aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2022-11-09regulator: qcom_smd: Add PMR735a regulatorsKonrad Dybcio1-0/+2
PMR735a is already supported in the RPMH regulator driver, but there are cases where it's bundled with SMD RPM SoCs. Port it over to qcom_smd-regulator to enable usage in such cases. Signed-off-by: Konrad Dybcio <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2022-11-09KVM: replace direct irq.h inclusionPaolo Bonzini1-0/+2
virt/kvm/irqchip.c is including "irq.h" from the arch-specific KVM source directory (i.e. not from arch/*/include) for the sole purpose of retrieving irqchip_in_kernel. Making the function inline in a header that is already included, such as asm/kvm_host.h, is not possible because it needs to look at struct kvm which is defined after asm/kvm_host.h is included. So add a kvm_arch_irqchip_in_kernel non-inline function; irqchip_in_kernel() is only performance critical on arm64 and x86, and the non-inline function is enough on all other architectures. irq.h can then be deleted from all architectures except x86. Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09kvm: Add interruptible flag to __gfn_to_pfn_memslot()Peter Xu1-2/+2
Add a new "interruptible" flag showing that the caller is willing to be interrupted by signals during the __gfn_to_pfn_memslot() request. Wire it up with a FOLL_INTERRUPTIBLE flag that we've just introduced. This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the SIGIPI) even during e.g. handling an userfaultfd page fault. No functional change intended. Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09kvm: Add KVM_PFN_ERR_SIGPENDINGPeter Xu1-0/+10
Add a new pfn error to show that we've got a pending signal to handle during hva_to_pfn_slow() procedure (of -EINTR retval). Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09mm/gup: Add FOLL_INTERRUPTIBLEPeter Xu1-0/+1
We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One issue with it is that not all GUP paths are able to handle signal delivers besides SIGKILL. That's not ideal for the GUP users who are actually able to handle these cases, like KVM. KVM uses GUP extensively on faulting guest pages, during which we've got existing infrastructures to retry a page fault at a later time. Allowing the GUP to be interrupted by generic signals can make KVM related threads to be more responsive. For examples: (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be generated to kick the vcpus out of kernel context immediately, (2) SIGINT: which can be used with interactive hypervisor users to stop a virtual machine with Ctrl-C without any delays/hangs, (3) SIGTRAP: which grants GDB capability even during page faults that are stuck for a long time. Normally hypervisor will be able to receive these signals properly, but not if we're stuck in a GUP for a long time for whatever reason. It happens easily with a stucked postcopy migration when e.g. a network temp failure happens, then some vcpu threads can hang death waiting for the pages. With the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively enable the ability to trap these signals. Reviewed-by: John Hubbard <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Peter Xu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09bug: introduce ASSERT_STRUCT_OFFSETMaxim Levitsky1-0/+9
ASSERT_STRUCT_OFFSET allows to assert during the build of the kernel that a field in a struct have an expected offset. KVM used to have such macro, but there is almost nothing KVM specific in it so move it to build_bug.h, so that it can be used in other places in KVM. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOCJohn Ogness1-0/+5
Provide an implementation for debug_lockdep_rcu_enabled() when CONFIG_DEBUG_LOCK_ALLOC is not enabled. This allows code to check if rcu lockdep debugging is available without needing an extra check if CONFIG_DEBUG_LOCK_ALLOC is enabled. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-11-09driver core: class: make namespace and get_ownership take const *Greg Kroah-Hartman1-2/+2
The callbacks in struct class namespace() and get_ownership() do not modify the struct device passed to them, so mark the pointer as constant and fix up all callbacks in the kernel to have the correct function signature. This helps make it more obvious what calls and callbacks do, and do not, modify structures passed to them. Cc: "Rafael J. Wysocki" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-11-09Merge tag 'rxrpc-next-20221108' of ↵David S. Miller2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs rxrpc changes David Howells says: ==================== rxrpc: Increasing SACK size and moving away from softirq, part 1 AF_RXRPC has some issues that need addressing: (1) The SACK table has a maximum capacity of 255, but for modern networks that isn't sufficient. This is hard to increase in the upstream code because of the way the application thread is coupled to the softirq and retransmission side through a ring buffer. Adjustments to the rx protocol allows a capacity of up to 8192, and having a ring sufficiently large to accommodate that would use an excessive amount of memory as this is per-call. (2) Processing ACKs in softirq mode causes the ACKs get conflated, with only the most recent being considered. Whilst this has the upside that the retransmission algorithm only needs to deal with the most recent ACK, it causes DATA transmission for a call to be very bursty because DATA packets cannot be transmitted in softirq mode. Rather transmission must be delegated to either the application thread or a workqueue, so there tend to be sudden bursts of traffic for any particular call due to scheduling delays. (3) All crypto in a single call is done in series; however, each DATA packet is individually encrypted so encryption and decryption of large calls could be parallelised if spare CPU resources are available. This is the first of a number of sets of patches that try and address them. The overall aims of these changes include: (1) To get rid of the TxRx ring and instead pass the packets round in queues (eg. sk_buff_head). On the Tx side, each ACK packet comes with a SACK table that can be parsed as-is, so there's no particular need to maintain our own; we just have to refer to the ACK. On the Rx side, we do need to maintain a SACK table with one bit per entry - but only if packets go missing - and we don't want to have to perform a complex transformation to get the information into an ACK packet. (2) To try and move almost all processing of received packets out of the softirq handler and into a high-priority kernel I/O thread. Only the transferral of packets would be left there. I would still use the encap_rcv hook to receive packets as there's a noticeable performance drop from letting the UDP socket put the packets into its own queue and then getting them out of there. (3) To make the I/O thread also do all the transmission. The app thread would be responsible for packaging the data into packets and then buffering them for the I/O thread to transmit. This would make it easier for the app thread to run ahead of the I/O thread, and would mean the I/O thread is less likely to have to wait around for a new packet to come available for transmission. (4) To logically partition the socket/UAPI/KAPI side of things from the I/O side of things. The local endpoint, connection, peer and call objects would belong to the I/O side. The socket side would not then touch the private internals of calls and suchlike and would not change their states. It would only look at the send queue, receive queue and a way to pass a message to cause an abort. (5) To remove as much locking, synchronisation, barriering and atomic ops as possible from the I/O side. Exclusion would be achieved by limiting modification of state to the I/O thread only. Locks would still need to be used in communication with the UDP socket and the AF_RXRPC socket API. (6) To provide crypto offload kernel threads that, when there's slack in the system, can see packets that need crypting and provide parallelisation in dealing with them. (7) To remove the use of system timers. Since each timer would then send a poke to the I/O thread, which would then deal with it when it had the opportunity, there seems no point in using system timers if, instead, a list of timeouts can be sensibly consulted. An I/O thread only then needs to schedule with a timeout when it is idle. (8) To use zero-copy sendmsg to send packets. This would make use of the I/O thread being the sole transmitter on the socket to manage the dead-reckoning sequencing of the completion notifications. There is a problem with zero-copy, though: the UDP socket doesn't handle running out of option memory very gracefully. With regard to this first patchset, the changes made include: (1) Some fixes, including a fallback for proc_create_net_single_write(), setting ack.bufferSize to 0 in ACK packets and a fix for rxrpc congestion management, which shouldn't be saving the cwnd value between calls. (2) Improvements in rxrpc tracepoints, including splitting the timer tracepoint into a set-timer and a timer-expired trace. (3) Addition of a new proc file to display some stats. (4) Some code cleanups, including removing some unused bits and unnecessary header inclusions. (5) A change to the recently added UDP encap_err_rcv hook so that it has the same signature as {ip,ipv6}_icmp_error(), and then just have rxrpc point its UDP socket's hook directly at those. (6) Definition of a new struct, rxrpc_txbuf, that is used to hold transmissible packets of DATA and ACK type in a single 2KiB block rather than using an sk_buff. This allows the buffer to be on a number of queues simultaneously more easily, and also guarantees that the entire block is in a single unit for zerocopy purposes and that the data payload is aligned for in-place crypto purposes. (7) ACK txbufs are allocated at proposal and queued for later transmission rather than being stored in a single place in the rxrpc_call struct, which means only a single ACK can be pending transmission at a time. The queue is then drained at various points. This allows the ACK generation code to be simplified. (8) The Rx ring buffer is removed. When a jumbo packet is received (which comprises a number of ordinary DATA packets glued together), it used to be pointed to by the ring multiple times, with an annotation in a side ring indicating which subpacket was in that slot - but this is no longer possible. Instead, the packet is cloned once for each subpacket, barring the last, and the range of data is set in the skb private area. This makes it easier for the subpackets in a jumbo packet to be decrypted in parallel. (9) The Tx ring buffer is removed. The side annotation ring that held the SACK information is also removed. Instead, in the event of packet loss, the SACK data attached an ACK packet is parsed. (10) Allocate an skcipher request when needed in the rxkad security class rather than caching one in the rxrpc_call struct. This deals with a race between externally-driven call disconnection getting rid of the skcipher request and sendmsg/recvmsg trying to use it because they haven't seen the completion yet. This is also needed to support parallelisation as the skcipher request cannot be used by two or more threads simultaneously. (11) Call udp_sendmsg() and udpv6_sendmsg() directly rather than going through kernel_sendmsg() so that we can provide our own iterator (zerocopy explicitly doesn't work with a KVEC iterator). This also lets us avoid the overhead of the security hook. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-11-09net/core: Allow live renaming when an interface is upAndy Ren1-3/+1
Allow a network interface to be renamed when the interface is up. As described in the netconsole documentation [1], when netconsole is used as a built-in, it will bring up the specified interface as soon as possible. As a result, user space will not be able to rename the interface since the kernel disallows renaming of interfaces that are administratively up unless the 'IFF_LIVE_RENAME_OK' private flag was set by the kernel. The original solution [2] to this problem was to add a new parameter to the netconsole configuration parameters that allows renaming of the interface used by netconsole while it is administratively up. However, during the discussion that followed, it became apparent that we have no reason to keep the current restriction and instead we should allow user space to rename interfaces regardless of their administrative state: 1. The restriction was put in place over 20 years ago when renaming was only possible via IOCTL and before rtnetlink started notifying user space about such changes like it does today. 2. The 'IFF_LIVE_RENAME_OK' flag was added over 3 years ago in version 5.2 and no regressions were reported. 3. In-kernel listeners to 'NETDEV_CHANGENAME' do not seem to care about the administrative state of interface. Therefore, allow user space to rename running interfaces by removing the restriction and the associated 'IFF_LIVE_RENAME_OK' flag. Help in possible triage by emitting a message to the kernel log that an interface was renamed while UP. [1] https://www.kernel.org/doc/Documentation/networking/netconsole.rst [2] https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Andy Ren <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-09gpiolib: remove devm_fwnode_get_[index_]gpiod_from_child()Dmitry Torokhov1-21/+0
Now that there are no more users of these APIs in the kernel we can remove them. Signed-off-by: Dmitry Torokhov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-11-09tty: Convert tty_buffer flags to boolIlpo Järvinen2-6/+3
The struct tty_buffer has flags which is only used for storing TTYB_NORMAL. There is also a few quite confusing operations for checking the presense of TTYB_NORMAL. Simplify things by converting flags to bool. Despite the name remaining the same, the meaning of "flags" is altered slightly by this change. Previously it referred to flags of the buffer (only TTYB_NORMAL being used as a flag). After this change, flags tell whether the buffer contains/should be allocated with flags array along with character data array. It is much more suitable name that TTYB_NORMAL was for this purpose, thus the name remains. Signed-off-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-11-09efi: libstub: Merge zboot decompressor with the ordinary stubArd Biesheuvel1-1/+0
Even though our EFI zboot decompressor is pedantically spec compliant and idiomatic for EFI image loaders, calling LoadImage() and StartImage() for the nested image is a bit of a burden. Not only does it create workflow issues for the distros (as both the inner and outer PE/COFF images need to be signed for secure boot), it also copies the image around in memory numerous times: - first, the image is decompressed into a buffer; - the buffer is consumed by LoadImage(), which copies the sections into a newly allocated memory region to hold the executable image; - once the EFI stub is invoked by StartImage(), it will also move the image in memory in case of KASLR, mirrored memory or if the image must execute from a certain a priori defined address. There are only two EFI spec compliant ways to load code into memory and execute it: - use LoadImage() and StartImage(), - call ExitBootServices() and take ownership of the entire system, after which anything goes. Given that the EFI zboot decompressor always invokes the EFI stub, and given that both are built from the same set of objects, let's merge the two, so that we can avoid LoadImage()/StartImage but still load our image into memory without breaking the above rules. This also means we can decompress the image directly into its final location, which could be randomized or meet other platform specific constraints that LoadImage() does not know how to adhere to. It also means that, even if the encapsulated image still has the EFI stub incorporated as well, it does not need to be signed for secure boot when wrapping it in the EFI zboot decompressor. In the future, we might decide to retire the EFI stub attached to the decompressed image, but for the time being, they can happily coexist. Signed-off-by: Ard Biesheuvel <[email protected]>
2022-11-09efi: libstub: Move screen_info handling to common codeArd Biesheuvel1-1/+1
Currently, arm64, RISC-V and LoongArch rely on the fact that struct screen_info can be accessed directly, due to the fact that the EFI stub and the core kernel are part of the same image. This will change after a future patch, so let's ensure that the screen_info handling is able to deal with this, by adopting the arm32 approach of passing it as a configuration table. While at it, switch to ACPI reclaim memory to hold the screen_info data, which is more appropriate for this kind of allocation. Signed-off-by: Ard Biesheuvel <[email protected]>
2022-11-08swap: add a limit for readahead page-cluster valueKairui Song1-0/+1
Currenty there is no upper limit for /proc/sys/vm/page-cluster, and it's a bit shift value, so it could result in overflow of the 32-bit integer. Add a reasonable upper limit for it, read-in at most 2**31 pages, which is a large enough value for readahead. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kairui Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/hwpoison: introduce per-memory_block hwpoison counterNaoya Horiguchi2-1/+22
Currently PageHWPoison flag does not behave well when experiencing memory hotremove/hotplug. Any data field in struct page is unreliable when the associated memory is offlined, and the current mechanism can't tell whether a memory block is onlined because a new memory devices is installed or because previous failed offline operations are undone. Especially if there's a hwpoisoned memory, it's unclear what the best option is. So introduce a new mechanism to make struct memory_block remember that a memory block has hwpoisoned memory inside it. And make any online event fail if the onlining memory block contains hwpoison. struct memory_block is freed and reallocated over ACPI-based hotremove/hotplug, but not over sysfs-based hotremove/hotplug. So the new counter can distinguish these cases. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/hwpoison: pass pfn to num_poisoned_pages_*()Naoya Horiguchi1-2/+2
No functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.cNaoya Horiguchi2-22/+7
These interfaces will be used by drivers/base/memory.c by later patch, so as a preparatory work move them to more common header file visible to the file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned ↵Naoya Horiguchi2-6/+10
hugepage Patch series "mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug", v7. This patchset tries to solve the issue among memory_hotplug, hugetlb and hwpoison. In this patchset, memory hotplug handles hwpoison pages like below: - hwpoison pages should not prevent memory hotremove, - memory block with hwpoison pages should not be onlined. This patch (of 4): HWPoisoned page is not supposed to be accessed once marked, but currently such accesses can happen during memory hotremove because do_migrate_range() can be called before dissolve_free_huge_pages() is called. Clear HPageMigratable for hwpoisoned hugepages to prevent them from being migrated. This should be done in hugetlb_lock to avoid race against isolate_hugetlb(). get_hwpoison_huge_page() needs to have a flag to show it's called from unpoison to take refcount of hwpoisoned hugepages, so add it. [[email protected]: remove TestClearHPageMigratable and reduce to test and clear separately] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Reported-by: Miaohe Lin <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08compiler-gcc: document minimum version for `__no_sanitize_coverage__`Miguel Ojeda1-0/+3
The attribute was added in GCC 12.1. This will simplify future cleanups, and is closer to what we do in `compiler_attributes.h`. Link: https://godbolt.org/z/MGbT76j6G Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Acked-by: Marco Elver <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08compiler-gcc: remove attribute support check for `__no_sanitize_undefined__`Miguel Ojeda1-4/+0
The attribute was added in GCC 4.9, while the minimum GCC version supported by the kernel is GCC 5.1. Therefore, remove the check. Link: https://godbolt.org/z/GrMeo6fYr Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Marco Elver <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08compiler-gcc: remove attribute support check for `__no_sanitize_thread__`Miguel Ojeda1-1/+1
The attribute was added in GCC 5.1, which matches the minimum GCC version supported by the kernel. Therefore, remove the check. Link: https://godbolt.org/z/vbxKejxbx Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Acked-by: Marco Elver <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08compiler-gcc: remove attribute support check for `__no_sanitize_address__`Miguel Ojeda1-4/+0
The attribute was added in GCC 4.8, while the minimum GCC version supported by the kernel is GCC 5.1. Therefore, remove the check. Link: https://godbolt.org/z/84v56vcn8 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Marco Elver <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08compiler-gcc: be consistent with underscores use for `no_sanitize`Miguel Ojeda1-4/+4
Patch series "compiler-gcc: be consistent with underscores use for `no_sanitize`". This patch (of 5): Other macros that define shorthands for attributes in e.g. `compiler_attributes.h` and elsewhere use underscores. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm: remove FGP_HEADMatthew Wilcox (Oracle)1-3/+2
This is no longer used; all callers have been converted to use folios instead. Somehow this manages to save 11 bytes of text. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08memory: move hotplug memory notifier priority to same file for easy sortingLiu Shixin2-3/+7
The priority of hotplug memory callback is defined in a different file. And there are some callers using numbers directly. Collect them together into include/linux/memory.h for easy reading. This allows us to sort their priorities more intuitively without additional comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Shixin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Waiman Long <[email protected]> Cc: zefan li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08memory: remove unused register_hotmemory_notifier()Liu Shixin1-6/+0
Remove unused register_hotmemory_notifier(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Shixin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Waiman Long <[email protected]> Cc: zefan li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm: fix typo in struct vm_operations_struct commentsRolf Eike Beer1-1/+1
There is no eprotect(), so I assume this is about mprotect(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rolf Eike Beer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/hugetlb: add folio_hstate()Sidhartha Kumar1-2/+12
Helper function to retrieve hstate information from a hugetlb folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08hugetlbfs: convert hugetlb_delete_from_page_cache() to use foliosSidhartha Kumar1-1/+0
Remove the last caller of delete_from_page_cache() by converting the code to its folio equivalent. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/hugetlb: add hugetlb_folio_subpool() helpersSidhartha Kumar1-2/+13
Allow hugetlbfs_migrate_folio to check and read subpool information by passing in a folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: kernel test robot <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm: add private field of first tail to struct page and struct folioSidhartha Kumar1-0/+14
Allow struct folio to store hugetlb metadata that is contained in the private field of the first tail page. On 32-bit, _private_1 aligns with page[1].private. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Acked-by: Mike Kravetz <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: kernel test robot <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/hugetlb: add folio support to hugetlb specific flag macrosSidhartha Kumar1-0/+24
Patch series "begin converting hugetlb code to folios", v4. This patch series starts the conversion of the hugetlb code to operate on struct folios rather than struct pages. This removes the ambiguitiy of whether functions are operating on head pages, tail pages of compound pages, or base pages. This series passes the linux test project hugetlb test cases. Patch 1 adds hugeltb specific page macros that can operate on folios. Patch 2 adds the private field of the first tail page to struct page. For 32-bit, _private_1 alinging with page[1].private was confirmed by using pahole. Patch 3 introduces hugetlb subpool helper functions which operate on struct folios. These patches were tested using the hugepage-mmap.c selftest along with the migratepages command. Patch 4 converts hugetlb_delete_from_page_cache() to use folios. Patch 5 adds a folio_hstate() function to get hstate information from a folio and adds a user of folio_hstate(). Bpftrace was used to track time spent in the free_huge_pages function during the ltp test cases as it is a caller of the hugetlb subpool functions. From the histogram, the performance is similar before and after the patch series. Time spent in 'free_huge_page' 6.0.0-rc2.master.20220823 @nsecs: [256, 512) 14770 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |@@@@@@@@@@@@@@@@@@@@@@@@@ | [512, 1K) 155 | | [1K, 2K) 169 | | [2K, 4K) 50 | | [4K, 8K) 14 | | [8K, 16K) 3 | | [16K, 32K) 3 | | 6.0.0-rc2.master.20220823 + patch series @nsecs: [256, 512) 13678 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |@@@@@@@@@@@@@@@@@@@@@@@@@ | [512, 1K) 142 | | [1K, 2K) 199 | | [2K, 4K) 44 | | [4K, 8K) 13 | | [8K, 16K) 4 | | [16K, 32K) 1 | | This patch (of 5): Allow the macros which test, set, and clear hugetlb specific page flags to take a hugetlb folio as an input. The macrros are generated as folio_{test, set, clear}_hugetlb_{restore_reserve, migratable, temporary, freed, vmemmap_optimized, raw_hwp_unreliable}. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: kernel test robot <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm: vmscan: make rotations a secondary factor in balancing anon vs fileJohannes Weiner1-2/+3
We noticed a 2% webserver throughput regression after upgrading from 5.6. This could be tracked down to a shift in the anon/file reclaim balance (confirmed with swappiness) that resulted in worse reclaim efficiency and thus more kswapd activity for the same outcome. The change that exposed the problem is aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU"). By qualifying swapins based on their refault distance, it lowered the cost of anon reclaim in this workload, in turn causing (much) more anon scanning than before. Scanning the anon list is more expensive due to the higher ratio of mmapped pages that may rotate during reclaim, and so the result was an increase in %sys time. Right now, rotations aren't considered a cost when balancing scan pressure between LRUs. We can end up with very few file refaults putting all the scan pressure on hot anon pages that are rotated en masse, don't get reclaimed, and never push back on the file LRU again. We still only reclaim file cache in that case, but we burn a lot CPU rotating anon pages. It's "fair" from an LRU age POV, but doesn't reflect the real cost it imposes on the system. Consider rotations as a secondary factor in balancing the LRUs. This doesn't attempt to make a precise comparison between IO cost and CPU cost, it just says: if reloads are about comparable between the lists, or rotations are overwhelmingly different, adjust for CPU work. This fixed the regression on our webservers. It has since been deployed to the entire Meta fleet and hasn't caused any problems. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08hugetlb: simplify hugetlb handling in follow_page_maskMike Kravetz1-42/+8
During discussions of this series [1], it was suggested that hugetlb handling code in follow_page_mask could be simplified. At the beginning of follow_page_mask, there currently is a call to follow_huge_addr which 'may' handle hugetlb pages. ia64 is the only architecture which provides a follow_huge_addr routine that does not return error. Instead, at each level of the page table a check is made for a hugetlb entry. If a hugetlb entry is found, a call to a routine associated with that entry is made. Currently, there are two checks for hugetlb entries at each page table level. The first check is of the form: if (p?d_huge()) page = follow_huge_p?d(); the second check is of the form: if (is_hugepd()) page = follow_huge_pd(). We can replace these checks, as well as the special handling routines such as follow_huge_p?d() and follow_huge_pd() with a single routine to handle hugetlb vmas. A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the beginning of follow_page_mask. hugetlb_follow_page_mask will use the existing routine huge_pte_offset to walk page tables looking for hugetlb entries. huge_pte_offset can be overwritten by architectures, and already handles special cases such as hugepd entries. [1] https://lore.kernel.org/linux-mm/[email protected]/ [[email protected]: remove vma (pmd sharing) per Peter] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: remove left over hugetlb_vma_unlock_read()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Tested-by: Baolin Wang <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08maple_tree: reorganize testing to restore module testingLiam Howlett1-0/+7
Along the development cycle, the testing code support for module/in-kernel compiles was removed. Restore this functionality by moving any internal API tests to the userspace side, as well as threading tests. Fix the lockdep issues and add a way to reduce memory usage so the tests can complete with KASAN + memleak detection. Make the tests work on 32 bit hosts where possible and detect 32 bit hosts in the radix test suite. [[email protected]: fix module export] [[email protected]: fix it some more] [[email protected]: fix compile warnings on 32bit build in check_find()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08soc: mediatek: Add all settings to mtk_mmsys_ddp_dpi_fmt_config funcXinlei Lee1-0/+7
The difference between MT8186 and other ICs is that when modifying the output format, we need to modify the mmsys_base+0x400 register to take effect. So when setting the dpi output format, we need to call mtk_mmsys_ddp_dpi_fmt_config to set it to MT8186 synchronously. Commit a071e52f75d1 ("soc: mediatek: Add mmsys func to adapt to dpi output for MT8186") lacked some of the possible output formats and also had a wrong bitmask. Add the missing output formats and fix the bitmask. While at it, also update mtk_mmsys_ddp_dpi_fmt_config() to use generic formats, so that it is slightly easier to extend for other platforms. Fixes: a071e52f75d1 ("soc: mediatek: Add mmsys func to adapt to dpi output for MT8186") Signed-off-by: Xinlei Lee <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Reviewed-by: CK Hu <[email protected]> Reviewed-by: Nícolas F. R. A. Prado <[email protected]> Signed-off-by: Matthias Brugger <[email protected]>
2022-11-08PCI: Assign PCI domain IDs by ida_alloc()Pali Rohár1-0/+1
Replace assignment of PCI domain IDs from atomic_inc_return() to ida_alloc(). Use two IDAs, one for static domain allocations (those which are defined in device tree) and second for dynamic allocations (all other). During removal of root bus / host bridge, also release the domain ID. The released ID can be reused again, for example when dynamically loading and unloading native PCI host bridge drivers. This change also allows to mix static device tree assignment and dynamic by kernel as all static allocations are reserved in dynamic pool. [bhelgaas: set "err" if "bus->domain_nr < 0"] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Pali Rohár <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2022-11-08net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error()David Howells1-1/+2
Change the udp encap_err_rcv signature to match ip_icmp_error() and ipv6_icmp_error() so that those can be used from the called function and export them. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected]
2022-11-08net, proc: Provide PROC_FS=n fallback for proc_create_net_single_write()David Howells1-0/+2
Provide a CONFIG_PROC_FS=n fallback for proc_create_net_single_write(). Also provide a fallback for proc_create_net_data_write(). Fixes: 564def71765c ("proc: Add a way to make network proc files writable") Reported-by: kernel test robot <[email protected]> Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected]
2022-11-08ethtool: linkstate: add a statistic for PHY down eventsJakub Kicinski2-0/+20
The previous attempt to augment carrier_down (see Link) was not met with much enthusiasm so let's do the simple thing of exposing what some devices already maintain. Add a common ethtool statistic for link going down. Currently users have to maintain per-driver mapping to extract the right stat from the vendor-specific ethtool -S stats. carrier_down does not fit the bill because it counts a lot of software related false positives. Add the statistic to the extended link state API to steer vendors towards implementing all of it. Implement for bnxt and all Linux-controlled PHYs. mlx5 and (possibly) enic also have a counter for this but I leave the implementation to their maintainers. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Michael Chan <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-11-08Merge git://linuxtv.org/sailus/media_tree into media_stageMauro Carvalho Chehab1-0/+3
* git://linuxtv.org/sailus/media_tree: (47 commits) media: i2c: ov4689: code cleanup media: ov9650: Drop platform data code path media: ov7670: Drop unused include media: ov2640: Drop legacy includes media: tc358746: add Toshiba TC358746 Parallel to CSI-2 bridge driver media: dt-bindings: add bindings for Toshiba TC358746 phy: dphy: add support to calculate the timing based on hs_clk_rate phy: dphy: refactor get_default_config v4l: subdev: Warn if disabling streaming failed, return success dw9768: Enable low-power probe on ACPI media: i2c: imx290: Replace GAIN control with ANALOGUE_GAIN media: i2c: imx290: Add crop selection targets support media: i2c: imx290: Factor out format retrieval to separate function media: i2c: imx290: Move registers with fixed value to init array media: i2c: imx290: Create controls for fwnode properties media: i2c: imx290: Implement HBLANK and VBLANK controls media: i2c: imx290: Split control initialization to separate function media: i2c: imx290: Fix max gain value media: i2c: imx290: Add exposure time control media: i2c: imx290: Define more register macros ...
2022-11-07mm/percpu: remove unused PERCPU_DYNAMIC_EARLY_SLOTSBaoquan He1-4/+3
Since commit 40064aeca35c ("percpu: replace area map allocator with bitmap"), there's no place to use PERCPU_DYNAMIC_EARLY_SLOTS. So clean it up. Signed-off-by: Baoquan He <[email protected]> Signed-off-by: Dennis Zhou <[email protected]>
2022-11-07net: remove explicit phylink_generic_validate() referencesRussell King (Oracle)1-0/+5
Virtually all conventional network drivers are now converted to use phylink_generic_validate() - only DSA drivers and fman_memac remain, so lets remove the necessity for network drivers to explicitly set this member, and default to phylink_generic_validate() when unset. This is possible as .validate must currently be set. Any remaining instances that have not been addressed by this patch can be fixed up later. Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-07mtd: nand: drop EXPORT_SYMBOL_GPL for nanddev_erase()Dario Binacchi1-1/+0
This function is only used within this module, so it is no longer necessary to use EXPORT_SYMBOL_GPL(). Signed-off-by: Dario Binacchi <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Link: https://lore.kernel.org/linux-mtd/[email protected]
2022-11-07arm_pmu: rework ACPI probingMark Rutland1-1/+0
The current ACPI PMU probing logic tries to associate PMUs with CPUs when the CPU is first brought online, in order to handle late hotplug, though PMUs are only registered during early boot, and so for late hotplugged CPUs this can only associate the CPU with an existing PMU. We tried to be clever and the have the arm_pmu_acpi_cpu_starting() callback allocate a struct arm_pmu when no matching instance is found, in order to avoid duplication of logic. However, as above this doesn't do anything useful for late hotplugged CPUs, and this requires us to allocate memory in an atomic context, which is especially problematic for PREEMPT_RT, as reported by Valentin and Pierre. This patch reworks the probing to detect PMUs for all online CPUs in the arm_pmu_acpi_probe() function, which is more aligned with how DT probing works. The arm_pmu_acpi_cpu_starting() callback only tries to associate CPUs with an existing arm_pmu instance, avoiding the problem of allocating in atomic context. Note that as we didn't previously register PMUs for late-hotplugged CPUs, this change doesn't result in a loss of existing functionality, though we will now warn when we cannot associate a CPU with a PMU. This change allows us to pull the hotplug callback registration into the arm_pmu_acpi_probe() function, as we no longer need the callbacks to be invoked shortly after probing the boot CPUs, and can register it without invoking the calls. For the moment the arm_pmu_acpi_init() initcall remains to register the SPE PMU, though in future this should probably be moved elsewhere (e.g. the arm64 ACPI init code), since this doesn't need to be tied to the regular CPU PMU code. Signed-off-by: Mark Rutland <[email protected]> Reported-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ Reported-by: Pierre Gondois <[email protected]> Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/ Cc: Pierre Gondois <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Will Deacon <[email protected]> Reviewed-and-tested-by: Pierre Gondois <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-11-07ACPI: ARM Performance Monitoring Unit Table (APMT) initial supportBesar Wicaksono1-0/+19
ARM Performance Monitoring Unit Table describes the properties of PMU support in ARM-based system. The APMT table contains a list of nodes, each represents a PMU in the system that conforms to ARM CoreSight PMU architecture. The properties of each node include information required to access the PMU (e.g. MMIO base address, interrupt number) and also identification. For more detailed information, please refer to the specification below: * APMT: https://developer.arm.com/documentation/den0117/latest * ARM Coresight PMU: https://developer.arm.com/documentation/ihi0091/latest The initial support adds the detection of APMT table and generic infrastructure to create platform devices for ARM CoreSight PMUs. Similar to IORT the root pointer of APMT is preserved during runtime and each PMU platform device is given a pointer to the corresponding APMT node. Signed-off-by: Besar Wicaksono <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Reviewed-by: Sudeep Holla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-11-07can: dev: fix skb drop checkOliver Hartkopp1-0/+16
In commit a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") the priv->ctrlmode element is read even on virtual CAN interfaces that do not create the struct can_priv at startup. This out-of-bounds read may lead to CAN frame drops for virtual CAN interfaces like vcan and vxcan. This patch mainly reverts the original commit and adds a new helper for CAN interface drivers that provide the required information in struct can_priv. Fixes: a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") Reported-by: Dariusz Stojaczyk <[email protected]> Cc: Vincent Mailhol <[email protected]> Cc: Max Staudt <[email protected]> Signed-off-by: Oliver Hartkopp <[email protected]> Acked-by: Vincent Mailhol <[email protected]> Link: https://lore.kernel.org/all/[email protected] Cc: [email protected] # 6.0.x [mkl: patch pch_can, too] Signed-off-by: Marc Kleine-Budde <[email protected]>
2022-11-07net: mv643xx_eth: support MII/GMII/RGMII modes for KirkwoodDavid Yang1-0/+2
Support mode switch properly, which is not available before. If SoC has two Ethernet controllers, by setting both of them into MII mode, the first controller enters GMII mode, while the second controller is effectively disabled. This requires configuring (and maybe enabling) the second controller in the device tree, even though it cannot be used. Signed-off-by: David Yang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-04lsm: make security_socket_getpeersec_stream() sockptr_t safePaul Moore3-7/+10
Commit 4ff09db1b79b ("bpf: net: Change sk_getsockopt() to take the sockptr_t argument") made it possible to call sk_getsockopt() with both user and kernel address space buffers through the use of the sockptr_t type. Unfortunately at the time of conversion the security_socket_getpeersec_stream() LSM hook was written to only accept userspace buffers, and in a desire to avoid having to change the LSM hook the commit author simply passed the sockptr_t's userspace buffer pointer. Since the only sk_getsockopt() callers at the time of conversion which used kernel sockptr_t buffers did not allow SO_PEERSEC, and hence the security_socket_getpeersec_stream() hook, this was acceptable but also very fragile as future changes presented the possibility of silently passing kernel space pointers to the LSM hook. There are several ways to protect against this, including careful code review of future commits, but since relying on code review to catch bugs is a recipe for disaster and the upstream eBPF maintainer is "strongly against defensive programming", this patch updates the LSM hook, and all of the implementations to support sockptr_t and safely handle both user and kernel space buffers. Acked-by: Casey Schaufler <[email protected]> Acked-by: John Johansen <[email protected]> Signed-off-by: Paul Moore <[email protected]>