| Age | Commit message (Collapse) | Author | Files | Lines |
|
PMR735a is already supported in the RPMH regulator driver, but
there are cases where it's bundled with SMD RPM SoCs. Port it over
to qcom_smd-regulator to enable usage in such cases.
Signed-off-by: Konrad Dybcio <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
virt/kvm/irqchip.c is including "irq.h" from the arch-specific KVM source
directory (i.e. not from arch/*/include) for the sole purpose of retrieving
irqchip_in_kernel.
Making the function inline in a header that is already included,
such as asm/kvm_host.h, is not possible because it needs to look at
struct kvm which is defined after asm/kvm_host.h is included. So add a
kvm_arch_irqchip_in_kernel non-inline function; irqchip_in_kernel() is
only performance critical on arm64 and x86, and the non-inline function
is enough on all other architectures.
irq.h can then be deleted from all architectures except x86.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add the mask KVM_MSR_EXIT_REASON_VALID_MASK for the MSR exit reason
flags. This simplifies checks that validate these flags, and makes it
easier to introduce new flags in the future.
No functional change intended.
Signed-off-by: Aaron Lewis <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add a new "interruptible" flag showing that the caller is willing to be
interrupted by signals during the __gfn_to_pfn_memslot() request. Wire it
up with a FOLL_INTERRUPTIBLE flag that we've just introduced.
This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the
SIGIPI) even during e.g. handling an userfaultfd page fault.
No functional change intended.
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add a new pfn error to show that we've got a pending signal to handle
during hva_to_pfn_slow() procedure (of -EINTR retval).
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One
issue with it is that not all GUP paths are able to handle signal delivers
besides SIGKILL.
That's not ideal for the GUP users who are actually able to handle these
cases, like KVM.
KVM uses GUP extensively on faulting guest pages, during which we've got
existing infrastructures to retry a page fault at a later time. Allowing
the GUP to be interrupted by generic signals can make KVM related threads
to be more responsive. For examples:
(1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
generated to kick the vcpus out of kernel context immediately,
(2) SIGINT: which can be used with interactive hypervisor users to stop a
virtual machine with Ctrl-C without any delays/hangs,
(3) SIGTRAP: which grants GDB capability even during page faults that are
stuck for a long time.
Normally hypervisor will be able to receive these signals properly, but not
if we're stuck in a GUP for a long time for whatever reason. It happens
easily with a stucked postcopy migration when e.g. a network temp failure
happens, then some vcpu threads can hang death waiting for the pages. With
the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
enable the ability to trap these signals.
Reviewed-by: John Hubbard <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
ASSERT_STRUCT_OFFSET allows to assert during the build of
the kernel that a field in a struct have an expected offset.
KVM used to have such macro, but there is almost nothing KVM specific
in it so move it to build_bug.h, so that it can be used in other
places in KVM.
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Provide an implementation for debug_lockdep_rcu_enabled() when
CONFIG_DEBUG_LOCK_ALLOC is not enabled. This allows code to check
if rcu lockdep debugging is available without needing an extra
check if CONFIG_DEBUG_LOCK_ALLOC is enabled.
Signed-off-by: John Ogness <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
The callbacks in struct class namespace() and get_ownership() do not
modify the struct device passed to them, so mark the pointer as constant
and fix up all callbacks in the kernel to have the correct function
signature.
This helps make it more obvious what calls and callbacks do, and do not,
modify structures passed to them.
Cc: "Rafael J. Wysocki" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
rxrpc changes
David Howells says:
====================
rxrpc: Increasing SACK size and moving away from softirq, part 1
AF_RXRPC has some issues that need addressing:
(1) The SACK table has a maximum capacity of 255, but for modern networks
that isn't sufficient. This is hard to increase in the upstream code
because of the way the application thread is coupled to the softirq
and retransmission side through a ring buffer. Adjustments to the rx
protocol allows a capacity of up to 8192, and having a ring
sufficiently large to accommodate that would use an excessive amount
of memory as this is per-call.
(2) Processing ACKs in softirq mode causes the ACKs get conflated, with
only the most recent being considered. Whilst this has the upside
that the retransmission algorithm only needs to deal with the most
recent ACK, it causes DATA transmission for a call to be very bursty
because DATA packets cannot be transmitted in softirq mode. Rather
transmission must be delegated to either the application thread or a
workqueue, so there tend to be sudden bursts of traffic for any
particular call due to scheduling delays.
(3) All crypto in a single call is done in series; however, each DATA
packet is individually encrypted so encryption and decryption of large
calls could be parallelised if spare CPU resources are available.
This is the first of a number of sets of patches that try and address them.
The overall aims of these changes include:
(1) To get rid of the TxRx ring and instead pass the packets round in
queues (eg. sk_buff_head). On the Tx side, each ACK packet comes with
a SACK table that can be parsed as-is, so there's no particular need
to maintain our own; we just have to refer to the ACK.
On the Rx side, we do need to maintain a SACK table with one bit per
entry - but only if packets go missing - and we don't want to have to
perform a complex transformation to get the information into an ACK
packet.
(2) To try and move almost all processing of received packets out of the
softirq handler and into a high-priority kernel I/O thread. Only the
transferral of packets would be left there. I would still use the
encap_rcv hook to receive packets as there's a noticeable performance
drop from letting the UDP socket put the packets into its own queue
and then getting them out of there.
(3) To make the I/O thread also do all the transmission. The app thread
would be responsible for packaging the data into packets and then
buffering them for the I/O thread to transmit. This would make it
easier for the app thread to run ahead of the I/O thread, and would
mean the I/O thread is less likely to have to wait around for a new
packet to come available for transmission.
(4) To logically partition the socket/UAPI/KAPI side of things from the
I/O side of things. The local endpoint, connection, peer and call
objects would belong to the I/O side. The socket side would not then
touch the private internals of calls and suchlike and would not change
their states. It would only look at the send queue, receive queue and
a way to pass a message to cause an abort.
(5) To remove as much locking, synchronisation, barriering and atomic ops
as possible from the I/O side. Exclusion would be achieved by
limiting modification of state to the I/O thread only. Locks would
still need to be used in communication with the UDP socket and the
AF_RXRPC socket API.
(6) To provide crypto offload kernel threads that, when there's slack in
the system, can see packets that need crypting and provide
parallelisation in dealing with them.
(7) To remove the use of system timers. Since each timer would then send
a poke to the I/O thread, which would then deal with it when it had
the opportunity, there seems no point in using system timers if,
instead, a list of timeouts can be sensibly consulted. An I/O thread
only then needs to schedule with a timeout when it is idle.
(8) To use zero-copy sendmsg to send packets. This would make use of the
I/O thread being the sole transmitter on the socket to manage the
dead-reckoning sequencing of the completion notifications. There is a
problem with zero-copy, though: the UDP socket doesn't handle running
out of option memory very gracefully.
With regard to this first patchset, the changes made include:
(1) Some fixes, including a fallback for proc_create_net_single_write(),
setting ack.bufferSize to 0 in ACK packets and a fix for rxrpc
congestion management, which shouldn't be saving the cwnd value
between calls.
(2) Improvements in rxrpc tracepoints, including splitting the timer
tracepoint into a set-timer and a timer-expired trace.
(3) Addition of a new proc file to display some stats.
(4) Some code cleanups, including removing some unused bits and
unnecessary header inclusions.
(5) A change to the recently added UDP encap_err_rcv hook so that it has
the same signature as {ip,ipv6}_icmp_error(), and then just have rxrpc
point its UDP socket's hook directly at those.
(6) Definition of a new struct, rxrpc_txbuf, that is used to hold
transmissible packets of DATA and ACK type in a single 2KiB block
rather than using an sk_buff. This allows the buffer to be on a
number of queues simultaneously more easily, and also guarantees that
the entire block is in a single unit for zerocopy purposes and that
the data payload is aligned for in-place crypto purposes.
(7) ACK txbufs are allocated at proposal and queued for later transmission
rather than being stored in a single place in the rxrpc_call struct,
which means only a single ACK can be pending transmission at a time.
The queue is then drained at various points. This allows the ACK
generation code to be simplified.
(8) The Rx ring buffer is removed. When a jumbo packet is received (which
comprises a number of ordinary DATA packets glued together), it used
to be pointed to by the ring multiple times, with an annotation in a
side ring indicating which subpacket was in that slot - but this is no
longer possible. Instead, the packet is cloned once for each
subpacket, barring the last, and the range of data is set in the skb
private area. This makes it easier for the subpackets in a jumbo
packet to be decrypted in parallel.
(9) The Tx ring buffer is removed. The side annotation ring that held the
SACK information is also removed. Instead, in the event of packet
loss, the SACK data attached an ACK packet is parsed.
(10) Allocate an skcipher request when needed in the rxkad security class
rather than caching one in the rxrpc_call struct. This deals with a
race between externally-driven call disconnection getting rid of the
skcipher request and sendmsg/recvmsg trying to use it because they
haven't seen the completion yet. This is also needed to support
parallelisation as the skcipher request cannot be used by two or more
threads simultaneously.
(11) Call udp_sendmsg() and udpv6_sendmsg() directly rather than going
through kernel_sendmsg() so that we can provide our own iterator
(zerocopy explicitly doesn't work with a KVEC iterator). This also
lets us avoid the overhead of the security hook.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Allow a network interface to be renamed when the interface
is up.
As described in the netconsole documentation [1], when netconsole is
used as a built-in, it will bring up the specified interface as soon as
possible. As a result, user space will not be able to rename the
interface since the kernel disallows renaming of interfaces that are
administratively up unless the 'IFF_LIVE_RENAME_OK' private flag was set
by the kernel.
The original solution [2] to this problem was to add a new parameter to
the netconsole configuration parameters that allows renaming of
the interface used by netconsole while it is administratively up.
However, during the discussion that followed, it became apparent that we
have no reason to keep the current restriction and instead we should
allow user space to rename interfaces regardless of their administrative
state:
1. The restriction was put in place over 20 years ago when renaming was
only possible via IOCTL and before rtnetlink started notifying user
space about such changes like it does today.
2. The 'IFF_LIVE_RENAME_OK' flag was added over 3 years ago in version
5.2 and no regressions were reported.
3. In-kernel listeners to 'NETDEV_CHANGENAME' do not seem to care about
the administrative state of interface.
Therefore, allow user space to rename running interfaces by removing the
restriction and the associated 'IFF_LIVE_RENAME_OK' flag. Help in
possible triage by emitting a message to the kernel log that an
interface was renamed while UP.
[1] https://www.kernel.org/doc/Documentation/networking/netconsole.rst
[2] https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Andy Ren <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining
of some byteswap operations") added __always_inline to swab functions
and commit 283d75737837 ("uapi/linux/stddef.h: Provide __always_inline to
userspace headers") added a definition of __always_inline for use in
exported headers when the kernel's compiler.h is not available.
However, since swab.h does not include stddef.h, if the header soup does
not indirectly include it, the definition of __always_inline is missing,
resulting in a compilation failure, which was observed compiling the
perf tool using exported headers containing this commit:
In file included from /usr/include/linux/byteorder/little_endian.h:12:0,
from /usr/include/asm/byteorder.h:14,
from tools/include/uapi/linux/perf_event.h:20,
from perf.h:8,
from builtin-bench.c:18:
/usr/include/linux/swab.h:160:8: error: unknown type name `__always_inline'
static __always_inline __u16 __swab16p(const __u16 *p)
Fix this by replacing the inclusion of linux/compiler.h with
linux/stddef.h to ensure that we pick up that definition if required,
without relying on it's indirect inclusion. compiler.h is then included
indirectly, via stddef.h.
Fixes: 283d75737837 ("uapi/linux/stddef.h: Provide __always_inline to userspace headers")
Signed-off-by: Matt Redfearn <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Reviewed-by: Petr Vaněk <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
Now that there are no more users of these APIs in the kernel we can
remove them.
Signed-off-by: Dmitry Torokhov <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
The struct tty_buffer has flags which is only used for storing TTYB_NORMAL.
There is also a few quite confusing operations for checking the presense
of TTYB_NORMAL. Simplify things by converting flags to bool.
Despite the name remaining the same, the meaning of "flags" is altered
slightly by this change. Previously it referred to flags of the buffer
(only TTYB_NORMAL being used as a flag). After this change, flags tell
whether the buffer contains/should be allocated with flags array along
with character data array. It is much more suitable name that
TTYB_NORMAL was for this purpose, thus the name remains.
Signed-off-by: Ilpo Järvinen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Even though our EFI zboot decompressor is pedantically spec compliant
and idiomatic for EFI image loaders, calling LoadImage() and
StartImage() for the nested image is a bit of a burden. Not only does it
create workflow issues for the distros (as both the inner and outer
PE/COFF images need to be signed for secure boot), it also copies the
image around in memory numerous times:
- first, the image is decompressed into a buffer;
- the buffer is consumed by LoadImage(), which copies the sections into
a newly allocated memory region to hold the executable image;
- once the EFI stub is invoked by StartImage(), it will also move the
image in memory in case of KASLR, mirrored memory or if the image must
execute from a certain a priori defined address.
There are only two EFI spec compliant ways to load code into memory and
execute it:
- use LoadImage() and StartImage(),
- call ExitBootServices() and take ownership of the entire system, after
which anything goes.
Given that the EFI zboot decompressor always invokes the EFI stub, and
given that both are built from the same set of objects, let's merge the
two, so that we can avoid LoadImage()/StartImage but still load our
image into memory without breaking the above rules.
This also means we can decompress the image directly into its final
location, which could be randomized or meet other platform specific
constraints that LoadImage() does not know how to adhere to. It also
means that, even if the encapsulated image still has the EFI stub
incorporated as well, it does not need to be signed for secure boot when
wrapping it in the EFI zboot decompressor.
In the future, we might decide to retire the EFI stub attached to the
decompressed image, but for the time being, they can happily coexist.
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
Currently, arm64, RISC-V and LoongArch rely on the fact that struct
screen_info can be accessed directly, due to the fact that the EFI stub
and the core kernel are part of the same image. This will change after a
future patch, so let's ensure that the screen_info handling is able to
deal with this, by adopting the arm32 approach of passing it as a
configuration table. While at it, switch to ACPI reclaim memory to hold
the screen_info data, which is more appropriate for this kind of
allocation.
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
Currenty there is no upper limit for /proc/sys/vm/page-cluster, and it's a
bit shift value, so it could result in overflow of the 32-bit integer.
Add a reasonable upper limit for it, read-in at most 2**31 pages, which is
a large enough value for readahead.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kairui Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Currently PageHWPoison flag does not behave well when experiencing memory
hotremove/hotplug. Any data field in struct page is unreliable when the
associated memory is offlined, and the current mechanism can't tell
whether a memory block is onlined because a new memory devices is
installed or because previous failed offline operations are undone.
Especially if there's a hwpoisoned memory, it's unclear what the best
option is.
So introduce a new mechanism to make struct memory_block remember that a
memory block has hwpoisoned memory inside it. And make any online event
fail if the onlining memory block contains hwpoison. struct memory_block
is freed and reallocated over ACPI-based hotremove/hotplug, but not over
sysfs-based hotremove/hotplug. So the new counter can distinguish these
cases.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
No functional change.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
These interfaces will be used by drivers/base/memory.c by later patch, so
as a preparatory work move them to more common header file visible to the
file.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
hugepage
Patch series "mm, hwpoison: improve handling workload related to hugetlb
and memory_hotplug", v7.
This patchset tries to solve the issue among memory_hotplug, hugetlb and hwpoison.
In this patchset, memory hotplug handles hwpoison pages like below:
- hwpoison pages should not prevent memory hotremove,
- memory block with hwpoison pages should not be onlined.
This patch (of 4):
HWPoisoned page is not supposed to be accessed once marked, but currently
such accesses can happen during memory hotremove because
do_migrate_range() can be called before dissolve_free_huge_pages() is
called.
Clear HPageMigratable for hwpoisoned hugepages to prevent them from being
migrated. This should be done in hugetlb_lock to avoid race against
isolate_hugetlb().
get_hwpoison_huge_page() needs to have a flag to show it's called from
unpoison to take refcount of hwpoisoned hugepages, so add it.
[[email protected]: remove TestClearHPageMigratable and reduce to test and clear separately]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Reported-by: Miaohe Lin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The attribute was added in GCC 12.1.
This will simplify future cleanups, and is closer to what we do
in `compiler_attributes.h`.
Link: https://godbolt.org/z/MGbT76j6G
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Acked-by: Marco Elver <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The attribute was added in GCC 4.9, while the minimum GCC version
supported by the kernel is GCC 5.1.
Therefore, remove the check.
Link: https://godbolt.org/z/GrMeo6fYr
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The attribute was added in GCC 5.1, which matches the minimum GCC version
supported by the kernel.
Therefore, remove the check.
Link: https://godbolt.org/z/vbxKejxbx
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Acked-by: Marco Elver <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The attribute was added in GCC 4.8, while the minimum GCC version
supported by the kernel is GCC 5.1.
Therefore, remove the check.
Link: https://godbolt.org/z/84v56vcn8
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "compiler-gcc: be consistent with underscores use for
`no_sanitize`".
This patch (of 5):
Other macros that define shorthands for attributes in e.g.
`compiler_attributes.h` and elsewhere use underscores.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This is no longer used; all callers have been converted to use folios
instead. Somehow this manages to save 11 bytes of text.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This event is used in order to validate/debug a start address of freed VA,
number of currently outstanding and maximum allowed areas.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It is for debug purposes to track number of freed vmap areas including a
range it occurs on.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Add basic trace events for vmap/vmalloc (v2)", v2.
This small series add some basic trace events for the vmap/vmalloc code.
Since currently we lack any, sometimes it is hard to start debuging vmap
code if an issue is reported or occured.
For example https://lore.kernel.org/linux-mm/Y0p8BZIiDXLQbde%2F@pc636/T/
The final patch adds two reviewers for vmalloc code.
This patch (of 7):
It is for debug purposes and for validation of passed parameters.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The priority of hotplug memory callback is defined in a different file.
And there are some callers using numbers directly. Collect them together
into include/linux/memory.h for easy reading. This allows us to sort
their priorities more intuitively without additional comments.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Remove unused register_hotmemory_notifier().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There is no eprotect(), so I assume this is about mprotect().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rolf Eike Beer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Helper function to retrieve hstate information from a hugetlb folio.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Remove the last caller of delete_from_page_cache() by converting the code
to its folio equivalent.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Allow hugetlbfs_migrate_folio to check and read subpool information by
passing in a folio.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Allow struct folio to store hugetlb metadata that is contained in the
private field of the first tail page. On 32-bit, _private_1 aligns with
page[1].private.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Mike Kravetz <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "begin converting hugetlb code to folios", v4.
This patch series starts the conversion of the hugetlb code to operate on
struct folios rather than struct pages. This removes the ambiguitiy of
whether functions are operating on head pages, tail pages of compound
pages, or base pages.
This series passes the linux test project hugetlb test cases.
Patch 1 adds hugeltb specific page macros that can operate on folios.
Patch 2 adds the private field of the first tail page to struct page. For
32-bit, _private_1 alinging with page[1].private was confirmed by using
pahole.
Patch 3 introduces hugetlb subpool helper functions which operate on
struct folios. These patches were tested using the hugepage-mmap.c
selftest along with the migratepages command.
Patch 4 converts hugetlb_delete_from_page_cache() to use folios.
Patch 5 adds a folio_hstate() function to get hstate information from a
folio and adds a user of folio_hstate().
Bpftrace was used to track time spent in the free_huge_pages function
during the ltp test cases as it is a caller of the hugetlb subpool
functions. From the histogram, the performance is similar before and
after the patch series.
Time spent in 'free_huge_page'
6.0.0-rc2.master.20220823
@nsecs:
[256, 512) 14770 |@@@@@@@@@@@@@@@@@@@@@@@@@@@
|@@@@@@@@@@@@@@@@@@@@@@@@@ |
[512, 1K) 155 | |
[1K, 2K) 169 | |
[2K, 4K) 50 | |
[4K, 8K) 14 | |
[8K, 16K) 3 | |
[16K, 32K) 3 | |
6.0.0-rc2.master.20220823 + patch series
@nsecs:
[256, 512) 13678 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
|@@@@@@@@@@@@@@@@@@@@@@@@@ |
[512, 1K) 142 | |
[1K, 2K) 199 | |
[2K, 4K) 44 | |
[4K, 8K) 13 | |
[8K, 16K) 4 | |
[16K, 32K) 1 | |
This patch (of 5):
Allow the macros which test, set, and clear hugetlb specific page flags to
take a hugetlb folio as an input. The macrros are generated as
folio_{test, set, clear}_hugetlb_{restore_reserve, migratable, temporary,
freed, vmemmap_optimized, raw_hwp_unreliable}.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
We noticed a 2% webserver throughput regression after upgrading from 5.6.
This could be tracked down to a shift in the anon/file reclaim balance
(confirmed with swappiness) that resulted in worse reclaim efficiency and
thus more kswapd activity for the same outcome.
The change that exposed the problem is aae466b0052e ("mm/swap: implement
workingset detection for anonymous LRU"). By qualifying swapins based on
their refault distance, it lowered the cost of anon reclaim in this
workload, in turn causing (much) more anon scanning than before. Scanning
the anon list is more expensive due to the higher ratio of mmapped pages
that may rotate during reclaim, and so the result was an increase in %sys
time.
Right now, rotations aren't considered a cost when balancing scan pressure
between LRUs. We can end up with very few file refaults putting all the
scan pressure on hot anon pages that are rotated en masse, don't get
reclaimed, and never push back on the file LRU again. We still only
reclaim file cache in that case, but we burn a lot CPU rotating anon
pages. It's "fair" from an LRU age POV, but doesn't reflect the real cost
it imposes on the system.
Consider rotations as a secondary factor in balancing the LRUs. This
doesn't attempt to make a precise comparison between IO cost and CPU cost,
it just says: if reloads are about comparable between the lists, or
rotations are overwhelmingly different, adjust for CPU work.
This fixed the regression on our webservers. It has since been deployed
to the entire Meta fleet and hasn't caused any problems.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified. At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages. ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error. Instead, at each
level of the page table a check is made for a hugetlb entry. If a hugetlb
entry is found, a call to a routine associated with that entry is made.
Currently, there are two checks for hugetlb entries at each page table
level. The first check is of the form:
if (p?d_huge())
page = follow_huge_p?d();
the second check is of the form:
if (is_hugepd())
page = follow_huge_pd().
We can replace these checks, as well as the special handling routines such
as follow_huge_p?d() and follow_huge_pd() with a single routine to handle
hugetlb vmas.
A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask. hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries. huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[[email protected]: remove vma (pmd sharing) per Peter]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: remove left over hugetlb_vma_unlock_read()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Kravetz <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Along the development cycle, the testing code support for module/in-kernel
compiles was removed. Restore this functionality by moving any internal
API tests to the userspace side, as well as threading tests. Fix the
lockdep issues and add a way to reduce memory usage so the tests can
complete with KASAN + memleak detection. Make the tests work on 32 bit
hosts where possible and detect 32 bit hosts in the radix test suite.
[[email protected]: fix module export]
[[email protected]: fix it some more]
[[email protected]: fix compile warnings on 32bit build in check_find()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A small audit patch to fix an instance of undefined behavior in a
shift operator caused when shifting a signed value too far, the same
case as the lsm patch merged previously.
While the fix is trivial and I can't imagine it causing a problem in a
backport, I'm not explicitly marking it for stable on the off chance
that there is some system out there which is relying on some wonky
unexpected behavior which this patch could break; *if* it does break,
IMO it's better that to happen in a minor or -rcX release and not in a
stable backport"
* tag 'audit-pr-20221107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: fix undefined behavior in bit shift for AUDIT_BIT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm fix from Paul Moore:
"A small capability patch to fix an instance of undefined behavior in a
shift operator caused when shifting a signed value too far.
While the fix is trivial and I can't imagine it causing a problem in a
backport, I'm not explicitly marking it for stable on the off chance
that there is some system out there which is relying on some wonky
unexpected behavior which this patch could break; *if* it does break,
IMO it's better that to happen in a minor or -rcX release and not in a
stable backport"
* tag 'lsm-pr-20221107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
capabilities: fix undefined behavior in bit shift for CAP_TO_MASK
|
|
ACPICA commit 28fc163aa29e208678d901d98bb9030b775521b3
Version 20221020.
Link: https://github.com/acpica/acpica/commit/28fc163a
Signed-off-by: Bob Moore <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
This patch adds tracepoints for send and recv cases of dlm messages and
dlm rcom messages. In case of send and dlm message we add the dlm rsb
resource name this dlm messages belongs to. This has the advantage to
follow dlm messages on a per lock basis. In case of recv message the
resource name can be extracted by follow the send message sequence
number.
The dlm message DLM_MSG_PURGE doesn't belong to a lock request and will
not set the resource name in a dlm_message trace. The same for all rcom
messages.
There is additional handling required for this debugging functionality
which is tried to be small as possible. Also the midcomms layer gets
aware of lock resource names, for now this is required to make a
connection between sequence number and lock resource names. It is for
debugging purpose only.
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: David Teigland <[email protected]>
|
|
|
|
The difference between MT8186 and other ICs is that when modifying the
output format, we need to modify the mmsys_base+0x400 register to take
effect. So when setting the dpi output format, we need to call
mtk_mmsys_ddp_dpi_fmt_config to set it to MT8186 synchronously.
Commit a071e52f75d1 ("soc: mediatek: Add mmsys func to adapt to dpi
output for MT8186") lacked some of the possible output formats and also
had a wrong bitmask.
Add the missing output formats and fix the bitmask.
While at it, also update mtk_mmsys_ddp_dpi_fmt_config() to use generic
formats, so that it is slightly easier to extend for other platforms.
Fixes: a071e52f75d1 ("soc: mediatek: Add mmsys func to adapt to dpi output for MT8186")
Signed-off-by: Xinlei Lee <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Reviewed-by: CK Hu <[email protected]>
Reviewed-by: Nícolas F. R. A. Prado <[email protected]>
Signed-off-by: Matthias Brugger <[email protected]>
|
|
Commit d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP")
fixed an orphan section warning by adding the '.data..decrypted' section
to the linker script under the PERCPU_DECRYPTED_SECTION define but that
placement introduced a panic with !SMP, as the percpu sections are not
instantiated with that configuration so attempting to access variables
defined with DEFINE_PER_CPU_DECRYPTED() will result in a page fault.
Move the '.data..decrypted' section to the DATA_MAIN define so that the
variables in it are properly instantiated at boot time with
CONFIG_SMP=n.
Cc: [email protected]
Fixes: d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP")
Link: https://lore.kernel.org/[email protected]/
Debugged-by: Ard Biesheuvel <[email protected]>
Reported-by: Zhao Wenhui <[email protected]>
Tested-by: xiafukun <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Replace assignment of PCI domain IDs from atomic_inc_return() to
ida_alloc().
Use two IDAs, one for static domain allocations (those which are defined in
device tree) and second for dynamic allocations (all other).
During removal of root bus / host bridge, also release the domain ID. The
released ID can be reused again, for example when dynamically loading and
unloading native PCI host bridge drivers.
This change also allows to mix static device tree assignment and dynamic by
kernel as all static allocations are reserved in dynamic pool.
[bhelgaas: set "err" if "bus->domain_nr < 0"]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Pali Rohár <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
rxrpc has a problem in its congestion management in that it saves the
congestion window size (cwnd) from one call to another, but if this is 0 at
the time is saved, then the next call may not actually manage to ever
transmit anything.
To this end:
(1) Don't save cwnd between calls, but rather reset back down to the
initial cwnd and re-enter slow-start if data transmission is idle for
more than an RTT.
(2) Preserve ssthresh instead, as that is a handy estimate of pipe
capacity. Knowing roughly when to stop slow start and enter
congestion avoidance can reduce the tendency to overshoot and drop
larger amounts of packets when probing.
In future, cwind growth also needs to be constrained when the window isn't
being filled due to being application limited.
Reported-by: Simon Wilkinson <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
|