blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-11-09	regulator: qcom_smd: Add PMR735a regulators	Konrad Dybcio	1	-0/+2
	PMR735a is already supported in the RPMH regulator driver, but there are cases where it's bundled with SMD RPM SoCs. Port it over to qcom_smd-regulator to enable usage in such cases. Signed-off-by: Konrad Dybcio <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2022-11-09	KVM: replace direct irq.h inclusion	Paolo Bonzini	1	-0/+2
	virt/kvm/irqchip.c is including "irq.h" from the arch-specific KVM source directory (i.e. not from arch/*/include) for the sole purpose of retrieving irqchip_in_kernel. Making the function inline in a header that is already included, such as asm/kvm_host.h, is not possible because it needs to look at struct kvm which is defined after asm/kvm_host.h is included. So add a kvm_arch_irqchip_in_kernel non-inline function; irqchip_in_kernel() is only performance critical on arm64 and x86, and the non-inline function is enough on all other architectures. irq.h can then be deleted from all architectures except x86. Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09	KVM: x86: Add a VALID_MASK for the MSR exit reason flags	Aaron Lewis	1	-0/+3
	Add the mask KVM_MSR_EXIT_REASON_VALID_MASK for the MSR exit reason flags. This simplifies checks that validate these flags, and makes it easier to introduce new flags in the future. No functional change intended. Signed-off-by: Aaron Lewis <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09	kvm: Add interruptible flag to __gfn_to_pfn_memslot()	Peter Xu	1	-2/+2
	Add a new "interruptible" flag showing that the caller is willing to be interrupted by signals during the __gfn_to_pfn_memslot() request. Wire it up with a FOLL_INTERRUPTIBLE flag that we've just introduced. This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the SIGIPI) even during e.g. handling an userfaultfd page fault. No functional change intended. Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09	kvm: Add KVM_PFN_ERR_SIGPENDING	Peter Xu	1	-0/+10
	Add a new pfn error to show that we've got a pending signal to handle during hva_to_pfn_slow() procedure (of -EINTR retval). Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09	mm/gup: Add FOLL_INTERRUPTIBLE	Peter Xu	1	-0/+1
	We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One issue with it is that not all GUP paths are able to handle signal delivers besides SIGKILL. That's not ideal for the GUP users who are actually able to handle these cases, like KVM. KVM uses GUP extensively on faulting guest pages, during which we've got existing infrastructures to retry a page fault at a later time. Allowing the GUP to be interrupted by generic signals can make KVM related threads to be more responsive. For examples: (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be generated to kick the vcpus out of kernel context immediately, (2) SIGINT: which can be used with interactive hypervisor users to stop a virtual machine with Ctrl-C without any delays/hangs, (3) SIGTRAP: which grants GDB capability even during page faults that are stuck for a long time. Normally hypervisor will be able to receive these signals properly, but not if we're stuck in a GUP for a long time for whatever reason. It happens easily with a stucked postcopy migration when e.g. a network temp failure happens, then some vcpu threads can hang death waiting for the pages. With the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively enable the ability to trap these signals. Reviewed-by: John Hubbard <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Peter Xu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09	bug: introduce ASSERT_STRUCT_OFFSET	Maxim Levitsky	1	-0/+9
	ASSERT_STRUCT_OFFSET allows to assert during the build of the kernel that a field in a struct have an expected offset. KVM used to have such macro, but there is almost nothing KVM specific in it so move it to build_bug.h, so that it can be used in other places in KVM. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09	rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC	John Ogness	1	-0/+5
	Provide an implementation for debug_lockdep_rcu_enabled() when CONFIG_DEBUG_LOCK_ALLOC is not enabled. This allows code to check if rcu lockdep debugging is available without needing an extra check if CONFIG_DEBUG_LOCK_ALLOC is enabled. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-11-09	driver core: class: make namespace and get_ownership take const *	Greg Kroah-Hartman	1	-2/+2
	The callbacks in struct class namespace() and get_ownership() do not modify the struct device passed to them, so mark the pointer as constant and fix up all callbacks in the kernel to have the correct function signature. This helps make it more obvious what calls and callbacks do, and do not, modify structures passed to them. Cc: "Rafael J. Wysocki" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-11-09	Merge tag 'rxrpc-next-20221108' of ↵	David S. Miller	4	-88/+282
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs rxrpc changes David Howells says: ==================== rxrpc: Increasing SACK size and moving away from softirq, part 1 AF_RXRPC has some issues that need addressing: (1) The SACK table has a maximum capacity of 255, but for modern networks that isn't sufficient. This is hard to increase in the upstream code because of the way the application thread is coupled to the softirq and retransmission side through a ring buffer. Adjustments to the rx protocol allows a capacity of up to 8192, and having a ring sufficiently large to accommodate that would use an excessive amount of memory as this is per-call. (2) Processing ACKs in softirq mode causes the ACKs get conflated, with only the most recent being considered. Whilst this has the upside that the retransmission algorithm only needs to deal with the most recent ACK, it causes DATA transmission for a call to be very bursty because DATA packets cannot be transmitted in softirq mode. Rather transmission must be delegated to either the application thread or a workqueue, so there tend to be sudden bursts of traffic for any particular call due to scheduling delays. (3) All crypto in a single call is done in series; however, each DATA packet is individually encrypted so encryption and decryption of large calls could be parallelised if spare CPU resources are available. This is the first of a number of sets of patches that try and address them. The overall aims of these changes include: (1) To get rid of the TxRx ring and instead pass the packets round in queues (eg. sk_buff_head). On the Tx side, each ACK packet comes with a SACK table that can be parsed as-is, so there's no particular need to maintain our own; we just have to refer to the ACK. On the Rx side, we do need to maintain a SACK table with one bit per entry - but only if packets go missing - and we don't want to have to perform a complex transformation to get the information into an ACK packet. (2) To try and move almost all processing of received packets out of the softirq handler and into a high-priority kernel I/O thread. Only the transferral of packets would be left there. I would still use the encap_rcv hook to receive packets as there's a noticeable performance drop from letting the UDP socket put the packets into its own queue and then getting them out of there. (3) To make the I/O thread also do all the transmission. The app thread would be responsible for packaging the data into packets and then buffering them for the I/O thread to transmit. This would make it easier for the app thread to run ahead of the I/O thread, and would mean the I/O thread is less likely to have to wait around for a new packet to come available for transmission. (4) To logically partition the socket/UAPI/KAPI side of things from the I/O side of things. The local endpoint, connection, peer and call objects would belong to the I/O side. The socket side would not then touch the private internals of calls and suchlike and would not change their states. It would only look at the send queue, receive queue and a way to pass a message to cause an abort. (5) To remove as much locking, synchronisation, barriering and atomic ops as possible from the I/O side. Exclusion would be achieved by limiting modification of state to the I/O thread only. Locks would still need to be used in communication with the UDP socket and the AF_RXRPC socket API. (6) To provide crypto offload kernel threads that, when there's slack in the system, can see packets that need crypting and provide parallelisation in dealing with them. (7) To remove the use of system timers. Since each timer would then send a poke to the I/O thread, which would then deal with it when it had the opportunity, there seems no point in using system timers if, instead, a list of timeouts can be sensibly consulted. An I/O thread only then needs to schedule with a timeout when it is idle. (8) To use zero-copy sendmsg to send packets. This would make use of the I/O thread being the sole transmitter on the socket to manage the dead-reckoning sequencing of the completion notifications. There is a problem with zero-copy, though: the UDP socket doesn't handle running out of option memory very gracefully. With regard to this first patchset, the changes made include: (1) Some fixes, including a fallback for proc_create_net_single_write(), setting ack.bufferSize to 0 in ACK packets and a fix for rxrpc congestion management, which shouldn't be saving the cwnd value between calls. (2) Improvements in rxrpc tracepoints, including splitting the timer tracepoint into a set-timer and a timer-expired trace. (3) Addition of a new proc file to display some stats. (4) Some code cleanups, including removing some unused bits and unnecessary header inclusions. (5) A change to the recently added UDP encap_err_rcv hook so that it has the same signature as {ip,ipv6}_icmp_error(), and then just have rxrpc point its UDP socket's hook directly at those. (6) Definition of a new struct, rxrpc_txbuf, that is used to hold transmissible packets of DATA and ACK type in a single 2KiB block rather than using an sk_buff. This allows the buffer to be on a number of queues simultaneously more easily, and also guarantees that the entire block is in a single unit for zerocopy purposes and that the data payload is aligned for in-place crypto purposes. (7) ACK txbufs are allocated at proposal and queued for later transmission rather than being stored in a single place in the rxrpc_call struct, which means only a single ACK can be pending transmission at a time. The queue is then drained at various points. This allows the ACK generation code to be simplified. (8) The Rx ring buffer is removed. When a jumbo packet is received (which comprises a number of ordinary DATA packets glued together), it used to be pointed to by the ring multiple times, with an annotation in a side ring indicating which subpacket was in that slot - but this is no longer possible. Instead, the packet is cloned once for each subpacket, barring the last, and the range of data is set in the skb private area. This makes it easier for the subpackets in a jumbo packet to be decrypted in parallel. (9) The Tx ring buffer is removed. The side annotation ring that held the SACK information is also removed. Instead, in the event of packet loss, the SACK data attached an ACK packet is parsed. (10) Allocate an skcipher request when needed in the rxkad security class rather than caching one in the rxrpc_call struct. This deals with a race between externally-driven call disconnection getting rid of the skcipher request and sendmsg/recvmsg trying to use it because they haven't seen the completion yet. This is also needed to support parallelisation as the skcipher request cannot be used by two or more threads simultaneously. (11) Call udp_sendmsg() and udpv6_sendmsg() directly rather than going through kernel_sendmsg() so that we can provide our own iterator (zerocopy explicitly doesn't work with a KVEC iterator). This also lets us avoid the overhead of the security hook. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-11-09	net/core: Allow live renaming when an interface is up	Andy Ren	1	-3/+1
	Allow a network interface to be renamed when the interface is up. As described in the netconsole documentation [1], when netconsole is used as a built-in, it will bring up the specified interface as soon as possible. As a result, user space will not be able to rename the interface since the kernel disallows renaming of interfaces that are administratively up unless the 'IFF_LIVE_RENAME_OK' private flag was set by the kernel. The original solution [2] to this problem was to add a new parameter to the netconsole configuration parameters that allows renaming of the interface used by netconsole while it is administratively up. However, during the discussion that followed, it became apparent that we have no reason to keep the current restriction and instead we should allow user space to rename interfaces regardless of their administrative state: 1. The restriction was put in place over 20 years ago when renaming was only possible via IOCTL and before rtnetlink started notifying user space about such changes like it does today. 2. The 'IFF_LIVE_RENAME_OK' flag was added over 3 years ago in version 5.2 and no regressions were reported. 3. In-kernel listeners to 'NETDEV_CHANGENAME' do not seem to care about the administrative state of interface. Therefore, allow user space to rename running interfaces by removing the restriction and the associated 'IFF_LIVE_RENAME_OK' flag. Help in possible triage by emitting a message to the kernel log that an interface was renamed while UP. [1] https://www.kernel.org/doc/Documentation/networking/netconsole.rst [2] https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Andy Ren <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-09	include/uapi/linux/swab: Fix potentially missing __always_inline	Matt Redfearn	1	-1/+1
	Commit bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations") added __always_inline to swab functions and commit 283d75737837 ("uapi/linux/stddef.h: Provide __always_inline to userspace headers") added a definition of __always_inline for use in exported headers when the kernel's compiler.h is not available. However, since swab.h does not include stddef.h, if the header soup does not indirectly include it, the definition of __always_inline is missing, resulting in a compilation failure, which was observed compiling the perf tool using exported headers containing this commit: In file included from /usr/include/linux/byteorder/little_endian.h:12:0, from /usr/include/asm/byteorder.h:14, from tools/include/uapi/linux/perf_event.h:20, from perf.h:8, from builtin-bench.c:18: /usr/include/linux/swab.h:160:8: error: unknown type name `__always_inline' static __always_inline __u16 __swab16p(const __u16 *p) Fix this by replacing the inclusion of linux/compiler.h with linux/stddef.h to ensure that we pick up that definition if required, without relying on it's indirect inclusion. compiler.h is then included indirectly, via stddef.h. Fixes: 283d75737837 ("uapi/linux/stddef.h: Provide __always_inline to userspace headers") Signed-off-by: Matt Redfearn <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Reviewed-by: Petr Vaněk <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2022-11-09	gpiolib: remove devm_fwnode_get_[index_]gpiod_from_child()	Dmitry Torokhov	1	-21/+0
	Now that there are no more users of these APIs in the kernel we can remove them. Signed-off-by: Dmitry Torokhov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-11-09	tty: Convert tty_buffer flags to bool	Ilpo Järvinen	2	-6/+3
	The struct tty_buffer has flags which is only used for storing TTYB_NORMAL. There is also a few quite confusing operations for checking the presense of TTYB_NORMAL. Simplify things by converting flags to bool. Despite the name remaining the same, the meaning of "flags" is altered slightly by this change. Previously it referred to flags of the buffer (only TTYB_NORMAL being used as a flag). After this change, flags tell whether the buffer contains/should be allocated with flags array along with character data array. It is much more suitable name that TTYB_NORMAL was for this purpose, thus the name remains. Signed-off-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-11-09	efi: libstub: Merge zboot decompressor with the ordinary stub	Ard Biesheuvel	1	-1/+0
	Even though our EFI zboot decompressor is pedantically spec compliant and idiomatic for EFI image loaders, calling LoadImage() and StartImage() for the nested image is a bit of a burden. Not only does it create workflow issues for the distros (as both the inner and outer PE/COFF images need to be signed for secure boot), it also copies the image around in memory numerous times: - first, the image is decompressed into a buffer; - the buffer is consumed by LoadImage(), which copies the sections into a newly allocated memory region to hold the executable image; - once the EFI stub is invoked by StartImage(), it will also move the image in memory in case of KASLR, mirrored memory or if the image must execute from a certain a priori defined address. There are only two EFI spec compliant ways to load code into memory and execute it: - use LoadImage() and StartImage(), - call ExitBootServices() and take ownership of the entire system, after which anything goes. Given that the EFI zboot decompressor always invokes the EFI stub, and given that both are built from the same set of objects, let's merge the two, so that we can avoid LoadImage()/StartImage but still load our image into memory without breaking the above rules. This also means we can decompress the image directly into its final location, which could be randomized or meet other platform specific constraints that LoadImage() does not know how to adhere to. It also means that, even if the encapsulated image still has the EFI stub incorporated as well, it does not need to be signed for secure boot when wrapping it in the EFI zboot decompressor. In the future, we might decide to retire the EFI stub attached to the decompressed image, but for the time being, they can happily coexist. Signed-off-by: Ard Biesheuvel <[email protected]>
2022-11-09	efi: libstub: Move screen_info handling to common code	Ard Biesheuvel	1	-1/+1
	Currently, arm64, RISC-V and LoongArch rely on the fact that struct screen_info can be accessed directly, due to the fact that the EFI stub and the core kernel are part of the same image. This will change after a future patch, so let's ensure that the screen_info handling is able to deal with this, by adopting the arm32 approach of passing it as a configuration table. While at it, switch to ACPI reclaim memory to hold the screen_info data, which is more appropriate for this kind of allocation. Signed-off-by: Ard Biesheuvel <[email protected]>
2022-11-08	swap: add a limit for readahead page-cluster value	Kairui Song	1	-0/+1
	Currenty there is no upper limit for /proc/sys/vm/page-cluster, and it's a bit shift value, so it could result in overflow of the 32-bit integer. Add a reasonable upper limit for it, read-in at most 2**31 pages, which is a large enough value for readahead. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kairui Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm/hwpoison: introduce per-memory_block hwpoison counter	Naoya Horiguchi	2	-1/+22
	Currently PageHWPoison flag does not behave well when experiencing memory hotremove/hotplug. Any data field in struct page is unreliable when the associated memory is offlined, and the current mechanism can't tell whether a memory block is onlined because a new memory devices is installed or because previous failed offline operations are undone. Especially if there's a hwpoisoned memory, it's unclear what the best option is. So introduce a new mechanism to make struct memory_block remember that a memory block has hwpoisoned memory inside it. And make any online event fail if the onlining memory block contains hwpoison. struct memory_block is freed and reallocated over ACPI-based hotremove/hotplug, but not over sysfs-based hotremove/hotplug. So the new counter can distinguish these cases. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm/hwpoison: pass pfn to num_poisoned_pages_*()	Naoya Horiguchi	1	-2/+2
	No functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c	Naoya Horiguchi	2	-22/+7
	These interfaces will be used by drivers/base/memory.c by later patch, so as a preparatory work move them to more common header file visible to the file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned ↵	Naoya Horiguchi	2	-6/+10
	hugepage Patch series "mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug", v7. This patchset tries to solve the issue among memory_hotplug, hugetlb and hwpoison. In this patchset, memory hotplug handles hwpoison pages like below: - hwpoison pages should not prevent memory hotremove, - memory block with hwpoison pages should not be onlined. This patch (of 4): HWPoisoned page is not supposed to be accessed once marked, but currently such accesses can happen during memory hotremove because do_migrate_range() can be called before dissolve_free_huge_pages() is called. Clear HPageMigratable for hwpoisoned hugepages to prevent them from being migrated. This should be done in hugetlb_lock to avoid race against isolate_hugetlb(). get_hwpoison_huge_page() needs to have a flag to show it's called from unpoison to take refcount of hwpoisoned hugepages, so add it. [[email protected]: remove TestClearHPageMigratable and reduce to test and clear separately] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Reported-by: Miaohe Lin <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	compiler-gcc: document minimum version for `__no_sanitize_coverage__`	Miguel Ojeda	1	-0/+3
	The attribute was added in GCC 12.1. This will simplify future cleanups, and is closer to what we do in `compiler_attributes.h`. Link: https://godbolt.org/z/MGbT76j6G Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Acked-by: Marco Elver <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	compiler-gcc: remove attribute support check for `__no_sanitize_undefined__`	Miguel Ojeda	1	-4/+0
	The attribute was added in GCC 4.9, while the minimum GCC version supported by the kernel is GCC 5.1. Therefore, remove the check. Link: https://godbolt.org/z/GrMeo6fYr Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Marco Elver <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	compiler-gcc: remove attribute support check for `__no_sanitize_thread__`	Miguel Ojeda	1	-1/+1
	The attribute was added in GCC 5.1, which matches the minimum GCC version supported by the kernel. Therefore, remove the check. Link: https://godbolt.org/z/vbxKejxbx Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Acked-by: Marco Elver <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	compiler-gcc: remove attribute support check for `__no_sanitize_address__`	Miguel Ojeda	1	-4/+0
	The attribute was added in GCC 4.8, while the minimum GCC version supported by the kernel is GCC 5.1. Therefore, remove the check. Link: https://godbolt.org/z/84v56vcn8 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Marco Elver <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	compiler-gcc: be consistent with underscores use for `no_sanitize`	Miguel Ojeda	1	-4/+4
	Patch series "compiler-gcc: be consistent with underscores use for `no_sanitize`". This patch (of 5): Other macros that define shorthands for attributes in e.g. `compiler_attributes.h` and elsewhere use underscores. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Li <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm: remove FGP_HEAD	Matthew Wilcox (Oracle)	1	-3/+2
	This is no longer used; all callers have been converted to use folios instead. Somehow this manages to save 11 bytes of text. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm: vmalloc: add free_vmap_area_noflush trace event	Uladzislau Rezki (Sony)	1	-0/+34
	This event is used in order to validate/debug a start address of freed VA, number of currently outstanding and maximum allowed areas. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm: vmalloc: add purge_vmap_area_lazy trace event	Uladzislau Rezki (Sony)	1	-0/+33
	It is for debug purposes to track number of freed vmap areas including a range it occurs on. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm: vmalloc: add alloc_vmap_area trace event	Uladzislau Rezki (Sony)	1	-0/+56
	Patch series "Add basic trace events for vmap/vmalloc (v2)", v2. This small series add some basic trace events for the vmap/vmalloc code. Since currently we lack any, sometimes it is hard to start debuging vmap code if an issue is reported or occured. For example https://lore.kernel.org/linux-mm/Y0p8BZIiDXLQbde%2F@pc636/T/ The final patch adds two reviewers for vmalloc code. This patch (of 7): It is for debug purposes and for validation of passed parameters. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	memory: move hotplug memory notifier priority to same file for easy sorting	Liu Shixin	2	-3/+7
	The priority of hotplug memory callback is defined in a different file. And there are some callers using numbers directly. Collect them together into include/linux/memory.h for easy reading. This allows us to sort their priorities more intuitively without additional comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Shixin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Waiman Long <[email protected]> Cc: zefan li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	memory: remove unused register_hotmemory_notifier()	Liu Shixin	1	-6/+0
	Remove unused register_hotmemory_notifier(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Shixin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Waiman Long <[email protected]> Cc: zefan li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm: fix typo in struct vm_operations_struct comments	Rolf Eike Beer	1	-1/+1
	There is no eprotect(), so I assume this is about mprotect(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rolf Eike Beer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm/hugetlb: add folio_hstate()	Sidhartha Kumar	1	-2/+12
	Helper function to retrieve hstate information from a hugetlb folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	hugetlbfs: convert hugetlb_delete_from_page_cache() to use folios	Sidhartha Kumar	1	-1/+0
	Remove the last caller of delete_from_page_cache() by converting the code to its folio equivalent. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm/hugetlb: add hugetlb_folio_subpool() helpers	Sidhartha Kumar	1	-2/+13
	Allow hugetlbfs_migrate_folio to check and read subpool information by passing in a folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: kernel test robot <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm: add private field of first tail to struct page and struct folio	Sidhartha Kumar	1	-0/+14
	Allow struct folio to store hugetlb metadata that is contained in the private field of the first tail page. On 32-bit, _private_1 aligns with page[1].private. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Acked-by: Mike Kravetz <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: kernel test robot <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm/hugetlb: add folio support to hugetlb specific flag macros	Sidhartha Kumar	1	-0/+24
	Patch series "begin converting hugetlb code to folios", v4. This patch series starts the conversion of the hugetlb code to operate on struct folios rather than struct pages. This removes the ambiguitiy of whether functions are operating on head pages, tail pages of compound pages, or base pages. This series passes the linux test project hugetlb test cases. Patch 1 adds hugeltb specific page macros that can operate on folios. Patch 2 adds the private field of the first tail page to struct page. For 32-bit, _private_1 alinging with page[1].private was confirmed by using pahole. Patch 3 introduces hugetlb subpool helper functions which operate on struct folios. These patches were tested using the hugepage-mmap.c selftest along with the migratepages command. Patch 4 converts hugetlb_delete_from_page_cache() to use folios. Patch 5 adds a folio_hstate() function to get hstate information from a folio and adds a user of folio_hstate(). Bpftrace was used to track time spent in the free_huge_pages function during the ltp test cases as it is a caller of the hugetlb subpool functions. From the histogram, the performance is similar before and after the patch series. Time spent in 'free_huge_page' 6.0.0-rc2.master.20220823 @nsecs: [256, 512) 14770 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@ \|@@@@@@@@@@@@@@@@@@@@@@@@@ \| [512, 1K) 155 \| \| [1K, 2K) 169 \| \| [2K, 4K) 50 \| \| [4K, 8K) 14 \| \| [8K, 16K) 3 \| \| [16K, 32K) 3 \| \| 6.0.0-rc2.master.20220823 + patch series @nsecs: [256, 512) 13678 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| \|@@@@@@@@@@@@@@@@@@@@@@@@@ \| [512, 1K) 142 \| \| [1K, 2K) 199 \| \| [2K, 4K) 44 \| \| [4K, 8K) 13 \| \| [8K, 16K) 4 \| \| [16K, 32K) 1 \| \| This patch (of 5): Allow the macros which test, set, and clear hugetlb specific page flags to take a hugetlb folio as an input. The macrros are generated as folio_{test, set, clear}_hugetlb_{restore_reserve, migratable, temporary, freed, vmemmap_optimized, raw_hwp_unreliable}. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Colin Cross <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: kernel test robot <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm: vmscan: make rotations a secondary factor in balancing anon vs file	Johannes Weiner	1	-2/+3
	We noticed a 2% webserver throughput regression after upgrading from 5.6. This could be tracked down to a shift in the anon/file reclaim balance (confirmed with swappiness) that resulted in worse reclaim efficiency and thus more kswapd activity for the same outcome. The change that exposed the problem is aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU"). By qualifying swapins based on their refault distance, it lowered the cost of anon reclaim in this workload, in turn causing (much) more anon scanning than before. Scanning the anon list is more expensive due to the higher ratio of mmapped pages that may rotate during reclaim, and so the result was an increase in %sys time. Right now, rotations aren't considered a cost when balancing scan pressure between LRUs. We can end up with very few file refaults putting all the scan pressure on hot anon pages that are rotated en masse, don't get reclaimed, and never push back on the file LRU again. We still only reclaim file cache in that case, but we burn a lot CPU rotating anon pages. It's "fair" from an LRU age POV, but doesn't reflect the real cost it imposes on the system. Consider rotations as a secondary factor in balancing the LRUs. This doesn't attempt to make a precise comparison between IO cost and CPU cost, it just says: if reloads are about comparable between the lists, or rotations are overwhelmingly different, adjust for CPU work. This fixed the regression on our webservers. It has since been deployed to the entire Meta fleet and hasn't caused any problems. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	hugetlb: simplify hugetlb handling in follow_page_mask	Mike Kravetz	1	-42/+8
	During discussions of this series [1], it was suggested that hugetlb handling code in follow_page_mask could be simplified. At the beginning of follow_page_mask, there currently is a call to follow_huge_addr which 'may' handle hugetlb pages. ia64 is the only architecture which provides a follow_huge_addr routine that does not return error. Instead, at each level of the page table a check is made for a hugetlb entry. If a hugetlb entry is found, a call to a routine associated with that entry is made. Currently, there are two checks for hugetlb entries at each page table level. The first check is of the form: if (p?d_huge()) page = follow_huge_p?d(); the second check is of the form: if (is_hugepd()) page = follow_huge_pd(). We can replace these checks, as well as the special handling routines such as follow_huge_p?d() and follow_huge_pd() with a single routine to handle hugetlb vmas. A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the beginning of follow_page_mask. hugetlb_follow_page_mask will use the existing routine huge_pte_offset to walk page tables looking for hugetlb entries. huge_pte_offset can be overwritten by architectures, and already handles special cases such as hugepd entries. [1] https://lore.kernel.org/linux-mm/[email protected]/ [[email protected]: remove vma (pmd sharing) per Peter] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: remove left over hugetlb_vma_unlock_read()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Tested-by: Baolin Wang <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	maple_tree: reorganize testing to restore module testing	Liam Howlett	1	-0/+7
	Along the development cycle, the testing code support for module/in-kernel compiles was removed. Restore this functionality by moving any internal API tests to the userspace side, as well as threading tests. Fix the lockdep issues and add a way to reduce memory usage so the tests can complete with KASAN + memleak detection. Make the tests work on 32 bit hosts where possible and detect 32 bit hosts in the radix test suite. [[email protected]: fix module export] [[email protected]: fix it some more] [[email protected]: fix compile warnings on 32bit build in check_find()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	Merge tag 'audit-pr-20221107' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A small audit patch to fix an instance of undefined behavior in a shift operator caused when shifting a signed value too far, the same case as the lsm patch merged previously. While the fix is trivial and I can't imagine it causing a problem in a backport, I'm not explicitly marking it for stable on the off chance that there is some system out there which is relying on some wonky unexpected behavior which this patch could break; if it does break, IMO it's better that to happen in a minor or -rcX release and not in a stable backport" * tag 'audit-pr-20221107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix undefined behavior in bit shift for AUDIT_BIT
2022-11-08	Merge tag 'lsm-pr-20221107' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm fix from Paul Moore: "A small capability patch to fix an instance of undefined behavior in a shift operator caused when shifting a signed value too far. While the fix is trivial and I can't imagine it causing a problem in a backport, I'm not explicitly marking it for stable on the off chance that there is some system out there which is relying on some wonky unexpected behavior which this patch could break; if it does break, IMO it's better that to happen in a minor or -rcX release and not in a stable backport" * tag 'lsm-pr-20221107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: capabilities: fix undefined behavior in bit shift for CAP_TO_MASK
2022-11-08	ACPICA: Update version to 20221020	Bob Moore	1	-1/+1
	ACPICA commit 28fc163aa29e208678d901d98bb9030b775521b3 Version 20221020. Link: https://github.com/acpica/acpica/commit/28fc163a Signed-off-by: Bob Moore <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-11-08	fd: dlm: trace send/recv of dlm message and rcom	Alexander Aring	1	-0/+297
	This patch adds tracepoints for send and recv cases of dlm messages and dlm rcom messages. In case of send and dlm message we add the dlm rsb resource name this dlm messages belongs to. This has the advantage to follow dlm messages on a per lock basis. In case of recv message the resource name can be extracted by follow the send message sequence number. The dlm message DLM_MSG_PURGE doesn't belong to a lock request and will not set the resource name in a dlm_message trace. The same for all rcom messages. There is additional handling required for this debugging functionality which is tried to be small as possible. Also the midcomms layer gets aware of lock resource names, for now this is required to make a connection between sequence number and lock resource names. It is for debugging purpose only. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]>
2022-11-08	Merge branch 'for-linus/hardening' into for-next/hardening	Kees Cook	1	-1/+1

2022-11-08	soc: mediatek: Add all settings to mtk_mmsys_ddp_dpi_fmt_config func	Xinlei Lee	1	-0/+7
	The difference between MT8186 and other ICs is that when modifying the output format, we need to modify the mmsys_base+0x400 register to take effect. So when setting the dpi output format, we need to call mtk_mmsys_ddp_dpi_fmt_config to set it to MT8186 synchronously. Commit a071e52f75d1 ("soc: mediatek: Add mmsys func to adapt to dpi output for MT8186") lacked some of the possible output formats and also had a wrong bitmask. Add the missing output formats and fix the bitmask. While at it, also update mtk_mmsys_ddp_dpi_fmt_config() to use generic formats, so that it is slightly easier to extend for other platforms. Fixes: a071e52f75d1 ("soc: mediatek: Add mmsys func to adapt to dpi output for MT8186") Signed-off-by: Xinlei Lee <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Reviewed-by: CK Hu <[email protected]> Reviewed-by: Nícolas F. R. A. Prado <[email protected]> Signed-off-by: Matthias Brugger <[email protected]>
2022-11-08	vmlinux.lds.h: Fix placement of '.data..decrypted' section	Nathan Chancellor	1	-1/+1
	Commit d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP") fixed an orphan section warning by adding the '.data..decrypted' section to the linker script under the PERCPU_DECRYPTED_SECTION define but that placement introduced a panic with !SMP, as the percpu sections are not instantiated with that configuration so attempting to access variables defined with DEFINE_PER_CPU_DECRYPTED() will result in a page fault. Move the '.data..decrypted' section to the DATA_MAIN define so that the variables in it are properly instantiated at boot time with CONFIG_SMP=n. Cc: [email protected] Fixes: d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP") Link: https://lore.kernel.org/[email protected]/ Debugged-by: Ard Biesheuvel <[email protected]> Reported-by: Zhao Wenhui <[email protected]> Tested-by: xiafukun <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-08	PCI: Assign PCI domain IDs by ida_alloc()	Pali Rohár	1	-0/+1
	Replace assignment of PCI domain IDs from atomic_inc_return() to ida_alloc(). Use two IDAs, one for static domain allocations (those which are defined in device tree) and second for dynamic allocations (all other). During removal of root bus / host bridge, also release the domain ID. The released ID can be reused again, for example when dynamically loading and unloading native PCI host bridge drivers. This change also allows to mix static device tree assignment and dynamic by kernel as all static allocations are reserved in dynamic pool. [bhelgaas: set "err" if "bus->domain_nr < 0"] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Pali Rohár <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2022-11-08	rxrpc: Fix congestion management	David Howells	1	-0/+1
	rxrpc has a problem in its congestion management in that it saves the congestion window size (cwnd) from one call to another, but if this is 0 at the time is saved, then the next call may not actually manage to ever transmit anything. To this end: (1) Don't save cwnd between calls, but rather reset back down to the initial cwnd and re-enter slow-start if data transmission is idle for more than an RTT. (2) Preserve ssthresh instead, as that is a handy estimate of pipe capacity. Knowing roughly when to stop slow start and enter congestion avoidance can reduce the tendency to overshoot and drop larger amounts of packets when probing. In future, cwind growth also needs to be constrained when the window isn't being filled due to being application limited. Reported-by: Simon Wilkinson <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]