aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-03-30dt-bindings: clock: r8a7779: Add PWM module clockGeert Uytterhoeven1-0/+1
Add the module clock used by the PWM Timers on the Renesas R-Car H1 (R8A7779) SoC. Signed-off-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/1397b517fccbe716a71cfae770512ed577730a25.1679329211.git.geert+renesas@glider.be
2023-03-30spi: s3c64xx: add no_cs descriptionJaewon Kim1-0/+1
This patch adds missing variable no_cs descriptions. Signed-off-by: Jaewon Kim <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-03-30net: add softnet_data.in_net_rx_actionEric Dumazet1-0/+1
We want to make two optimizations in napi_schedule_rps() and ____napi_schedule() which require to know if these helpers are called from net_rx_action(), instead of being called from other contexts. sd.in_net_rx_action is only read/written by the owning cpu. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jason Xing <[email protected]> Tested-by: Jason Xing <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-03-30wifi: nl80211: support advertising S1G capabilitiesKieran Frewen1-0/+7
Include S1G capabilities in netlink band info messages. Signed-off-by: Kieran Frewen <[email protected]> Co-developed-by: Gilad Itzkovitch <[email protected]> Signed-off-by: Gilad Itzkovitch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2023-03-30nfs: use vfs setgid helperChristian Brauner1-0/+2
We've aligned setgid behavior over multiple kernel releases. The details can be found in the following two merge messages: cf619f891971 ("Merge tag 'fs.ovl.setgid.v6.2') 426b4ca2d6a5 ("Merge tag 'fs.setgid.v6.0') Consistent setgid stripping behavior is now encapsulated in the setattr_should_drop_sgid() helper which is used by all filesystems that strip setgid bits outside of vfs proper. Switch nfs to rely on this helper as well. Without this patch the setgid stripping tests in xfstests will fail. Signed-off-by: Christian Brauner (Microsoft) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-03-29kunit: increase KUNIT_LOG_SIZE to 2048 bytesHeiko Carstens1-1/+1
The s390 specific test_unwind kunit test has 39 parameterized tests. The results in debugfs are truncated since the full log doesn't fit into 1500 bytes. Therefore increase KUNIT_LOG_SIZE to 2048 bytes in a similar way like it was done recently with commit "kunit: fix bug in debugfs logs of parameterized tests". With that the whole test result is present. Reported-by: Alexander Egorenkov <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Reviewed-by: Rae Moar <[email protected]> Reviewed-by: David Gow <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-03-29f2fs: convert to use bitmap APIYangtao Li1-5/+4
Let's use BIT() and GENMASK() instead of open it. Signed-off-by: Yangtao Li <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-03-29power: supply: generic-adc-battery: use simple-battery APISebastian Reichel1-18/+0
Constant battery data is available through power-supply's simple-battery API. This works automatically, so the manual handling can be removed without loosing any feature :) Note, that the POWER_SUPPLY_STATUS_FULL check for the level variable can be dropped, since the variable is never written. It can be re-introduced properly once the driver gets functionality to calculate the current charge level. Apart from that the check must be done fuzzy anyways, since charge estimation usually is not precise enough to always return exactly the full charge capacity for a full battery. Reviewed-by: Linus Walleij <[email protected]> Reviewed-by: Matti Vaittinen <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]>
2023-03-29power: supply: generic-adc-battery: drop charge now supportSebastian Reichel1-2/+0
Drop CHARGE_NOW support, which requires a platform specific calculation method. Reviewed-by: Linus Walleij <[email protected]> Reviewed-by: Matti Vaittinen <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]>
2023-03-29power: supply: generic-adc-battery: drop jitter delay supportSebastian Reichel1-3/+0
Drop support for configuring IRQ jitter delay by using big enough fixed value. Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]>
2023-03-29power: supply: core: auto-exposure of simple-battery dataSebastian Reichel1-0/+8
Automatically expose data from the simple-battery firmware node for all battery drivers. Reviewed-by: Linus Walleij <[email protected]> Reviewed-by: Matti Vaittinen <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]>
2023-03-29Merge tag 'f2fs-fix-6.3-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fix from Jaegeuk Kim: "This fixes a tracepoint field size in f2fs in preparation for stricter rules for tracing fields" * tag 'f2fs-fix-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: Fix f2fs_truncate_partial_nodes ftrace event
2023-03-29Merge v6.3-rc4 into drm-nextDaniel Vetter36-45/+200
I just landed the fence deadline PR from Rob that a bunch of drivers want/need to apply driver-specific patches. Backmerge -rc4 so that they don't have to be stuck on -rc2 for no reason at all. Signed-off-by: Daniel Vetter <[email protected]>
2023-03-29Merge tag 'dma-fence-deadline' of https://gitlab.freedesktop.org/drm/msm ↵Daniel Vetter5-22/+57
into drm-next This series adds a deadline hint to fences, so realtime deadlines such as vblank can be communicated to the fence signaller for power/ frequency management decisions. This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons: 1) To continue to be able to use the atomic helpers 2) To support cases where display and gpu are different drivers See https://patchwork.freedesktop.org/series/93035/ This does not yet add any UAPI, although this will be needed in a number of cases: 1) Workloads "ping-ponging" between CPU and GPU, where we don't want the GPU freq governor to interpret time stalled waiting for GPU as "idle" time 2) Cases where the compositor is waiting for fences to be signaled before issuing the atomic ioctl, for example to maintain 60fps cursor updates even when the GPU is not able to maintain that framerate. Signed-off-by: Daniel Vetter <[email protected]> From: Rob Clark <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGt5nDQpa6J86V1oFKPA30YcJzPhAVpmF7N1K1g2N3c=Zg@mail.gmail.com
2023-03-29regmap: Removed compressed cache supportMark Brown1-1/+0
The compressed register cache support has assumptions that make it hard to cover in testing, mainly that it requires raw registers defaults be provided. Rather than either address these assumptions or leave it untested by the forthcoming KUnit tests let's remove it, the use case is quite thin and there are no current users. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-29tracing/user_events: Align structs with tabs for readabilityBeau Belgrave2-19/+19
Add tabs to make struct members easier to read and unify the style of the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Add ioctl for disabling addressesBeau Belgrave1-0/+24
Enablements are now tracked by the lifetime of the task/mm. User processes need to be able to disable their addresses if tracing is requested to be turned off. Before unmapping the page would suffice. However, we now need a stronger contract. Add an ioctl to enable this. A new flag bit is added, freeing, to user_event_enabler to ensure that if the event is attempted to be removed while a fault is being handled that the remove is delayed until after the fault is reattempted. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Use remote writes for event enablementBeau Belgrave2-4/+64
As part of the discussions for user_events aligned with user space tracers, it was determined that user programs should register a aligned value to set or clear a bit when an event becomes enabled. Currently a shared page is being used that requires mmap(). Remove the shared page implementation and move to a user registered address implementation. In this new model during the event registration from user programs 3 new values are specified. The first is the address to update when the event is either enabled or disabled. The second is the bit to set/clear to reflect the event being enabled. The third is the size of the value at the specified address. This allows for a local 32/64-bit value in user programs to support both kernel and user tracers. As an example, setting bit 31 for kernel tracers when the event becomes enabled allows for user tracers to use the other bits for ref counts or other flags. The kernel side updates the bit atomically, user programs need to also update these values atomically. User provided addresses must be aligned on a natural boundary, this allows for single page checking and prevents odd behaviors such as a enable value straddling 2 pages instead of a single page. Currently page faults are only logged, future patches will handle these. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Track fork/exec/exit for mm lifetimeBeau Belgrave2-0/+23
During tracefs discussions it was decided instead of requiring a mapping within a user-process to track the lifetime of memory descriptors we should hook the appropriate calls. Do this by adding the minimal stubs required for task fork, exec, and exit. Currently this is just a NOP. Future patches will implement these calls fully. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Split header into uapi and kernelBeau Belgrave2-46/+54
The UAPI parts need to be split out from the kernel parts of user_events now that other parts of the kernel will reference it. Do so by moving the existing include/linux/user_events.h into include/uapi/linux/user_events.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29mtd: spi-nor: Enhance locking to support reads while writesMiquel Raynal1-0/+13
On devices featuring several banks, the Read While Write (RWW) feature is here to improve the overall performance when performing parallel reads and writes at different locations (different banks). The following constraints have to be taken into account: 1#: A single operation can be performed in a given bank. 2#: Only a single program or erase operation can happen on the entire chip (common hardware limitation to limit costs) 3#: Reads must remain serialized even though reads crossing bank boundaries are allowed. 4#: The I/O bus is unique and thus is the most constrained resource, all spi-nor operations requiring access to the spi bus (through the spi controller) must be serialized until the bus exchanges are over. So we must ensure a single operation can be "sent" at a time. 5#: Any other operation that would not be either a read or a write or an erase is considered requiring access to the full chip and cannot be parallelized, we then need to ensure the full chip is in the idle state when this occurs. All these constraints can easily be managed with a proper locking model: 1#: Is enforced by a bitfield of the in-use banks, so that only a single operation can happen in a specific bank at any time. 2#: Is handled by the ongoing_pe boolean which is set before any write or erase, and is released only at the very end of the operation. This way, no other destructive operation on the chip can start during this time frame. 3#: An ongoing_rd boolean allows to track the ongoing reads, so that only one can be performed at a time. 4#: An ongoing_io boolean is introduced in order to capture and serialize bus accessed. This is the one being released "sooner" than before, because we only need to protect the chip against other SPI accesses during the I/O phase, which for the destructive operations is the beginning of the operation (when we send the command cycles and possibly the data), while the second part of the operation (the erase delay or the programmation delay) is when we can do something else in another bank. 5#: Is handled by the three booleans presented above, if any of them is set, the chip is not yet ready for the operation and must wait. All these internal variables are protected by the existing lock, so that changes in this structure are atomic. The serialization is handled with a wait queue. Signed-off-by: Miquel Raynal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Tudor Ambarus <[email protected]>
2023-03-29cdx: add device attributesNipun Gupta1-0/+27
Create sysfs entry for CDX devices. Sysfs entries provided in each of the CDX device detected by the CDX controller - vendor id - device id - remove - reset of the device. - driver override Signed-off-by: Puneet Gupta <[email protected]> Signed-off-by: Nipun Gupta <[email protected]> Signed-off-by: Tarak Reddy <[email protected]> Reviewed-by: Pieter Jansen van Vuuren <[email protected]> Tested-by: Nikhil Agarwal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-29cdx: add the cdx bus driverNipun Gupta2-0/+162
Introduce AMD CDX bus, which provides a mechanism for scanning and probing CDX devices. These devices are memory mapped on system bus for Application Processors(APUs). CDX devices can be changed dynamically in the Fabric and CDX bus interacts with CDX controller to rescan the bus and rediscover the devices. Signed-off-by: Nipun Gupta <[email protected]> Reviewed-by: Pieter Jansen van Vuuren <[email protected]> Tested-by: Nikhil Agarwal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-29tty: n_gsm: add ioctl for DLC specific parameter configurationDaniel Starke1-1/+16
Parameter negotiation has been introduced with commit 92f1f0c3290d ("tty: n_gsm: add parameter negotiation support") However, means to set individual parameters per DLCI are not yet implemented. Furthermore, it is currently not possible to keep a DLCI half open until the user application sets the right parameters for it. This is required to allow a user application to set its specific parameters before the underlying link is established. Otherwise, the link is opened and re-established right afterwards if the user application sets incompatible parameters. This may be an unexpected behavior for the peer. Add parameter 'wait_config' to 'gsm_config' to support setups where the DLCI specific user application sets its specific parameters after open() and before the link gets fully established. Setting this to zero disables the user application specific DLCI configuration option. Add the ioctls 'GSMIOC_GETCONF_DLCI' and 'GSMIOC_SETCONF_DLCI' for the ldisc and virtual ttys. This gets/sets the DLCI specific parameters and may trigger a reconnect of the DLCI if incompatible values have been set. Only the parameters for the DLCI associated with the virtual tty can be set or retrieved if called on these. Add remark within the documentation to introduce the new ioctls. Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reported-by: kernel test robot <[email protected]> Signed-off-by: Daniel Starke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-29Merge branch 'slab/for-6.4/slob-removal' into slab/for-nextVlastimil Babka3-45/+4
A series by myself to remove CONFIG_SLOB: The SLOB allocator was deprecated in 6.2 and there have been no complaints so far so let's proceed with the removal. Besides the code cleanup, the main immediate benefit will be allowing kfree() family of function to work on kmem_cache_alloc() objects, which was incompatible with SLOB. This includes kfree_rcu() which had no kmem_cache_free_rcu() counterpart yet and now it shouldn't be necessary anymore. Otherwise it's all straightforward removal. After this series, 'git grep slob' or 'git grep SLOB' will have 3 remaining relevant hits in non-mm code: - tomoyo - patch submitted and carried there, doesn't need to wait for this series - skbuff - patch to cleanup now-unnecessary #ifdefs will be posted to netdev after this is merged, as requested to avoid conflicts - ftrace ring_buffer - patch to remove obsolete comment is carried there The rest of 'git grep SLOB' hits are false positives, or intentional (CREDITS, and mm/Kconfig SLUB_TINY description to help those that will happen to migrate later).
2023-03-29Merge branch 'slab/for-6.4/trivial' into slab/for-nextVlastimil Babka1-1/+1
Trivial slab and slub fixes for 6.4. A comment fix, a structure constification, and a config SLUB_DEBUG help text fix.
2023-03-29linux/vt_buffer.h: allow either builtin or modular for macrosRandy Dunlap1-1/+1
Fix build errors on ARCH=alpha when CONFIG_MDA_CONSOLE=m. This allows the ARCH macros to be the only ones defined. In file included from ../drivers/video/console/mdacon.c:37: ../arch/alpha/include/asm/vga.h:17:40: error: expected identifier or '(' before 'volatile' 17 | static inline void scr_writew(u16 val, volatile u16 *addr) | ^~~~~~~~ ../include/linux/vt_buffer.h:24:34: note: in definition of macro 'scr_writew' 24 | #define scr_writew(val, addr) (*(addr) = (val)) | ^~~~ ../include/linux/vt_buffer.h:24:40: error: expected ')' before '=' token 24 | #define scr_writew(val, addr) (*(addr) = (val)) | ^ ../arch/alpha/include/asm/vga.h:17:20: note: in expansion of macro 'scr_writew' 17 | static inline void scr_writew(u16 val, volatile u16 *addr) | ^~~~~~~~~~ ../arch/alpha/include/asm/vga.h:25:29: error: expected identifier or '(' before 'volatile' 25 | static inline u16 scr_readw(volatile const u16 *addr) | ^~~~~~~~ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Randy Dunlap <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-29mm/slab: document kfree() as allowed for kmem_cache_alloc() objectsVlastimil Babka1-2/+4
This will make it easier to free objects in situations when they can come from either kmalloc() or kmem_cache_alloc(), and also allow kfree_rcu() for freeing objects from kmem_cache_alloc(). For the SLAB and SLUB allocators this was always possible so with SLOB gone, we can document it as supported. Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Joel Fernandes <[email protected]>
2023-03-29mm/slab: remove CONFIG_SLOB code from slab common codeVlastimil Babka1-39/+0
CONFIG_SLOB has been removed from Kconfig. Remove code and #ifdef's specific to SLOB in the slab headers and common code. Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Acked-by: Lorenzo Stoakes <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]>
2023-03-29mm, page_flags: remove PG_slob_freeVlastimil Babka1-4/+0
With SLOB removed we no longer need the PG_slob_free alias for PG_private. Also update tools/mm/page-types. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Hyeonggon Yoo <[email protected]> Acked-by: Lorenzo Stoakes <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]>
2023-03-29usb: gadget: Add function wakeup supportElson Roy Serrao2-0/+7
USB3.2 spec section 9.2.5.4 quotes that a function may signal that it wants to exit from Function Suspend by sending a Function Wake Notification to the host if it is enabled for function remote wakeup. Add an api in composite layer that can be used by the function drivers to support this feature. Also expose a gadget op so that composite layer can trigger a wakeup request to the UDC driver. Reviewed-by: Thinh Nguyen <[email protected]> Signed-off-by: Elson Roy Serrao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-29usb: gadget: Properly configure the device for remote wakeupElson Roy Serrao2-0/+10
The wakeup bit in the bmAttributes field indicates whether the device is configured for remote wakeup. But this field should be allowed to set only if the UDC supports such wakeup mechanism. So configure this field based on UDC capability. Also inform the UDC whether the device is configured for remote wakeup by implementing a gadget op. Reviewed-by: Thinh Nguyen <[email protected]> Signed-off-by: Elson Roy Serrao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-29macvlan: Add netlink attribute for broadcast cutoffHerbert Xu1-0/+1
Make the broadcast cutoff configurable through netlink. Note that macvlan is weird because there is no central device for us to configure (the lowerdev could be anything). So all the options are duplicated over what could be thousands of child devices. IFLA_MACVLAN_BC_QUEUE_LEN took the approach of taking the maximum of all child device settings. This is unnecessary as we could simply store the option in the port device and take the last child device that gets updated as the value to use. Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-29ipv6: Remove in6addr_any alternatives.Kuniyuki Iwashima3-13/+6
Some code defines the IPv6 wildcard address as a local variable and use it with memcmp() or ipv6_addr_equal(). Let's use in6addr_any and ipv6_addr_any() instead. Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-29vsock: support sockmapBobby Eshleman2-0/+18
This patch adds sockmap support for vsock sockets. It is intended to be usable by all transports, but only the virtio and loopback transports are implemented. SOCK_STREAM, SOCK_DGRAM, and SOCK_SEQPACKET are all supported. Signed-off-by: Bobby Eshleman <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-29Merge branch 'mlx5-next' into wip/leon-for-nextLeon Romanovsky1-3/+10
* mlx5-next: net/mlx5: Introduce other vport query for Q-counters
2023-03-29net/mlx5: Introduce other vport query for Q-countersPatrisious Haddad1-3/+10
These new fields in QUERY_Q_COUNTER command allow us to access another vport counters during the query command, which is specially useful to query representor vports. In addition also add the required caps to check if this capability is actually supported. Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Link: https://lore.kernel.org/r/75c73a4a0e60f18c37b35a4a11ca2e2415e4a6f3.1679566038.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-03-29RDMA/bnxt_re: Add resize_cq supportSelvin Xavier1-0/+4
Add resize_cq verb support for user space CQs. Resize operation for kernel CQs are not supported now. Driver should free the current CQ only after user library polls for all the completions and switch to new CQ. So after the resize_cq is returned from the driver, user library polls for existing completions and store it as temporary data. Once library reaps all completions in the current CQ, it invokes the ibv_cmd_poll_cq to inform the driver about the resize_cq completion. Adding a check for user CQs in driver's poll_cq and complete the resize operation for user CQs. Updating uverbs_cmd_mask with poll_cq to support this. Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-03-29xhci: use pm_ptr() instead of #ifdef for CONFIG_PM conditionalsArnd Bergmann2-4/+1
A recent patch caused an unused-function warning in builds with CONFIG_PM disabled, after the function became marked 'static': drivers/usb/host/xhci-pci.c:91:13: error: 'xhci_msix_sync_irqs' defined but not used [-Werror=unused-function] 91 | static void xhci_msix_sync_irqs(struct xhci_hcd *xhci) | ^~~~~~~~~~~~~~~~~~~ This could be solved by adding another #ifdef, but as there is a trend towards removing CONFIG_PM checks in favor of helper macros, do the same conversion here and use pm_ptr() to get either a function pointer or NULL but avoid the warning. As the hidden functions reference some other symbols, make sure those are visible at compile time, at the minimal cost of a few extra bytes for 'struct usb_device'. Fixes: 9abe15d55dcc ("xhci: Move xhci MSI sync function to to xhci-pci") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-28Merge tag 'mlx5-updates-2023-03-20' of ↵Jakub Kicinski2-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-03-20 mlx5 dynamic msix This patch series adds support for dynamic msix vectors allocation in mlx5. Eli Cohen Says: ================ The following series of patches modifies mlx5_core to work with the dynamic MSIX API. Currently, mlx5_core allocates all the interrupt vectors it needs and distributes them amongst the consumers. With the introduction of dynamic MSIX support, which allows for allocation of interrupts more than once, we now allocate vectors as we need them. This allows other drivers running on top of mlx5_core to allocate interrupt vectors for their own use. An example for this is mlx5_vdpa, which uses these vectors to propagate interrupts directly from the hardware to the vCPU [1]. As a preparation for using this series, a use after free issue is fixed in lib/cpu_rmap.c and the allocator for rmap entries has been modified. A complementary API for irq_cpu_rmap_add() has also been introduced. [1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/patch/?id=0f2bf1fcae96a83b8c5581854713c9fc3407556e ================ * tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Provide external API for allocating vectors net/mlx5: Use one completion vector if eth is disabled net/mlx5: Refactor calculation of required completion vectors net/mlx5: Move devlink registration before mlx5_load net/mlx5: Use dynamic msix vectors allocation net/mlx5: Refactor completion irq request/release code net/mlx5: Improve naming of pci function vectors net/mlx5: Use newer affinity descriptor net/mlx5: Modify struct mlx5_irq to use struct msi_map net/mlx5: Fix wrong comment net/mlx5e: Coding style fix, add empty line lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_add lib: cpu_rmap: Use allocator for rmap entries lib: cpu_rmap: Avoid use after free on rmap->obj array entries ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-29driver core: class: mark the struct class for sysfs callbacks as constantGreg Kroah-Hartman1-4/+5
struct class should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct class to be moved to read-only memory. While we are touching all class sysfs callbacks also mark the attribute as constant as it can not be modified. The bonding code still uses this structure so it can not be removed from the function callbacks. Cc: "David S. Miller" <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Bartosz Golaszewski <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Namjae Jeon <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Russ Weight <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steve French <[email protected]> Cc: Vignesh Raghavendra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-03-28net: dst: Switch to rcuref_t reference countingThomas Gleixner2-10/+11
Under high contention dst_entry::__refcnt becomes a significant bottleneck. atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into high retry rates on contention. Switch the reference count to rcuref_t which results in a significant performance gain. Rename the reference count member to __rcuref to reflect the change. The gain depends on the micro-architecture and the number of concurrent operations and has been measured in the range of +25% to +130% with a localhost memtier/memcached benchmark which amplifies the problem massively. Running the memtier/memcached benchmark over a real (1Gb) network connection the conversion on top of the false sharing fix for struct dst_entry::__refcnt results in a total gain in the 2%-5% range over the upstream baseline. Reported-by: Wangyang Guo <[email protected]> Reported-by: Arjan Van De Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28net: dst: Prevent false sharing vs. dst_entry:: __refcntWangyang Guo4-8/+15
dst_entry::__refcnt is highly contended in scenarios where many connections happen from and to the same IP. The reference count is an atomic_t, so the reference count operations have to take the cache-line exclusive. Aside of the unavoidable reference count contention there is another significant problem which is caused by that: False sharing. perf top identified two affected read accesses. dst_entry::lwtstate and rtable::rt_genid. dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into a seperate cacheline vs. the read mostly members located at the beginning of the struct. That prevents false sharing vs. the struct members in the first 64 bytes of the structure, but there is also dst_entry::lwtstate which is located after the reference count and in the same cache line. This member is read after a reference count has been acquired. struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a size of 112 bytes, which means that the struct members of rtable which follow the dst member share the same cache line as dst_entry::__refcnt. Especially rtable::rt_genid is also read by the contexts which have a reference count acquired already. When dst_entry:__refcnt is incremented or decremented via an atomic operation these read accesses stall. This was found when analysing the memtier benchmark in 1:100 mode, which amplifies the problem extremly. Move the rt[6i]_uncached[_list] members out of struct rtable and struct rt6_info into struct dst_entry to provide padding and move the lwtstate member after that so it ends up in the same cache line. The resulting improvement depends on the micro-architecture and the number of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached benchmark. [ tglx: Rearrange struct ] Signed-off-by: Wangyang Guo <[email protected]> Signed-off-by: Arjan van de Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28Merge branch 'locking/rcuref' of ↵Jakub Kicinski5-11/+464
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pulling rcurefs from Peter for tglx's work. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28mm/thp: rename TRANSPARENT_HUGEPAGE_NEVER_DAX to _UNSUPPORTEDPeter Xu1-1/+1
TRANSPARENT_HUGEPAGE_NEVER_DAX has nothing to do with DAX. It's set when has_transparent_hugepage() returns false, checked in hugepage_vma_check() and will disable THP completely if false. Rename it to TRANSPARENT_HUGEPAGE_UNSUPPORTED to reflect its real purpose. [[email protected]: fix comment, per David] Link: https://lkml.kernel.org/r/ZBMzQW674oHQJV7F@x1n Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: vmscan: add a map_nr_max field to shrinker_infoQi Zheng1-0/+1
Patch series "make slab shrink lockless", v5. This patch series aims to make slab shrink lockless. 1. Background ============= On our servers, we often find the following system cpu hotspots: 52.22% [kernel] [k] down_read_trylock 19.60% [kernel] [k] up_read 8.86% [kernel] [k] shrink_slab 2.44% [kernel] [k] idr_find 1.25% [kernel] [k] count_shadow_nodes 1.18% [kernel] [k] shrink lruvec 0.71% [kernel] [k] mem_cgroup_iter 0.71% [kernel] [k] shrink_node 0.55% [kernel] [k] find_next_bit And we used bpftrace to capture its calltrace as follows: @[ down_read_trylock+1 shrink_slab+128 shrink_node+371 do_try_to_free_pages+232 try_to_free_pages+243 _alloc_pages_slowpath+771 _alloc_pages_nodemask+702 pagecache_get_page+255 filemap_fault+1361 ext4_filemap_fault+44 __do_fault+76 handle_mm_fault+3543 do_user_addr_fault+442 do_page_fault+48 page_fault+62 ]: 1161690 @[ down_read_trylock+1 shrink_slab+128 shrink_node+371 balance_pgdat+690 kswapd+389 kthread+246 ret_from_fork+31 ]: 8424884 @[ down_read_trylock+1 shrink_slab+128 shrink_node+371 do_try_to_free_pages+232 try_to_free_pages+243 __alloc_pages_slowpath+771 __alloc_pages_nodemask+702 __do_page_cache_readahead+244 filemap_fault+1674 ext4_filemap_fault+44 __do_fault+76 handle_mm_fault+3543 do_user_addr_fault+442 do_page_fault+48 page_fault+62 ]: 20917631 We can see that down_read_trylock() of shrinker_rwsem is being called with high frequency at that time. Because of the poor multicore scalability of atomic operations, this can lead to a significant drop in IPC (instructions per cycle). And more, the shrinker_rwsem is a global read-write lock in shrinkers subsystem, which protects most operations such as slab shrink, registration and unregistration of shrinkers, etc. This can easily cause problems in the following cases. 1) When the memory pressure is high and there are many filesystems mounted or unmounted at the same time, slab shrink will be affected (down_read_trylock() failed). Such as the real workload mentioned by Kirill Tkhai: ``` One of the real workloads from my experience is start of an overcommitted node containing many starting containers after node crash (or many resuming containers after reboot for kernel update). In these cases memory pressure is huge, and the node goes round in long reclaim. ``` 2) If a shrinker is blocked (such as the case mentioned in [1]) and a writer comes in (such as mount a fs), then this writer will be blocked and cause all subsequent shrinker-related operations to be blocked. [1]. https://lore.kernel.org/lkml/[email protected]/ All the above cases can be solved by replacing the shrinker_rwsem trylocks with SRCU. 2. Survey ========= Before doing the code implementation, I found that there were many similar submissions in the community: a. Davidlohr Bueso submitted a patch in 2015. Subject: [PATCH -next v2] mm: srcu-ify shrinkers Link: https://lore.kernel.org/all/[email protected]/ Result: It was finally merged into the linux-next branch, but failed on arm allnoconfig (without CONFIG_SRCU) b. Tetsuo Handa submitted a patchset in 2017. Subject: [PATCH 1/2] mm,vmscan: Kill global shrinker lock. Link: https://lore.kernel.org/lkml/1510609063-3327-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/ Result: Finally chose to use the current simple way (break when rwsem_is_contended()). And Christoph Hellwig suggested to using SRCU, but SRCU was not unconditionally enabled at the time. c. Kirill Tkhai submitted a patchset in 2018. Subject: [PATCH RFC 00/10] Introduce lockless shrink_slab() Link: https://lore.kernel.org/lkml/153365347929.19074.12509495712735843805.stgit@localhost.localdomain/ Result: At that time, SRCU was not unconditionally enabled, and there were some objections to enabling SRCU. Later, because Kirill's focus was moved to other things, this patchset was not continued to be updated. d. Sultan Alsawaf submitted a patch in 2021. Subject: [PATCH] mm: vmscan: Replace shrinker_rwsem trylocks with SRCU protection Link: https://lore.kernel.org/lkml/[email protected]/ Result: Rejected because SRCU was not unconditionally enabled. We can find that almost all these historical commits were abandoned because SRCU was not unconditionally enabled. But now SRCU has been unconditionally enable by Paul E. McKenney in 2023 [2], so it's time to replace shrinker_rwsem trylocks with SRCU. [2] https://lore.kernel.org/lkml/20230105003759.GA1769545@paulmck-ThinkPad-P17-Gen-1/ 3. Reproduction and testing =========================== We can reproduce the down_read_trylock() hotspot through the following script: ``` #!/bin/bash DIR="/root/shrinker/memcg/mnt" do_create() { mkdir -p /sys/fs/cgroup/memory/test mkdir -p /sys/fs/cgroup/perf_event/test echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes for i in `seq 0 $1`; do mkdir -p /sys/fs/cgroup/memory/test/$i; echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; mkdir -p $DIR/$i; done } do_mount() { for i in `seq $1 $2`; do mount -t tmpfs $i $DIR/$i; done } do_touch() { for i in `seq $1 $2`; do echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs; dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 & done } case "$1" in touch) do_touch $2 $3 ;; test) do_create 4000 do_mount 0 4000 do_touch 0 3000 ;; *) exit 1 ;; esac ``` Save the above script, then run test and touch commands. Then we can use the following perf command to view hotspots: perf top -U -F 999 1) Before applying this patchset: 32.31% [kernel] [k] down_read_trylock 19.40% [kernel] [k] pv_native_safe_halt 16.24% [kernel] [k] up_read 15.70% [kernel] [k] shrink_slab 4.69% [kernel] [k] _find_next_bit 2.62% [kernel] [k] shrink_node 1.78% [kernel] [k] shrink_lruvec 0.76% [kernel] [k] do_shrink_slab 2) After applying this patchset: 27.83% [kernel] [k] _find_next_bit 16.97% [kernel] [k] shrink_slab 15.82% [kernel] [k] pv_native_safe_halt 9.58% [kernel] [k] shrink_node 8.31% [kernel] [k] shrink_lruvec 5.64% [kernel] [k] do_shrink_slab 3.88% [kernel] [k] mem_cgroup_iter At the same time, we use the following perf command to capture IPC information: perf stat -e cycles,instructions -G test -a --repeat 5 -- sleep 10 1) Before applying this patchset: Performance counter stats for 'system wide' (5 runs): 454187219766 cycles test ( +- 1.84% ) 78896433101 instructions test # 0.17 insn per cycle ( +- 0.44% ) 10.0020430 +- 0.0000366 seconds time elapsed ( +- 0.00% ) 2) After applying this patchset: Performance counter stats for 'system wide' (5 runs): 841954709443 cycles test ( +- 15.80% ) (98.69%) 527258677936 instructions test # 0.63 insn per cycle ( +- 15.11% ) (98.68%) 10.01064 +- 0.00831 seconds time elapsed ( +- 0.08% ) We can see that IPC drops very seriously when calling down_read_trylock() at high frequency. After using SRCU, the IPC is at a normal level. This patch (of 8): To prepare for the subsequent lockless memcg slab shrink, add a map_nr_max field to struct shrinker_info to records its own real shrinker_nr_max. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qi Zheng <[email protected]> Suggested-by: Kirill Tkhai <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Kirill Tkhai <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Christian König <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Qi Zheng <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sultan Alsawaf <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28mm: prefer xxx_page() alloc/free functions for order-0 pagesLorenzo Stoakes1-2/+2
Update instances of alloc_pages(..., 0), __get_free_pages(..., 0) and __free_pages(..., 0) to use alloc_page(), __get_free_page() and __free_page() respectively in core code. Link: https://lkml.kernel.org/r/50c48ca4789f1da2a65795f2346f5ae3eff7d665.1678710232.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28kasan: remove PG_skip_kasan_poison flagPeter Collingbourne3-37/+15
Code inspection reveals that PG_skip_kasan_poison is redundant with kasantag, because the former is intended to be set iff the latter is the match-all tag. It can also be observed that it's basically pointless to poison pages which have kasantag=0, because any pages with this tag would have been pointed to by pointers with match-all tags, so poisoning the pages would have little to no effect in terms of bug detection. Therefore, change the condition in should_skip_kasan_poison() to check kasantag instead, and remove PG_skip_kasan_poison and associated flags. Link: https://lkml.kernel.org/r/[email protected] Link: https://linux-review.googlesource.com/id/I57f825f2eaeaf7e8389d6cf4597c8a5821359838 Signed-off-by: Peter Collingbourne <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28io-mapping: don't disable preempt on RT in io_mapping_map_atomic_wc().Sebastian Andrzej Siewior1-4/+16
io_mapping_map_atomic_wc() disables preemption and pagefaults for historical reasons. The conversion to io_mapping_map_local_wc(), which only disables migration, cannot be done wholesale because quite some call sites need to be updated to accommodate with the changed semantics. On PREEMPT_RT enabled kernels the io_mapping_map_atomic_wc() semantics are problematic due to the implicit disabling of preemption which makes it impossible to acquire 'sleeping' spinlocks within the mapped atomic sections. PREEMPT_RT replaces the preempt_disable() with a migrate_disable() for more than a decade. It could be argued that this is a justification to do this unconditionally, but PREEMPT_RT covers only a limited number of architectures and it disables some functionality which limits the coverage further. Limit the replacement to PREEMPT_RT for now. This is also done kmap_atomic(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reported-by: Richard Weinberger <[email protected]> Link: https://lore.kernel.org/CAFLxGvw0WMxaMqYqJ5WgvVSbKHq2D2xcXTOgMCpgq9nDC-MWTQ@mail.gmail.com Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28shmem: add support to ignore swapLuis Chamberlain1-0/+1
In doing experimentations with shmem having the option to avoid swap becomes a useful mechanism. One of the *raves* about brd over shmem is you can avoid swap, but that's not really a good reason to use brd if we can instead use shmem. Using brd has its own good reasons to exist, but just because "tmpfs" doesn't let you do that is not a great reason to avoid it if we can easily add support for it. I don't add support for reconfiguring incompatible options, but if we really wanted to we can add support for that. To avoid swap we use mapping_set_unevictable() upon inode creation, and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luis Chamberlain <[email protected]> Acked-by: Christian Brauner <[email protected]> Tested-by: Xin Hao <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Cc: Adam Manzanares <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Pankaj Raghav <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>