aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-06-19mm: use pmdp_get_lockless() without surplus barrier()Hugh Dickins1-17/+0
Patch series "mm: allow pte_offset_map[_lock]() to fail", v2. What is it all about? Some mmap_lock avoidance i.e. latency reduction. Initially just for the case of collapsing shmem or file pages to THPs; but likely to be relied upon later in other contexts e.g. freeing of empty page tables (but that's not work I'm doing). mmap_write_lock avoidance when collapsing to anon THPs? Perhaps, but again that's not work I've done: a quick attempt was not as easy as the shmem/file case. I would much prefer not to have to make these small but wide-ranging changes for such a niche case; but failed to find another way, and have heard that shmem MADV_COLLAPSE's usefulness is being limited by that mmap_write_lock it currently requires. These changes (though of course not these exact patches) have been in Google's data centre kernel for three years now: we do rely upon them. What is this preparatory series about? The current mmap locking will not be enough to guard against that tricky transition between pmd entry pointing to page table, and empty pmd entry, and pmd entry pointing to huge page: pte_offset_map() will have to validate the pmd entry for itself, returning NULL if no page table is there. What to do about that varies: sometimes nearby error handling indicates just to skip it; but in many cases an ACTION_AGAIN or "goto again" is appropriate (and if that risks an infinite loop, then there must have been an oops, or pfn 0 mistaken for page table, before). Given the likely extension to freeing empty page tables, I have not limited this set of changes to a THP config; and it has been easier, and sets a better example, if each site is given appropriate handling: even where deeper study might prove that failure could only happen if the pmd table were corrupted. Several of the patches are, or include, cleanup on the way; and by the end, pmd_trans_unstable() and suchlike are deleted: pte_offset_map() and pte_offset_map_lock() then handle those original races and more. Most uses of pte_lockptr() are deprecated, with pte_offset_map_nolock() taking its place. This patch (of 32): Use pmdp_get_lockless() in preference to READ_ONCE(*pmdp), to get a more reliable result with PAE (or READ_ONCE as before without PAE); and remove the unnecessary extra barrier()s which got left behind in its callers. HOWEVER: Note the small print in linux/pgtable.h, where it was designed specifically for fast GUP, and depends on interrupts being disabled for its full guarantee: most callers which have been added (here and before) do NOT have interrupts disabled, so there is still some need for caution. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Yu Zhao <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qi Zheng <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Steven Price <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zack Rusin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm: zswap: support exclusive loadsYosry Ahmed1-1/+1
Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets") removed support for exclusive loads from frontswap as it was not used. Bring back exclusive loads support to frontswap by adding an "exclusive" output parameter to frontswap_ops->load. On the zswap side, add a module parameter to enable/disable exclusive loads, and a config option to control the boot default value. Refactor zswap entry invalidation in zswap_frontswap_invalidate_page() into zswap_invalidate_entry() to reuse it in zswap_frontswap_load() if exclusive loads are enabled. With exclusive loads, we avoid having two copies of the same page in memory (compressed & uncompressed) after faulting it in from zswap. On the other hand, if the page is to be reclaimed again without being dirtied, it will be re-compressed. Compression is not usually slow, and a page that was just faulted in is less likely to be reclaimed again soon. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Suggested-by: Yu Zhao <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Domenico Cerasuolo <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm/memory_hotplug: remove reset_node_managed_pages() in hotadd_init_pgdat()Haifeng Xu1-1/+0
managed pages has already been set to 0 in free_area_init_core_hotplug(), via zone_init_internals() on each zone. It's pointless to reset again. Furthermore, reset_node_managed_pages() no longer needs to be exposed outside of mm/memblock.c. Remove declaration in include/linux/memblock.h and define it as static. In addtion to this, the only caller of reset_node_managed_pages() is reset_all_zones_managed_pages(), which is annotated with __init, so it should be safe to also mark reset_node_managed_pages() as __init. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfsRoberto Sassu1-0/+1
As the ramfs-based tmpfs uses ramfs_init_fs_context() for the init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb() to free it and avoid a memory leak. Link: https://lkml.kernel.org/r/[email protected] Fixes: c3b1b1cbf002 ("ramfs: add support for "mode=" mount option") Signed-off-by: Roberto Sassu <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Howells <[email protected]> Cc: Al Viro <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm/sparse: remove unused parameters in sparse_remove_section()Yajun Deng1-3/+2
These parameters ms and map_offset are not used in sparse_remove_section(), so remove them. The __remove_section() is only called by __remove_pages(), remove it. And put the WARN_ON_ONCE() in sparse_remove_section(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yajun Deng <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm: vmscan: mark kswapd_run() and kswapd_stop() __meminitMiaohe Lin1-2/+2
Add __meminit to kswapd_run() and kswapd_stop() to ensure they're default to __init when memory hotplug is not enabled. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Acked-by: Yu Zhao <[email protected]> Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm: remove obsolete alloc_migrate_target()Miaohe Lin1-3/+0
There's only declaration left in the header file. Remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm: page_isolation: write proper kerneldocJohannes Weiner1-18/+6
And remove the incorrect header comments. [[email protected]: s/lower/first/, s/upper/last/, per Mike] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19NFS: add sysfs shutdown knobBenjamin Coddington2-1/+3
Within each nfs_server sysfs tree, add an entry named "shutdown". Writing 1 to this file will set the cl_shutdown bit on the rpc_clnt structs associated with that mount. If cl_shutdown is set, the task scheduler immediately returns -EIO for new tasks. Signed-off-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19NFS: add a sysfs link to the lockd rpc_clientBenjamin Coddington1-0/+2
After lockd is started, add a symlink for lockd's rpc_client under NFS' superblock sysfs. Signed-off-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19NFS: Add sysfs links to sunrpc clients for nfs_clientsBenjamin Coddington1-1/+7
For the general and state management nfs_client under each mount, create symlinks to their respective rpc_client sysfs entries. Signed-off-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19NFS: add superblock sysfs entriesBenjamin Coddington1-0/+2
Create a sysfs directory for each mount that corresponds to the mount's nfs_server struct. As the mount is being constructed, use the name "server-n", but rename it to the "MAJOR:MINOR" of the mount after assigning a device_id. The rename approach allows us to populate the mount's directory with links to the various rpc_client objects during the mount's construction. The naming convention (MAJOR:MINOR) can be used to reference a particular NFS mount's sysfs tree. Signed-off-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19NFS: Have struct nfs_client carry a TLS policy fieldChuck Lever1-1/+2
The new field is used to match struct nfs_clients that have the same TLS policy setting. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19SUNRPC: Add a TCP-with-TLS RPC transport classChuck Lever2-0/+3
Use the new TLS handshake API to enable the SunRPC client code to request a TLS handshake. This implements support for RFC 9289, only on TCP sockets. Upper layers such as NFS use RPC-with-TLS to protect in-transit traffic. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19SUNRPC: Ignore data_ready callbacks during TLS handshakesChuck Lever1-0/+1
The RPC header parser doesn't recognize TLS handshake traffic, so it will close the connection prematurely with an error. To avoid that, shunt the transport's data_ready callback when there is a TLS handshake in progress. The XPRT_SOCK_IGNORE_RECV flag will be toggled by code added in a subsequent patch. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19SUNRPC: Add RPC client support for the RPC_AUTH_TLS auth flavorChuck Lever1-0/+2
The new authentication flavor is used only to discover peer support for RPC-over-TLS. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19ovl: enable fsnotify events on underlying real filesAmir Goldstein1-1/+3
Overlayfs creates the real underlying files with fake f_path, whose f_inode is on the underlying fs and f_path on overlayfs. Those real files were open with FMODE_NONOTIFY, because fsnotify code was not prapared to handle fsnotify hooks on files with fake path correctly and fanotify would report unexpected event->fd with fake overlayfs path, when the underlying fs was being watched. Teach fsnotify to handle events on the real files, and do not set real files to FMODE_NONOTIFY to allow operations on real file (e.g. open, access, modify, close) to generate async and permission events. Because fsnotify does not have notifications on address space operations, we do not need to worry about ->vm_file not reporting events to a watched overlayfs when users are accessing a mapped overlayfs file. Acked-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-06-19SUNRPC: Plumb an API for setting transport layer securityChuck Lever2-0/+19
Add an initial set of policies along with fields for upper layers to pass the requested policy down to the transport layer. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19fs: use backing_file container for internal files with "fake" f_pathAmir Goldstein1-5/+28
Overlayfs uses open_with_fake_path() to allocate internal kernel files, with a "fake" path - whose f_path is not on the same fs as f_inode. Allocate a container struct backing_file for those internal files, that is used to hold the "fake" ovl path along with the real path. backing_file_real_path() can be used to access the stored real path. Signed-off-by: Amir Goldstein <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-06-19fs: use a helper for opening kernel internal filesAmir Goldstein1-0/+2
cachefiles uses kernel_open_tmpfile() to open kernel internal tmpfile without accounting for nr_files. cachefiles uses open_with_fake_path() for the same reason without the need for a fake path. Fork open_with_fake_path() to kernel_file_open() which only does the noaccount part and use it in cachefiles. Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-06-19NFSv4.2: SETXATTR should update ctimeAnna Schumaker1-0/+3
Otherwise, `stat` will report a stale value to users. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2023-06-19fs: rename {vfs,kernel}_tmpfile_open()Amir Goldstein1-3/+4
Overlayfs and cachefiles use vfs_open_tmpfile() to open a tmpfile without accounting for nr_files. Rename this helper to kernel_tmpfile_open() to better reflect this helper is used for kernel internal users. Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-06-19regulator: pca9450: Fix LDO3OUT and LDO4OUT MASKTeresa Remmet1-2/+2
L3_OUT and L4_OUT Bit fields range from Bit 0:4 and thus the mask should be 0x1F instead of 0x0F. Fixes: 0935ff5f1f0a ("regulator: pca9450: add pca9450 pmic driver") Signed-off-by: Teresa Remmet <[email protected]> Reviewed-by: Frieder Schrempf <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-06-19gpiolib: Drop unused domain_ops memeber of GPIO IRQ chipAndy Shevchenko1-7/+0
It seems there is no driver that requires custom IRQ chip domain options. Drop the member and respective code. Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2023-06-19gpiolib: Fix irq_domain resource tracking for gpiochip_irqchip_add_domain()Michael Walle1-0/+8
Up until commit 6a45b0e2589f ("gpiolib: Introduce gpiochip_irqchip_add_domain()") all irq_domains were allocated by gpiolib itself and thus gpiolib also takes care of freeing it. With gpiochip_irqchip_add_domain() a user of gpiolib can associate an irq_domain with the gpio_chip. This irq_domain is not managed by gpiolib and therefore must not be freed by gpiolib. Fixes: 6a45b0e2589f ("gpiolib: Introduce gpiochip_irqchip_add_domain()") Reported-by: Jiawen Wu <[email protected]> Signed-off-by: Michael Walle <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2023-06-19wifi: mac80211: check EHT basic MCS/NSS setJohannes Berg1-8/+20
Check that all the NSS in the EHT basic MCS/NSS set are actually supported, otherwise disable EHT for the connection. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214436.737827c906c9.I0c11a3cd46ab4dcb774c11a5bbc30aecfb6fce11@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: update multi-link element STA reconfigJohannes Berg1-2/+6
Update the MLE STA reconfig sub-type to 802.11be D3.0 format, which includes the operation update field. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214436.2e1383b31f07.I8055a111c8fcf22e833e60f5587a4d8d21caca5b@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: ieee80211: reorder presence checks in MLE per-STA profileJohannes Berg1-3/+2
In ieee80211_mle_sta_prof_size_ok(), the presence checks aren't ordered by field order, so that's a bit confusing. Reorder them. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214436.fdbf17320a37.I517cf27fdc3f6e5d6a2615182da47ba4bdf14039@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: mac80211: Support link removal using Reconfiguration ML elementIlan Peer1-0/+33
Add support for handling link removal indicated by the Reconfiguration Multi-Link element. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214436.d8a046dc0c1a.I4dcf794da2a2d9f4e5f63a4b32158075d27c0660@changeid [use cfg80211_links_removed() API instead] Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: cfg80211: use structs for TBTT information accessBenjamin Berg1-3/+0
Make the data access a bit nicer overall by using structs. There is a small change here to also accept a TBTT information length of eight bytes as we do not require the 20 MHz PSD information. This also fixes a bug reading the short SSID on big endian machines. Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214436.4c3f8901c1bc.Ic3e94fd6e1bccff7948a252ad3bb87e322690a17@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: ieee80211: add structs for TBTT information accessBenjamin Berg1-0/+22
The TBTT information can have various lengths with different elements thare are present. Add definitions for the two types that we are interested in (i.e. the ones that contain the BSSID). Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214436.2a6f8766a3ec.Ic962e28492212cc8ee1eb602b8f07a4ea172fc4a@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: ieee80211: add definitions for RNR MLD paramsBenjamin Berg1-0/+15
Add the definitions necessary to parse the MLD parameters included in an RNR element. Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214436.9999842237c0.I80f00a90cb4e43071432b4158f206c73ba799618@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: ieee80211: use default for medium synchronization delayBenjamin Berg1-2/+11
Default values are defined for the information included in the Medium Synchronization Delay Information subfield. The spec says to initialize the values to these defaults and only change them when included. Return the default value instead of zero so that the defaults are used when the field is not included in the association response. Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214435.a7725bef3795.I2d3528cf4af021c5b37f97fbe64ae9116ce9bef1@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: ieee80211: add helper to validate ML element type and sizeBenjamin Berg1-19/+31
The helper functions to retrieve the EML capabilities and medium synchronization delay both assume that the type is correct. Instead of assuming the length is correct and still checking the type, add a new helper to check both and don't do any verification. Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214435.1b50e7a3b3cf.I9385514d8eb6d6d3c82479a6fa732ef65313e554@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: ieee80211: Fix the common size calculation for reconfiguration MLIlan Peer1-4/+1
The common information length is found in the first octet of the common information. Fixes: 0f48b8b88aa9 ("wifi: ieee80211: add definitions for multi-link element") Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230618214435.3c7ed4817338.I42ef706cb827b4dade6e4ffbb6e7f341eaccd398@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: mac80211: Rename ieee80211_mle_sta_prof_size_ok()Ilan Peer1-2/+4
Rename it to ieee80211_mle_basic_sta_prof_size_ok() as it validates the size of the station profile included in Basic Multi-Link element. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230616094949.9bdfd263974f.I7bebd26894f33716e93cc7da576ef3215e0ba727@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19wifi: cfg80211: ignore invalid TBTT info field typesBenjamin Berg1-0/+2
The TBTT information field type must be zero. This is only changed in the 802.11be draft specification where the value 1 is used to indicate that only the MLD parameters are included. Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230616094949.7865606ffe94.I7ff28afb875d1b4c39acd497df8490a7d3628e3f@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19ARM: 9298/1: Drop custom mdesc->handle_irq()Linus Walleij1-11/+0
ARM exclusively uses GENERIC_IRQ_MULTI_HANDLER, so at some point set_handle_irq() needs to be called to handle system-wide interrupts. For all DT-enabled boards, this call happens down in the drivers/irqchip subsystem, after locating the target irqchip driver from the device tree. We still have a few instances of the boardfiles with machine descriptors passing a machine-specific .handle_irq() to the ARM kernel core. Get rid of this by letting the few remaining machines consistently call set_handle_irq() from the end of the .init_irq() callback instead and diet down one member from the machine descriptor. Cc: Marc Zyngier <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Acked-by: Mark Rutland <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2023-06-19Merge branches 'iommu/fixes', 'arm/smmu', 'ppc/pamu', 'virtio', 'x86/vt-d', ↵Joerg Roedel1-0/+6
'core' and 'x86/amd' into next
2023-06-19fbdev/media: Use GPIO descriptors for VIA GPIOLinus Walleij1-14/+0
The VIA fbdev exposes a custom GPIO chip for its GPIOs, these are in turn looked up the camera driver using a custom API. Drop the custom API, provide a look-up table and convert to GPIO descriptors. Note proper polarity on the RESET line. Cc: Jonathan Corbet <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2023-06-19video/hdmi: Reorder fields in 'struct hdmi_avi_infoframe'Christophe JAILLET1-2/+2
Group some variables based on their sizes to reduce hole and avoid padding. On x86_64, this shrinks the size of 'struct hdmi_avi_infoframe' from 68 to 60 bytes. It saves a few bytes of memory and is more cache-line friendly. This also reduces the union hdmi_infoframe the same way. Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2023-06-19Backmerge tag 'v6.4-rc7' of ↵Dave Airlie37-84/+156
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next Linux 6.4-rc7 Need this to pull in the msm work. Signed-off-by: Dave Airlie <[email protected]>
2023-06-18posix-timers: Add sys_ni_posix_timers() prototypeArnd Bergmann1-0/+1
The sys_ni_posix_timers() definition causes a warning when the declaration is missing, so this needs to be added along with the normal syscalls, outside of the #ifdef. kernel/time/posix-stubs.c:26:17: error: no previous prototype for 'sys_ni_posix_timers' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-18posix-timers: Ensure timer ID search-loop limit is validThomas Gleixner1-1/+1
posix_timer_add() tries to allocate a posix timer ID by starting from the cached ID which was stored by the last successful allocation. This is done in a loop searching the ID space for a free slot one by one. The loop has to terminate when the search wrapped around to the starting point. But that's racy vs. establishing the starting point. That is read out lockless, which leads to the following problem: CPU0 CPU1 posix_timer_add() start = sig->posix_timer_id; lock(hash_lock); ... posix_timer_add() if (++sig->posix_timer_id < 0) start = sig->posix_timer_id; sig->posix_timer_id = 0; So CPU1 can observe a negative start value, i.e. -1, and the loop break never happens because the condition can never be true: if (sig->posix_timer_id == start) break; While this is unlikely to ever turn into an endless loop as the ID space is huge (INT_MAX), the racy read of the start value caught the attention of KCSAN and Dmitry unearthed that incorrectness. Rewrite it so that all id operations are under the hash lock. Reported-by: [email protected] Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/87bkhzdn6g.ffs@tglx
2023-06-18Merge tag 'mlx5-updates-2023-06-16' of ↵David S. Miller2-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2023-06-16 1) Added a new event handler to firmware sync reset, which is used to support firmware sync reset flow on smart NIC. Adding this new stage to the flow enables the firmware to ensure host PFs unload before ECPFs unload, to avoid race of PFs recovery. 2) Debugfs for mlx5 eswitch bridge offloads 3) Added two new counters for vport stats 4) Minor Fixups and cleanups for net-next branch Signed-off-by: David S. Miller <[email protected]>
2023-06-18Merge tag 'ata-6.4-rc7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ata fix from Damien Le Moal: - Avoid deadlocks on resume from sleep by delaying scsi rescan until the scsi device is also fully resumed. * tag 'ata-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: libata-scsi: Avoid deadlock on rescan after device resume
2023-06-18tcp: Use per-vma locking for receive zerocopyArjun Roy1-0/+17
Per-VMA locking allows us to lock a struct vm_area_struct without taking the process-wide mmap lock in read mode. Consider a process workload where the mmap lock is taken constantly in write mode. In this scenario, all zerocopy receives are periodically blocked during that period of time - though in principle, the memory ranges being used by TCP are not touched by the operations that need the mmap write lock. This results in performance degradation. Now consider another workload where the mmap lock is never taken in write mode, but there are many TCP connections using receive zerocopy that are concurrently receiving. These connections all take the mmap lock in read mode, but this does induce a lot of contention and atomic ops for this process-wide lock. This results in additional CPU overhead caused by contending on the cache line for this lock. However, with per-vma locking, both of these problems can be avoided. As a test, I ran an RPC-style request/response workload with 4KB payloads and receive zerocopy enabled, with 100 simultaneous TCP connections. I measured perf cycles within the find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and without per-vma locking enabled. When using process-wide mmap semaphore read locking, about 1% of measured perf cycles were within this path. With per-VMA locking, this value dropped to about 0.45%. Signed-off-by: Arjun Roy <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-18sysctl: replace child with an enumerationJoel Granados1-2/+12
This is part of the effort to remove the empty element at the end of ctl_table structs. "child" was a deprecated elem in this struct and was being used to differentiate between two types of ctl_tables: "normal" and "permanently emtpy". What changed?: * Replace "child" with an enumeration that will have two values: the default (0) and the permanently empty (1). The latter is left at zero so when struct ctl_table is created with kzalloc or in a local context, it will have the zero value by default. We document the new enum with kdoc. * Remove the "empty child" check from sysctl_check_table * Remove count_subheaders function as there is no longer a need to calculate how many headers there are for every child * Remove the recursive call to unregister_sysctl_table as there is no need to traverse down the child tree any longer * Add a new SYSCTL_PERM_EMPTY_DIR binary flag * Remove the last remanence of child from partport/procfs.c Signed-off-by: Joel Granados <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-06-18ata: libata-scsi: Avoid deadlock on rescan after device resumeDamien Le Moal1-1/+1
When an ATA port is resumed from sleep, the port is reset and a power management request issued to libata EH to reset the port and rescanning the device(s) attached to the port. Device rescanning is done by scheduling an ata_scsi_dev_rescan() work, which will execute scsi_rescan_device(). However, scsi_rescan_device() takes the generic device lock, which is also taken by dpm_resume() when the SCSI device is resumed as well. If a device rescan execution starts before the completion of the SCSI device resume, the rcu locking used to refresh the cached VPD pages of the device, combined with the generic device locking from scsi_rescan_device() and from dpm_resume() can cause a deadlock. Avoid this situation by changing struct ata_port scsi_rescan_task to be a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is modified to check if the SCSI device associated with the ATA device that must be rescanned is not suspended. If the SCSI device is still suspended, ata_scsi_dev_rescan() returns early and reschedule itself for execution after an arbitrary delay of 5ms. Reported-by: Kai-Heng Feng <[email protected]> Reported-by: Joe Breuer <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217530 Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Kai-Heng Feng <[email protected]> Tested-by: Joe Breuer <[email protected]>
2023-06-17x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offlineMichael Kelley1-0/+1
These commits a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg") 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs") update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg because that memory will be correctly marked as decrypted or encrypted for all VM types (CoCo or normal). But problems ensue when CPUs in the VM go online or offline after virtual PCI devices have been configured. When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN. But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which may call the virtual PCI driver and fault trying to use the as yet uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo VM if the MMIO read and write hypercalls are used from state CPUHP_AP_IRQ_AFFINITY_ONLINE. When a CPU is taken offline, IRQs may be reassigned in state CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to use the hyperv_pcpu_input_arg that has already been freed by a higher state. Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE) and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for Hyper-V initialization so that hyperv_pcpu_input_arg is allocated early enough. Fix the offlining problem by not freeing hyperv_pcpu_input_arg when a CPU goes offline. Retain the allocated memory, and reuse it if the CPU comes back online later. Signed-off-by: Michael Kelley <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Acked-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Dexuan Cui <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>