aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-04-11net/mlx5: Add new WQE for updating flow tableYevgeny Kliteynik2-0/+12
Add new WQE type: FLOW_TBL_ACCESS, which will be used for writing modify header arguments. This type has specific control segment and special data segment. Signed-off-by: Yevgeny Kliteynik <[email protected]> Reviewed-by: Alex Vesker <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-11net/mlx5: Add mlx5_ifc bits for modify header argumentYevgeny Kliteynik1-1/+27
Add enum value for modify-header argument object and mlx5_bits for the related capabilities. Signed-off-by: Muhammad Sammar <[email protected]> Signed-off-by: Yevgeny Kliteynik <[email protected]> Reviewed-by: Alex Vesker <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-11net/mlx5: Create a new profile for SFsParav Pandit1-0/+1
Create a new profile for SFs in order to disable the command cache. Each function command cache consumes ~500KB of memory, when using a large number of SFs this savings is notable on memory constarined systems. Use a new profile to provide for future differences between SFs and PFs. The mr_cache not used for non-PF functions, so it is excluded from the new profile. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Bodong Wang <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-11net/mlx5: Add mlx5_ifc definitions for bridge multicast supportVlad Buslov1-1/+6
Add the required hardware definitions to mlx5_ifc: fdb_uplink_hairpin, fdb_multi_path_any_table_limit_regc, fdb_multi_path_any_table. Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-11fsverity: use WARN_ON_ONCE instead of WARN_ONEric Biggers1-3/+3
As per Linus's suggestion (https://lore.kernel.org/r/CAHk-=whefxRGyNGzCzG6BVeM=5vnvgb-XhSeFJVxJyAxAF8XRA@mail.gmail.com), use WARN_ON_ONCE instead of WARN_ON. This barely adds any extra overhead, and it makes it so that if any of these ever becomes reachable (they shouldn't, but that's the point), the logs can't be flooded. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Eric Biggers <[email protected]>
2023-04-11nvme: Add a nvme_pr_type enumMike Christie1-0/+9
The next patch adds support to report the reservation type, so we need to be able to convert from the NVMe PR value we get from the device to the linux block layer PR value that will be returned to callers. To prepare for that, this patch adds a nvme_pr_type enum and renames the nvme_pr_type function. Signed-off-by: Mike Christie <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2023-04-11nvme: Add pr_ops read_keys supportMike Christie1-0/+4
This patch adds support for the pr_ops read_keys callout by calling the NVMe Reservation Report helper, then parsing that info to get the controller's registered keys. Because the callout is only used in the kernel where the callers, like LIO, do not know about controller/host IDs, the callout just returns the registered keys which is required by the SCSI PR in READ KEYS command. Signed-off-by: Mike Christie <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2023-04-11nvme: Fix reservation status related structsMike Christie1-8/+30
This fixes the following issues with the reservation status structs: 1. resv10 is bytes 23:10 so it should be 14 bytes. 2. regctl_ds only supports 64 bit host IDs. These are not currently used, but will be in this patchset which adds support for the reservation report command. Signed-off-by: Mike Christie <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2023-04-11block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICTMike Christie1-2/+2
BLK_STS_NEXUS is used for NVMe/SCSI reservation conflicts and DASD's locking feature which works similar to NVMe/SCSI reservations where a host can get a lock on a device and when the lock is taken it will get failures. This patch renames BLK_STS_NEXUS so it better reflects this type of use. Signed-off-by: Mike Christie <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Stefan Haberland <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2023-04-11block: Add PR callouts for read keys and reservationMike Christie1-0/+25
Add callouts for reading keys and reservations. This allows LIO to support the READ_KEYS and READ_RESERVATION commands so it can export devices to VMs for software like windows clustering. Signed-off-by: Mike Christie <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2023-04-11NFSv3: handle out-of-order write replies.NeilBrown1-0/+47
NFSv3 includes pre/post wcc attributes which allow the client to determine if all changes to the file have been made by the client itself, or if any might have been made by some other client. If there are gaps in the pre/post ctime sequence it must be assumed that some other client changed the file in that gap and the local cache must be suspect. The next time the file is opened the cache should be invalidated. Since Commit 1c341b777501 ("NFS: Add deferred cache invalidation for close-to-open consistency violations") in linux 5.3 the Linux client has been triggering this invalidation. The chunk in nfs_update_inode() in particularly triggers. Unfortunately Linux NFS assumes that all replies will be processed in the order sent, and will arrive in the order processed. This is not true in general. Consequently Linux NFS might ignore the wcc info in a WRITE reply because the reply is in response to a WRITE that was sent before some other request for which a reply has already been seen. This is detected by Linux using the gencount tests in nfs_inode_attr_cmp(). Also, when the gencount tests pass it is still possible that the request were processed on the server in a different order, and a gap seen in the ctime sequence might be filled in by a subsequent reply, so gaps should not immediately trigger delayed invalidation. The net result is that writing to a server and then reading the file back can result in going to the server for the read rather than serving it from cache - all because a couple of replies arrived out-of-order. This is a performance regression over kernels before 5.3, though the change in 5.3 is a correctness improvement. This has been seen with Linux writing to a Netapp server which occasionally re-orders requests. In testing the majority of requests were in-order, but a few (maybe 2 or three at a time) could be re-ordered. This patch addresses the problem by recording any gaps seen in the pre/post ctime sequence and not triggering invalidation until either there are too many gaps to fit in the table, or until there are no more active writes and the remaining gaps cannot be resolved. We allocate a table of 16 gaps on demand. If the allocation fails we revert to current behaviour which is of little cost as we are unlikely to be able to cache the writes anyway. In the table we store "start->end" pair when iversion is updated and "end<-start" pairs pre/post pairs reported by the server. Usually these exactly cancel out and so nothing is stored. When there are out-of-order replies we do store gaps and these will eventually be cancelled against later replies when this client is the only writer. If the final write is out-of-order there may be one gap remaining when the file is closed. This will be noticed and if there is precisely on gap and if the iversion can be advanced to match it, then we do so. This patch makes no attempt to handle directories correctly. The same problem potentially exists in the out-of-order replies to create/unlink requests can cause future lookup requires to be sent to the server unnecessarily. A similar scheme using the same primitives could be used to notice and handle out-of-order replies. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2023-04-11Merge tag 'pci-v6.3-fixes-2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Provide pci_msix_can_alloc_dyn() stub when CONFIG_PCI_MSI unset to avoid build errors (Reinette Chatre) - Quirk AMD XHCI controller that loses MSI-X state in D3hot to avoid broken USB after hotplug or suspend/resume (Basavaraj Natikar) - Fix use-after-free in pci_bus_release_domain_nr() (Rob Herring) * tag 'pci-v6.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Fix use-after-free in pci_bus_release_domain_nr() x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()
2023-04-11NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bitDave Wysochanski1-1/+0
The NFS specific trace points are no longer needed as tracing is well covered by netfs and fscache. Signed-off-by: Dave Wysochanski <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Tested-by: Daire Byrne <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2023-04-11NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs APIDave Wysochanski1-12/+0
The old NFSIOS_FSCACHE counters are no longer accurate or useful with the conversion to the new netfs API. The new API does not have a page based interface, and so the counters in nfs_stat_fscachecounters are no longer obtainable. The new netfs the API has extensive statistics inside /proc/fs/fscache/stats so we no longer need NFS specific fscache stats. Note this also removes the 'fsc:' line from /proc/self/mountstats so it will be a user-visible change. Signed-off-by: Dave Wysochanski <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Tested-by: Daire Byrne <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2023-04-11NFS: Convert buffered read paths to use netfs when fscache is enabledDave Wysochanski2-0/+6
Convert the NFS buffered read code paths to corresponding netfs APIs, but only when fscache is configured and enabled. The netfs API defines struct netfs_request_ops which must be filled in by the network filesystem. For NFS, we only need to define 5 of the functions, the main one being the issue_read() function. The issue_read() function is called by the netfs layer when a read cannot be fulfilled locally, and must be sent to the server (either the cache is not active, or it is active but the data is not available). Once the read from the server is complete, netfs requires a call to netfs_subreq_terminated() which conveys either how many bytes were read successfully, or an error. Note that issue_read() is called with a structure, netfs_io_subrequest, which defines the IO requested, and contains a start and a length (both in bytes), and assumes the underlying netfs will return a either an error on the whole region, or the number of bytes successfully read. The NFS IO path is page based and the main APIs are the pgio APIs defined in pagelist.c. For the pgio APIs, there is no way for the caller to know how many RPCs will be sent and how the pages will be broken up into underlying RPCs, each of which will have their own completion and return code. In contrast, netfs is subrequest based, a single subrequest may contain multiple pages, and a single subrequest is initiated with issue_read() and terminated with netfs_subreq_terminated(). Thus, to utilze the netfs APIs, NFS needs some way to accommodate the netfs API requirement on the single response to the whole subrequest, while also minimizing disruptive changes to the NFS pgio layer. The approach taken with this patch is to allocate a small structure for each nfs_netfs_issue_read() call, store the final error and number of bytes successfully transferred in the structure, and update these values as each RPC completes. The refcount on the structure is used as a marker for the last RPC completion, is incremented in nfs_netfs_read_initiate(), and decremented inside nfs_netfs_read_completion(), when a nfs_pgio_header contains a valid pointer to the data. On the final put (which signals the final outstanding RPC is complete) in nfs_netfs_read_completion(), call netfs_subreq_terminated() with either the final error value (if one or more READs complete with an error) or the number of bytes successfully transferred (if all RPCs complete successfully). Note that when all RPCs complete successfully, the number of bytes transferred is capped to the length of the subrequest. Capping the transferred length to the subrequest length prevents "Subreq overread" warnings from netfs. This is due to the "aligned_len" in nfs_pageio_add_page(), and the corner case where NFS requests a full page at the end of the file, even when i_size reflects only a partial page (NFS overread). Signed-off-by: Dave Wysochanski <[email protected]> Tested-by: Daire Byrne <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2023-04-11NFS: Configure support for netfs when NFS fscache is configuredDave Wysochanski1-14/+10
As first steps for support of the netfs library when NFS_FSCACHE is configured, add NETFS_SUPPORT to Kconfig and add the required netfs_inode into struct nfs_inode. Using netfs requires we move the VFS inode structure to be stored inside struct netfs_inode, along with the fscache_cookie. Thus, if NFS_FSCACHE is configured, place netfs_inode inside an anonymous union so the vfs_inode memory is the same and we do not need to modify other non-fscache areas of NFS. In addition, inside the NFS fscache code, use the new helpers, netfs_inode() and netfs_i_cookie() helpers, and remove our own helper, nfs_i_fscache(). Later patches will convert NFS fscache to fully use netfs. Signed-off-by: Dave Wysochanski <[email protected]> Tested-by: Daire Byrne <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2023-04-11dm: add helper macro for simple DM target module init and exitYangtao Li1-0/+20
Eliminate duplicate boilerplate code for simple modules that contain a single DM target driver without any additional setup code. Add a new module_dm() macro, which replaces the module_init() and module_exit() with template functions that call dm_register_target() and dm_unregister_target() respectively. Signed-off-by: Yangtao Li <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2023-04-11bpf: Simplify internal verifier log interfaceAndrii Nakryiko1-10/+3
Simplify internal verifier log API down to bpf_vlog_init() and bpf_vlog_finalize(). The former handles input arguments validation in one place and makes it easier to change it. The latter subsumes -ENOSPC (truncation) and -EFAULT handling and simplifies both caller's code (bpf_check() and btf_parse()). For btf_parse(), this patch also makes sure that verifier log finalization happens even if there is some error condition during BTF verification process prior to normal finalization step. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Lorenz Bauer <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-04-11bpf: Add log_true_size output field to return necessary log buffer sizeAndrii Nakryiko2-2/+2
Add output-only log_true_size and btf_log_true_size field to BPF_PROG_LOAD and BPF_BTF_LOAD commands, respectively. It will return the size of log buffer necessary to fit in all the log contents at specified log_level. This is very useful for BPF loader libraries like libbpf to be able to size log buffer correctly, but could be used by users directly, if necessary, as well. This patch plumbs all this through the code, taking into account actual bpf_attr size provided by user to determine if these new fields are expected by users. And if they are, set them from kernel on return. We refactory btf_parse() function to accommodate this, moving attr and uattr handling inside it. The rest is very straightforward code, which is split from the logging accounting changes in the previous patch to make it simpler to review logic vs UAPI changes. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Lorenz Bauer <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-04-11bpf: Keep track of total log content size in both fixed and rolling modesAndrii Nakryiko1-9/+3
Change how we do accounting in BPF_LOG_FIXED mode and adopt log->end_pos as *logical* log position. This means that we can go beyond physical log buffer size now and be able to tell what log buffer size should be to fit entire log contents without -ENOSPC. To do this for BPF_LOG_FIXED mode, we need to remove a short-circuiting logic of not vsnprintf()'ing further log content once we filled up user-provided buffer, which is done by bpf_verifier_log_needed() checks. We modify these checks to always keep going if log->level is non-zero (i.e., log is requested), even if log->ubuf was NULL'ed out due to copying data to user-space, or if entire log buffer is physically full. We adopt bpf_verifier_vlog() routine to work correctly with log->ubuf == NULL condition, performing log formatting into temporary kernel buffer, doing all the necessary accounting, but just avoiding copying data out if buffer is full or NULL'ed out. With these changes, it's now possible to do this sort of determination of log contents size in both BPF_LOG_FIXED and default rolling log mode. We need to keep in mind bpf_vlog_reset(), though, which shrinks log contents after successful verification of a particular code path. This log reset means that log->end_pos isn't always increasing, so to return back to users what should be the log buffer size to fit all log content without causing -ENOSPC even in the presence of log resetting, we need to keep maximum over "lifetime" of logging. We do this accounting in bpf_vlog_update_len_max() helper. A related and subtle aspect is that with this logical log->end_pos even in BPF_LOG_FIXED mode we could temporary "overflow" buffer, but then reset it back with bpf_vlog_reset() to a position inside user-supplied log_buf. In such situation we still want to properly maintain terminating zero. We will eventually return -ENOSPC even if final log buffer is small (we detect this through log->len_max check). This behavior is simpler to reason about and is consistent with current behavior of verifier log. Handling of this required a small addition to bpf_vlog_reset() logic to avoid doing put_user() beyond physical log buffer dimensions. Another issue to keep in mind is that we limit log buffer size to 32-bit value and keep such log length as u32, but theoretically verifier could produce huge log stretching beyond 4GB. Instead of keeping (and later returning) 64-bit log length, we cap it at UINT_MAX. Current UAPI makes it impossible to specify log buffer size bigger than 4GB anyways, so we don't really loose anything here and keep everything consistently 32-bit in UAPI. This property will be utilized in next patch. Doing the same determination of maximum log buffer for rolling mode is trivial, as log->end_pos and log->start_pos are already logical positions, so there is nothing new there. These changes do incidentally fix one small issue with previous logging logic. Previously, if use provided log buffer of size N, and actual log output was exactly N-1 bytes + terminating \0, kernel logic coun't distinguish this condition from log truncation scenario which would end up with truncated log contents of N-1 bytes + terminating \0 as well. But now with log->end_pos being logical position that could go beyond actual log buffer size, we can distinguish these two conditions, which we do in this patch. This plays nicely with returning log_size_actual (implemented in UAPI in the next patch), as we can now guarantee that if user takes such log_size_actual and provides log buffer of that exact size, they will not get -ENOSPC in return. All in all, all these changes do conceptually unify fixed and rolling log modes much better, and allow a nice feature requested by users: knowing what should be the size of the buffer to avoid -ENOSPC. We'll plumb this through the UAPI and the code in the next patch. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Lorenz Bauer <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-04-11bpf: Switch BPF verifier log to be a rotating log by defaultAndrii Nakryiko1-7/+26
Currently, if user-supplied log buffer to collect BPF verifier log turns out to be too small to contain full log, bpf() syscall returns -ENOSPC, fails BPF program verification/load, and preserves first N-1 bytes of the verifier log (where N is the size of user-supplied buffer). This is problematic in a bunch of common scenarios, especially when working with real-world BPF programs that tend to be pretty complex as far as verification goes and require big log buffers. Typically, it's when debugging tricky cases at log level 2 (verbose). Also, when BPF program is successfully validated, log level 2 is the only way to actually see verifier state progression and all the important details. Even with log level 1, it's possible to get -ENOSPC even if the final verifier log fits in log buffer, if there is a code path that's deep enough to fill up entire log, even if normally it would be reset later on (there is a logic to chop off successfully validated portions of BPF verifier log). In short, it's not always possible to pre-size log buffer. Also, what's worse, in practice, the end of the log most often is way more important than the beginning, but verifier stops emitting log as soon as initial log buffer is filled up. This patch switches BPF verifier log behavior to effectively behave as rotating log. That is, if user-supplied log buffer turns out to be too short, verifier will keep overwriting previously written log, effectively treating user's log buffer as a ring buffer. -ENOSPC is still going to be returned at the end, to notify user that log contents was truncated, but the important last N bytes of the log would be returned, which might be all that user really needs. This consistent -ENOSPC behavior, regardless of rotating or fixed log behavior, allows to prevent backwards compatibility breakage. The only user-visible change is which portion of verifier log user ends up seeing *if buffer is too small*. Given contents of verifier log itself is not an ABI, there is no breakage due to this behavior change. Specialized tools that rely on specific contents of verifier log in -ENOSPC scenario are expected to be easily adapted to accommodate old and new behaviors. Importantly, though, to preserve good user experience and not require every user-space application to adopt to this new behavior, before exiting to user-space verifier will rotate log (in place) to make it start at the very beginning of user buffer as a continuous zero-terminated string. The contents will be a chopped off N-1 last bytes of full verifier log, of course. Given beginning of log is sometimes important as well, we add BPF_LOG_FIXED (which equals 8) flag to force old behavior, which allows tools like veristat to request first part of verifier log, if necessary. BPF_LOG_FIXED flag is also a simple and straightforward way to check if BPF verifier supports rotating behavior. On the implementation side, conceptually, it's all simple. We maintain 64-bit logical start and end positions. If we need to truncate the log, start position will be adjusted accordingly to lag end position by N bytes. We then use those logical positions to calculate their matching actual positions in user buffer and handle wrap around the end of the buffer properly. Finally, right before returning from bpf_check(), we rotate user log buffer contents in-place as necessary, to make log contents contiguous. See comments in relevant functions for details. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Lorenz Bauer <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-04-11bpf: Split off basic BPF verifier log into separate fileAndrii Nakryiko1-12/+7
kernel/bpf/verifier.c file is large and growing larger all the time. So it's good to start splitting off more or less self-contained parts into separate files to keep source code size (somewhat) somewhat under control. This patch is a one step in this direction, moving some of BPF verifier log routines into a separate kernel/bpf/log.c. Right now it's most low-level and isolated routines to append data to log, reset log to previous position, etc. Eventually we could probably move verifier state printing logic here as well, but this patch doesn't attempt to do that yet. Subsequent patches will add more logic to verifier log management, so having basics in a separate file will make sure verifier.c doesn't grow more with new changes. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Lorenz Bauer <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-04-10net: piggy back on the memory barrier in bql when waking queuesJakub Kicinski1-1/+1
Drivers call netdev_tx_completed_queue() right before netif_txq_maybe_wake(). If BQL is enabled netdev_tx_completed_queue() should issue a memory barrier, so we can depend on that separating the stop check from the consumer index update, instead of adding another barrier in netif_txq_maybe_wake(). This matters more than the barriers on the xmit path, because the wake condition is almost always true. So we issue the consumer side barrier often. Wrap netdev_tx_completed_queue() in a local helper to issue the barrier even if BQL is disabled. Keep the same semantics as netdev_tx_completed_queue() (barrier only if bytes != 0) to make it clear that the barrier is conditional. Plus since macro gets pkt/byte counts as arguments now - we can skip waking if there were no packets completed. Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-10net: provide macros for commonly copied lockless queue stop/wake codeJakub Kicinski1-0/+1
A lot of drivers follow the same scheme to stop / start queues without introducing locks between xmit and NAPI tx completions. I'm guessing they all copy'n'paste each other's code. The original code dates back all the way to e1000 and Linux 2.6.19. Smaller drivers shy away from the scheme and introduce a lock which may cause deadlocks in netpoll. Provide macros which encapsulate the necessary logic. The macros do not prevent false wake ups, the extra barrier required to close that race is not worth it. See discussion in: https://lore.kernel.org/all/[email protected]/ Acked-by: Herbert Xu <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-10of: Move of_device_(add|register|unregister) to of_platform.hRob Herring2-4/+5
As of_device_(add|register|unregister) functions work on struct platform_device, they should be declared in of_platform.h instead. This move is transparent for now as both headers include each other. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-04-10of: Make devtree_lock declaration privateRob Herring1-2/+0
Sparc is the only place devtree_lock is used outside of drivers/of/. Move the devtree_lock declaration into of_private.h and Sparc's prom.h so pulling in spinlock.h to of.h can be avoided for everything besides Sparc. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-04-10f2fs: use common implementation of file typeWeizhao Ouyang1-15/+0
Use common implementation of file type conversion helpers. Signed-off-by: Weizhao Ouyang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-04-10iio: light: Add gain-time-scale helpersMatti Vaittinen1-0/+206
Some light sensors can adjust both the HW-gain and integration time. There are cases where adjusting the integration time has similar impact to the scale of the reported values as gain setting has. IIO users do typically expect to handle scale by a single writable 'scale' entry. Driver should then adjust the gain/time accordingly. It however is difficult for a driver to know whether it should change gain or integration time to meet the requested scale. Usually it is preferred to have longer integration time which usually improves accuracy, but there may be use-cases where long measurement times can be an issue. Thus it can be preferable to allow also changing the integration time - but mitigate the scale impact by also changing the gain underneath. Eg, if integration time change doubles the measured values, the driver can reduce the HW-gain to half. The theory of the computations of gain-time-scale is simple. However, some people (undersigned) got that implemented wrong for more than once. Add some gain-time-scale helpers in order to not dublicate errors in all drivers needing these computations. Signed-off-by: Matti Vaittinen <[email protected]> Link: https://lore.kernel.org/r/268d418e7cffcdaa2ece6738478bbc57692c213e.1680263956.git.mazziesaccount@gmail.com Signed-off-by: Jonathan Cameron <[email protected]>
2023-04-10Merge 6.3-rc6 into usb-nextGreg Kroah-Hartman9-9/+26
We need the USB fixes in here for testing, and this resolves two merge conflicts, one pointed out by linux-next: drivers/usb/dwc3/dwc3-pci.c drivers/usb/host/xhci-pci.c Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-10Merge 6.3-rc6 into tty-nextGreg Kroah-Hartman23-29/+130
We need the tty/serial fixes in here for testing. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-10Merge 6.3-rc6 into char-misc-nextGreg Kroah-Hartman23-29/+130
We need it here to apply other char/misc driver changes to. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-09Merge branch 'hwmon-const' of ↵Jakub Kicinski1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull in pre-requisite patches from Guenter Roeck to constify pointers to hwmon_channel_info. * 'hwmon-const' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: constify pointers to hwmon_channel_info Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-09Merge tag 'cxl-fixes-6.3-rc6' of ↵Linus Torvalds1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull compute express link (cxl) fixes from Dan Williams: "Several fixes for driver startup regressions that landed during the merge window as well as some older bugs. The regressions were due to a lack of testing with what the CXL specification calls Restricted CXL Host (RCH) topologies compared to the testing with Virtual Host (VH) CXL topologies. A VH topology is typical PCIe while RCH topologies map CXL endpoints as Root Complex Integrated endpoints. The impact is some driver crashes on startup. This merge window also added compatibility for range registers (the mechanism that CXL 1.1 defined for mapping memory) to treat them like HDM decoders (the mechanism that CXL 2.0 defined for mapping Host-managed Device Memory). That work collided with the new region enumeration code that was tested with CXL 2.0 setups, and fails with crashes at startup. Lastly, the DOE (Data Object Exchange) implementation for retrieving an ACPI-like data table from CXL devices is being reworked for v6.4. Several fixes fell out of that work that are suitable for v6.3. All of this has been in linux-next for a while, and all reported issues [1] have been addressed. Summary: - Fix several issues with region enumeration in RCH topologies that can trigger crashes on driver startup or shutdown. - Fix CXL DVSEC range register compatibility versus region enumeration that leads to startup crashes - Fix CDAT endiannes handling - Fix multiple buffer handling boundary conditions - Fix Data Object Exchange (DOE) workqueue usage vs CONFIG_DEBUG_OBJECTS warn splats" Link: http://lore.kernel.org/r/[email protected] [1] * tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/hdm: Extend DVSEC range register emulation for region enumeration cxl/hdm: Limit emulation to the number of range registers cxl/region: Move coherence tracking into cxl_region_attach() cxl/region: Fix region setup/teardown for RCDs cxl/port: Fix find_cxl_root() for RCDs and simplify it cxl/hdm: Skip emulation when driver manages mem_enable cxl/hdm: Fix double allocation of @cxlhdm PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y cxl/pci: Handle excessive CDAT length cxl/pci: Handle truncated CDAT entries cxl/pci: Handle truncated CDAT header cxl/pci: Fix CDAT retrieval on big endian
2023-04-09net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stubVladimir Oltean1-6/+0
There was a sort of rush surrounding commit 88c0a6b503b7 ("net: create a netdev notifier for DSA to reject PTP on DSA master"), due to a desire to convert DSA's attempt to deny TX timestamping on a DSA master to something that doesn't block the kernel-wide API conversion from ndo_eth_ioctl() to ndo_hwtstamp_set(). What was required was a mechanism that did not depend on ndo_eth_ioctl(), and what was provided was a mechanism that did not depend on ndo_eth_ioctl(), while at the same time introducing something that wasn't absolutely necessary - a new netdev notifier. There have been objections from Jakub Kicinski that using notifiers in general when they are not absolutely necessary creates complications to the control flow and difficulties to maintainers who look at the code. So there is a desire to not use notifiers. In addition to that, the notifier chain gets called even if there is no DSA in the system and no one is interested in applying any restriction. Take the model of udp_tunnel_nic_ops and introduce a stub mechanism, through which net/core/dev_ioctl.c can call into DSA even when CONFIG_NET_DSA=m. Compared to the code that existed prior to the notifier conversion, aka what was added in commits: - 4cfab3566710 ("net: dsa: Add wrappers for overloaded ndo_ops") - 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers") this is different because we are not overloading any struct net_device_ops of the DSA master anymore, but rather, we are exposing a rather specific functionality which is orthogonal to which API is used to enable it - ndo_eth_ioctl() or ndo_hwtstamp_set(). Also, what is similar is that both approaches use function pointers to get from built-in code to DSA. There is no point in replicating the function pointers towards __dsa_master_hwtstamp_validate() once for every CPU port (dev->dsa_ptr). Instead, it is sufficient to introduce a singleton struct dsa_stubs, built into the kernel, which contains a single function pointer to __dsa_master_hwtstamp_validate(). I find this approach preferable to what we had originally, because dev->dsa_ptr->netdev_ops->ndo_do_ioctl() used to require going through struct dsa_port (dev->dsa_ptr), and so, this was incompatible with any attempts to add any data encapsulation and hide DSA data structures from the outside world. Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-08kexec: remove unnecessary arch_kexec_kernel_image_load()Bjorn Helgaas1-6/+0
arch_kexec_kernel_image_load() only calls kexec_image_load_default(), and there are no arch-specific implementations. Remove the unnecessary arch_kexec_kernel_image_load() and make kexec_image_load_default() static. No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Biederman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-08x86/kexec: remove unnecessary arch_kexec_kernel_image_load()Bjorn Helgaas1-2/+0
Patch series "kexec: Remove unnecessary arch hook", v2. There are no arch-specific things in arch_kexec_kernel_image_load(), so remove it and just use the generic version. This patch (of 2): The x86 implementation of arch_kexec_kernel_image_load() is functionally identical to the generic arch_kexec_kernel_image_load(): arch_kexec_kernel_image_load # x86 if (!image->fops || !image->fops->load) return ERR_PTR(-ENOEXEC); return image->fops->load(image, image->kernel_buf, ...) arch_kexec_kernel_image_load # generic kexec_image_load_default if (!image->fops || !image->fops->load) return ERR_PTR(-ENOEXEC); return image->fops->load(image, image->kernel_buf, ...) Remove the x86-specific version and use the generic arch_kexec_kernel_image_load(). No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Biederman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-08kernel.h: split the hexadecimal related helpers to hex.hAndy Shevchenko2-28/+36
For the sake of cleaning up the kernel.h split the hexadecimal related helpers to own header called 'hex.h'. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-08Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM fixes from Andrew Morton: "28 hotfixes. 23 are cc:stable and the other five address issues which were introduced during this merge cycle. 20 are for MM and the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits) maple_tree: fix a potential concurrency bug in RCU mode maple_tree: fix get wrong data_end in mtree_lookup_walk() mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() nilfs2: fix sysfs interface lifetime mm: take a page reference when removing device exclusive entries mm: vmalloc: avoid warn_alloc noise caused by fatal signal nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread() zsmalloc: document freeable stats zsmalloc: document new fullness grouping fsdax: force clear dirty mark if CoW mm/hugetlb: fix uffd wr-protection for CoW optimization path mm: enable maple tree RCU mode by default maple_tree: add RCU lock checking to rcu callback functions maple_tree: add smp_rmb() to dead node detection maple_tree: fix write memory barrier of nodes once dead for RCU mode maple_tree: remove extra smp_wmb() from mas_dead_leaves() maple_tree: fix freeing of nodes in rcu mode maple_tree: detect dead nodes in mas_start() maple_tree: be more cautious about dead nodes ...
2023-04-08irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4Shanker Donthineni1-0/+18
The T241 platform suffers from the T241-FABRIC-4 erratum which causes unexpected behavior in the GIC when multiple transactions are received simultaneously from different sources. This hardware issue impacts NVIDIA server platforms that use more than two T241 chips interconnected. Each chip has support for 320 {E}SPIs. This issue occurs when multiple packets from different GICs are incorrectly interleaved at the target chip. The erratum text below specifies exactly what can cause multiple transfer packets susceptible to interleaving and GIC state corruption. GIC state corruption can lead to a range of problems, including kernel panics, and unexpected behavior. >From the erratum text: "In some cases, inter-socket AXI4 Stream packets with multiple transfers, may be interleaved by the fabric when presented to ARM Generic Interrupt Controller. GIC expects all transfers of a packet to be delivered without any interleaving. The following GICv3 commands may result in multiple transfer packets over inter-socket AXI4 Stream interface: - Register reads from GICD_I* and GICD_N* - Register writes to 64-bit GICD registers other than GICD_IROUTERn* - ITS command MOVALL Multiple commands in GICv4+ utilize multiple transfer packets, including VMOVP, VMOVI, VMAPP, and 64-bit register accesses." This issue impacts system configurations with more than 2 sockets, that require multi-transfer packets to be sent over inter-socket AXI4 Stream interface between GIC instances on different sockets. GICv4 cannot be supported. GICv3 SW model can only be supported with the workaround. Single and Dual socket configurations are not impacted by this issue and support GICv3 and GICv4." Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf Writing to the chip alias region of the GICD_In{E} registers except GICD_ICENABLERn has an equivalent effect as writing to the global distributor. The SPI interrupt deactivate path is not impacted by the erratum. To fix this problem, implement a workaround that ensures read accesses to the GICD_In{E} registers are directed to the chip that owns the SPI, and disable GICv4.x features. To simplify code changes, the gic_configure_irq() function uses the same alias region for both read and write operations to GICD_ICFGR. Co-developed-by: Vikram Sethi <[email protected]> Signed-off-by: Vikram Sethi <[email protected]> Signed-off-by: Shanker Donthineni <[email protected]> Acked-by: Sudeep Holla <[email protected]> (for SMCCC/SOC ID bits) Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-08irqchip/gic: Drop support for board filesMarc Zyngier1-6/+0
With the last non-OF, non-ACPI user of the GIC being removed in e73307b9ebc4 ("ARM: cns3xxx: remove entire platform"), we can finally drop the entry point and do some minor cleanup. We also make the driver depend on CONFIG_OF, which is required even when CONFIG_ACPI is selected. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-07thermal/of: Unexport unused OF functionsDaniel Lezcano1-17/+0
The functions thermal_of_zone_register() and thermal_of_zone_unregister() are no longer needed from the drivers as the devm_ variant is always used. Make them static in the C file and remove their declaration from thermal.h Signed-off-by: Daniel Lezcano <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-07regmap: allow upshifting register addresses before performing operationsMaxime Chevallier1-3/+12
Similar to the existing reg_downshift mechanism, that is used to translate register addresses on busses that have a smaller address stride, it's also possible to want to upshift register addresses. Such a case was encountered when network PHYs and PCS that usually sit on a MDIO bus (16-bits register with a stride of 1) are integrated directly as memory-mapped devices. Here, the same register layout defined in 802.3 is used, but the register now have a larger stride. Introduce a mechanism to also allow upshifting register addresses. Re-purpose reg_downshift into a more generic, signed reg_shift, whose sign indicates the direction of the shift. To avoid confusion, also introduce macros to explicitly indicate if we want to downshift or upshift. For bisectability, change any use of reg_downshift to use reg_shift. Signed-off-by: Maxime Chevallier <[email protected]> Tested-by: Colin Foster <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-04-07hwmon: constify pointers to hwmon_channel_infoKrzysztof Kozlowski1-1/+1
HWmon core receives an array of pointers to hwmon_channel_info and it does not modify it, thus it can be array of const pointers for safety. This allows drivers to make them also const. Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Guenter Roeck <[email protected]>
2023-04-07cpufreq: drivers with target_index() must set freq_tableViresh Kumar1-0/+1
Since the cpufreq core directly uses freq_table, for cpufreq drivers that set their target_index() callback, make it mandatory for them to set the same. Since this is set per policy and normally from policy->init(), do this from cpufreq_table_validate_and_sort() which gets called right after ->init(). Reported-by: Yajun Deng <[email protected]> Signed-off-by: Viresh Kumar <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-04-07thermal/core: Remove thermal_bind_params structureZhang Rui1-38/+0
Remove struct thermal_bind_params because no one is using it for thermal binding now. Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-07net: ethernet: mtk_eth_soc: add code for offloading flows from wlan devicesFelix Fietkau1-0/+6
WED version 2 (on MT7986 and later) can offload flows originating from wireless devices. In order to make that work, ndo_setup_tc needs to be implemented on the netdevs. This adds the required code to offload flows coming in from WED, while keeping track of the incoming wed index used for selecting the correct PPE device. Signed-off-by: Felix Fietkau <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-07dma-mapping: provide a fallback dma_default_coherentJiaxun Yang1-0/+2
dma_default_coherent was decleared unconditionally at kernel/dma/mapping.c but only decleared when any of non-coherent options is enabled in dma-map-ops.h. Guard the declaration in mapping.c with non-coherent options and provide a fallback definition. Signed-off-by: Jiaxun Yang <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-06io_uring: reduce scheduling due to twPavel Begunkov1-1/+2
Every task_work will try to wake the task to be executed, which causes excessive scheduling and additional overhead. For some tw it's justified, but others won't do much but post a single CQE. When a task waits for multiple cqes, every such task_work will wake it up. Instead, the task may give a hint about how many cqes it waits for, io_req_local_work_add() will compare against it and skip wake ups if #cqes + #tw is not enough to satisfy the waiting condition. Task_work that uses the optimisation should be simple enough and never post more than one CQE. It's also ignored for non DEFER_TASKRUN rings. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/d2b77e99d1e86624d8a69f7037d764b739dcd225.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2023-04-06PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()Reinette Chatre1-0/+2
pci_msix_can_alloc_dyn() is not declared when CONFIG_PCI_MSI is disabled. There is no existing user of pci_msix_can_alloc_dyn() but work is in progress to change this. This work encounters the following error when CONFIG_PCI_MSI is disabled: drivers/vfio/pci/vfio_pci_intrs.c:427:21: error: implicit declaration of function 'pci_msix_can_alloc_dyn' [-Werror=implicit-function-declaration] Provide definition for pci_msix_can_alloc_dyn() in preparation for users that need to compile when CONFIG_PCI_MSI is disabled. [bhelgaas: Also reported by Arnd Bergmann <[email protected]> in drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c; added his Fixes: line] Fixes: fb0a6a268dcd ("net/mlx5: Provide external API for allocating vectors") Fixes: 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X") Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Link: https://lore.kernel.org/r/310ecc4815dae4174031062f525245f0755c70e2.1680119924.git.reinette.chatre@intel.com Reported-by: kernel test robot <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]> Cc: [email protected] # v6.2+
2023-04-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-3/+14
Conflicts: drivers/net/ethernet/google/gve/gve.h 3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts") 75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format") https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ Adjacent changes: net/can/isotp.c 051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()") 96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size") Signed-off-by: Jakub Kicinski <[email protected]>