aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2024-01-02net-device: move gso_partial_features to net_device_read_txEric Dumazet1-1/+1
dev->gso_partial_features is read from tx fast path for GSO packets. Move it to appropriate section to avoid a cache line miss. Fixes: 43a71cd66b9c ("net-device: reorganize net_device fast path variables") Signed-off-by: Eric Dumazet <[email protected]> Cc: Coco Li <[email protected]> Cc: David Ahern <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-02platform/x86: wmi: linux/wmi.h: fix Excess kernel-doc description warningRandy Dunlap1-2/+0
Remove the "private:" comment to prevent the kernel-doc warning: include/linux/wmi.h:27: warning: Excess struct member 'setable' description in 'wmi_device' Either a struct member is documented (via kernel-doc) or it's private, but not both. Fixes: b4cc979588ee ("platform/x86: wmi: Add kernel doc comments") Signed-off-by: Randy Dunlap <[email protected]> Cc: Armin Wolf <[email protected]> Cc: Hans de Goede <[email protected]> Cc: Ilpo Järvinen <[email protected]> Cc: [email protected] Reviewed-by: Armin Wolf <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2024-01-02drm/gpuvm: fix all kernel-doc warnings in include/drm/drm_gpuvm.hRandy Dunlap1-35/+45
Update kernel-doc comments in <drm/drm_gpuvm.h> to correct all kernel-doc warnings: drm_gpuvm.h:148: warning: Excess struct member 'addr' description in 'drm_gpuva' drm_gpuvm.h:148: warning: Excess struct member 'offset' description in 'drm_gpuva' drm_gpuvm.h:148: warning: Excess struct member 'obj' description in 'drm_gpuva' drm_gpuvm.h:148: warning: Excess struct member 'entry' description in 'drm_gpuva' drm_gpuvm.h:148: warning: Excess struct member '__subtree_last' description in 'drm_gpuva' drm_gpuvm.h:192: warning: No description found for return value of 'drm_gpuva_invalidated' drm_gpuvm.h:331: warning: Excess struct member 'tree' description in 'drm_gpuvm' drm_gpuvm.h:331: warning: Excess struct member 'list' description in 'drm_gpuvm' drm_gpuvm.h:331: warning: Excess struct member 'list' description in 'drm_gpuvm' drm_gpuvm.h:331: warning: Excess struct member 'local_list' description in 'drm_gpuvm' drm_gpuvm.h:331: warning: Excess struct member 'lock' description in 'drm_gpuvm' drm_gpuvm.h:331: warning: Excess struct member 'list' description in 'drm_gpuvm' drm_gpuvm.h:331: warning: Excess struct member 'local_list' description in 'drm_gpuvm' drm_gpuvm.h:331: warning: Excess struct member 'lock' description in 'drm_gpuvm' drm_gpuvm.h:352: warning: No description found for return value of 'drm_gpuvm_get' drm_gpuvm.h:545: warning: Excess struct member 'fn' description in 'drm_gpuvm_exec' drm_gpuvm.h:545: warning: Excess struct member 'priv' description in 'drm_gpuvm_exec' drm_gpuvm.h:597: warning: missing initial short description on line: * drm_gpuvm_exec_resv_add_fence() drm_gpuvm.h:616: warning: missing initial short description on line: * drm_gpuvm_exec_validate() drm_gpuvm.h:623: warning: No description found for return value of 'drm_gpuvm_exec_validate' drm_gpuvm.h:698: warning: Excess struct member 'gpuva' description in 'drm_gpuvm_bo' drm_gpuvm.h:698: warning: Excess struct member 'entry' description in 'drm_gpuvm_bo' drm_gpuvm.h:698: warning: Excess struct member 'gem' description in 'drm_gpuvm_bo' drm_gpuvm.h:698: warning: Excess struct member 'evict' description in 'drm_gpuvm_bo' drm_gpuvm.h:726: warning: No description found for return value of 'drm_gpuvm_bo_get' drm_gpuvm.h:738: warning: missing initial short description on line: * drm_gpuvm_bo_gem_evict() drm_gpuvm.h:740: warning: missing initial short description on line: * drm_gpuvm_bo_gem_evict() drm_gpuvm.h:698: warning: Excess struct member 'evict' description in 'drm_gpuvm_bo' drm_gpuvm.h:844: warning: Excess struct member 'addr' description in 'drm_gpuva_op_map' drm_gpuvm.h:844: warning: Excess struct member 'range' description in 'drm_gpuva_op_map' drm_gpuvm.h:844: warning: Excess struct member 'offset' description in 'drm_gpuva_op_map' drm_gpuvm.h:844: warning: Excess struct member 'obj' description in 'drm_gpuva_op_map' Signed-off-by: Randy Dunlap <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Jonathan Corbet <[email protected]> Cc: Vegard Nossum <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-01-02HID: bpf: make bus_type const in struct hid_bpf_opsGreg Kroah-Hartman1-1/+1
The struct bus_type pointer in hid_bpf_ops just passes the pointer to the driver core, and the driver core can handle, and expects, a constant pointer, so also make the pointer constant in hid_bpf_ops. Part of the process of moving all usages of struct bus_type to be constant to move them all to read-only memory. Cc: Jiri Kosina <[email protected]> Cc: Benjamin Tissoires <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2024-01-02HID: make hid_bus_type constGreg Kroah-Hartman1-1/+1
Now that the driver core can properly handle constant struct bus_type, move the hid_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Jiri Kosina <[email protected]> Cc: Benjamin Tissoires <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2024-01-02Merge tag 'v6.7-rc8' into locking/core, to pick up dependent changesIngo Molnar83-275/+461
Pick up these commits from Linus's tree: b106bcf0f99a ("locking/osq_lock: Clarify osq_wait_next()") 563adbfc351b ("locking/osq_lock: Clarify osq_wait_next() calling convention") 7c2230982129 ("locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.c") Signed-off-by: Ingo Molnar <[email protected]>
2024-01-02reboot: Introduce thermal_zone_device_critical_reboot()Fabio Estevam1-0/+5
Introduce thermal_zone_device_critical_reboot() to trigger an emergency reboot. It is a counterpart of thermal_zone_device_critical() with the difference that it will force a reboot instead of shutdown. The motivation for doing this is to allow the thermal subystem to trigger a reboot when the temperature reaches the critical temperature. Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-02thermal/core: Prepare for introduction of thermal rebootFabio Estevam1-1/+6
Add some helper functions to make it easier introducing the support for thermal reboot. No functional change. Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-01net: ethtool: Introduce a command to list PHYs on an interfaceMaxime Chevallier1-0/+29
As we have the ability to track the PHYs connected to a net_device through the link_topology, we can expose this list to userspace. This allows userspace to use these identifiers for phy-specific commands and take the decision of which PHY to target by knowing the link topology. Add PHY_GET and PHY_DUMP, which can be a filtered DUMP operation to list devices on only one interface. Signed-off-by: Maxime Chevallier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-01net: ethtool: Allow passing a phy index for some commandsMaxime Chevallier1-0/+1
Some netlink commands are target towards ethernet PHYs, to control some of their features. As there's several such commands, add the ability to pass a PHY index in the ethnl request, which will populate the generic ethnl_req_info with the relevant phydev when the command targets a PHY. Signed-off-by: Maxime Chevallier <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-01net: sfp: Add helper to return the SFP bus nameMaxime Chevallier1-0/+6
Knowing the bus name is helpful when we want to expose the link topology to userspace, add a helper to return the SFP bus name. Signed-off-by: Maxime Chevallier <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-01net: phy: add helpers to handle sfp phy connect/disconnectMaxime Chevallier1-0/+2
There are a few PHY drivers that can handle SFP modules through their sfp_upstream_ops. Introduce Phylib helpers to keep track of connected SFP PHYs in a netdevice's namespace, by adding the SFP PHY to the upstream PHY's netdev's namespace. By doing so, these SFP PHYs can be enumerated and exposed to users, which will be able to use their capabilities. Signed-off-by: Maxime Chevallier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-01net: sfp: pass the phy_device when disconnecting an sfp module's PHYMaxime Chevallier1-1/+1
Pass the phy_device as a parameter to the sfp upstream .disconnect_phy operation. This is preparatory work to help track phy devices across a net_device's link. Signed-off-by: Maxime Chevallier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-01net: phy: Introduce ethernet link topology representationMaxime Chevallier5-1/+109
Link topologies containing multiple network PHYs attached to the same net_device can be found when using a PHY as a media converter for use with an SFP connector, on which an SFP transceiver containing a PHY can be used. With the current model, the transceiver's PHY can't be used for operations such as cable testing, timestamping, macsec offload, etc. The reason being that most of the logic for these configuration, coming from either ethtool netlink or ioctls tend to use netdev->phydev, which in multi-phy systems will reference the PHY closest to the MAC. Introduce a numbering scheme allowing to enumerate PHY devices that belong to any netdev, which can in turn allow userspace to take more precise decisions with regard to each PHY's configuration. The numbering is maintained per-netdev, in a phy_device_list. The numbering works similarly to a netdevice's ifindex, with identifiers that are only recycled once INT_MAX has been reached. This prevents races that could occur between PHY listing and SFP transceiver removal/insertion. The identifiers are assigned at phy_attach time, as the numbering depends on the netdevice the phy is attached to. Signed-off-by: Maxime Chevallier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-01afs: trace: Log afs_make_call(), including server addressDavid Howells1-0/+36
Add a tracepoint to log calls to afs_make_call(), including the destination server address. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2024-01-01afs: Fix fileserver rotationDavid Howells1-12/+69
Fix the fileserver rotation so that it doesn't use RTT as the basis for deciding which server and address to use as this doesn't necessarily give a good indication of the best path. Instead, use the configurable preference list in conjunction with whatever probes have succeeded at the time of looking. To this end, make the following changes: (1) Keep an array of "server states" to track what addresses we've tried on each server and move the waitqueue entries there that we'll need for probing. (2) Each afs_server_state struct is made to pin the corresponding server's endpoint state rather than the afs_operation struct carrying a pin on the server we're currently looking at. (3) Drop the server list preference; we now always rescan the server list. (4) afs_wait_for_probes() now uses the server state list to guide it in what it waits for (and to provide the waitqueue entries) and returns an indication of whether we'd got a response, run out of responsive addresses or the endpoint state had been superseded and we need to restart the iteration. (5) Call afs_get_address_preferences*() occasionally to refresh the preference values. (6) When picking a server, scan the addresses of the servers for which we have as-yet untested communications, looking for the highest priority one and use that instead of trying all the addresses for a particular server in ascending-RTT order. (7) When a Busy or Offline state is seen across all available servers, do a short sleep. (8) If we detect that we accessed a future RO volume version whilst it is undergoing replication, reissue the op against the older version until at least half of the servers are replicated. (9) Whilst RO replication is ongoing, increase the frequency of Volume Location server checks for that volume to every ten minutes instead of hourly. Also add a tracepoint to track progress through the rotation algorithm. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2024-01-01afs: Overhaul invalidation handling to better support RO volumesDavid Howells1-4/+0
Overhaul the third party-induced invalidation handling, making use of the previously added volume-level event counters (cb_scrub and cb_ro_snapshot) that are now being parsed out of the VolSync record returned by the fileserver in many of its replies. This allows better handling of RO (and Backup) volumes. Since these are snapshot of a RW volume that are updated atomically simultantanously across all servers that host them, they only require a single callback promise for the entire volume. The currently upstream code assumes that RO volumes operate in the same manner as RW volumes, and that each file has its own individual callback - which means that it does a status fetch for *every* file in a RO volume, whether or not the volume got "released" (volume callback breaks can occur for other reasons too, such as the volumeserver taking ownership of a volume from a fileserver). To this end, make the following changes: (1) Change the meaning of the volume's cb_v_break counter so that it is now a hint that we need to issue a status fetch to work out the state of a volume. cb_v_break is incremented by volume break callbacks and by server initialisation callbacks. (2) Add a second counter, cb_v_check, to the afs_volume struct such that if this differs from cb_v_break, we need to do a check. When the check is complete, cb_v_check is advanced to what cb_v_break was at the start of the status fetch. (3) Move the list of mmap'd vnodes to the volume and trigger removal of PTEs that map to files on a volume break rather than on a server break. (4) When a server reinitialisation callback comes in, use the server-to-volume reverse mapping added in a preceding patch to iterate over all the volumes using that server and clear the volume callback promises for that server and the general volume promise as a whole to trigger reanalysis. (5) Replace the AFS_VNODE_CB_PROMISED flag with an AFS_NO_CB_PROMISE (TIME64_MIN) value in the cb_expires_at field, reducing the number of checks we need to make. (6) Change afs_check_validity() to quickly see if various event counters have been incremented or if the vnode or volume callback promise is due to expire/has expired without making any changes to the state. That is now left to afs_validate() as this may get more complicated in future as we may have to examine server records too. (7) Overhaul afs_validate() so that it does a single status fetch if we need to check the state of either the vnode or the volume - and do so under appropriate locking. The function does the following steps: (A) If the vnode/volume is no longer seen as valid, then we take the vnode validation lock and, if the volume promise has expired, the volume check lock also. The latter prevents redundant checks being made to find out if a new version of the volume got released. (B) If a previous RPC call found that the volsync changed unexpectedly or that a RO volume was updated, then we unmap all PTEs pointing to the file to stop mmap being used for access. (C) If the vnode is still seen to be of uncertain validity, then we perform an FS.FetchStatus RPC op to jointly update the volume status and the vnode status. This assessment is done as part of parsing the reply: If the RO volume creation timestamp advances, cb_ro_snapshot is incremented; if either the creation or update timestamps changes in an unexpected way, the cb_scrub counter is incremented If the Data Version returned doesn't match the copy we have locally, then we ask for the pagecache to be zapped. This takes care of handling RO update. (D) If cb_scrub differs between volume and vnode, the vnode's pagecache is zapped and the vnode's cb_scrub is updated unless the file is marked as having been deleted. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2024-01-01afs: Parse the VolSync record in the reply of a number of RPC opsDavid Howells1-1/+29
A number of fileserver RPC operations return a VolSync record as part of their reply that gives some information about the state of the volume being accessed, including: (1) A volume Creation timestamp. For an RW volume, this is the time at which the volume was created; if it changes, the RW volume was presumably restored from a backup and all cached data should be scrubbed as Data Version numbers could regress on the files in the volume. For an RO volume, this is the time it was last snapshotted from the RW volume. It is expected to advance each time this happens; if it regresses, cached data should be scrubbed. (2) A volume Update timestamp (Auristor only). For an RW volume, this is updated any time any change is made to a volume or its contents. If it regresses, all cached data must be scrubbed. For an RO volume, this is a copy of the RW volume's Update timestamp at the point of snapshotting. It can be used as a version number when checking to see if a callback on a RO volume was due to a snapshot. If it regresses, all cached data must be scrubbed. but this is currently not made use of by the in-kernel afs filesystem. Make the afs filesystem use this by: (1) Add an update time field to the afs_volsync struct and use a value of TIME64_MIN in both that and the creation time to indicate that they are unset. (2) Add creation and update time fields to the afs_volume struct and use this to track the two timestamps. (3) Add a volsync_lock mutex to the afs_volume struct to control modification access for when we detect a change in these values. (3) Add a 'pre-op volsync' struct to the afs_operation struct to record the state of the volume tracking before the op. (4) Add a new counter, cb_scrub, to the afs_volume struct to count events that require all data to be scrubbed. A copy is placed in the afs_vnode struct (inode) and if they no longer match, a scrub takes place. (5) When the result of an operation is being parsed, parse the VolSync data too, if it is provided. Note that the two timestamps are handled separately, since they don't work in quite the same way. - If the afs_volume tracking is unset, just set it and do nothing else. - If the result timestamps are the same as the ones in afs_volume, do nothing. - If the timestamps regress, increment cb_scrub if not already done so. - If the creation timestamp on a RW volume changes, increment cb_scrub if not already done so. - If the creation timestamp on a RO volume advances, update the server list and see if the current server has been excluded, if so reissue the op. Once over half of the replication sites have been updated, increment cb_ro_snapshot to indicate updates may be required and switch over to excluding unupdated replication sites. - If the creation timestamp on a Backup volume advances, just increment cb_ro_snapshot to trigger updates. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2024-01-01afs: Apply server breaks to mmap'd files in the call processorDavid Howells1-0/+2
Apply server breaks to mmap'd files that are being used from that server from the call processor work function rather than punting it off to a workqueue. The work item, afs_server_init_callback(), then bumps each individual inode off to its own work item introducing a potentially lengthy delay. This reduces that delay at the cost of extending the amount of time we delay replying to the CB.InitCallBack3 notification RPC from the server. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2024-01-01afs: Keep a record of the current fileserver endpoint stateDavid Howells1-15/+54
Keep a record of the current fileserver endpoint state, including the probe state, and replace it when a new probe is started rather than just squelching the old state and overwriting it. Clearance of the old state can cause a race if there's another thread also currently trying to communicate with that server. It appears that this race might be the culprit for some occasions where kafs complains about invalid data in the RPC reply because the rotation algorithm fell all the way through without actually issuing an RPC call and the error return got filled in from the probe state (which has a zero error recorded). Whatever happens to be in the caller's reply buffer is then taken as the response. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2024-01-01afs: Dispatch vlserver probes in priority orderDavid Howells1-0/+34
When probing all the addresses for a volume location server, dispatch them in order of descending priority to try and get back highest priority one first. Also add a tracepoint to show the transmission and completion of the probes. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2024-01-01afs: Dispatch fileserver probes in priority orderDavid Howells1-0/+33
When probing all the addresses for a fileserver, dispatch them in order of descending priority to try and get back highest priority one first. Also add a tracepoint to show the transmission and completion of the probes. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2024-01-01Merge tag 'nf-next-23-12-22' of ↵David S. Miller1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== netfilter pull request 23-12-22 The following patchset contains Netfilter updates for net-next: 1) Add locking for NFT_MSG_GETSETELEM_RESET requests, to address a race scenario with two concurrent processes running a dump-and-reset which exposes negative counters to userspace, from Phil Sutter. 2) Use GFP_KERNEL in pipapo GC, from Florian Westphal. 3) Reorder nf_flowtable struct members, place the read-mostly parts accessed by the datapath first. From Florian Westphal. 4) Set on dead flag for NFT_MSG_NEWSET in abort path, from Florian Westphal. 5) Support filtering zone in ctnetlink, from Felix Huettner. 6) Bail out if user tries to redefine an existing chain with different type in nf_tables. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-01-01Merge tag 'for-netdev' of ↵David S. Miller2-14/+22
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next-for-netdev The following pull-request contains BPF updates for your *net-next* tree. We've added 22 non-merge commits during the last 3 day(s) which contain a total of 23 files changed, 652 insertions(+), 431 deletions(-). The main changes are: 1) Add verifier support for annotating user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience, from Andrii Nakryiko. These tags are: - Ability to annotate a special PTR_TO_CTX argument - Ability to annotate a generic PTR_TO_MEM as non-NULL 2) Support BPF verifier tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like, from Menglong Dong. 3) Fix a warning in bpf_mem_cache's check_obj_size() as reported by LKP, from Hou Tao. 4) Re-support uid/gid options when mounting bpffs which had to be reverted with the prior token series revert to avoid conflicts, from Daniel Borkmann. 5) Fix a libbpf NULL pointer dereference in bpf_object__collect_prog_relos() found from fuzzing the library with malformed ELF files, from Mingyi Zhang. 6) Skip DWARF sections in libbpf's linker sanity check given compiler options to generate compressed debug sections can trigger a rejection due to misalignment, from Alyssa Ross. 7) Fix an unnecessary use of the comma operator in BPF verifier, from Simon Horman. 8) Fix format specifier for unsigned long values in cpustat sample, from Colin Ian King. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-01-01net: mdio: get/put device node during (un)registrationLuiz Angelo Daros de Luca1-0/+3
The __of_mdiobus_register() function was storing the device node in dev.of_node without increasing its reference count. It implicitly relied on the caller to maintain the allocated node until the mdiobus was unregistered. Now, __of_mdiobus_register() will acquire the node before assigning it, and of_mdiobus_unregister_callback() will be called at the end of mdio_unregister(). Drivers can now release the node immediately after MDIO registration. Some of them are already doing that even before this patch. Signed-off-by: Luiz Angelo Daros de Luca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-12-31Merge tag 'iio-for-6.8b' of ↵Greg Kroah-Hartman2-10/+3
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Jonathan writes: IIO: 2nd set of new device support, features and cleanup for 6.8 A late/optimistic second pull request. The bots have been poking them since Wednesday without any issues. There are a few fixes in the ad7091r5 driver as part of rework to enable the ad7091r8 parts that are included at start of that series. Includes pre-work for major changes to the DMA buffers that should land in 6.9 and will greatly improve performance and flexibility for high performance devices by enabling DMABUF based zero copy transfers when we don't need to bounce the data via user space. New device support ------------------ adi,ad7091r8 - Major refactor of existing adi,ad7091r5 driver to separate out useful shared library code that can be used by I2C and SPI parts. - Use that library from a new driver supporting the AD7091R-2, AD7091R-4 and AD7091R-8 12-Bit SPI ADCs. - Series includes some late breaking fixes for the ad7091r5. microchip,mcp4821 - New driver for MCP4801, MCP4802, MCP4811, MCP4812, MCP4821 and MCP4822 I2C single / dual channel DACs. Cleanup ------- buffers: - Use IIO_SEPARATE in place of some hard-coded 0 values. dma-buffers: - Simplify things to not use an outgoing queue given it only ever has up to two elements and we only need to track which is first. - Split the iio_dma_buffer_fileio_free() function up to make it easier to read and enable reuse in a series lining up for 6.9 iio.h - Drop some stale documentation of struct fields that don't exist. * tag 'iio-for-6.8b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: iio: linux/iio.h: fix Excess kernel-doc description warning MAINTAINERS: Add MAINTAINERS entry for AD7091R iio: adc: Add support for AD7091R-8 dt-bindings: iio: Add AD7091R-8 iio: adc: Split AD7091R-5 config symbol iio: adc: ad7091r: Add chip_info callback to get conversion result channel iio: adc: ad7091r: Set device mode through chip_info callback iio: adc: ad7091r: Remove unneeded probe parameters iio: adc: ad7091r: Move chip init data to container struct iio: adc: ad7091r: Move generic AD7091R code to base driver and header file iio: adc: ad7091r: Enable internal vref if external vref is not supplied iio: adc: ad7091r: Allow users to configure device events iio: dac: driver for MCP4821 dt-bindings: iio: dac: add MCP4821 iio: buffer-dma: split iio_dma_buffer_fileio_free() function iio: buffer-dma: Get rid of outgoing queue iio: buffer: Use IIO_SEPARATE instead of a hard-coded 0
2023-12-30locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.cDavid Laight1-5/+0
struct optimistic_spin_node is private to the implementation. Move it into the C file to ensure nothing is accessing it. Signed-off-by: David Laight <[email protected]> Acked-by: Waiman Long <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-12-30ALSA: mark all struct bus_type as constGreg Kroah-Hartman2-2/+2
Now that the driver core can properly handle constant struct bus_type, move all of the sound subsystem struct bus_type structures as const, placing them into read-only memory which can not be modified at runtime. Note, this fixes a duplicate definition of ac97_bus_type, which somehow was declared extern in a .h file, and then static as a prototype in a .c file, and then properly later on in the same .c file. Amazing that no compiler warning ever showed up for this. Cc: Jaroslav Kysela <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Dawei Li <[email protected]> Cc: Yu Liao <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Hans de Goede <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/2023121945-immersion-budget-d0aa@gregkh Signed-off-by: Takashi Iwai <[email protected]>
2023-12-29zswap: memcontrol: implement zswap writeback disablingNhat Pham2-0/+19
During our experiment with zswap, we sometimes observe swap IOs due to occasional zswap store failures and writebacks-to-swap. These swapping IOs prevent many users who cannot tolerate swapping from adopting zswap to save memory and improve performance where possible. This patch adds the option to disable this behavior entirely: do not writeback to backing swapping device when a zswap store attempt fail, and do not write pages in the zswap pool back to the backing swap device (both when the pool is full, and when the new zswap shrinker is called). This new behavior can be opted-in/out on a per-cgroup basis via a new cgroup file. By default, writebacks to swap device is enabled, which is the previous behavior. Initially, writeback is enabled for the root cgroup, and a newly created cgroup will inherit the current setting of its parent. Note that this is subtly different from setting memory.swap.max to 0, as it still allows for pages to be stored in the zswap pool (which itself consumes swap space in its current form). This patch should be applied on top of the zswap shrinker series: https://lore.kernel.org/linux-mm/[email protected]/ as it also disables the zswap shrinker, a major source of zswap writebacks. For the most part, this feature is motivated by internal parties who have already established their opinions regarding swapping - the workloads that are highly sensitive to IO, and especially those who are using servers with really slow disk performance (for instance, massive but slow HDDs). For these folks, it's impossible to convince them to even entertain zswap if swapping also comes as a packaged deal. Writeback disabling is quite a useful feature in these situations - on a mixed workloads deployment, they can disable writeback for the more IO-sensitive workloads, and enable writeback for other background workloads. For instance, on a server with HDD, I allocate memories and populate them with random values (so that zswap store will always fail), and specify memory.high low enough to trigger reclaim. The time it takes to allocate the memories and just read through it a couple of times (doing silly things like computing the values' average etc.): zswap.writeback disabled: real 0m30.537s user 0m23.687s sys 0m6.637s 0 pages swapped in 0 pages swapped out zswap.writeback enabled: real 0m45.061s user 0m24.310s sys 0m8.892s 712686 pages swapped in 461093 pages swapped out (the last two lines are from vmstat -s). [[email protected]: add a comment about recurring zswap store failures leading to reclaim inefficiency] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nhat Pham <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Reviewed-by: Yosry Ahmed <[email protected]> Acked-by: Chris Li <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Heidelberg <[email protected]> Cc: Domenico Cerasuolo <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29Merge tag 'mlx5-updates-2023-12-20' of ↵David S. Miller3-6/+29
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-12-20 mlx5 Socket direct support and management PF profile. Tariq Says: =========== Support Socket-Direct multi-dev netdev This series adds support for combining multiple devices (PFs) of the same port under one netdev instance. Passing traffic through different devices belonging to different NUMA sockets saves cross-numa traffic and allows apps running on the same netdev from different numas to still feel a sense of proximity to the device and achieve improved performance. We achieve this by grouping PFs together, and creating the netdev only once all group members are probed. Symmetrically, we destroy the netdev once any of the PFs is removed. The channels are distributed between all devices, a proper configuration would utilize the correct close numa when working on a certain app/cpu. We pick one device to be a primary (leader), and it fills a special role. The other devices (secondaries) are disconnected from the network in the chip level (set to silent mode). All RX/TX traffic is steered through the primary to/from the secondaries. Currently, we limit the support to PFs only, and up to two devices (sockets). =========== Armen Says: =========== Management PF support and module integration This patch rolls out comprehensive support for the Management Physical Function (MGMT PF) within the mlx5 driver. It involves updating the mlx5 interface header to introduce necessary definitions for MGMT PF and adding a new management PF netdev profile, which will allow the host side to communicate with the embedded linux on Blue-field devices. =========== ==================== Signed-off-by: David S. Miller <[email protected]>
2023-12-29kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZEYouling Tang1-6/+0
Remove duplicate definitions, no functional changes. Link: https://lkml.kernel.org/r/MW4PR84MB3145459ADC7EB38BBB36955B8198A@MW4PR84MB3145.NAMPRD84.PROD.OUTLOOK.COM Signed-off-by: Youling Tang <[email protected]> Reported-by: Huacai Chen <[email protected]> Acked-by: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29lib: crc_ccitt_false() is identical to crc_itu_t()Mathis Marion2-9/+2
crc_ccitt_false() was introduced in commit 0d85adb5fbd33 ("lib/crc-ccitt: Add CCITT-FALSE CRC16 variant"), but it is redundant with crc_itu_t(). Since the latter is more used, it is the one being kept. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mathis Marion <[email protected]> Cc: Andrey Smirnov <[email protected]> Cc: Andrey Vostrikov <[email protected]> Cc: Jérôme Pouiller <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]()David Hildenbrand1-25/+71
Let's convert it like we converted all the other rmap functions. Don't introduce folio_try_share_anon_rmap_ptes() for now, as we don't have a user that wants rmap batching in sight. Pretty easy to add later. All users are easy to convert -- only ksm.c doesn't use folios yet but that is left for future work -- so let's just do it in a single shot. While at it, turn the BUG_ON into a WARN_ON_ONCE. Note that page_try_share_anon_rmap() so far didn't care about pte/pmd mappings (no compound parameter). We're changing that so we can perform better sanity checks and make the code actually more readable/consistent. For example, __folio_rmap_sanity_checks() will make sure that a PMD range actually falls completely into the folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: remove page_try_dup_anon_rmap()David Hildenbrand1-13/+3
All users are gone, remove page_try_dup_anon_rmap() and any remaining traces. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()David Hildenbrand2-50/+106
The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone, remove them. Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap baching during fork() soon. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: convert page_dup_file_rmap() to folio_dup_file_rmap_[pte|ptes|pmd]()David Hildenbrand1-5/+54
Let's convert page_dup_file_rmap() like the other rmap functions. As there is only a single caller, convert that single caller right away and remove page_dup_file_rmap(). Add folio_dup_file_rmap_ptes() right away, we want to perform rmap baching during fork() soon. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: remove page_remove_rmap()David Hildenbrand1-3/+1
All callers are gone, let's remove it and some leftover traces. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()David Hildenbrand1-0/+6
Let's mimic what we did with folio_add_file_rmap_*() and folio_add_anon_rmap_*() so we can similarly replace page_remove_rmap() next. Make the compiler always special-case on the granularity by using __always_inline. We're adding folio_remove_rmap_ptes() handling right away, as we want to use that soon for batching rmap operations when unmapping PTE-mapped large folios. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: remove RMAP_COMPOUNDDavid Hildenbrand1-9/+3
No longer used, let's remove it and clarify RMAP_NONE/RMAP_EXCLUSIVE a bit. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: remove page_add_anon_rmap()David Hildenbrand1-2/+0
All users are gone, remove it and all traces. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()David Hildenbrand1-0/+6
Let's mimic what we did with folio_add_file_rmap_*() so we can similarly replace page_add_anon_rmap() next. Make the compiler always special-case on the granularity by using __always_inline. For the PageAnonExclusive sanity checks, when adding a PMD mapping, we're now also checking each individual subpage covered by that PMD, instead of only the head page. Note that the new functions ignore the RMAP_COMPOUND flag, which we will remove as soon as page_add_anon_rmap() is gone. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: remove page_add_file_rmap()David Hildenbrand1-2/+0
All users are gone, let's remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: convert folio_add_file_rmap_range() into ↵David Hildenbrand1-2/+44
folio_add_file_rmap_[pte|ptes|pmd]() Let's get rid of the compound parameter and instead define explicitly which mappings we're adding. That is more future proof, easier to read and harder to mess up. Use an enum to express the granularity internally. Make the compiler always special-case on the granularity by using __always_inline. Replace the "compound" check by a switch-case that will be removed by the compiler completely. Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the folio_test_pmd_mappable() check by a config check in the caller and sanity checks. Convert the single user of folio_add_file_rmap_range(). While at it, consistently use "int" instead of "unisgned int" in rmap code when dealing with mapcounts and the number of pages. This function design can later easily be extended to PUDs and to batch PMDs. Note that for now we don't support anything bigger than PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks will catch if that ever changes. Next up is removing page_remove_rmap() along with its "compound" parameter and smilarly converting all other rmap functions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: introduce and use hugetlb_try_share_anon_rmap()David Hildenbrand1-0/+25
hugetlb rmap handling differs quite a lot from "ordinary" rmap code. For example, hugetlb currently only supports entire mappings, and treats any mapping as mapped using a single "logical PTE". Let's move it out of the way so we can overhaul our "ordinary" rmap. implementation/interface. So let's introduce and use hugetlb_try_dup_anon_rmap() to make all hugetlb handling use dedicated hugetlb_* rmap functions. Add sanity checks that we end up with the right folios in the right functions. Note that try_to_unmap_one() does not need care. Easy to spot because among all that nasty hugetlb special-casing in that function, we're not using set_huge_pte_at() on the anon path -- well, and that code assumes that we would want to swapout. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()David Hildenbrand2-3/+27
hugetlb rmap handling differs quite a lot from "ordinary" rmap code. For example, hugetlb currently only supports entire mappings, and treats any mapping as mapped using a single "logical PTE". Let's move it out of the way so we can overhaul our "ordinary" rmap. implementation/interface. So let's introduce and use hugetlb_try_dup_anon_rmap() to make all hugetlb handling use dedicated hugetlb_* rmap functions. Add sanity checks that we end up with the right folios in the right functions. Note that is_device_private_page() does not apply to hugetlb. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: introduce and use hugetlb_add_file_rmap()David Hildenbrand1-0/+8
hugetlb rmap handling differs quite a lot from "ordinary" rmap code. For example, hugetlb currently only supports entire mappings, and treats any mapping as mapped using a single "logical PTE". Let's move it out of the way so we can overhaul our "ordinary" rmap. implementation/interface. Right now we're using page_dup_file_rmap() in some cases where "ordinary" rmap code would have used page_add_file_rmap(). So let's introduce and use hugetlb_add_file_rmap() instead. We won't be adding a "hugetlb_dup_file_rmap()" functon for the fork() case, as it would be doing the same: "dup" is just an optimization for "add". What remains is a single page_dup_file_rmap() call in fork() code. Add sanity checks that we end up with the right folios in the right functions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: introduce and use hugetlb_remove_rmap()David Hildenbrand1-0/+7
hugetlb rmap handling differs quite a lot from "ordinary" rmap code. For example, hugetlb currently only supports entire mappings, and treats any mapping as mapped using a single "logical PTE". Let's move it out of the way so we can overhaul our "ordinary" rmap. implementation/interface. Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb code from page_remove_rmap(). This effectively removes one check on the small-folio path as well. Add sanity checks that we end up with the right folios in the right functions. Note: all possible candidates that need care are page_remove_rmap() that pass compound=true. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/rmap: rename hugepage_add* to hugetlb_add*David Hildenbrand1-2/+2
Patch series "mm/rmap: interface overhaul", v2. This series overhauls the rmap interface, to get rid of the "bool compound" / RMAP_COMPOUND parameter with the goal of making the interface less error prone, more future proof, and more natural to extend to "batching". Also, this converts the interface to always consume folio+subpage, which speeds up operations on large folios. Further, this series adds PTE-batching variants for 4 rmap functions, whereby only folio_add_anon_rmap_ptes() is used for batching in this series when PTE-remapping a PMD-mapped THP. folio_remove_rmap_ptes(), folio_try_dup_anon_rmap_ptes() and folio_dup_file_rmap_ptes() will soon come in handy[1,2]. This series performs a lot of folio conversion along the way. Most of the added LOC in the diff are only due to documentation. As we're moving to a pte/pmd interface where we clearly express the mapping granularity we are dealing with, we first get the remainder of hugetlb out of the way, as it is special and expected to remain special: it treats everything as a "single logical PTE" and only currently allows entire mappings. Even if we'd ever support partial mappings, I strongly assume the interface and implementation will still differ heavily: hopefull we can avoid working on subpages/subpage mapcounts completely and only add a "count" parameter for them to enable batching. New (extended) hugetlb interface that operates on entire folio: * hugetlb_add_new_anon_rmap() -> Already existed * hugetlb_add_anon_rmap() -> Already existed * hugetlb_try_dup_anon_rmap() * hugetlb_try_share_anon_rmap() * hugetlb_add_file_rmap() * hugetlb_remove_rmap() New "ordinary" interface for small folios / THP:: * folio_add_new_anon_rmap() -> Already existed * folio_add_anon_rmap_[pte|ptes|pmd]() * folio_try_dup_anon_rmap_[pte|ptes|pmd]() * folio_try_share_anon_rmap_[pte|pmd]() * folio_add_file_rmap_[pte|ptes|pmd]() * folio_dup_file_rmap_[pte|ptes|pmd]() * folio_remove_rmap_[pte|ptes|pmd]() folio_add_new_anon_rmap() will always map at the largest granularity possible (currently, a single PMD to cover a PMD-sized THP). Could be extended if ever required. In the future, we might want "_pud" variants and eventually "_pmds" variants for batching. I ran some simple microbenchmarks on an Intel(R) Xeon(R) Silver 4210R: measuring munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE remapping PMD-mapped THPs on 1 GiB of memory. For small folios, there is barely a change (< 1% improvement for me). For PTE-mapped THP: * PTE-remapping a PMD-mapped THP is more than 10% faster. * fork() is more than 4% faster. * MADV_DONTNEED is 2% faster * COW when writing only a single byte on a COW-shared PTE is 1% faster * munmap() barely changes (< 1%). [1] https://lkml.kernel.org/r/[email protected] [2] https://lkml.kernel.org/r/[email protected] This patch (of 40): Let's just call it "hugetlb_". Yes, it's all already inconsistent and confusing because we have a lot of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly be confused with transparent huge pages, and it matches "hugetlb.c" and "folio_test_hugetlb()". So let's minimize confusion in rmap code. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm, kasan: use KASAN_TAG_KERNEL instead of 0xffAndrey Konovalov2-2/+3
Use the KASAN_TAG_KERNEL marco instead of open-coding 0xff in the mm code. This macro is provided by include/linux/kasan-tags.h, which does not include any other headers, so it's safe to include it into mm.h without causing circular include dependencies. Link: https://lkml.kernel.org/r/71db9087b0aebb6c4dccbc609cc0cd50621533c7.1703188911.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm/sparsemem: fix race in accessing memory_section->usageCharan Teja Kalla1-3/+11
The below race is observed on a PFN which falls into the device memory region with the system memory configuration where PFN's are such that [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL]. Since normal zone start and end pfn contains the device memory PFN's as well, the compaction triggered will try on the device memory PFN's too though they end up in NOP(because pfn_to_online_page() returns NULL for ZONE_DEVICE memory sections). When from other core, the section mappings are being removed for the ZONE_DEVICE region, that the PFN in question belongs to, on which compaction is currently being operated is resulting into the kernel crash with CONFIG_SPASEMEM_VMEMAP enabled. The crash logs can be seen at [1]. compact_zone() memunmap_pages ------------- --------------- __pageblock_pfn_to_page ...... (a)pfn_valid(): valid_section()//return true (b)__remove_pages()-> sparse_remove_section()-> section_deactivate(): [Free the array ms->usage and set ms->usage = NULL] pfn_section_valid() [Access ms->usage which is NULL] NOTE: From the above it can be said that the race is reduced to between the pfn_valid()/pfn_section_valid() and the section deactivate with SPASEMEM_VMEMAP enabled. The commit b943f045a9af("mm/sparse: fix kernel crash with pfn_section_valid check") tried to address the same problem by clearing the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns false thus ms->usage is not accessed. Fix this issue by the below steps: a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage. b) RCU protected read side critical section will either return NULL when SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage. c) Free the ->usage with kfree_rcu() and set ms->usage = NULL. No attempt will be made to access ->usage after this as the SECTION_HAS_MEM_MAP is cleared thus valid_section() return false. Thanks to David/Pavan for their inputs on this patch. [1] https://lore.kernel.org/linux-mm/[email protected]/ On Snapdragon SoC, with the mentioned memory configuration of PFN's as [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL], we are able to see bunch of issues daily while testing on a device farm. For this particular issue below is the log. Though the below log is not directly pointing to the pfn_section_valid(){ ms->usage;}, when we loaded this dump on T32 lauterbach tool, it is pointing. [ 540.578056] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 540.578068] Mem abort info: [ 540.578070] ESR = 0x0000000096000005 [ 540.578073] EC = 0x25: DABT (current EL), IL = 32 bits [ 540.578077] SET = 0, FnV = 0 [ 540.578080] EA = 0, S1PTW = 0 [ 540.578082] FSC = 0x05: level 1 translation fault [ 540.578085] Data abort info: [ 540.578086] ISV = 0, ISS = 0x00000005 [ 540.578088] CM = 0, WnR = 0 [ 540.579431] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBSBTYPE=--) [ 540.579436] pc : __pageblock_pfn_to_page+0x6c/0x14c [ 540.579454] lr : compact_zone+0x994/0x1058 [ 540.579460] sp : ffffffc03579b510 [ 540.579463] x29: ffffffc03579b510 x28: 0000000000235800 x27:000000000000000c [ 540.579470] x26: 0000000000235c00 x25: 0000000000000068 x24:ffffffc03579b640 [ 540.579477] x23: 0000000000000001 x22: ffffffc03579b660 x21:0000000000000000 [ 540.579483] x20: 0000000000235bff x19: ffffffdebf7e3940 x18:ffffffdebf66d140 [ 540.579489] x17: 00000000739ba063 x16: 00000000739ba063 x15:00000000009f4bff [ 540.579495] x14: 0000008000000000 x13: 0000000000000000 x12:0000000000000001 [ 540.579501] x11: 0000000000000000 x10: 0000000000000000 x9 :ffffff897d2cd440 [ 540.579507] x8 : 0000000000000000 x7 : 0000000000000000 x6 :ffffffc03579b5b4 [ 540.579512] x5 : 0000000000027f25 x4 : ffffffc03579b5b8 x3 :0000000000000001 [ 540.579518] x2 : ffffffdebf7e3940 x1 : 0000000000235c00 x0 :0000000000235800 [ 540.579524] Call trace: [ 540.579527] __pageblock_pfn_to_page+0x6c/0x14c [ 540.579533] compact_zone+0x994/0x1058 [ 540.579536] try_to_compact_pages+0x128/0x378 [ 540.579540] __alloc_pages_direct_compact+0x80/0x2b0 [ 540.579544] __alloc_pages_slowpath+0x5c0/0xe10 [ 540.579547] __alloc_pages+0x250/0x2d0 [ 540.579550] __iommu_dma_alloc_noncontiguous+0x13c/0x3fc [ 540.579561] iommu_dma_alloc+0xa0/0x320 [ 540.579565] dma_alloc_attrs+0xd4/0x108 [[email protected]: use kfree_rcu() in place of synchronize_rcu(), per David] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: f46edbd1b151 ("mm/sparsemem: add helpers track active portions of a section at boot") Signed-off-by: Charan Teja Kalla <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>