aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-10-26Merge branches 'acpi-ec', 'acpi-sysfs', 'acpi-misc' and 'acpi-uid'Rafael J. Wysocki2-0/+6
Merge ACPI EC driver updates, ACPI sysfs interface updates, misc updates related to ACPI and changes related to ACPI _UID handling for 6.7-rc1: - Add EC GPE detection quirk for HP 250 G7 Notebook PC (Jonathan Denose). - Fix and clean up create_pnp_modalias() and create_of_modalias() (Christophe JAILLET). - Modify 2 pieces of code to use acpi_evaluate_dsm_typed() (Andy Shevchenko). - Define acpi_dev_uid_match() for matching _UID and use it in several places (Raag Jadav). - Use acpi_device_uid() for fetching _UID in 2 places (Raag Jadav). * acpi-ec: ACPI: EC: Add quirk for HP 250 G7 Notebook PC * acpi-sysfs: ACPI: sysfs: Clean up create_pnp_modalias() and create_of_modalias() ACPI: sysfs: Fix create_pnp_modalias() and create_of_modalias() * acpi-misc: ACPI: x86: s2idle: Switch to use acpi_evaluate_dsm_typed() ACPI: PCI: Switch to use acpi_evaluate_dsm_typed() * acpi-uid: perf: arm_cspmu: use acpi_dev_hid_uid_match() for matching _HID and _UID ACPI: x86: use acpi_dev_uid_match() for matching _UID ACPI: utils: use acpi_dev_uid_match() for matching _UID pinctrl: intel: use acpi_dev_uid_match() for matching _UID ACPI: utils: Introduce acpi_dev_uid_match() for matching _UID perf: qcom: use acpi_device_uid() for fetching _UID ACPI: sysfs: use acpi_device_uid() for fetching _UID
2023-10-26Merge branches 'acpi-video', 'acpi-prm', 'acpi-apei' and 'acpi-pcc'Rafael J. Wysocki2-0/+17
Merge ACPI backlight driver updates, ACPI APEI updates, ACPI PRM updates and changes related to ACPI PCC for 6.7-rc1: - Add acpi_backlight=vendor quirk for Toshiba Portégé R100 (Ondrej Zary). - Add "vendor" backlight quirks for 3 Lenovo x86 Android tablets (Hans de Goede). - Move Xiaomi Mi Pad 2 backlight quirk to its own section (Hans de Goede). - Annotate struct prm_module_info with __counted_by (Kees Cook). - Fix AER info corruption in aer_recover_queue() when error status data has multiple sections (Shiju Jose). - Make APEI use ERST max execution time value for slow devices (Jeshua Smith). - Add support for platform notification handling to the PCC mailbox driver and modify it to support shared interrupts for multiple subspaces (Huisong Li). - Define common macros to use when referring to various bitfields in the PCC generic communications channel command and status fields and use them in some drivers (Sudeep Holla). * acpi-video: ACPI: video: Add acpi_backlight=vendor quirk for Toshiba Portégé R100 ACPI: video: Add "vendor" quirks for 3 Lenovo x86 Android tablets ACPI: video: Move Xiaomi Mi Pad 2 quirk to its own section * acpi-prm: ACPI: PRM: Annotate struct prm_module_info with __counted_by * acpi-apei: ACPI: APEI: Use ERST timeout for slow devices ACPI: APEI: Fix AER info corruption when error status data has multiple sections * acpi-pcc: soc: kunpeng_hccs: Migrate to use generic PCC shmem related macros hwmon: (xgene) Migrate to use generic PCC shmem related macros i2c: xgene-slimpro: Migrate to use generic PCC shmem related macros ACPI: PCC: Add PCC shared memory region command and status bitfields mailbox: pcc: Support shared interrupt for multiple subspaces mailbox: pcc: Add support for platform notification handling
2023-10-26Merge branches 'acpi-utils', 'acpi-resource', 'acpi-property' and 'acpi-soc'Rafael J. Wysocki1-3/+6
Merge ACPI utilities updates, ACPI resource management updates, ACPI device properties management updates and ACPI LPSS (Intel SoC) driver update for 6.7-rc1: - Rework acpi_handle_list handling so as to manage it dynamically, including size computation (Rafael Wysocki). - Clean up ACPI utilities code so as to make it follow the kernel coding style (Jonathan Bergh). - Consolidate IRQ trigger-type override DMI tables and drop .ident values from dmi_system_id tables used for ACPI resources management quirks (Hans de Goede). - Add ACPI IRQ override for TongFang GMxXGxx (Werner Sembach). - Allow _DSD buffer data only for byte accessors and document the _DSD data buffer GUID (Andy Shevchenko). - Drop BayTrail and Lynxpoint pinctrl device IDs from the ACPI LPSS driver, because it does not need them (Raag Jadav). * acpi-utils: ACPI: utils: Remove redundant braces around individual statement ACPI: utils: Fix up white space in a few places ACPI: utils: Dynamically determine acpi_handle_list size ACPI: thermal: Merge trip initialization functions ACPI: thermal: Collapse trip devices update function wrappers ACPI: thermal: Collapse trip devices update functions ACPI: thermal: Add device list to struct acpi_thermal_trip ACPI: thermal: Fix a small leak in acpi_thermal_add() ACPI: thermal: Drop valid flag from struct acpi_thermal_trip ACPI: thermal: Drop redundant trip point flags ACPI: thermal: Untangle initialization and updates of active trips ACPI: thermal: Untangle initialization and updates of the passive trip ACPI: thermal: Simplify critical and hot trips representation ACPI: thermal: Create and populate trip points table earlier ACPI: thermal: Determine the number of trip points earlier ACPI: thermal: Fold acpi_thermal_get_info() into its caller ACPI: thermal: Simplify initialization of critical and hot trips * acpi-resource: ACPI: resource: Do IRQ override on TongFang GMxXGxx ACPI: resource: Drop .ident values from dmi_system_id tables ACPI: resource: Consolidate IRQ trigger-type override DMI tables * acpi-property: ACPI: property: Document the _DSD data buffer GUID ACPI: property: Allow _DSD buffer data only for byte accessors * acpi-soc: ACPI: LPSS: drop BayTrail and Lynxpoint pinctrl HIDs
2023-10-26x86/apic/msi: Fix misconfigured non-maskable MSI quirkKoichiro Den2-28/+4
commit ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted"), reworked the code so that the x86 specific quirk for affinity setting of non-maskable PCI/MSI interrupts is not longer activated if necessary. This could be solved by restoring the original logic in the core MSI code, but after a deeper analysis it turned out that the quirk flag is not required at all. The quirk is only required when the PCI/MSI device cannot mask the MSI interrupts, which in turn also prevents reservation mode from being enabled for the affected interrupt. This allows ot remove the NOMASK quirk bit completely as msi_set_affinity() can instead check whether reservation mode is enabled for the interrupt, which gives exactly the same answer. Even in the momentary non-existing case that the reservation mode would be not set for a maskable MSI interrupt this would not cause any harm as it just would cause msi_set_affinity() to go needlessly through the functionaly equivalent slow path, which works perfectly fine with maskable interrupts as well. Rework msi_set_affinity() to query the reservation mode and remove all NOMASK quirk logic from the core code. [ tglx: Massaged changelog ] Fixes: ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted") Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Koichiro Den <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2023-10-26Merge tag 'nf-next-23-10-25' of ↵Paolo Abeni3-26/+38
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. Mostly nf_tables updates with two patches for connlabel and br_netfilter. 1) Rename function name to perform on-demand GC for rbtree elements, and replace async GC in rbtree by sync GC. Patches from Florian Westphal. 2) Use commit_mutex for NFT_MSG_GETRULE_RESET to ensure that two concurrent threads invoking this command do not underrun stateful objects. Patches from Phil Sutter. 3) Use single hook to deal with IP and ARP packets in br_netfilter. Patch from Florian Westphal. 4) Use atomic_t in netns->connlabel use counter instead of using a spinlock, also patch from Florian. 5) Cleanups for stateful objects infrastructure in nf_tables. Patches from Phil Sutter. 6) Flush path uses opaque set element offered by the iterator, instead of calling pipapo_deactivate() which looks up for it again. 7) Set backend .flush interface always succeeds, make it return void instead. 8) Add struct nft_elem_priv placeholder structure and use it by replacing void * to pass opaque set element representation from backend to frontend which defeats compiler type checks. 9) Shrink memory consumption of set element transactions, by reducing struct nft_trans_elem object size and reducing stack memory usage. 10) Use struct nft_elem_priv also for set backend .insert operation too. 11) Carry reset flag in nft_set_dump_ctx structure, instead of passing it as a function argument, from Phil Sutter. netfilter pull request 23-10-25 * tag 'nf-next-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctx netfilter: nf_tables: set->ops->insert returns opaque set element in case of EEXIST netfilter: nf_tables: shrink memory consumption of set elements netfilter: nf_tables: expose opaque set element as struct nft_elem_priv netfilter: nf_tables: set backend .flush always succeeds netfilter: nft_set_pipapo: no need to call pipapo_deactivate() from flush netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctx netfilter: nf_tables: nft_obj_filter fits into cb->ctx netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctx netfilter: nf_tables: A better name for nft_obj_filter netfilter: nf_tables: Unconditionally allocate nft_obj_filter netfilter: nf_tables: Drop pointless memset in nf_tables_dump_obj netfilter: conntrack: switch connlabels to atomic_t br_netfilter: use single forward hook for ip and arp netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests netfilter: nf_tables: Introduce nf_tables_getrule_single() netfilter: nf_tables: Open-code audit log call in nf_tables_getrule() netfilter: nft_set_rbtree: prefer sync gc to async worker netfilter: nft_set_rbtree: rename gc deactivate+erase function ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-10-26Merge branch 'acpica'Rafael J. Wysocki1-0/+3
Merge an ACPICA change for 6.7-rc1 which adds symbol definitions related to CDAT (Dave Jiang). * acpica: ACPICA: Add defines for CDAT SSLBIS
2023-10-26ALSA: seq: Replace with __packed attributeTakashi Iwai1-2/+2
Replace the old __attribute__((packed)) with the new __packed. Only cleanup, no functional changes. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-26ALSA: wavefront: Drop obsoleted comments and definitionsTakashi Iwai1-51/+0
The header file contains lots of outdated comments and definitions. Drop those as cleanup. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-26ALSA: wavefront: Replace with __packed attributeTakashi Iwai1-1/+1
Replace the old __attribute__((packed)) with the new __packed. Only cleanup, no functional changes. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-26ALSA: opl3: Replace with __packed attributeTakashi Iwai1-1/+1
Replace the old __attribute__((packed)) with the new __packed. Only cleanup, no functional changes. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-25ipv6: drop feature RTAX_FEATURE_ALLFRAGYan Zhai4-10/+1
RTAX_FEATURE_ALLFRAG was added before the first git commit: https://www.mail-archive.com/[email protected]/msg03399.html The feature would send packets to the fragmentation path if a box receives a PMTU value with less than 1280 byte. However, since commit 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such message would be simply discarded. The feature flag is neither supported in iproute2 utility. In theory one can still manipulate it with direct netlink message, but it is not ideal because it was based on obsoleted guidance of RFC-2460 (replaced by RFC-8200). The feature would always test false at the moment, so remove related code or mark them as unused. Signed-off-by: Yan Zhai <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/d78e44dcd9968a252143ffe78460446476a472a1.1698156966.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-25mempolicy: mmap_lock is not needed while migrating foliosHugh Dickins1-9/+0
mbind(2) holds down_write of current task's mmap_lock throughout (exclusive because it needs to set the new mempolicy on the vmas); migrate_pages(2) holds down_read of pid's mmap_lock throughout. They both hold mmap_lock across the internal migrate_pages(), under which all new page allocations (huge or small) are made. I'm nervous about it; and migrate_pages() certainly does not need mmap_lock itself. It's done this way for mbind(2), because its page allocator is vma_alloc_folio() or alloc_hugetlb_folio_vma(), both of which depend on vma and address. Now that we have alloc_pages_mpol(), depending on (refcounted) memory policy and interleave index, mbind(2) can be modified to use that or alloc_hugetlb_folio_nodemask(), and then not need mmap_lock across the internal migrate_pages() at all: add alloc_migration_target_by_mpol() to replace mbind's new_page(). (After that change, alloc_hugetlb_folio_vma() is used by nothing but a userfaultfd function: move it out of hugetlb.h and into the #ifdef.) migrate_pages(2) has chosen its target node before migrating, so can continue to use the standard alloc_migration_target(); but let it take and drop mmap_lock just around migrate_to_node()'s queue_pages_range(): neither the node-to-node calculations nor the page migrations need it. It seems unlikely, but it is conceivable that some userspace depends on the kernel's mmap_lock exclusion here, instead of doing its own locking: more likely in a testsuite than in real life. It is also possible, of course, that some pages on the list will be munmapped by another thread before they are migrated, or a newer memory policy applied to the range by that time: but such races could happen before, as soon as mmap_lock was dropped, so it does not appear to be a concern. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mempolicy: alloc_pages_mpol() for NUMA policy without vmaHugh Dickins3-5/+27
Shrink shmem's stack usage by eliminating the pseudo-vma from its folio allocation. alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the principal actor for passing mempolicy choice down to __alloc_pages(), rather than vma_alloc_folio(gfp, order, vma, addr, hugepage). vma_alloc_folio() and alloc_pages() remain, but as wrappers around alloc_pages_mpol(). alloc_pages_bulk_*() untouched, except to provide the additional args to policy_nodemask(), which subsumes policy_node(). Cleanup throughout, cutting out some unhelpful "helpers". It would all be much simpler without MPOL_INTERLEAVE, but that adds a dynamic to the constant mpol: complicated by v3.6 commit 09c231cb8bfd ("tmpfs: distribute interleave better across nodes"), which added ino bias to the interleave, hidden from mm/mempolicy.c until this commit. Hence "ilx" throughout, the "interleave index". Originally I thought it could be done just with nid, but that's wrong: the nodemask may come from the shared policy layer below a shmem vma, or it may come from the task layer above a shmem vma; and without the final nodemask then nodeid cannot be decided. And how ilx is applied depends also on page order. The interleave index is almost always irrelevant unless MPOL_INTERLEAVE: with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX passed down from vma-less alloc_pages() is also used as hint not to use THP-style hugepage allocation - to avoid the overhead of a hugepage arg (though I don't understand why we never just added a GFP bit for THP - if it actually needs a different allocation strategy from other pages of the same order). vma_alloc_folio() still carries its hugepage arg here, but it is not used, and should be removed when agreed. get_vma_policy() no longer allows a NULL vma: over time I believe we've eradicated all the places which used to need it e.g. swapoff and madvise used to pass NULL vma to read_swap_cache_async(), but now know the vma. [[email protected]: handle NULL mpol being passed to __read_swap_cache_async()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Huang Ying <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: Domenico Cerasuolo <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mempolicy: remove confusing MPOL_MF_LAZY dead codeHugh Dickins1-1/+1
v3.8 commit b24f53a0bea3 ("mm: mempolicy: Add MPOL_MF_LAZY") introduced MPOL_MF_LAZY, and included it in the MPOL_MF_VALID flags; but a720094ded8 ("mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now") immediately removed it from MPOL_MF_VALID flags, pending further review. "This will need to be revisited", but it has not been reinstated. The present state is confusing: there is dead code in mm/mempolicy.c to handle MPOL_MF_LAZY cases which can never occur. Remove that: it can be resurrected later if necessary. But keep the definition of MPOL_MF_LAZY, which must remain in the UAPI, even though it always fails with EINVAL. https://lore.kernel.org/linux-mm/[email protected]/ links to a previous request to remove MPOL_MF_LAZY. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mempolicy trivia: use pgoff_t in shared mempolicy treeHugh Dickins1-13/+7
Prefer the more explicit "pgoff_t" to "unsigned long" when dealing with a shared mempolicy tree. Delete confusing comment about pseudo mm vmas. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mempolicy trivia: slightly more consistent namingHugh Dickins1-6/+5
Before getting down to work, do a little cleanup, mainly of inconsistent variable naming. I gave up trying to rationalize mpol versus pol versus policy, and node versus nid, but let's avoid p and nd. Remove a few superfluous blank lines, but add one; and here prefer vma->vm_policy to vma_policy(vma) - the latter being appropriate in other sources, which have to allow for !CONFIG_NUMA. That intriguing line about KERNEL_DS? should have gone in v2.6.15, when numa_policy_init() stopped using set_mempolicy(2)'s system call handler. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25hugetlbfs: drop shared NUMA mempolicy pretenceHugh Dickins1-2/+1
Patch series "mempolicy: cleanups leading to NUMA mpol without vma", v2. Mostly cleanups in mm/mempolicy.c, but finally removing the pseudo-vma from shmem folio allocation, and removing the mmap_lock around folio migration for mbind and migrate_pages syscalls. This patch (of 12): hugetlbfs_fallocate() goes through the motions of pasting a shared NUMA mempolicy onto its pseudo-vma, but how could there ever be a shared NUMA mempolicy for this file? hugetlb_vm_ops has never offered a set_policy method, and hugetlbfs_parse_param() has never supported any mpol options for a mount-wide default policy. It's just an illusion: clean it away so as not to confuse others, giving us more freedom to adjust shmem's set_policy/get_policy implementation. But hugetlbfs_inode_info is still required, just to accommodate seals. Yes, shared NUMA mempolicy support could be added to hugetlbfs, with a set_policy method and/or mpol mount option (Andi's first posting did include an admitted-unsatisfactory hugetlb_set_policy()); but it seems that nobody has bothered to add that in the nineteen years since v2.6.7 made it possible, and there is at least one company that has invested enough into hugetlbfs, that I guess they have learnt well enough how to manage its NUMA, without needing shared mempolicy. Remove linux/mempolicy.h from linux/hugetlb.h: include linux/pagemap.h in its place, because hugetlb.h's recently added use of filemap_lock_folio() requires that (although most .configs and .c's get it in some other way). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm/damon: implement a function for max nr_accesses safe calculationSeongJae Park1-0/+7
Patch series "avoid divide-by-zero due to max_nr_accesses overflow". The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Avoid the divide-by-zero by implementing a function that handles the corner case (first patch), and replaces the vulnerable direct max nr_accesses calculations (remaining patches). Note that the patches for the replacements are divided for broken commits, to make backporting on required tres easier. Especially, the last patch is for a patch that not yet merged into the mainline but in mm tree. This patch (of 4): The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Implement a function that handles the corner case. Note that this commit is not fixing the real issue since this is only introducing the safe function that will replaces the problematic divisions. The replacements will be made by followup commits, to make backporting on stable series easier. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization") Signed-off-by: SeongJae Park <[email protected]> Reported-by: Jakub Acs <[email protected]> Cc: <[email protected]> [5.16+] Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm/khugepaged: convert alloc_charge_hpage() to use foliosVishal Moola (Oracle)1-14/+0
Also remove count_memcg_page_event now that its last caller no longer uses it and reword hpage_collapse_alloc_page() to hpage_collapse_alloc_folio(). This removes 1 call to compound_head() and helps convert khugepaged to use folios throughout. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm/migrate: add nr_split to trace_mm_migrate_pages stats.Zi Yan1-10/+14
Add nr_split to trace_mm_migrate_pages for large folio (including THP) split events. [[email protected]: cleanup per Huang, Ying] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zi Yan <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25bootmem: use kmemleak_free_part_phys in free_bootmem_pageLiu Shixin1-1/+1
Since kmemleak_alloc_phys() rather than kmemleak_alloc() was called from memblock_alloc_range_nid(), kmemleak_free_part_phys() should be used to delete kmemleak object in free_bootmem_page(). In debug mode, there are following warning: kmemleak: Partially freeing unknown object at 0xffff97345aff7000 (size 4096) Link: https://lkml.kernel.org/r/[email protected] Fixes: 028725e73375 ("bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page") Signed-off-by: Liu Shixin <[email protected]> Acked-by: Catalin Marinas <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Patrick Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: remove page_cpupid_xchg_last()Kefeng Wang1-12/+7
Since all calls use folio_xchg_last_cpupid(), remove page_cpupid_xchg_last(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: make finish_mkwrite_fault() staticKefeng Wang1-1/+0
Make finish_mkwrite_fault static since it is not used outside of memory.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: add folio_xchg_last_cpupid()Kefeng Wang1-0/+5
Add folio_xchg_last_cpupid() wrapper, which is required to convert page_cpupid_xchg_last() to folio vertion later in the series. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: remove xchg_page_access_time()Kefeng Wang1-8/+4
Since all calls use folio_xchg_access_time(), remove xchg_page_access_time(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: add folio_xchg_access_time()Kefeng Wang1-0/+5
Add folio_xchg_access_time() wrapper, which is required to convert xchg_page_access_time() to folio vertion later in the series. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: remove page_cpupid_last()Kefeng Wang1-11/+6
Since all calls use folio_last_cpupid(), remove page_cpupid_last(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: add folio_last_cpupid()Kefeng Wang1-0/+5
Add folio_last_cpupid() wrapper, which is required to convert page_cpupid_last() to folio vertion later in the series. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm_types: add virtual and _last_cpupid into struct folioKefeng Wang1-4/+18
Patch series "mm: convert page cpupid functions to folios", v3. The cpupid(or access time) used by numa balancing is stored in flags or _last_cpupid(if LAST_CPUPID_NOT_IN_PAGE_FLAGS) of page, this is to convert page cpupid to folio cpupid, a new _last_cpupid is added into folio, which make us to use folio->_last_cpupid directly, and the page cpupid functions are converted to folio ones. page_cpupid_last() -> folio_last_cpupid() xchg_page_access_time() -> folio_xchg_access_time() page_cpupid_xchg_last() -> folio_xchg_last_cpupid() This patch (of 19): If WANT_PAGE_VIRTUAL and LAST_CPUPID_NOT_IN_PAGE_FLAGS defined, the 'virtual' and '_last_cpupid' are in struct page, and since _last_cpupid is used by numa balancing feature, it is better to move it before KMSAN metadata from struct page, also add them into struct folio to make us to access them from folio directly. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: kmem: reimplement get_obj_cgroup_from_current()Roman Gushchin1-1/+10
Reimplement get_obj_cgroup_from_current() using current_obj_cgroup(). get_obj_cgroup_from_current() and current_obj_cgroup() share 80% of the code, so the new implementation is almost trivial. get_obj_cgroup_from_current() is a convenient function used by the bpf subsystem, so there is no reason to get rid of it completely. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin (Cruise) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Shakeel Butt <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naresh Kamboju <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: kmem: scoped objcg protectionRoman Gushchin2-0/+13
Switch to a scope-based protection of the objcg pointer on slab/kmem allocation paths. Instead of using the get_() semantics in the pre-allocation hook and put the reference afterwards, let's rely on the fact that objcg is pinned by the scope. It's possible because: 1) if the objcg is received from the current task struct, the task is keeping a reference to the objcg. 2) if the objcg is received from an active memcg (remote charging), the memcg is pinned by the scope and has a reference to the corresponding objcg. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin (Cruise) <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Acked-by: Shakeel Butt <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: kmem: make memcg keep a reference to the original objcgRoman Gushchin1-1/+7
Keep a reference to the original objcg object for the entire life of a memcg structure. This allows to simplify the synchronization on the kernel memory allocation paths: pinning a (live) memcg will also pin the corresponding objcg. The memory overhead of this change is minimal because object cgroups usually outlive their corresponding memory cgroups even without this change, so it's only an additional pointer per memcg. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin (Cruise) <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Acked-by: Shakeel Butt <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: kmem: add direct objcg pointer to task_structRoman Gushchin1-0/+4
To charge a freshly allocated kernel object to a memory cgroup, the kernel needs to obtain an objcg pointer. Currently it does it indirectly by obtaining the memcg pointer first and then calling to __get_obj_cgroup_from_memcg(). Usually tasks spend their entire life belonging to the same object cgroup. So it makes sense to save the objcg pointer on task_struct directly, so it can be obtained faster. It requires some work on fork, exit and cgroup migrate paths, but these paths are way colder. To avoid any costly synchronization the following rules are applied: 1) A task sets it's objcg pointer itself. 2) If a task is being migrated to another cgroup, the least significant bit of the objcg pointer is set atomically. 3) On the allocation path the objcg pointer is obtained locklessly using the READ_ONCE() macro and the least significant bit is checked. If it's set, the following procedure is used to update it locklessly: - task->objcg is zeroed using cmpxcg - new objcg pointer is obtained - task->objcg is updated using try_cmpxchg - operation is repeated if try_cmpxcg fails It guarantees that no updates will be lost if task migration is racing against objcg pointer update. It also allows to keep both read and write paths fully lockless. Because the task is keeping a reference to the objcg, it can't go away while the task is alive. This commit doesn't change the way the remote memcg charging works. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin (Cruise) <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Shakeel Butt <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm, pcp: reduce detecting time of consecutive high order page freeingHuang Ying1-1/+1
In current PCP auto-tuning design, if the number of pages allocated is much more than that of pages freed on a CPU, the PCP high may become the maximal value even if the allocating/freeing depth is small, for example, in the sender of network workloads. If a CPU was used as sender originally, then it is used as receiver after context switching, we need to fill the whole PCP with maximal high before triggering PCP draining for consecutive high order freeing. This will hurt the performance of some network workloads. To solve the issue, in this patch, we will track the consecutive page freeing with a counter in stead of relying on PCP draining. So, we can detect consecutive page freeing much earlier. On a 2-socket Intel server with 128 logical CPU, we tested SCTP_STREAM_MANY test case of netperf test suite with 64-pair processes. With the patch, the network bandwidth improves 5.0%. This restores the performance drop caused by PCP auto-tuning. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Sudeep Holla <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm, pcp: decrease PCP high if free pages < high watermarkHuang Ying1-0/+1
One target of PCP is to minimize pages in PCP if the system free pages is too few. To reach that target, when page reclaiming is active for the zone (ZONE_RECLAIM_ACTIVE), we will stop increasing PCP high in allocating path, decrease PCP high and free some pages in freeing path. But this may be too late because the background page reclaiming may introduce latency for some workloads. So, in this patch, during page allocation we will detect whether the number of free pages of the zone is below high watermark. If so, we will stop increasing PCP high in allocating path, decrease PCP high and free some pages in freeing path. With this, we can reduce the possibility of the premature background page reclaiming caused by too large PCP. The high watermark checking is done in allocating path to reduce the overhead in hotter freeing path. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Sudeep Holla <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: tune PCP high automaticallyHuang Ying1-0/+1
The target to tune PCP high automatically is as follows, - Minimize allocation/freeing from/to shared zone - Minimize idle pages in PCP - Minimize pages in PCP if the system free pages is too few To reach these target, a tuning algorithm as follows is designed, - When we refill PCP via allocating from the zone, increase PCP high. Because if we had larger PCP, we could avoid to allocate from the zone. - In periodic vmstat updating kworker (via refresh_cpu_vm_stats()), decrease PCP high to try to free possible idle PCP pages. - When page reclaiming is active for the zone, stop increasing PCP high in allocating path, decrease PCP high and free some pages in freeing path. So, the PCP high can be tuned to the page allocating/freeing depth of workloads eventually. One issue of the algorithm is that if the number of pages allocated is much more than that of pages freed on a CPU, the PCP high may become the maximal value even if the allocating/freeing depth is small. But this isn't a severe issue, because there are no idle pages in this case. One alternative choice is to increase PCP high when we drain PCP via trying to free pages to the zone, but don't increase PCP high during PCP refilling. This can avoid the issue above. But if the number of pages allocated is much less than that of pages freed on a CPU, there will be many idle pages in PCP and it is hard to free these idle pages. 1/8 (>> 3) of PCP high will be decreased periodically. The value 1/8 is kind of arbitrary. Just to make sure that the idle PCP pages will be freed eventually. On a 2-socket Intel server with 224 logical CPU, we run 8 kbuild instances in parallel (each with `make -j 28`) in 8 cgroup. This simulates the kbuild server that is used by 0-Day kbuild service. With the patch, the build time decreases 3.5%. The cycles% of the spinlock contention (mostly for zone lock) decreases from 11.0% to 0.5%. The number of PCP draining for high order pages freeing (free_high) decreases 65.6%. The number of pages allocated from zone (instead of from PCP) decreases 83.9%. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Suggested-by: Mel Gorman <[email protected]> Suggested-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Sudeep Holla <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: add framework for PCP high auto-tuningHuang Ying1-1/+4
The page allocation performance requirements of different workloads are usually different. So, we need to tune PCP (per-CPU pageset) high to optimize the workload page allocation performance. Now, we have a system wide sysctl knob (percpu_pagelist_high_fraction) to tune PCP high by hand. But, it's hard to find out the best value by hand. And one global configuration may not work best for the different workloads that run on the same system. One solution to these issues is to tune PCP high of each CPU automatically. This patch adds the framework for PCP high auto-tuning. With it, pcp->high of each CPU will be changed automatically by tuning algorithm at runtime. The minimal high (pcp->high_min) is the original PCP high value calculated based on the low watermark pages. While the maximal high (pcp->high_max) is the PCP high value when percpu_pagelist_high_fraction sysctl knob is set to MIN_PERCPU_PAGELIST_HIGH_FRACTION. That is, the maximal pcp->high that can be set via sysctl knob by hand. It's possible that PCP high auto-tuning doesn't work well for some workloads. So, when PCP high is tuned by hand via the sysctl knob, the auto-tuning will be disabled. The PCP high set by hand will be used instead. This patch only adds the framework, so pcp->high will be set to pcp->high_min (original default) always. We will add actual auto-tuning algorithm in the following patches in the series. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Sudeep Holla <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm, page_alloc: scale the number of pages that are batch allocatedHuang Ying1-1/+2
When a task is allocating a large number of order-0 pages, it may acquire the zone->lock multiple times allocating pages in batches. This may unnecessarily contend on the zone lock when allocating very large number of pages. This patch adapts the size of the batch based on the recent pattern to scale the batch size for subsequent allocations. On a 2-socket Intel server with 224 logical CPU, we run 8 kbuild instances in parallel (each with `make -j 28`) in 8 cgroup. This simulates the kbuild server that is used by 0-Day kbuild service. With the patch, the cycles% of the spinlock contention (mostly for zone lock) decreases from 12.6% to 11.0% (with PCP size == 367). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Suggested-by: Mel Gorman <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Sudeep Holla <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm, pcp: reduce lock contention for draining high-order pagesHuang Ying2-0/+7
In commit f26b3fa04611 ("mm/page_alloc: limit number of high-order pages on PCP during bulk free"), the PCP (Per-CPU Pageset) will be drained when PCP is mostly used for high-order pages freeing to improve the cache-hot pages reusing between page allocating and freeing CPUs. On system with small per-CPU data cache slice, pages shouldn't be cached before draining to guarantee cache-hot. But on a system with large per-CPU data cache slice, some pages can be cached before draining to reduce zone lock contention. So, in this patch, instead of draining without any caching, "pcp->batch" pages will be cached in PCP before draining if the size of the per-CPU data cache slice is more than "3 * batch". In theory, if the size of per-CPU data cache slice is more than "2 * batch", we can reuse cache-hot pages between CPUs. But considering the other usage of cache (code, other data accessing, etc.), "3 * batch" is used. Note: "3 * batch" is chosen to make sure the optimization works on recent x86_64 server CPUs. If you want to increase it, please check whether it breaks the optimization. On a 2-socket Intel server with 128 logical CPU, with the patch, the network bandwidth of the UNIX (AF_UNIX) test case of lmbench test suite with 16-pair processes increase 70.5%. The cycles% of the spinlock contention (mostly for zone lock) decreases from 46.1% to 21.3%. The number of PCP draining for high order pages freeing (free_high) decreases 89.9%. The cache miss rate keeps 0.2%. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Sudeep Holla <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Arjan van de Ven <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25cacheinfo: calculate size of per-CPU data cache sliceHuang Ying1-0/+1
This can be used to estimate the size of the data cache slice that can be used by one CPU under ideal circumstances. Both DATA caches and UNIFIED caches are used in calculation. So, the users need to consider the impact of the code cache usage. Because the cache inclusive/non-inclusive information isn't available now, we just use the size of the per-CPU slice of LLC to make the result more predictable across architectures. This may be improved when more cache information is available in the future. A brute-force algorithm to iterate all online CPUs is used to avoid to allocate an extra cpumask, especially in offline callback. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Sudeep Holla <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Arjan van de Ven <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm, pcp: avoid to drain PCP when process exitHuang Ying1-1/+11
Patch series "mm: PCP high auto-tuning", v3. The page allocation performance requirements of different workloads are often different. So, we need to tune the PCP (Per-CPU Pageset) high on each CPU automatically to optimize the page allocation performance. The list of patches in series is as follows, [1/9] mm, pcp: avoid to drain PCP when process exit [2/9] cacheinfo: calculate per-CPU data cache size [3/9] mm, pcp: reduce lock contention for draining high-order pages [4/9] mm: restrict the pcp batch scale factor to avoid too long latency [5/9] mm, page_alloc: scale the number of pages that are batch allocated [6/9] mm: add framework for PCP high auto-tuning [7/9] mm: tune PCP high automatically [8/9] mm, pcp: decrease PCP high if free pages < high watermark [9/9] mm, pcp: reduce detecting time of consecutive high order page freeing Patch [1/9], [2/9], [3/9] optimize the PCP draining for consecutive high-order pages freeing. Patch [4/9], [5/9] optimize batch freeing and allocating. Patch [6/9], [7/9], [8/9] implement and optimize a PCP high auto-tuning method. Patch [9/9] optimize the PCP draining for consecutive high order page freeing based on PCP high auto-tuning. The test results for patches with performance impact are as follows, kbuild ====== On a 2-socket Intel server with 224 logical CPU, we run 8 kbuild instances in parallel (each with `make -j 28`) in 8 cgroup. This simulates the kbuild server that is used by 0-Day kbuild service. build time lock contend% free_high alloc_zone ---------- ---------- --------- ---------- base 100.0 14.0 100.0 100.0 patch1 99.5 12.8 19.5 95.6 patch3 99.4 12.6 7.1 95.6 patch5 98.6 11.0 8.1 97.1 patch7 95.1 0.5 2.8 15.6 patch9 95.0 1.0 8.8 20.0 The PCP draining optimization (patch [1/9], [3/9]) and PCP batch allocation optimization (patch [5/9]) reduces zone lock contention a little. The PCP high auto-tuning (patch [7/9], [9/9]) reduces build time visibly. Where the tuning target: the number of pages allocated from zone reduces greatly. So, the zone contention cycles% reduces greatly. With PCP tuning patches (patch [7/9], [9/9]), the average used memory during test increases up to 18.4% because more pages are cached in PCP. But at the end of the test, the number of the used memory decreases to the same level as that of the base patch. That is, the pages cached in PCP will be released to zone after not being used actively. netperf SCTP_STREAM_MANY ======================== On a 2-socket Intel server with 128 logical CPU, we tested SCTP_STREAM_MANY test case of netperf test suite with 64-pair processes. score lock contend% free_high alloc_zone cache miss rate% ----- ---------- --------- ---------- ---------------- base 100.0 2.1 100.0 100.0 1.3 patch1 99.4 2.1 99.4 99.4 1.3 patch3 106.4 1.3 13.3 106.3 1.3 patch5 106.0 1.2 13.2 105.9 1.3 patch7 103.4 1.9 6.7 90.3 7.6 patch9 108.6 1.3 13.7 108.6 1.3 The PCP draining optimization (patch [1/9]+[3/9]) improves performance. The PCP high auto-tuning (patch [7/9]) reduces performance a little because PCP draining cannot be triggered in time sometimes. So, the cache miss rate% increases. The further PCP draining optimization (patch [9/9]) based on PCP tuning restore the performance. lmbench3 UNIX (AF_UNIX) ======================= On a 2-socket Intel server with 128 logical CPU, we tested UNIX (AF_UNIX socket) test case of lmbench3 test suite with 16-pair processes. score lock contend% free_high alloc_zone cache miss rate% ----- ---------- --------- ---------- ---------------- base 100.0 51.4 100.0 100.0 0.2 patch1 116.8 46.1 69.5 104.3 0.2 patch3 199.1 21.3 7.0 104.9 0.2 patch5 200.0 20.8 7.1 106.9 0.3 patch7 191.6 19.9 6.8 103.8 2.8 patch9 193.4 21.7 7.0 104.7 2.1 The PCP draining optimization (patch [1/9], [3/9]) improves performance much. The PCP tuning (patch [7/9]) reduces performance a little because PCP draining cannot be triggered in time sometimes. The further PCP draining optimization (patch [9/9]) based on PCP tuning restores the performance partly. The patchset adds several fields in struct per_cpu_pages. The struct layout before/after the patchset is as follows, base ==== struct per_cpu_pages { spinlock_t lock; /* 0 4 */ int count; /* 4 4 */ int high; /* 8 4 */ int batch; /* 12 4 */ short int free_factor; /* 16 2 */ short int expire; /* 18 2 */ /* XXX 4 bytes hole, try to pack */ struct list_head lists[13]; /* 24 208 */ /* size: 256, cachelines: 4, members: 7 */ /* sum members: 228, holes: 1, sum holes: 4 */ /* padding: 24 */ } __attribute__((__aligned__(64))); patched ======= struct per_cpu_pages { spinlock_t lock; /* 0 4 */ int count; /* 4 4 */ int high; /* 8 4 */ int high_min; /* 12 4 */ int high_max; /* 16 4 */ int batch; /* 20 4 */ u8 flags; /* 24 1 */ u8 alloc_factor; /* 25 1 */ u8 expire; /* 26 1 */ /* XXX 1 byte hole, try to pack */ short int free_count; /* 28 2 */ /* XXX 2 bytes hole, try to pack */ struct list_head lists[13]; /* 32 208 */ /* size: 256, cachelines: 4, members: 11 */ /* sum members: 237, holes: 2, sum holes: 3 */ /* padding: 16 */ } __attribute__((__aligned__(64))); The size of the struct doesn't changed with the patchset. This patch (of 9): In commit f26b3fa04611 ("mm/page_alloc: limit number of high-order pages on PCP during bulk free"), the PCP (Per-CPU Pageset) will be drained when PCP is mostly used for high-order pages freeing to improve the cache-hot pages reusing between page allocation and freeing CPUs. But, the PCP draining mechanism may be triggered unexpectedly when process exits. With some customized trace point, it was found that PCP draining (free_high == true) was triggered with the order-1 page freeing with the following call stack, => free_unref_page_commit => free_unref_page => __mmdrop => exit_mm => do_exit => do_group_exit => __x64_sys_exit_group => do_syscall_64 Checking the source code, this is the page table PGD freeing (mm_free_pgd()). It's a order-1 page freeing if CONFIG_PAGE_TABLE_ISOLATION=y. Which is a common configuration for security. Just before that, page freeing with the following call stack was found, => free_unref_page_commit => free_unref_page_list => release_pages => tlb_batch_pages_flush => tlb_finish_mmu => exit_mmap => __mmput => exit_mm => do_exit => do_group_exit => __x64_sys_exit_group => do_syscall_64 So, when a process exits, - a large number of user pages of the process will be freed without page allocation, it's highly possible that pcp->free_factor becomes > 0. In fact, this is expected behavior to improve process exit performance. - after freeing all user pages, the PGD will be freed, which is a order-1 page freeing, PCP will be drained. All in all, when a process exits, it's high possible that the PCP will be drained. This is an unexpected behavior. To avoid this, in the patch, the PCP draining will only be triggered for 2 consecutive high-order page freeing. On a 2-socket Intel server with 224 logical CPU, we run 8 kbuild instances in parallel (each with `make -j 28`) in 8 cgroup. This simulates the kbuild server that is used by 0-Day kbuild service. With the patch, the cycles% of the spinlock contention (mostly for zone lock) decreases from 14.0% to 12.8% (with PCP size == 367). The number of PCP draining for high order pages freeing (free_high) decreases 80.5%. This helps network workload too for reduced zone lock contention. On a 2-socket Intel server with 128 logical CPU, with the patch, the network bandwidth of the UNIX (AF_UNIX) test case of lmbench test suite with 16-pair processes increase 16.8%. The cycles% of the spinlock contention (mostly for zone lock) decreases from 51.4% to 46.1%. The number of PCP draining for high order pages freeing (free_high) decreases 30.5%. The cache miss rate keeps 0.2%. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Sudeep Holla <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25buffer: remove folio_create_empty_buffers()Matthew Wilcox (Oracle)1-3/+1
With all users converted, remove the old create_empty_buffers() and rename folio_create_empty_buffers() to create_empty_buffers(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Pankaj Raghav <[email protected]> Cc: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25buffer: add get_nth_bh()Matthew Wilcox (Oracle)1-0/+22
Extract this useful helper from nilfs_page_get_nth_block() Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Ryusuke Konishi <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Pankaj Raghav <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25buffer: make folio_create_empty_buffers() return a buffer_headMatthew Wilcox (Oracle)1-2/+2
Patch series "Finish the create_empty_buffers() transition", v2. Pankaj recently added folio_create_empty_buffers() as the folio equivalent to create_empty_buffers(). This patch set finishes the conversion by first converting all remaining filesystems to call folio_create_empty_buffers(), then renaming it back to create_empty_buffers(). I took the opportunity to make a few simplifications like making folio_create_empty_buffers() return the head buffer and extracting get_nth_bh() from nilfs2. A few of the patches in this series aren't directly related to create_empty_buffers(), but I saw them while I was working on this and thought they'd be easy enough to add to this series. Compile-tested only, other than ext4. This patch (of 26): Almost all callers want to know the first BH that was allocated for this folio. We already have that handy, so return it. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Pankaj Raghav <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25Merge tag 'nf-23-10-25' of ↵Jakub Kicinski1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net This patch contains two late Netfilter's flowtable fixes for net: 1) Flowtable GC pushes back packets to classic path in every GC run, ie. every second. This is because NF_FLOW_HW_ESTABLISHED is only used by sched/act_ct (never set) and IPS_SEEN_REPLY might be unset by the time the flow is offloaded (this status bit is only reliable in the sched/act_ct datapath). 2) sched/act_ct logic to push back packets to classic path to reevaluate if UDP flow is unidirectional only applies if IPS_HW_OFFLOAD_BIT is set on and no hardware offload request is pending to be handled. From Vlad Buslov. These two patches fixes two problems that were introduced in the previous 6.5 development cycle. * tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: net/sched: act_ct: additional checks for outdated flows netfilter: flowtable: GC pushes back packets to classic path ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-25Merge tag 'qcom-drivers-for-6.7-2' of ↵Arnd Bergmann1-0/+6
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers More Qualcomm driver updates for v6.7 The Qualcomm SMC an QSEECOM drivers are moved into a "qcom" subdirectory, to declutter the base directory. Missing include guards are added to the qseecom header file. Unneded extern specifiers are removed from the scm call wrappers. __counted_by is added to the apr_rx_buf structure, in the APR driver. Lastly in the pmic_glink driver the pmic_glink drm_bridge type is corrected to DisplayPort, over the incorrect "USB" value. The return values are added to error prints for the various typec set() calls. * tag 'qcom-drivers-for-6.7-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: soc: qcom: pmic_glink_altmode: Print return value on error firmware: qcom: scm: remove unneeded 'extern' specifiers firmware: qcom: scm: add a missing forward declaration for struct device firmware: qcom: move Qualcomm code into its own directory soc: qcom: apr: Add __counted_by for struct apr_rx_buf and use struct_size() soc: qcom: pmic_glink: fix connector type to be DisplayPort firmware: qcom: qseecom: add missing include guards Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2023-10-25file, i915: fix file reference for mmap_singleton()Christian Brauner1-0/+1
Today we got a report at [1] for rcu stalls on the i915 testsuite in [2] due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict, get_file_rcu() goes into an infinite loop trying to carefully verify that i915->gem.mmap_singleton hasn't changed - see the splat below. So I stared at this code to figure out what it actually does. It seems that the i915->gem.mmap_singleton pointer itself never had rcu semantics. The i915->gem.mmap_singleton is replaced in file->f_op->release::singleton_release(): static int singleton_release(struct inode *inode, struct file *file) { struct drm_i915_private *i915 = file->private_data; cmpxchg(&i915->gem.mmap_singleton, file, NULL); drm_dev_put(&i915->drm); return 0; } The cmpxchg() is ordered against a concurrent update of i915->gem.mmap_singleton from mmap_singleton(). IOW, when mmap_singleton() fails to get a reference on i915->gem.mmap_singleton: While mmap_singleton() does rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); it allocates a new file via anon_inode_getfile() and does smp_store_mb(i915->gem.mmap_singleton, file); So, then what happens in the case of this bug is that at some point fput() is called and drops the file->f_count to zero leaving the pointer in i915->gem.mmap_singleton in tact. Now, there might be delays until file->f_op->release::singleton_release() is called and i915->gem.mmap_singleton is set to NULL. Say concurrently another task hits mmap_singleton() and does: rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); When get_file_rcu() fails to get a reference via atomic_inc_not_zero() it will try the reload from i915->gem.mmap_singleton expecting it to be NULL, assuming it has comparable semantics as we expect in __fget_files_rcu(). But it hasn't so it reloads the same pointer again, trying the same atomic_inc_not_zero() again and doing so until file->f_op->release::singleton_release() of the old file has been called. So, in contrast to __fget_files_rcu() here we want to not retry when atomic_inc_not_zero() has failed. We only want to retry in case we managed to get a reference but the pointer did change on reload. <3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: <3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238 <3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20) <6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002 <6> [511.395962] Call Trace: <6> [511.395966] <TASK> <6> [511.395974] ? __schedule+0x3a8/0xd70 <6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140 <6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396029] ? get_file_rcu+0x10/0x30 <6> [511.396039] ? get_file_rcu+0x10/0x30 <6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915] <6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915] <6> [511.396903] ? mmap_region+0x253/0xb60 <6> [511.396925] ? do_mmap+0x334/0x5c0 <6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0 <6> [511.396949] ? rcu_is_watching+0x11/0x50 <6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915] <6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915] <6> [511.397762] ? igt_mmap+0x11e/0x150 [i915] <6> [511.398139] ? __trace_bprintk+0x76/0x90 <6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915] <6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915] <6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915] <6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915] <6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915] <6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915] <6> [511.400692] ? pci_device_probe+0x95/0x120 <6> [511.400704] ? really_probe+0x164/0x3c0 <6> [511.400715] ? __pfx___driver_attach+0x10/0x10 <6> [511.400722] ? __driver_probe_device+0x73/0x160 <6> [511.400731] ? driver_probe_device+0x19/0xa0 <6> [511.400741] ? __driver_attach+0xb6/0x180 <6> [511.400749] ? __pfx___driver_attach+0x10/0x10 <6> [511.400756] ? bus_for_each_dev+0x77/0xd0 <6> [511.400770] ? bus_add_driver+0x114/0x210 <6> [511.400781] ? driver_register+0x5b/0x110 <6> [511.400791] ? i915_init+0x23/0xc0 [i915] <6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915] <6> [511.401503] ? do_one_initcall+0x57/0x270 <6> [511.401515] ? rcu_is_watching+0x11/0x50 <6> [511.401521] ? kmalloc_trace+0xa3/0xb0 <6> [511.401532] ? do_init_module+0x5f/0x210 <6> [511.401544] ? load_module+0x1d00/0x1f60 <6> [511.401581] ? init_module_from_file+0x86/0xd0 <6> [511.401590] ? init_module_from_file+0x86/0xd0 <6> [511.401613] ? idempotent_init_module+0x17c/0x230 <6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0 <6> [511.401650] ? do_syscall_64+0x3c/0x90 <6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 <6> [511.401684] </TASK> Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@[email protected]#dmesg-warnings10963 Cc: Jann Horn <[email protected]>, Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner Signed-off-by: Christian Brauner <[email protected]>
2023-10-25tcp: define initial scaling factor value as a macroPaolo Abeni1-5/+7
So that other users could access it. Notably MPTCP will use it in the next patch. No functional change intended. Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-25highmem: Add folio_release_kmap()Matthew Wilcox (Oracle)1-2/+16
This is the folio equivalent of unmap_and_put_page(), which remains as a wrapper for it. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <[email protected]>
2023-10-25HID: core: remove #ifdef CONFIG_PM from hid_driverThomas Weißschuh1-2/+2
Allow HID drivers to pass ->suspend, ->resume and ->reset_resume via pm_ptr(). Through the usage of pm_ptr() the CONFIG_PM-dependent code will always be compiled, protecting against bitrot. The linker will then garbage-collect the unused function avoiding any overhead. The only overhead in the final kernel image and at runtime are a few extra bytes in 'struct hid_driver'. The same approach is chosen by 'struct usb_driver' and other subsystems. Signed-off-by: Thomas Weißschuh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Benjamin Tissoires <[email protected]>