aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-10-26drm/sched: Convert the GPU scheduler to variable number of run-queuesLuben Tuikov1-3/+6
The GPU scheduler has now a variable number of run-queues, which are set up at drm_sched_init() time. This way, each driver announces how many run-queues it requires (supports) per each GPU scheduler it creates. Note, that run-queues correspond to scheduler "priorities", thus if the number of run-queues is set to 1 at drm_sched_init(), then that scheduler supports a single run-queue, i.e. single "priority". If a driver further sets a single entity per run-queue, then this creates a 1-to-1 correspondence between a scheduler and a scheduled entity. Cc: Lucas Stach <[email protected]> Cc: Russell King <[email protected]> Cc: Qiang Yu <[email protected]> Cc: Rob Clark <[email protected]> Cc: Abhinav Kumar <[email protected]> Cc: Dmitry Baryshkov <[email protected]> Cc: Danilo Krummrich <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: Emma Anholt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Christian König <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-26ASoC: cs35l41: Detect CSPL errors when sending CSPL commandsStefan Binding1-0/+2
The existing code checks for the correct state transition after sending a command. However, it is possible for the message box to return -1, which indicates an error, if an error has occurred in the firmware. We can detect if the error has occurred, and return a different error. In addition, there is no recovering from a CSPL error, so the retry mechanism is not needed in this case, and we can return immediately. Signed-off-by: Stefan Binding <[email protected]> Acked-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-26ALSA: hda: cs35l41: Force a software reset after hardware resetStefan Binding1-0/+1
To ensure the chip has correctly reset during probe and system suspend, we need to force a software reset, in case of systems where the hardware reset is not available. The software reset register was labelled as volatile but not readable, however, it is readable, (just returns 0x0). Adding it to readable registers means it will be correctly treated as volatile, and thus will not be cached. Signed-off-by: Stefan Binding <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-26pmdomain: Merge branch genpd_dt into nextUlf Hansson1-0/+2
Merge the immutable branch genpd_dt into next, to allow the DT bindings to be tested together with new pmdomain changes that are targeted for v6.7. Signed-off-by: Ulf Hansson <[email protected]>
2023-10-26dt-bindings: power: qcom,rpmhpd: Add GMXC PD indexSibi Sankar1-0/+1
Document GMXC (Graphics MXC) power domain index which will be used on SC8380XP SoCs. Signed-off-by: Sibi Sankar <[email protected]> Link: https://lore.kernel.org/r/[email protected] [Ulf: Re-based to step up the index number] Signed-off-by: Ulf Hansson <[email protected]>
2023-10-26dt-bindings: power: qcom,rpmpd: document the SM8650 RPMh Power DomainsNeil Armstrong1-0/+1
Document the RPMh Power Domains on the SM8650 Platform. Signed-off-by: Neil Armstrong <[email protected]> Link: https://lore.kernel.org/r/20231025-topic-sm8650-upstream-rpmpd-v1-1-f25d313104c6@linaro.org Signed-off-by: Ulf Hansson <[email protected]>
2023-10-26iommu/vt-d: Disallow read-only mappings to nest parent domainLu Baolu1-1/+11
When remapping hardware is configured by system software in scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled) in first-stage page-table entries even when second-stage mappings indicate that corresponding first-stage page-table is Read-Only. As the result, contents of pages designated by VMM as Read-Only can be modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of address translation process due to DMAs issued by Guest. This disallows read-only mappings in the domain that is supposed to be used as nested parent. Reference from Sapphire Rapids Specification Update [1], errata details, SPR17. Userspace should know this limitation by checking the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO ioctl. [1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Lu Baolu <[email protected]> Signed-off-by: Yi Liu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-26iommufd: Add data structure for Intel VT-d stage-1 domain allocationYi Liu1-0/+30
This adds IOMMU_HWPT_DATA_VTD_S1 for stage-1 hw_pagetable of Intel VT-d and the corressponding data structure for userspace specified parameter for the domain allocation. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Yi Liu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-26iommu: Add iommu_copy_struct_from_user helperNicolin Chen1-0/+40
Wrap up the data type/pointer/len sanity and a copy_struct_from_user call for iommu drivers to copy driver specific data via struct iommu_user_data. And expect it to be used in the domain_alloc_user op for example. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Nicolin Chen <[email protected]> Co-developed-by: Yi Liu <[email protected]> Signed-off-by: Yi Liu <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-26iommufd: Add a nested HW pagetable objectNicolin Chen1-2/+29
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMUFD_OBJ_HWPT_PAGING type. IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases that require user data input from the user space. Add a new IOMMUFD_OBJ_HWPT_NESTED type with its abort/destroy(). Pair it with a new iommufd_hwpt_nested structure and its to_hwpt_nested() helper. Update the to_hwpt_paging() helper, so a NESTED-type hw_pagetable can be handled in the callers, for example iommufd_hw_pagetable_enforce_rr(). Screen the inputs including the parent PAGING-type hw_pagetable that has a need of a new nest_parent flag in the iommufd_hwpt_paging structure. Extend the IOMMU_HWPT_ALLOC ioctl to accept an IOMMU driver specific data input which is tagged by the enum iommu_hwpt_data_type. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, use them to allocate a hw_pagetable of IOMMUFD_OBJ_HWPT_NESTED type using the iommufd_hw_pagetable_alloc_nested() allocator. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Nicolin Chen <[email protected]> Co-developed-by: Yi Liu <[email protected]> Signed-off-by: Yi Liu <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-26iommu: Pass in parent domain with user_data to domain_alloc_user opYi Liu1-3/+24
domain_alloc_user op already accepts user flags for domain allocation, add a parent domain pointer and a driver specific user data support as well. The user data would be tagged with a type for iommu drivers to add their own driver specific user data per hw_pagetable. Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an iommufd core uAPI structure. Make the user data opaque to the core, since a userspace driver must match the kernel driver. In the future, if drivers share some common parameter, there would be a generic parameter as well. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lu Baolu <[email protected]> Co-developed-by: Nicolin Chen <[email protected]> Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Yi Liu <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-26iommu: Add IOMMU_DOMAIN_NESTEDLu Baolu1-0/+4
Introduce a new domain type for a user I/O page table, which is nested on top of another user space address represented by a PAGING domain. This new domain can be allocated by the domain_alloc_user op, and attached to a device through the existing iommu_attach_device/group() interfaces. The mappings of a nested domain are managed by user space software, so it is not necessary to have map/unmap callbacks. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lu Baolu <[email protected]> Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Yi Liu <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-26Merge branches 'pm-sleep', 'powercap' and 'pm-tools'Rafael J. Wysocki1-14/+29
Merge updates related to system sleep handling, one power capping update and one PM utility update for 6.7-rc1: - Use __get_safe_page() rather than touching the list in hibernation snapshot code (Brian Geffon). - Fix symbol export for _SIMPLE_ variants of _PM_OPS() (Raag Jadav). - Clean up sync_read handling in snapshot_write_next() (Brian Geffon). - Fix kerneldoc comments for swsusp_check() and swsusp_close() to better match code (Christoph Hellwig). - Downgrade BIOS locked limits pr_warn() in the Intel RAPL power capping driver to pr_debug() (Ville Syrjälä). - Change the minimum python version for the intel_pstate_tracer utility from 2.7 to 3.6 (Doug Smythies). * pm-sleep: PM: hibernate: fix the kerneldoc comment for swsusp_check() and swsusp_close() PM: hibernate: Clean up sync_read handling in snapshot_write_next() PM: sleep: Fix symbol export for _SIMPLE_ variants of _PM_OPS() PM: hibernate: Use __get_safe_page() rather than touching the list * powercap: powercap: intel_rapl: Downgrade BIOS locked limits pr_warn() to pr_debug() * pm-tools: tools/power/x86/intel_pstate_tracer: python minimum version
2023-10-26Merge branch 'pm-cpufreq'Rafael J. Wysocki1-8/+0
Merge cpufreq updates for 6.7-rc1: - Add support for several Qualcomm SoC versions and other similar changes (Christian Marangi, Dmitry Baryshkov, Luca Weiss, Neil Armstrong, Richard Acayan, Robert Marko, Rohit Agarwal, Stephan Gerhold and Varadarajan Narayanan). - Clean up the tegra cpufreq driver (Sumit Gupta). - Use of_property_read_reg() to parse "reg" in pmac32 driver (Rob Herring). - Add support for TI's am62p5 Soc (Bryan Brattlof). - Make ARM_BRCMSTB_AVS_CPUFREQ depends on !ARM_SCMI_CPUFREQ (Florian Fainelli). - Update Kconfig to mention i.MX7 as well (Alexander Stein). - Revise global turbo disable check in intel_pstate (Srinivas Pandruvada). - Carry out initialization of sg_cpu in the schedutil cpufreq governor in one loop (Liao Chang). - Simplify the condition for storing 'down_threshold' in the conservative cpufreq governor (Liao Chang). - Use fine-grained mutex in the userspace cpufreq governor (Liao Chang). - Move is_managed indicator in the userspace cpufreq governor into a per-policy structure (Liao Chang). - Rebuild sched-domains when removing cpufreq driver (Pierre Gondois). - Fix buffer overflow detection in trans_stats() (Christian Marangi). * pm-cpufreq: (32 commits) dt-bindings: cpufreq: qcom-hw: document SM8650 CPUFREQ Hardware cpufreq: arm: Kconfig: Add i.MX7 to supported SoC for ARM_IMX_CPUFREQ_DT cpufreq: qcom-nvmem: add support for IPQ8064 cpufreq: qcom-nvmem: also accept operating-points-v2-krait-cpu cpufreq: qcom-nvmem: drop pvs_ver for format a fuses dt-bindings: cpufreq: qcom-cpufreq-nvmem: Document krait-cpu cpufreq: qcom-nvmem: add support for IPQ6018 dt-bindings: cpufreq: qcom-cpufreq-nvmem: document IPQ6018 cpufreq: qcom-nvmem: Add MSM8909 cpufreq: qcom-nvmem: Simplify driver data allocation cpufreq: stats: Fix buffer overflow detection in trans_stats() dt-bindings: cpufreq: cpufreq-qcom-hw: Add SDX75 compatible cpufreq: ARM_BRCMSTB_AVS_CPUFREQ cannot be used with ARM_SCMI_CPUFREQ cpufreq: ti-cpufreq: Add opp support for am62p5 SoCs cpufreq: dt-platdev: add am62p5 to blocklist cpufreq: tegra194: remove redundant AND with cpu_online_mask cpufreq: tegra194: use refclk delta based loop instead of udelay cpufreq: tegra194: save CPU data to avoid repeated SMP calls cpufreq: Rebuild sched-domains when removing cpufreq driver cpufreq: userspace: Move is_managed indicator into per-policy structure ...
2023-10-26Merge branch 'pm-devfreq'Rafael J. Wysocki4-6/+52
Merge devfreq updates for 6.7-rc1: - Switch to dev_pm_opp_find_freq_(ceil/floor)_indexed() APIs to support specific devices like UFS which handle multiple clocks through OPP (Operationg Performance Point) framework (Manivannan Sadhasivam). - Add perf support to the Rockchip DFI (DDR Monitor Module) devfreq- event driver: * Generalize rockchip-dfi.c to support new RK3568/RK3588 using different DDR type (Sascha Hauer). * Convert devicetree bidning document format to yaml (Sascha Hauer). * Add perf support for DFI (a unit suitable for measuring DDR utilization) to rockchip-dfi.c to extend DFI usage (Sascha Hauer). - Add locking to the OPP handling code in the Mediatek CCI devfreq driver, because the voltage of shared OPP might be changed by multiple drivers (Mark Tseng, Dan Carpenter). - Use device_get_match_data() in the Samsung Exynos PPMU devfreq-event driver (Rob Herring). * pm-devfreq: (26 commits) dt-bindings: devfreq: event: rockchip,dfi: Add rk3588 support dt-bindings: devfreq: event: rockchip,dfi: Add rk3568 support dt-bindings: devfreq: event: convert Rockchip DFI binding to yaml PM / devfreq: rockchip-dfi: add support for RK3588 PM / devfreq: rockchip-dfi: account for multiple DDRMON_CTRL registers PM / devfreq: rockchip-dfi: make register stride SoC specific PM / devfreq: rockchip-dfi: Add perf support PM / devfreq: rockchip-dfi: give variable a better name PM / devfreq: rockchip-dfi: Prepare for multiple users PM / devfreq: rockchip-dfi: Pass private data struct to internal functions PM / devfreq: rockchip-dfi: Handle LPDDR4X PM / devfreq: rockchip-dfi: Handle LPDDR2 correctly PM / devfreq: rockchip-dfi: Add RK3568 support PM / devfreq: rockchip-dfi: Clean up DDR type register defines PM / devfreq: rk3399_dmc,dfi: generalize DDRTYPE defines PM / devfreq: rockchip-dfi: introduce channel mask PM / devfreq: rockchip-dfi: Use free running counter PM / devfreq: mediatek: unlock on error in mtk_ccifreq_target() PM / devfreq: exynos-ppmu: Use device_get_match_data() PM / devfreq: rockchip-dfi: dfi store raw values in counter struct ...
2023-10-26Merge branches 'acpi-ac', 'acpi-pad' and 'pnp'Rafael J. Wysocki1-4/+4
Merge updates of the ACPI AC and ACPI PAD drivers and PNP updates for 6.7-rc1: - Switch over the ACPI AC and ACPI PAD drivers to using the platform driver interface which, is more logically consistent than binding a driver directly to an ACPI device object, and clean them up (Michal Wilczynski). - Replace strncpy() in the PNP code with either memcpy() or strscpy() as appropriate (Justin Stitt). - Clean up coding style in pnp.h (GuoHua Cheng). * acpi-ac: ACPI: AC: Rename ACPI device from device to adev ACPI: AC: Replace acpi_driver with platform_driver ACPI: AC: Use string_choices API instead of ternary operator ACPI: AC: Remove redundant checks * acpi-pad: ACPI: acpi_pad: Rename ACPI device from device to adev ACPI: acpi_pad: Use dev groups for sysfs ACPI: acpi_pad: Replace acpi_driver with platform_driver * pnp: PNP: replace deprecated strncpy() with memcpy() PNP: ACPI: replace deprecated strncpy() with strscpy() PNP: Clean up coding style in pnp.h
2023-10-26Merge branch 'acpi-bus'Rafael J. Wysocki1-1/+1
Merge ACPI bus type driver updates for 6.7-rc1: - Add context argument to acpi_dev_install_notify_handler() (Rafael Wysocki). - Clarify ACPI bus concepts in the ACPI device enumeration documentation (Rafael Wysocki). * acpi-bus: ACPI: bus: Add context argument to acpi_dev_install_notify_handler() ACPI: docs: enumeration: Clarify ACPI bus concepts
2023-10-26Merge branches 'acpi-ec', 'acpi-sysfs', 'acpi-misc' and 'acpi-uid'Rafael J. Wysocki2-0/+6
Merge ACPI EC driver updates, ACPI sysfs interface updates, misc updates related to ACPI and changes related to ACPI _UID handling for 6.7-rc1: - Add EC GPE detection quirk for HP 250 G7 Notebook PC (Jonathan Denose). - Fix and clean up create_pnp_modalias() and create_of_modalias() (Christophe JAILLET). - Modify 2 pieces of code to use acpi_evaluate_dsm_typed() (Andy Shevchenko). - Define acpi_dev_uid_match() for matching _UID and use it in several places (Raag Jadav). - Use acpi_device_uid() for fetching _UID in 2 places (Raag Jadav). * acpi-ec: ACPI: EC: Add quirk for HP 250 G7 Notebook PC * acpi-sysfs: ACPI: sysfs: Clean up create_pnp_modalias() and create_of_modalias() ACPI: sysfs: Fix create_pnp_modalias() and create_of_modalias() * acpi-misc: ACPI: x86: s2idle: Switch to use acpi_evaluate_dsm_typed() ACPI: PCI: Switch to use acpi_evaluate_dsm_typed() * acpi-uid: perf: arm_cspmu: use acpi_dev_hid_uid_match() for matching _HID and _UID ACPI: x86: use acpi_dev_uid_match() for matching _UID ACPI: utils: use acpi_dev_uid_match() for matching _UID pinctrl: intel: use acpi_dev_uid_match() for matching _UID ACPI: utils: Introduce acpi_dev_uid_match() for matching _UID perf: qcom: use acpi_device_uid() for fetching _UID ACPI: sysfs: use acpi_device_uid() for fetching _UID
2023-10-26Merge branches 'acpi-video', 'acpi-prm', 'acpi-apei' and 'acpi-pcc'Rafael J. Wysocki2-0/+17
Merge ACPI backlight driver updates, ACPI APEI updates, ACPI PRM updates and changes related to ACPI PCC for 6.7-rc1: - Add acpi_backlight=vendor quirk for Toshiba Portégé R100 (Ondrej Zary). - Add "vendor" backlight quirks for 3 Lenovo x86 Android tablets (Hans de Goede). - Move Xiaomi Mi Pad 2 backlight quirk to its own section (Hans de Goede). - Annotate struct prm_module_info with __counted_by (Kees Cook). - Fix AER info corruption in aer_recover_queue() when error status data has multiple sections (Shiju Jose). - Make APEI use ERST max execution time value for slow devices (Jeshua Smith). - Add support for platform notification handling to the PCC mailbox driver and modify it to support shared interrupts for multiple subspaces (Huisong Li). - Define common macros to use when referring to various bitfields in the PCC generic communications channel command and status fields and use them in some drivers (Sudeep Holla). * acpi-video: ACPI: video: Add acpi_backlight=vendor quirk for Toshiba Portégé R100 ACPI: video: Add "vendor" quirks for 3 Lenovo x86 Android tablets ACPI: video: Move Xiaomi Mi Pad 2 quirk to its own section * acpi-prm: ACPI: PRM: Annotate struct prm_module_info with __counted_by * acpi-apei: ACPI: APEI: Use ERST timeout for slow devices ACPI: APEI: Fix AER info corruption when error status data has multiple sections * acpi-pcc: soc: kunpeng_hccs: Migrate to use generic PCC shmem related macros hwmon: (xgene) Migrate to use generic PCC shmem related macros i2c: xgene-slimpro: Migrate to use generic PCC shmem related macros ACPI: PCC: Add PCC shared memory region command and status bitfields mailbox: pcc: Support shared interrupt for multiple subspaces mailbox: pcc: Add support for platform notification handling
2023-10-26Merge branches 'acpi-utils', 'acpi-resource', 'acpi-property' and 'acpi-soc'Rafael J. Wysocki1-3/+6
Merge ACPI utilities updates, ACPI resource management updates, ACPI device properties management updates and ACPI LPSS (Intel SoC) driver update for 6.7-rc1: - Rework acpi_handle_list handling so as to manage it dynamically, including size computation (Rafael Wysocki). - Clean up ACPI utilities code so as to make it follow the kernel coding style (Jonathan Bergh). - Consolidate IRQ trigger-type override DMI tables and drop .ident values from dmi_system_id tables used for ACPI resources management quirks (Hans de Goede). - Add ACPI IRQ override for TongFang GMxXGxx (Werner Sembach). - Allow _DSD buffer data only for byte accessors and document the _DSD data buffer GUID (Andy Shevchenko). - Drop BayTrail and Lynxpoint pinctrl device IDs from the ACPI LPSS driver, because it does not need them (Raag Jadav). * acpi-utils: ACPI: utils: Remove redundant braces around individual statement ACPI: utils: Fix up white space in a few places ACPI: utils: Dynamically determine acpi_handle_list size ACPI: thermal: Merge trip initialization functions ACPI: thermal: Collapse trip devices update function wrappers ACPI: thermal: Collapse trip devices update functions ACPI: thermal: Add device list to struct acpi_thermal_trip ACPI: thermal: Fix a small leak in acpi_thermal_add() ACPI: thermal: Drop valid flag from struct acpi_thermal_trip ACPI: thermal: Drop redundant trip point flags ACPI: thermal: Untangle initialization and updates of active trips ACPI: thermal: Untangle initialization and updates of the passive trip ACPI: thermal: Simplify critical and hot trips representation ACPI: thermal: Create and populate trip points table earlier ACPI: thermal: Determine the number of trip points earlier ACPI: thermal: Fold acpi_thermal_get_info() into its caller ACPI: thermal: Simplify initialization of critical and hot trips * acpi-resource: ACPI: resource: Do IRQ override on TongFang GMxXGxx ACPI: resource: Drop .ident values from dmi_system_id tables ACPI: resource: Consolidate IRQ trigger-type override DMI tables * acpi-property: ACPI: property: Document the _DSD data buffer GUID ACPI: property: Allow _DSD buffer data only for byte accessors * acpi-soc: ACPI: LPSS: drop BayTrail and Lynxpoint pinctrl HIDs
2023-10-26x86/apic/msi: Fix misconfigured non-maskable MSI quirkKoichiro Den2-28/+4
commit ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted"), reworked the code so that the x86 specific quirk for affinity setting of non-maskable PCI/MSI interrupts is not longer activated if necessary. This could be solved by restoring the original logic in the core MSI code, but after a deeper analysis it turned out that the quirk flag is not required at all. The quirk is only required when the PCI/MSI device cannot mask the MSI interrupts, which in turn also prevents reservation mode from being enabled for the affected interrupt. This allows ot remove the NOMASK quirk bit completely as msi_set_affinity() can instead check whether reservation mode is enabled for the interrupt, which gives exactly the same answer. Even in the momentary non-existing case that the reservation mode would be not set for a maskable MSI interrupt this would not cause any harm as it just would cause msi_set_affinity() to go needlessly through the functionaly equivalent slow path, which works perfectly fine with maskable interrupts as well. Rework msi_set_affinity() to query the reservation mode and remove all NOMASK quirk logic from the core code. [ tglx: Massaged changelog ] Fixes: ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted") Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Koichiro Den <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2023-10-26Merge tag 'nf-next-23-10-25' of ↵Paolo Abeni3-26/+38
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. Mostly nf_tables updates with two patches for connlabel and br_netfilter. 1) Rename function name to perform on-demand GC for rbtree elements, and replace async GC in rbtree by sync GC. Patches from Florian Westphal. 2) Use commit_mutex for NFT_MSG_GETRULE_RESET to ensure that two concurrent threads invoking this command do not underrun stateful objects. Patches from Phil Sutter. 3) Use single hook to deal with IP and ARP packets in br_netfilter. Patch from Florian Westphal. 4) Use atomic_t in netns->connlabel use counter instead of using a spinlock, also patch from Florian. 5) Cleanups for stateful objects infrastructure in nf_tables. Patches from Phil Sutter. 6) Flush path uses opaque set element offered by the iterator, instead of calling pipapo_deactivate() which looks up for it again. 7) Set backend .flush interface always succeeds, make it return void instead. 8) Add struct nft_elem_priv placeholder structure and use it by replacing void * to pass opaque set element representation from backend to frontend which defeats compiler type checks. 9) Shrink memory consumption of set element transactions, by reducing struct nft_trans_elem object size and reducing stack memory usage. 10) Use struct nft_elem_priv also for set backend .insert operation too. 11) Carry reset flag in nft_set_dump_ctx structure, instead of passing it as a function argument, from Phil Sutter. netfilter pull request 23-10-25 * tag 'nf-next-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctx netfilter: nf_tables: set->ops->insert returns opaque set element in case of EEXIST netfilter: nf_tables: shrink memory consumption of set elements netfilter: nf_tables: expose opaque set element as struct nft_elem_priv netfilter: nf_tables: set backend .flush always succeeds netfilter: nft_set_pipapo: no need to call pipapo_deactivate() from flush netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctx netfilter: nf_tables: nft_obj_filter fits into cb->ctx netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctx netfilter: nf_tables: A better name for nft_obj_filter netfilter: nf_tables: Unconditionally allocate nft_obj_filter netfilter: nf_tables: Drop pointless memset in nf_tables_dump_obj netfilter: conntrack: switch connlabels to atomic_t br_netfilter: use single forward hook for ip and arp netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests netfilter: nf_tables: Introduce nf_tables_getrule_single() netfilter: nf_tables: Open-code audit log call in nf_tables_getrule() netfilter: nft_set_rbtree: prefer sync gc to async worker netfilter: nft_set_rbtree: rename gc deactivate+erase function ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-10-26Merge branch 'acpica'Rafael J. Wysocki1-0/+3
Merge an ACPICA change for 6.7-rc1 which adds symbol definitions related to CDAT (Dave Jiang). * acpica: ACPICA: Add defines for CDAT SSLBIS
2023-10-26ALSA: seq: Replace with __packed attributeTakashi Iwai1-2/+2
Replace the old __attribute__((packed)) with the new __packed. Only cleanup, no functional changes. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-26ALSA: wavefront: Drop obsoleted comments and definitionsTakashi Iwai1-51/+0
The header file contains lots of outdated comments and definitions. Drop those as cleanup. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-26ALSA: wavefront: Replace with __packed attributeTakashi Iwai1-1/+1
Replace the old __attribute__((packed)) with the new __packed. Only cleanup, no functional changes. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-26ALSA: opl3: Replace with __packed attributeTakashi Iwai1-1/+1
Replace the old __attribute__((packed)) with the new __packed. Only cleanup, no functional changes. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-10-25ipv6: drop feature RTAX_FEATURE_ALLFRAGYan Zhai4-10/+1
RTAX_FEATURE_ALLFRAG was added before the first git commit: https://www.mail-archive.com/[email protected]/msg03399.html The feature would send packets to the fragmentation path if a box receives a PMTU value with less than 1280 byte. However, since commit 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such message would be simply discarded. The feature flag is neither supported in iproute2 utility. In theory one can still manipulate it with direct netlink message, but it is not ideal because it was based on obsoleted guidance of RFC-2460 (replaced by RFC-8200). The feature would always test false at the moment, so remove related code or mark them as unused. Signed-off-by: Yan Zhai <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/d78e44dcd9968a252143ffe78460446476a472a1.1698156966.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-25mempolicy: mmap_lock is not needed while migrating foliosHugh Dickins1-9/+0
mbind(2) holds down_write of current task's mmap_lock throughout (exclusive because it needs to set the new mempolicy on the vmas); migrate_pages(2) holds down_read of pid's mmap_lock throughout. They both hold mmap_lock across the internal migrate_pages(), under which all new page allocations (huge or small) are made. I'm nervous about it; and migrate_pages() certainly does not need mmap_lock itself. It's done this way for mbind(2), because its page allocator is vma_alloc_folio() or alloc_hugetlb_folio_vma(), both of which depend on vma and address. Now that we have alloc_pages_mpol(), depending on (refcounted) memory policy and interleave index, mbind(2) can be modified to use that or alloc_hugetlb_folio_nodemask(), and then not need mmap_lock across the internal migrate_pages() at all: add alloc_migration_target_by_mpol() to replace mbind's new_page(). (After that change, alloc_hugetlb_folio_vma() is used by nothing but a userfaultfd function: move it out of hugetlb.h and into the #ifdef.) migrate_pages(2) has chosen its target node before migrating, so can continue to use the standard alloc_migration_target(); but let it take and drop mmap_lock just around migrate_to_node()'s queue_pages_range(): neither the node-to-node calculations nor the page migrations need it. It seems unlikely, but it is conceivable that some userspace depends on the kernel's mmap_lock exclusion here, instead of doing its own locking: more likely in a testsuite than in real life. It is also possible, of course, that some pages on the list will be munmapped by another thread before they are migrated, or a newer memory policy applied to the range by that time: but such races could happen before, as soon as mmap_lock was dropped, so it does not appear to be a concern. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mempolicy: alloc_pages_mpol() for NUMA policy without vmaHugh Dickins3-5/+27
Shrink shmem's stack usage by eliminating the pseudo-vma from its folio allocation. alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the principal actor for passing mempolicy choice down to __alloc_pages(), rather than vma_alloc_folio(gfp, order, vma, addr, hugepage). vma_alloc_folio() and alloc_pages() remain, but as wrappers around alloc_pages_mpol(). alloc_pages_bulk_*() untouched, except to provide the additional args to policy_nodemask(), which subsumes policy_node(). Cleanup throughout, cutting out some unhelpful "helpers". It would all be much simpler without MPOL_INTERLEAVE, but that adds a dynamic to the constant mpol: complicated by v3.6 commit 09c231cb8bfd ("tmpfs: distribute interleave better across nodes"), which added ino bias to the interleave, hidden from mm/mempolicy.c until this commit. Hence "ilx" throughout, the "interleave index". Originally I thought it could be done just with nid, but that's wrong: the nodemask may come from the shared policy layer below a shmem vma, or it may come from the task layer above a shmem vma; and without the final nodemask then nodeid cannot be decided. And how ilx is applied depends also on page order. The interleave index is almost always irrelevant unless MPOL_INTERLEAVE: with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX passed down from vma-less alloc_pages() is also used as hint not to use THP-style hugepage allocation - to avoid the overhead of a hugepage arg (though I don't understand why we never just added a GFP bit for THP - if it actually needs a different allocation strategy from other pages of the same order). vma_alloc_folio() still carries its hugepage arg here, but it is not used, and should be removed when agreed. get_vma_policy() no longer allows a NULL vma: over time I believe we've eradicated all the places which used to need it e.g. swapoff and madvise used to pass NULL vma to read_swap_cache_async(), but now know the vma. [[email protected]: handle NULL mpol being passed to __read_swap_cache_async()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Huang Ying <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: Domenico Cerasuolo <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mempolicy: remove confusing MPOL_MF_LAZY dead codeHugh Dickins1-1/+1
v3.8 commit b24f53a0bea3 ("mm: mempolicy: Add MPOL_MF_LAZY") introduced MPOL_MF_LAZY, and included it in the MPOL_MF_VALID flags; but a720094ded8 ("mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now") immediately removed it from MPOL_MF_VALID flags, pending further review. "This will need to be revisited", but it has not been reinstated. The present state is confusing: there is dead code in mm/mempolicy.c to handle MPOL_MF_LAZY cases which can never occur. Remove that: it can be resurrected later if necessary. But keep the definition of MPOL_MF_LAZY, which must remain in the UAPI, even though it always fails with EINVAL. https://lore.kernel.org/linux-mm/[email protected]/ links to a previous request to remove MPOL_MF_LAZY. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mempolicy trivia: use pgoff_t in shared mempolicy treeHugh Dickins1-13/+7
Prefer the more explicit "pgoff_t" to "unsigned long" when dealing with a shared mempolicy tree. Delete confusing comment about pseudo mm vmas. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mempolicy trivia: slightly more consistent namingHugh Dickins1-6/+5
Before getting down to work, do a little cleanup, mainly of inconsistent variable naming. I gave up trying to rationalize mpol versus pol versus policy, and node versus nid, but let's avoid p and nd. Remove a few superfluous blank lines, but add one; and here prefer vma->vm_policy to vma_policy(vma) - the latter being appropriate in other sources, which have to allow for !CONFIG_NUMA. That intriguing line about KERNEL_DS? should have gone in v2.6.15, when numa_policy_init() stopped using set_mempolicy(2)'s system call handler. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25hugetlbfs: drop shared NUMA mempolicy pretenceHugh Dickins1-2/+1
Patch series "mempolicy: cleanups leading to NUMA mpol without vma", v2. Mostly cleanups in mm/mempolicy.c, but finally removing the pseudo-vma from shmem folio allocation, and removing the mmap_lock around folio migration for mbind and migrate_pages syscalls. This patch (of 12): hugetlbfs_fallocate() goes through the motions of pasting a shared NUMA mempolicy onto its pseudo-vma, but how could there ever be a shared NUMA mempolicy for this file? hugetlb_vm_ops has never offered a set_policy method, and hugetlbfs_parse_param() has never supported any mpol options for a mount-wide default policy. It's just an illusion: clean it away so as not to confuse others, giving us more freedom to adjust shmem's set_policy/get_policy implementation. But hugetlbfs_inode_info is still required, just to accommodate seals. Yes, shared NUMA mempolicy support could be added to hugetlbfs, with a set_policy method and/or mpol mount option (Andi's first posting did include an admitted-unsatisfactory hugetlb_set_policy()); but it seems that nobody has bothered to add that in the nineteen years since v2.6.7 made it possible, and there is at least one company that has invested enough into hugetlbfs, that I guess they have learnt well enough how to manage its NUMA, without needing shared mempolicy. Remove linux/mempolicy.h from linux/hugetlb.h: include linux/pagemap.h in its place, because hugetlb.h's recently added use of filemap_lock_folio() requires that (although most .configs and .c's get it in some other way). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm/damon: implement a function for max nr_accesses safe calculationSeongJae Park1-0/+7
Patch series "avoid divide-by-zero due to max_nr_accesses overflow". The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Avoid the divide-by-zero by implementing a function that handles the corner case (first patch), and replaces the vulnerable direct max nr_accesses calculations (remaining patches). Note that the patches for the replacements are divided for broken commits, to make backporting on required tres easier. Especially, the last patch is for a patch that not yet merged into the mainline but in mm tree. This patch (of 4): The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Implement a function that handles the corner case. Note that this commit is not fixing the real issue since this is only introducing the safe function that will replaces the problematic divisions. The replacements will be made by followup commits, to make backporting on stable series easier. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization") Signed-off-by: SeongJae Park <[email protected]> Reported-by: Jakub Acs <[email protected]> Cc: <[email protected]> [5.16+] Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm/khugepaged: convert alloc_charge_hpage() to use foliosVishal Moola (Oracle)1-14/+0
Also remove count_memcg_page_event now that its last caller no longer uses it and reword hpage_collapse_alloc_page() to hpage_collapse_alloc_folio(). This removes 1 call to compound_head() and helps convert khugepaged to use folios throughout. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm/migrate: add nr_split to trace_mm_migrate_pages stats.Zi Yan1-10/+14
Add nr_split to trace_mm_migrate_pages for large folio (including THP) split events. [[email protected]: cleanup per Huang, Ying] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zi Yan <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25bootmem: use kmemleak_free_part_phys in free_bootmem_pageLiu Shixin1-1/+1
Since kmemleak_alloc_phys() rather than kmemleak_alloc() was called from memblock_alloc_range_nid(), kmemleak_free_part_phys() should be used to delete kmemleak object in free_bootmem_page(). In debug mode, there are following warning: kmemleak: Partially freeing unknown object at 0xffff97345aff7000 (size 4096) Link: https://lkml.kernel.org/r/[email protected] Fixes: 028725e73375 ("bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page") Signed-off-by: Liu Shixin <[email protected]> Acked-by: Catalin Marinas <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Patrick Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: remove page_cpupid_xchg_last()Kefeng Wang1-12/+7
Since all calls use folio_xchg_last_cpupid(), remove page_cpupid_xchg_last(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: make finish_mkwrite_fault() staticKefeng Wang1-1/+0
Make finish_mkwrite_fault static since it is not used outside of memory.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: add folio_xchg_last_cpupid()Kefeng Wang1-0/+5
Add folio_xchg_last_cpupid() wrapper, which is required to convert page_cpupid_xchg_last() to folio vertion later in the series. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: remove xchg_page_access_time()Kefeng Wang1-8/+4
Since all calls use folio_xchg_access_time(), remove xchg_page_access_time(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: add folio_xchg_access_time()Kefeng Wang1-0/+5
Add folio_xchg_access_time() wrapper, which is required to convert xchg_page_access_time() to folio vertion later in the series. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: remove page_cpupid_last()Kefeng Wang1-11/+6
Since all calls use folio_last_cpupid(), remove page_cpupid_last(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: add folio_last_cpupid()Kefeng Wang1-0/+5
Add folio_last_cpupid() wrapper, which is required to convert page_cpupid_last() to folio vertion later in the series. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm_types: add virtual and _last_cpupid into struct folioKefeng Wang1-4/+18
Patch series "mm: convert page cpupid functions to folios", v3. The cpupid(or access time) used by numa balancing is stored in flags or _last_cpupid(if LAST_CPUPID_NOT_IN_PAGE_FLAGS) of page, this is to convert page cpupid to folio cpupid, a new _last_cpupid is added into folio, which make us to use folio->_last_cpupid directly, and the page cpupid functions are converted to folio ones. page_cpupid_last() -> folio_last_cpupid() xchg_page_access_time() -> folio_xchg_access_time() page_cpupid_xchg_last() -> folio_xchg_last_cpupid() This patch (of 19): If WANT_PAGE_VIRTUAL and LAST_CPUPID_NOT_IN_PAGE_FLAGS defined, the 'virtual' and '_last_cpupid' are in struct page, and since _last_cpupid is used by numa balancing feature, it is better to move it before KMSAN metadata from struct page, also add them into struct folio to make us to access them from folio directly. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: kmem: reimplement get_obj_cgroup_from_current()Roman Gushchin1-1/+10
Reimplement get_obj_cgroup_from_current() using current_obj_cgroup(). get_obj_cgroup_from_current() and current_obj_cgroup() share 80% of the code, so the new implementation is almost trivial. get_obj_cgroup_from_current() is a convenient function used by the bpf subsystem, so there is no reason to get rid of it completely. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin (Cruise) <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Shakeel Butt <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naresh Kamboju <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: kmem: scoped objcg protectionRoman Gushchin2-0/+13
Switch to a scope-based protection of the objcg pointer on slab/kmem allocation paths. Instead of using the get_() semantics in the pre-allocation hook and put the reference afterwards, let's rely on the fact that objcg is pinned by the scope. It's possible because: 1) if the objcg is received from the current task struct, the task is keeping a reference to the objcg. 2) if the objcg is received from an active memcg (remote charging), the memcg is pinned by the scope and has a reference to the corresponding objcg. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin (Cruise) <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Acked-by: Shakeel Butt <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: kmem: make memcg keep a reference to the original objcgRoman Gushchin1-1/+7
Keep a reference to the original objcg object for the entire life of a memcg structure. This allows to simplify the synchronization on the kernel memory allocation paths: pinning a (live) memcg will also pin the corresponding objcg. The memory overhead of this change is minimal because object cgroups usually outlive their corresponding memory cgroups even without this change, so it's only an additional pointer per memcg. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin (Cruise) <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Acked-by: Shakeel Butt <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25mm: kmem: add direct objcg pointer to task_structRoman Gushchin1-0/+4
To charge a freshly allocated kernel object to a memory cgroup, the kernel needs to obtain an objcg pointer. Currently it does it indirectly by obtaining the memcg pointer first and then calling to __get_obj_cgroup_from_memcg(). Usually tasks spend their entire life belonging to the same object cgroup. So it makes sense to save the objcg pointer on task_struct directly, so it can be obtained faster. It requires some work on fork, exit and cgroup migrate paths, but these paths are way colder. To avoid any costly synchronization the following rules are applied: 1) A task sets it's objcg pointer itself. 2) If a task is being migrated to another cgroup, the least significant bit of the objcg pointer is set atomically. 3) On the allocation path the objcg pointer is obtained locklessly using the READ_ONCE() macro and the least significant bit is checked. If it's set, the following procedure is used to update it locklessly: - task->objcg is zeroed using cmpxcg - new objcg pointer is obtained - task->objcg is updated using try_cmpxchg - operation is repeated if try_cmpxcg fails It guarantees that no updates will be lost if task migration is racing against objcg pointer update. It also allows to keep both read and write paths fully lockless. Because the task is keeping a reference to the objcg, it can't go away while the task is alive. This commit doesn't change the way the remote memcg charging works. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin (Cruise) <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Shakeel Butt <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>