aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-01-19Merge tag 'f2fs-for-5.17-rc1' of ↵Linus Torvalds1-12/+15
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've tried to address some performance issues in f2fs_checkpoint and direct IO flows. Also, there was a work to enhance the page cache management used for compression. Other than them, we've done typical work including sysfs, code clean-ups, tracepoint, sanity check, in addition to bug fixes on corner cases. Enhancements: - use iomap for direct IO - try to avoid lock contention to improve f2fs_ckpt speed - avoid unnecessary memory allocation in compression flow - POSIX_FADV_DONTNEED drops the page cache containing compression pages - add some sysfs entries (gc_urgent_high_remaining, pending_discard) Bug fixes: - try not to expose unwritten blocks to user by DIO (this was added to avoid merge conflict; another patch is coming to address other missing case) - relax minor error condition for file pinning feature used in Android OTA - fix potential deadlock case in compression flow - should not truncate any block on pinned file In addition, we've done some code clean-ups and tracepoint/sanity check improvement" * tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits) f2fs: do not allow partial truncation on pinned file f2fs: remove redunant invalidate compress pages f2fs: Simplify bool conversion f2fs: don't drop compressed page cache in .{invalidate,release}page f2fs: fix to reserve space for IO align feature f2fs: fix to check available space of CP area correctly in update_ckpt_flags() f2fs: support fault injection to f2fs_trylock_op() f2fs: clean up __find_inline_xattr() with __find_xattr() f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr() f2fs: do not bother checkpoint by f2fs_get_node_info f2fs: avoid down_write on nat_tree_lock during checkpoint f2fs: compress: fix potential deadlock of compress file f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file f2fs: add gc_urgent_high_remaining sysfs node f2fs: fix to do sanity check in is_alive() f2fs: fix to avoid panic in is_alive() if metadata is inconsistent f2fs: fix to do sanity check on inode type during garbage collection f2fs: avoid duplicate call of mark_inode_dirty f2fs: show number of pending discard commands f2fs: support POSIX_FADV_DONTNEED drop compressed page cache ...
2022-01-19Merge tag 'kbuild-v5.17' of ↵Linus Torvalds2-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add new kconfig target 'make mod2noconfig', which will be useful to speed up the build and test iteration. - Raise the minimum supported version of LLVM to 11.0.0 - Refactor certs/Makefile - Change the format of include/config/auto.conf to stop double-quoting string type CONFIG options. - Fix ARCH=sh builds in dash - Separate compression macros for general purposes (cmd_bzip2 etc.) and the ones for decompressors (cmd_bzip2_with_size etc.) - Misc Makefile cleanups * tag 'kbuild-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kbuild: add cmd_file_size arch: decompressor: remove useless vmlinux.bin.all-y kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22} kbuild: drop $(size_append) from cmd_zstd sh: rename suffix-y to suffix_y doc: kbuild: fix default in `imply` table microblaze: use built-in function to get CPU_{MAJOR,MINOR,REV} certs: move scripts/extract-cert to certs/ kbuild: do not quote string values in include/config/auto.conf kbuild: do not include include/config/auto.conf from shell scripts certs: simplify $(srctree)/ handling and remove config_filename macro kbuild: stop using config_filename in scripts/Makefile.modsign certs: remove misleading comments about GCC PR certs: refactor file cleaning certs: remove unneeded -I$(srctree) option for system_certificates.o certs: unify duplicated cmd_extract_certs and improve the log certs: use $< and $@ to simplify the key generation rule kbuild: remove headers_check stub kbuild: move headers_check.pl to usr/include/ certs: use if_changed to re-generate the key when the key type is changed ...
2022-01-19Merge branch 'random-5.17-rc1-for-linus' of ↵Linus Torvalds2-38/+21
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator fixes from Jason Donenfeld: - Some Kconfig changes resulted in BIG_KEYS being unselectable, which Justin sent a patch to fix. - Geert pointed out that moving to BLAKE2s bloated vmlinux on little machines, like m68k, so we now compensate for this. - Numerous style and house cleaning fixes, meant to have a cleaner base for future changes. * 'random-5.17-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: random: simplify arithmetic function flow in account() random: selectively clang-format where it makes sense random: access input_pool_data directly rather than through pointer random: cleanup fractional entropy shift constants random: prepend remaining pool constants with POOL_ random: de-duplicate INPUT_POOL constants random: remove unused OUTPUT_POOL constants random: rather than entropy_store abstraction, use global random: remove unused extract_entropy() reserved argument random: remove incomplete last_data logic random: cleanup integer types random: cleanup poolinfo abstraction random: fix typo in comments lib/crypto: sha1: re-roll loops to reduce code size lib/crypto: blake2s: move hmac construction into wireguard lib/crypto: add prompts back to crypto libraries
2022-01-18random: rather than entropy_store abstraction, use globalJason A. Donenfeld1-35/+21
Originally, the RNG used several pools, so having things abstracted out over a generic entropy_store object made sense. These days, there's only one input pool, and then an uneven mix of usage via the abstraction and usage via &input_pool. Rather than this uneasy mixture, just get rid of the abstraction entirely and have things always use the global. This simplifies the code and makes reading it a bit easier. Reviewed-by: Dominik Brodowski <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2022-01-18lib/crypto: blake2s: move hmac construction into wireguardJason A. Donenfeld1-3/+0
Basically nobody should use blake2s in an HMAC construction; it already has a keyed variant. But unfortunately for historical reasons, Noise, used by WireGuard, uses HKDF quite strictly, which means we have to use this. Because this really shouldn't be used by others, this commit moves it into wireguard's noise.c locally, so that kernels that aren't using WireGuard don't get this superfluous code baked in. On m68k systems, this shaves off ~314 bytes. Cc: Herbert Xu <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2022-01-18Merge tag 'dmaengine-5.17-rc1' of ↵Linus Torvalds3-1/+22
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "A bunch of new support and few updates to drivers: New support: - DMA_MEMCPY_SG support is bought back as we have a user in Xilinx driver - Support for TI J721S2 SoC in k3-udma driver - Support for Ingenic MDMA and BDMA in the JZ4760 - Support for Renesas r8a779f0 dmac Updates: - We are finally getting rid of slave_id, so this brings in the changes across tree for that - updates for idxd driver - at_xdmac driver cleanup" * tag 'dmaengine-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (60 commits) dt-bindings: dma-controller: Split interrupt fields in example dmaengine: pch_dma: Remove usage of the deprecated "pci-dma-compat.h" API dmaengine: at_xdmac: Fix race over irq_status dmaengine: at_xdmac: Remove a level of indentation in at_xdmac_tasklet() dmaengine: at_xdmac: Fix at_xdmac_lld struct definition dmaengine: at_xdmac: Fix lld view setting dmaengine: at_xdmac: Remove a level of indentation in at_xdmac_advance_work() dmaengine: at_xdmac: Fix concurrency over xfers_list dmaengine: at_xdmac: Move the free desc to the tail of the desc list dmaengine: at_xdmac: Fix race for the tx desc callback dmaengine: at_xdmac: Fix concurrency over chan's completed_cookie dmaengine: at_xdmac: Print debug message after realeasing the lock dmaengine: at_xdmac: Start transfer for cyclic channels in issue_pending dmaengine: at_xdmac: Don't start transactions at tx_submit level dmaengine: idxd: deprecate token sysfs attributes for read buffers dmaengine: idxd: change bandwidth token to read buffers dmaengine: idxd: fix wq settings post wq disable dmaengine: idxd: change MSIX allocation based on per wq activation dmaengine: idxd: fix descriptor flushing locking dmaengine: idxd: embed irq_entry in idxd_wq struct ...
2022-01-18Merge tag 'ata-5.17-rc1' of ↵Linus Torvalds2-84/+470
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA updates from Damien Le Moal: "A larger than usual set of changes for this cycle. The bulk of the changes are part of a rework of libata messages and debugging features from Hannes. In more detail, the changes are as follows. - Small code cleanups in the pata_ali driver (unnecessary variable initialization and simplified return statement, from Jason and Colin. - Switch to using struct_group() in the sata_fsl driver, from Kees. - Convert many sysfs attribute show functions to use sysfs_emit() instead of snprintf(), from me. - sata_dwc_460ex driver code cleanups, from Andy. - Improve DMA setup and remove superfluous error message in libahci_platform, from Andy - A small code cleanup in libata to use min() instead of open coding test, from Changcheng. - Rework of libata messages from Hannes. This is especially focused on replacing compile time defined debugging messages (DPRINTK() and VPRINTK()) with regular dynamic debugging messages (pr_debug()) and traceipoint events. Both libata-core and many drivers are updated to have a consistent debugging level control for all drivers. - Extend compile test support to as many drivers as possible in ATA Kconfig to improve compile test coverage, from me. - Fixes to avoid compile time warnings (W=1) and sparse warnings in sata_fsl and ahci_xgene drivers, from me. - Fix the interface of the read_id() port operation method to clarify that the data buffer passed as an argument is little endian. This avoids sparse warnings in the pata_netcell, pata_it821x, ahci_xgene, ahci_cevaxi and ahci_brcm drivers. From me. - Small code cleanup in the pata_octeon_cf driver, from Minghao. - Improved IRQ configuration code in pata_of_platform, from Lad. - Simplified implementation of __ata_scsi_queuecmd(), from Wenchao. - Debounce delay flag renaming, from Paul. - Add support for AMD A85 FCH (Hudson D4) AHCI adapters, from Paul" * tag 'ata-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (106 commits) ata: pata_ali: remove redundant return statement ata: ahci: Add support for AMD A85 FCH (Hudson D4) ata: libata: Rename link flag ATA_LFLAG_NO_DB_DELAY ata: libata-scsi: simplify __ata_scsi_queuecmd() ata: pata_of_platform: Use platform_get_irq_optional() to get the interrupt ata: pata_samsung_cf: add compile test support ata: pata_pxa: add compile test support ata: pata_imx: add compile test support ata: pata_ftide010: add compile test support ata: pata_cs5535: add compile test support ata: pata_octeon_cf: remove redundant val variable ata: fix read_id() ata port operation interface ata: ahci_xgene: use correct type for port mmio address ata: sata_fsl: fix cmdhdr_tbl_entry and prde struct definitions ata: sata_fsl: fix scsi host initialization ata: pata_bk3710: add compile test support ata: ahci_seattle: add compile test support ata: ahci_xgene: add compile test support ata: ahci_tegra: add compile test support ata: ahci_sunxi: add compile test support ...
2022-01-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds3-8/+38
Pull virtio updates from Michael Tsirkin: "virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes. - partial support for < MAX_ORDER - 1 granularity for virtio-mem - driver_override for vdpa - sysfs ABI documentation for vdpa - multiqueue config support for mlx5 vdpa - and misc fixes, cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits) vdpa/mlx5: Fix tracking of current number of VQs vdpa/mlx5: Fix is_index_valid() to refer to features vdpa: Protect vdpa reset with cf_mutex vdpa: Avoid taking cf_mutex lock on get status vdpa/vdpa_sim_net: Report max device capabilities vdpa: Use BIT_ULL for bit operations vdpa/vdpa_sim: Configure max supported virtqueues vdpa/mlx5: Report max device capabilities vdpa: Support reporting max device capabilities vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps() vdpa: Add support for returning device configuration information vdpa/mlx5: Support configuring max data virtqueue vdpa/mlx5: Fix config_attr_mask assignment vdpa: Allow to configure max data virtqueues vdpa: Read device configuration only if FEATURES_OK vdpa: Sync calls set/get config/status with cf_mutex vdpa/mlx5: Distribute RX virtqueues in RQT object vdpa: Provide interface to read driver features vdpa: clean up get_config_size ret value handling virtio_ring: mark ring unused on error ...
2022-01-18Merge tag 'pm-5.17-rc1-2' of ↵Linus Torvalds2-15/+64
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "This is a continuation of the rework of device power management macros used for declaring device power management callbacks (Paul Cercueil)" * tag 'pm-5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: iio: pressure: bmp280: Use new PM macros PM: runtime: Add EXPORT[_GPL]_RUNTIME_DEV_PM_OPS macros PM: runtime: Add DEFINE_RUNTIME_DEV_PM_OPS() macro PM: core: Add EXPORT[_GPL]_SIMPLE_DEV_PM_OPS macros PM: core: Remove static qualifier in DEFINE_SIMPLE_DEV_PM_OPS macro PM: core: Remove DEFINE_UNIVERSAL_DEV_PM_OPS() macro
2022-01-18Merge tag 'acpi-5.17-rc1-2' of ↵Linus Torvalds2-0/+308
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "The most significant item here is the Platform Firmware Runtime Update and Telemetry (PFRUT) support designed to allow certain pieces of the platform firmware to be updated on the fly, among other things. Also important is the e820 handling change on x86 that should work around PCI BAR allocation issues on some systems shipping since 2019. The rest is just a handful of assorted fixes and cleanups on top of the ACPI material merged previously. Specifics: - Add support for the the Platform Firmware Runtime Update and Telemetry (PFRUT) interface based on ACPI to allow certain pieces of the platform firmware to be updated without restarting the system and to provide a mechanism for collecting platform firmware telemetry data (Chen Yu, Dan Carpenter, Yang Yingliang). - Ignore E820 reservations covering PCI host bridge windows on sufficiently recent x86 systems to avoid issues with allocating PCI BARs on systems where the E820 reservations cover the entire PCI host bridge memory window returned by the _CRS object in the system's ACPI tables (Hans de Goede). - Fix and clean up acpi_scan_init() (Rafael Wysocki). - Add more sanity checking to ACPI SPCR tables parsing (Mark Langsdorf). - Fix up ACPI APD (AMD Soc) driver initialization (Jiasheng Jiang). - Drop unnecessary "static" from the ACPI PCC address space handling driver added recently (kernel test robot)" * tag 'acpi-5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PCC: pcc_ctx can be static ACPI: scan: Rename label in acpi_scan_init() ACPI: scan: Simplify initialization of power and sleep buttons ACPI: scan: Change acpi_scan_init() return value type to void ACPI: SPCR: check if table->serial_port.access_width is too wide ACPI: APD: Check for NULL pointer after calling devm_ioremap() x86/PCI: Ignore E820 reservations for bridge windows on newer systems ACPI: pfr_telemetry: Fix info leak in pfrt_log_ioctl() ACPI: pfr_update: Fix return value check in pfru_write() ACPI: tools: Introduce utility for firmware updates/telemetry ACPI: Introduce Platform Firmware Runtime Telemetry driver ACPI: Introduce Platform Firmware Runtime Update device driver efi: Introduce EFI_FIRMWARE_MANAGEMENT_CAPSULE_HEADER and corresponding structures
2022-01-18Merge tag 'slab-for-5.17-part2' of ↵Linus Torvalds2-65/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull more slab updates from Vlastimil Babka: "Finish the conversion to struct slab by removing slab-specific fields from struct page. The first slab update (see merge commit ca1a46d6f506) did most of the conversion, but there was also series in iommu tree removing the iommu's usage of struct page 'freelist' field, blocking the final struct page cleanup. Now that the iommu changes have been merged, we can finish the job" * tag 'slab-for-5.17-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm: Remove slab from struct page
2022-01-17Merge branch 'acpi-pfrut'Rafael J. Wysocki2-0/+308
Merge support for the Platform Firmware Runtime Update and Telemetry interface based on ACPI. The interface provided here allows updating certain pieces of the platform firmware without restarting the system and collecting platform firmware telemetry data. This also includes a utility for accesing the new interface from user space. * acpi-pfrut: ACPI: pfr_telemetry: Fix info leak in pfrt_log_ioctl() ACPI: pfr_update: Fix return value check in pfru_write() ACPI: tools: Introduce utility for firmware updates/telemetry ACPI: Introduce Platform Firmware Runtime Telemetry driver ACPI: Introduce Platform Firmware Runtime Update device driver efi: Introduce EFI_FIRMWARE_MANAGEMENT_CAPSULE_HEADER and corresponding structures
2022-01-17Merge tag '5.17-rc-part1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-0/+4
Pull cifs updates from Steve French: - multichannel patches mostly related to improving reconnect behavior - minor cleanup patches * tag '5.17-rc-part1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix FILE_BOTH_DIRECTORY_INFO definition cifs: move superblock magic defitions to magic.h cifs: Fix smb311_update_preauth_hash() kernel-doc comment cifs: avoid race during socket reconnect between send and recv cifs: maintain a state machine for tcp/smb/tcon sessions cifs: fix hang on cifs_get_next_mid() cifs: take cifs_tcp_ses_lock for status checks cifs: reconnect only the connection and not smb session where possible cifs: add WARN_ON for when chan_count goes below minimum cifs: adjust DebugData to use chans_need_reconnect for conn status cifs: use the chans_need_reconnect bitmap for reconnect status cifs: track individual channel status using chans_need_reconnect cifs: remove redundant assignment to pointer p
2022-01-17devtmpfs regression fix: reconfigure on each mountNeilBrown1-0/+2
Prior to Linux v5.4 devtmpfs used mount_single() which treats the given mount options as "remount" options, so it updates the configuration of the single super_block on each mount. Since that was changed, the mount options used for devtmpfs are ignored. This is a regression which affect systemd - which mounts devtmpfs with "-o mode=755,size=4m,nr_inodes=1m". This patch restores the "remount" effect by calling reconfigure_single() Fixes: d401727ea0d7 ("devtmpfs: don't mix {ramfs,shmem}_fill_super() with mount_single()") Acked-by: Christian Brauner <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-17Merge tag 'ntb-5.17' of git://github.com/jonmason/ntbLinus Torvalds1-2/+0
Pull NTB updates from Jon Mason: "New AMD PCI ID for NTB, and a number of bug fixes for ntb_hw_switchtec for Linux v5.17" * tag 'ntb-5.17' of git://github.com/jonmason/ntb: ntb_hw_switchtec: Fix a minor issue in config_req_id_table() ntb_hw_switchtec: Remove code for disabling ID protection ntb_hw_switchtec: Update the way of getting VEP instance ID ntb_hw_switchtec: AND with the part_map for a valid tpart_vec ntb_hw_switchtec: Fix bug with more than 32 partitions ntb_hw_switchtec: Fix pff ioread to read into mmio_part_cfg_all ntb_hw_switchtec: fix the spelling of "its" NTB/msi: Fix ntbm_msi_request_threaded_irq() kernel-doc comment ntb_hw_amd: Add NTB PCI ID for new gen CPU
2022-01-17Merge tag 'linux-watchdog-5.17-rc1' of ↵Linus Torvalds1-0/+8
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - New device support: - Watchdog Timer driver for RZ/G2L - Realtek Otto watchdog timer - Apple SoC watchdog driver - Fintek F81966 - Remove BCM63XX_WDT after support for this SoC was added to BCM7038_WDT - Improvements of the BCM7038_WDT and s3c2410_wdt code - Several other fixes and improvements * tag 'linux-watchdog-5.17-rc1' of git://www.linux-watchdog.org/linux-watchdog: (38 commits) watchdog: msc313e: Check if the WDT was running at boot watchdog: Add Apple SoC watchdog driver dt-bindings: watchdog: Add SM6350 and SM8250 compatible watchdog: s3c2410: Fix getting the optional clock watchdog: s3c2410: Use platform_get_irq() to get the interrupt dt-bindings: watchdog: atmel: Add missing 'interrupts' property watchdog: mtk_wdt: use platform_get_irq_optional watchdog: Add Watchdog Timer driver for RZ/G2L dt-bindings: watchdog: renesas,wdt: Add support for RZ/G2L watchdog: da9063: Add hard dependency on I2C watchdog: Add Realtek Otto watchdog timer dt-bindings: watchdog: Realtek Otto WDT binding watchdog: s3c2410: Add Exynos850 support watchdog: da9063: use atomic safe i2c transfer in reset handler watchdog: davinci: Use div64_ul instead of do_div watchdog: Remove BCM63XX_WDT MIPS: BCM63XX: Provide platform data to watchdog device watchdog: bcm7038_wdt: Add platform device id for bcm63xx-wdt watchdog: Allow building BCM7038_WDT for BCM63XX watchdog: bcm7038_wdt: Support platform data configuration ...
2022-01-17Merge branch 'modules-next' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "The biggest change here is in-kernel support for module decompression. This change is being made to help support LSMs like LoadPin as otherwise it loses link between the source of kernel module on the disk and binary blob that is being loaded into the kernel. kmod decompression is still done by userspace even with this is done, both because there are no measurable gains in not doing so and as it adds a secondary extra check for validating the module before loading it into the kernel. The rest of the changes are minor, the only other change worth mentionin there is Jessica Yu is now bowing out of maintenance of modules as she's taking a break from work. While there were other changes posted for modules, those have not yet received much review of testing so I'm not yet comfortable in merging any of those changes yet." * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: fix signature check failures when using in-kernel decompression kernel: Fix spelling mistake "compresser" -> "compressor" MAINTAINERS: add mailing lists for kmod and modules module.h: allow #define strings to work with MODULE_IMPORT_NS module: add in-kernel support for decompressing MAINTAINERS: Remove myself as modules maintainer module: Remove outdated comment
2022-01-17Merge branch 'signal-for-v5.17' of ↵Linus Torvalds9-75/+20
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull signal/exit/ptrace updates from Eric Biederman: "This set of changes deletes some dead code, makes a lot of cleanups which hopefully make the code easier to follow, and fixes bugs found along the way. The end-game which I have not yet reached yet is for fatal signals that generate coredumps to be short-circuit deliverable from complete_signal, for force_siginfo_to_task not to require changing userspace configured signal delivery state, and for the ptrace stops to always happen in locations where we can guarantee on all architectures that the all of the registers are saved and available on the stack. Removal of profile_task_ext, profile_munmap, and profile_handoff_task are the big successes for dead code removal this round. A bunch of small bug fixes are included, as most of the issues reported were small enough that they would not affect bisection so I simply added the fixes and did not fold the fixes into the changes they were fixing. There was a bug that broke coredumps piped to systemd-coredump. I dropped the change that caused that bug and replaced it entirely with something much more restrained. Unfortunately that required some rebasing. Some successes after this set of changes: There are few enough calls to do_exit to audit in a reasonable amount of time. The lifetime of struct kthread now matches the lifetime of struct task, and the pointer to struct kthread is no longer stored in set_child_tid. The flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is removed. Issues where task->exit_code was examined with signal->group_exit_code should been examined were fixed. There are several loosely related changes included because I am cleaning up and if I don't include them they will probably get lost. The original postings of these changes can be found at: https://lkml.kernel.org/r/[email protected] https://lkml.kernel.org/r/[email protected] https://lkml.kernel.org/r/[email protected] I trimmed back the last set of changes to only the obviously correct once. Simply because there was less time for review than I had hoped" * 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits) ptrace/m68k: Stop open coding ptrace_report_syscall ptrace: Remove unused regs argument from ptrace_report_syscall ptrace: Remove second setting of PT_SEIZED in ptrace_attach taskstats: Cleanup the use of task->exit_code exit: Use the correct exit_code in /proc/<pid>/stat exit: Fix the exit_code for wait_task_zombie exit: Coredumps reach do_group_exit exit: Remove profile_handoff_task exit: Remove profile_task_exit & profile_munmap signal: clean up kernel-doc comments signal: Remove the helper signal_group_exit signal: Rename group_exit_task group_exec_task coredump: Stop setting signal->group_exit_task signal: Remove SIGNAL_GROUP_COREDUMP signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process signal: Make coredump handling explicit in complete_signal signal: Have prepare_signal detect coredumps using signal->core_state signal: Have the oom killer detect coredumps using signal->core_state exit: Move force_uaccess back into do_exit exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit ...
2022-01-17Merge tag 'unicode-for-next-5.17' of ↵Linus Torvalds1-3/+46
git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode Pull unicode updates from Gabriel Krisman Bertazi: "This includes patches from Christoph Hellwig to split the large data tables of the unicode subsystem into a loadable module, which allow users to not have them around if case-insensitive filesystems are not to be used. It also includes minor code fixes to unicode and its users, from the same author. All the patches here have been on linux-next releases for the past months" * tag 'unicode-for-next-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: unicode: only export internal symbols for the selftests unicode: Add utf8-data module unicode: cache the normalization tables in struct unicode_map unicode: move utf8cursor to utf8-selftest.c unicode: simplify utf8len unicode: remove the unused utf8{,n}age{min,max} functions unicode: pass a UNICODE_AGE() tripple to utf8_load unicode: mark the version field in struct unicode_map unsigned unicode: remove the charset field from struct unicode_map f2fs: simplify f2fs_sb_read_encoding ext4: simplify ext4_sb_read_encoding
2022-01-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-103/+382
Pull kvm updates from Paolo Bonzini: "RISCV: - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish ARM: - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update s390: - fix sigp sense/start/stop/inconsistency - cleanups x86: - Clean up some function prototypes more - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery - completely remove potential TOC/TOU races in nested SVM consistency checks - update some PMCs on emulated instructions - Intel AMX support (joint work between Thomas and Intel) - large MMU cleanups - module parameter to disable PMU virtualization - cleanup register cache - first part of halt handling cleanups - Hyper-V enlightened MSR bitmap support for nested hypervisors Generic: - clean up Makefiles - introduce CONFIG_HAVE_KVM_DIRTY_RING - optimize memslot lookup using a tree - optimize vCPU array usage by converting to xarray" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits) x86/fpu: Fix inline prefix warnings selftest: kvm: Add amx selftest selftest: kvm: Move struct kvm_x86_state to header selftest: kvm: Reorder vcpu_load_state steps for AMX kvm: x86: Disable interception for IA32_XFD on demand x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() kvm: selftests: Add support for KVM_CAP_XSAVE2 kvm: x86: Add support for getting/setting expanded xstate buffer x86/fpu: Add uabi_size to guest_fpu kvm: x86: Add CPUID support for Intel AMX kvm: x86: Add XCR0 support for Intel AMX kvm: x86: Disable RDMSR interception of IA32_XFD_ERR kvm: x86: Emulate IA32_XFD_ERR for guest kvm: x86: Intercept #NM for saving IA32_XFD_ERR x86/fpu: Prepare xfd_err in struct fpu_guest kvm: x86: Add emulation for IA32_XFD x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2 x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM x86/fpu: Add guest support to xfd_enable_feature() ...
2022-01-16Merge tag 'hyperv-next-signed-20220114' of ↵Linus Torvalds3-2/+16
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - More patches for Hyper-V isolation VM support (Tianyu Lan) - Bug fixes and clean-up patches from various people * tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: scsi: storvsc: Fix storvsc_queuecommand() memory leak x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi() Drivers: hv: vmbus: Initialize request offers message for Isolation VM scsi: storvsc: Fix unsigned comparison to zero swiotlb: Add CONFIG_HAS_IOMEM check around swiotlb_mem_remap() x86/hyperv: Fix definition of hv_ghcb_pg variable Drivers: hv: Fix definition of hypercall input & output arg variables net: netvsc: Add Isolation VM support for netvsc driver scsi: storvsc: Add Isolation VM support for storvsc driver hyper-v: Enable swiotlb bounce buffer for Isolation VM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() swiotlb: Add swiotlb bounce buffer remap function for HV IVM
2022-01-16Merge tag 'trace-v5.17' of ↵Linus Torvalds4-2/+152
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New: - The Real Time Linux Analysis (RTLA) tool is added to the tools directory. - Can safely filter on user space pointers with: field.ustring ~ "match-string" - eprobes can now be filtered like any other event. - trace_marker(_raw) now uses stream_open() to allow multiple threads to safely write to it. Note, this could possibly break existing user space, but we will not know until we hear about it, and then can revert the change if need be. - New field in events to display when bottom halfs are disabled. - Sorting of the ftrace functions are now done at compile time instead of at bootup. Infrastructure changes to support future efforts: - Added __rel_loc type for trace events. Similar to __data_loc but the offset to the dynamic data is based off of the location of the descriptor and not the beginning of the event. Needed for user defined events. - Some simplification of event trigger code. - Make synthetic events process its callback better to not hinder other event callbacks that are registered. Needed for user defined events. And other small fixes and cleanups" * tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits) tracing: Add ustring operation to filtering string pointers rtla: Add rtla timerlat hist documentation rtla: Add rtla timerlat top documentation rtla: Add rtla timerlat documentation rtla: Add rtla osnoise hist documentation rtla: Add rtla osnoise top documentation rtla: Add rtla osnoise man page rtla: Add Documentation rtla/timerlat: Add timerlat hist mode rtla: Add timerlat tool and timelart top mode rtla/osnoise: Add the hist mode rtla/osnoise: Add osnoise top mode rtla: Add osnoise tool rtla: Helper functions for rtla rtla: Real-Time Linux Analysis tool tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails tracing: Remove duplicate warnings when calling trace_create_file() tracing/kprobes: 'nmissed' not showed correctly for kretprobe tracing: Add test for user space strings when filtering on string pointers tracing: Have syscall trace events use trace_event_buffer_lock_reserve() ...
2022-01-16Merge tag 'pci-v5.17-changes' of ↵Linus Torvalds4-128/+108
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Use pci_find_vsec_capability() instead of open-coding it (Andy Shevchenko) - Convert pci_dev_present() stub from macro to static inline to avoid 'unused variable' errors (Hans de Goede) - Convert sysfs slot attributes from default_attrs to default_groups (Greg Kroah-Hartman) - Use DWORD accesses for LTR, L1 SS to avoid BayHub OZ711LV2 erratum (Rajat Jain) - Remove unnecessary initialization of static variables (Longji Guo) Resource management: - Always write Intel I210 ROM BAR on update to work around device defect (Bjorn Helgaas) PCIe native device hotplug: - Fix pciehp lockdep errors on Thunderbolt undock (Hans de Goede) - Fix infinite loop in pciehp IRQ handler on power fault (Lukas Wunner) Power management: - Convert amd64-agp, sis-agp, via-agp from legacy PCI power management to generic power management (Vaibhav Gupta) IOMMU: - Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller so it can work with an IOMMU (Yifeng Li) Error handling: - Add PCI_ERROR_RESPONSE and related definitions for signaling and checking for transaction errors on PCI (Naveen Naidu) - Fabricate PCI_ERROR_RESPONSE data (~0) in config read wrappers, instead of in host controller drivers, when transactions fail on PCI (Naveen Naidu) - Use PCI_POSSIBLE_ERROR() to check for possible failure of config reads (Naveen Naidu) Peer-to-peer DMA: - Add Logan Gunthorpe as P2PDMA maintainer (Bjorn Helgaas) ASPM: - Calculate link L0s and L1 exit latencies when needed instead of caching them (Saheed O. Bolarinwa) - Calculate device L0s and L1 acceptable exit latencies when needed instead of caching them (Saheed O. Bolarinwa) - Remove struct aspm_latency since it's no longer needed (Saheed O. Bolarinwa) APM X-Gene PCIe controller driver: - Fix IB window setup, which was broken by the fact that IB resources are now sorted in address order instead of DT dma-ranges order (Rob Herring) Apple PCIe controller driver: - Enable clock gating to save power (Hector Martin) - Fix REFCLK1 enable/poll logic (Hector Martin) Broadcom STB PCIe controller driver: - Declare bitmap correctly for use by bitmap interfaces (Christophe JAILLET) - Clean up computation of legacy and non-legacy MSI bitmasks (Florian Fainelli) - Update suspend/resume/remove error handling to warn about errors and not fail the operation (Jim Quinlan) - Correct the "pcie" and "msi" interrupt descriptions in DT binding (Jim Quinlan) - Add DT bindings for endpoint voltage regulators (Jim Quinlan) - Split brcm_pcie_setup() into two functions (Jim Quinlan) - Add mechanism for turning on voltage regulators for connected devices (Jim Quinlan) - Turn voltage regulators for connected devices on/off when bus is added or removed (Jim Quinlan) - When suspending, don't turn off voltage regulators for wakeup devices (Jim Quinlan) Freescale i.MX6 PCIe controller driver: - Add i.MX8MM support (Richard Zhu) Freescale Layerscape PCIe controller driver: - Use DWC common ops instead of layerscape-specific link-up functions (Hou Zhiqiang) Intel VMD host bridge driver: - Honor platform ACPI _OSC feature negotiation for Root Ports below VMD (Kai-Heng Feng) - Add support for Raptor Lake SKUs (Karthik L Gopalakrishnan) - Reset everything below VMD before enumerating to work around failure to enumerate NVMe devices when guest OS reboots (Nirmal Patel) Bridge emulation (used by Marvell Aardvark and MVEBU): - Make emulated ROM BAR read-only by default (Pali Rohár) - Make some emulated legacy PCI bits read-only for PCIe devices (Pali Rohár) - Update reserved bits in emulated PCIe Capability (Pali Rohár) - Allow drivers to emulate different PCIe Capability versions (Pali Rohár) - Set emulated Capabilities List bit for all PCIe devices, since they must have at least a PCIe Capability (Pali Rohár) Marvell Aardvark PCIe controller driver: - Add bridge emulation definitions for PCIe DEVCAP2, DEVCTL2, DEVSTA2, LNKCAP2, LNKCTL2, LNKSTA2, SLTCAP2, SLTCTL2, SLTSTA2 (Pali Rohár) - Add aardvark support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers (Pali Rohár) - Clear all MSIs at setup to avoid spurious interrupts (Pali Rohár) - Disable bus mastering when unbinding host controller driver (Pali Rohár) - Mask all interrupts when unbinding host controller driver (Pali Rohár) - Fix memory leak in host controller unbind (Pali Rohár) - Assert PERST# when unbinding host controller driver (Pali Rohár) - Disable link training when unbinding host controller driver (Pali Rohár) - Disable common PHY when unbinding host controller driver (Pali Rohár) - Fix resource type checking to check only IORESOURCE_MEM, not IORESOURCE_MEM_64, which is a flavor of IORESOURCE_MEM (Pali Rohár) Marvell MVEBU PCIe controller driver: - Implement pci_remap_iospace() for ARM so mvebu can use devm_pci_remap_iospace() instead of the previous ARM-specific pci_ioremap_io() interface (Pali Rohár) - Use the standard pci_host_probe() instead of the device-specific mvebu_pci_host_probe() (Pali Rohár) - Replace all uses of ARM-specific pci_ioremap_io() with the ARM implementation of the standard pci_remap_iospace() interface and remove pci_ioremap_io() (Pali Rohár) - Skip initializing invalid Root Ports (Pali Rohár) - Check for errors from pci_bridge_emul_init() (Pali Rohár) - Ignore any bridges at non-zero function numbers (Pali Rohár) - Return ~0 data for invalid config read size (Pali Rohár) - Disallow mapping interrupts on emulated bridges (Pali Rohár) - Clear Root Port Memory & I/O Space Enable and Bus Master Enable at initialization (Pali Rohár) - Make type bits in Root Port I/O Base register read-only (Pali Rohár) - Disable Root Port windows when base/limit set to invalid values (Pali Rohár) - Set controller to Root Complex mode (Pali Rohár) - Set Root Port Class Code to PCI Bridge (Pali Rohár) - Update emulated Root Port secondary bus numbers to better reflect the actual topology (Pali Rohár) - Add PCI_BRIDGE_CTL_BUS_RESET support to emulated Root Ports so pci_reset_secondary_bus() can reset connected devices (Pali Rohár) - Add PCI_EXP_DEVCTL Error Reporting Enable support to emulated Root Ports (Pali Rohár) - Add PCI_EXP_RTSTA PME Status bit support to emulated Root Ports (Pali Rohár) - Add DEVCAP2, DEVCTL2 and LNKCTL2 support to emulated Root Ports on Armada XP and newer devices (Pali Rohár) - Export mvebu-mbus.c symbols to allow pci-mvebu.c to be a module (Pali Rohár) - Add support for compiling as a module (Pali Rohár) MediaTek PCIe controller driver: - Assert PERST# for 100ms to allow power and clock to stabilize (qizhong cheng) MediaTek PCIe Gen3 controller driver: - Disable Mediatek DVFSRC voltage request since lack of DVFSRC to respond to the request causes failure to exit L1 PM Substate (Jianjun Wang) MediaTek MT7621 PCIe controller driver: - Declare mt7621_pci_ops static (Sergio Paracuellos) - Give pcibios_root_bridge_prepare() access to host bridge windows (Sergio Paracuellos) - Move MIPS I/O coherency unit setup from driver to pcibios_root_bridge_prepare() (Sergio Paracuellos) - Add missing MODULE_LICENSE() (Sergio Paracuellos) - Allow COMPILE_TEST for all arches (Sergio Paracuellos) Microsoft Hyper-V host bridge driver: - Add hv-internal interfaces to encapsulate arch IRQ dependencies (Sunil Muthuswamy) - Add arm64 Hyper-V vPCI support (Sunil Muthuswamy) Qualcomm PCIe controller driver: - Undo PM setup in qcom_pcie_probe() error handling path (Christophe JAILLET) - Use __be16 type to store return value from cpu_to_be16() (Manivannan Sadhasivam) - Constify static dw_pcie_ep_ops (Rikard Falkeborn) Renesas R-Car PCIe controller driver: - Fix aarch32 abort handler so it doesn't check the wrong bus clock before accessing the host controller (Marek Vasut) TI Keystone PCIe controller driver: - Add register offset for ti,syscon-pcie-id and ti,syscon-pcie-mode DT properties (Kishon Vijay Abraham I) MicroSemi Switchtec management driver: - Add Gen4 automotive device IDs (Kelvin Cao) - Declare state_names[] as static so it's not allocated and initialized for every call (Kelvin Cao) Host controller driver cleanups: - Use of_device_get_match_data(), not of_match_device(), when we only need the device data in altera, artpec6, cadence, designware-plat, dra7xx, keystone, kirin (Fan Fei) - Drop pointless of_device_get_match_data() cast in j721e (Bjorn Helgaas) - Drop redundant struct device * from j721e since struct cdns_pcie already has one (Bjorn Helgaas) - Rename driver structs to *_pcie in intel-gw, iproc, ls-gen4, mediatek-gen3, microchip, mt7621, rcar-gen2, tegra194, uniphier, xgene, xilinx, xilinx-cpm for consistency across drivers (Fan Fei) - Fix invalid address space conversions in hisi, spear13xx (Bjorn Helgaas) Miscellaneous: - Sort Intel Device IDs by value (Andy Shevchenko) - Change Capability offsets to hex to match spec (Baruch Siach) - Correct misspellings (Krzysztof Wilczyński) - Terminate statement with semicolon in pci_endpoint_test.c (Ming Wang)" * tag 'pci-v5.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (151 commits) PCI: mt7621: Allow COMPILE_TEST for all arches PCI: mt7621: Add missing MODULE_LICENSE() PCI: mt7621: Move MIPS setup to pcibios_root_bridge_prepare() PCI: Let pcibios_root_bridge_prepare() access bridge->windows PCI: mt7621: Declare mt7621_pci_ops static PCI: brcmstb: Do not turn off WOL regulators on suspend PCI: brcmstb: Add control of subdevice voltage regulators PCI: brcmstb: Add mechanism to turn on subdev regulators PCI: brcmstb: Split brcm_pcie_setup() into two funcs dt-bindings: PCI: Add bindings for Brcmstb EP voltage regulators dt-bindings: PCI: Correct brcmstb interrupts, interrupt-map. PCI: brcmstb: Fix function return value handling PCI: brcmstb: Do not use __GENMASK PCI: brcmstb: Declare 'used' as bitmap, not unsigned long PCI: hv: Add arm64 Hyper-V vPCI support PCI: hv: Make the code arch neutral by adding arch specific interfaces PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors x86/PCI: Remove initialization of static variables to false PCI: Use DWORD accesses for LTR, L1 SS to avoid erratum misc: pci_endpoint_test: Terminate statement with semicolon ...
2022-01-16Merge tag 'exfat-for-5.17-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Fix ->i_blocks truncation issue that still exists elsewhere. - Four cleanups & typos fixes. - Move super block magic number to magic.h - Fix missing REQ_SYNC in exfat_update_bhs(). * tag 'exfat-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix missing REQ_SYNC in exfat_update_bhs() exfat: remove argument 'sector' from exfat_get_dentry() exfat: move super block magic number to magic.h exfat: fix i_blocks for files truncated over 4 GiB exfat: reuse exfat_inode_info variable instead of calling EXFAT_I() exfat: make exfat_find_location() static exfat: fix typos in comments exfat: simplify is_valid_cluster()
2022-01-16Merge tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds5-75/+56
Pull nfsd updates from Chuck Lever: "Bruce has announced he is leaving Red Hat at the end of the month and is stepping back from his role as NFSD co-maintainer. As a result, this includes a patch removing him from the MAINTAINERS file. There is one patch in here that Jeff Layton was carrying in the locks tree. Since he had only one for this cycle, he asked us to send it to you via the nfsd tree. There continues to be 0-day reports from Robert Morris @MIT. This time we include a fix for a crash in the COPY_NOTIFY operation. Highlights: - Bruce steps down as NFSD maintainer - Prepare for dynamic nfsd thread management - More work on supporting re-exporting NFS mounts - One fs/locks patch on behalf of Jeff Layton Notable bug fixes: - Fix zero-length NFSv3 WRITEs - Fix directory cinfo on FS's that do not support iversion - Fix WRITE verifiers for stable writes - Fix crash on COPY_NOTIFY with a special state ID" * tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (51 commits) SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point fs/locks: fix fcntl_getlk64/fcntl_setlk64 stub prototypes nfsd: fix crash on COPY_NOTIFY with special stateid MAINTAINERS: remove bfields NFSD: Move fill_pre_wcc() and fill_post_wcc() Revert "nfsd: skip some unnecessary stats in the v4 case" NFSD: Trace boot verifier resets NFSD: Rename boot verifier functions NFSD: Clean up the nfsd_net::nfssvc_boot field NFSD: Write verifier might go backwards nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id) NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id) NFSD: Clean up nfsd_vfs_write() nfsd: Replace use of rwsem with errseq_t NFSD: Fix verifier returned in stable WRITEs nfsd: Retry once in nfsd_open on an -EOPENSTALE return nfsd: Add errno mapping for EREMOTEIO nfsd: map EBADF ...
2022-01-16Merge tag '9p-for-5.17-rc1' of git://github.com/martinetd/linuxLinus Torvalds2-3/+1
Pull 9p updates from Dominique Martinet: "Fixes, split 9p_net_fd, and new reviewer: - fix possible uninitialized memory usage for setattr - fix fscache reading hole in a file just after it's been grown - split net/9p/trans_fd.c in its own module like other transports. The new transport module defaults to 9P_NET and is autoloaded if required so users should not be impacted - add Christian Schoenebeck to 9p reviewers - some more trivial cleanup" * tag '9p-for-5.17-rc1' of git://github.com/martinetd/linux: 9p: fix enodata when reading growing file net/9p: show error message if user 'msize' cannot be satisfied MAINTAINERS: 9p: add Christian Schoenebeck as reviewer 9p: only copy valid iattrs in 9P2000.L setattr implementation 9p: Use BUG_ON instead of if condition followed by BUG. net/p9: load default transports 9p/xen: autoload when xenbus service is available 9p/trans_fd: split into dedicated module fs: 9p: remove unneeded variable 9p/trans_virtio: Fix typo in the comment for p9_virtio_create()
2022-01-16Merge tag 'drm-next-2022-01-14' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-1/+1
Pull drm fixes from Daniel Vetter: "drivers fixes: - i915 fixes for ttm backend + one pm wakelock fix - amdgpu fixes, fairly big pile of small things all over. Note this doesn't yet containe the fixed version of the otg sync patch that blew up - small driver fixes: meson, sun4i, vga16fb probe fix drm core fixes: - cma-buf heap locking - ttm compilation - self refresh helper state check - wrong error message in atomic helpers - mipi-dbi buffer mapping" * tag 'drm-next-2022-01-14' of git://anongit.freedesktop.org/drm/drm: (49 commits) drm/mipi-dbi: Fix source-buffer address in mipi_dbi_buf_copy drm: fix error found in some cases after the patch d1af5cd86997 drm/ttm: fix compilation on ARCH=um dma-buf: cma_heap: Fix mutex locking section video: vga16fb: Only probe for EGA and VGA 16 color graphic cards drm/amdkfd: Fix ASIC name typos drm/amdkfd: Fix DQM asserts on Hawaii drm/amdgpu: Use correct VIEWPORT_DIMENSION for DCN2 drm/amd/pm: only send GmiPwrDnControl msg on master die (v3) drm/amdgpu: use spin_lock_irqsave to avoid deadlock by local interrupt drm/amdgpu: not return error on the init_apu_flags drm/amdkfd: Use prange->update_list head for remove_list drm/amdkfd: Use prange->list head for insert_list drm/amdkfd: make SPDX License expression more sound drm/amdkfd: Check for null pointer after calling kmemdup drm/amd/display: invalid parameter check in dmub_hpd_callback Revert "drm/amdgpu: Don't inherit GEM object VMAs in child process" drm/amd/display: reset dcn31 SMU mailbox on failures drm/amdkfd: use default_groups in kobj_type drm/amdgpu: use default_groups in kobj_type ...
2022-01-16Merge tag 'memblock-v5.17-rc1' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock cleanup from Mike Rapoport: "Remove #ifdef __KERNEL__ from memblock.h memblock.h is not a uAPI header, so __KERNEL__ guard can be deleted" * tag 'memblock-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock: Remove #ifdef __KERNEL__ from memblock.h
2022-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds30-260/+583
Merge misc updates from Andrew Morton: "146 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak, dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap, memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb, userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp, ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and damon)" * emailed patches from Andrew Morton <[email protected]>: (146 commits) mm/damon: hide kernel pointer from tracepoint event mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging mm/damon/dbgfs: remove an unnecessary variable mm/damon: move the implementation of damon_insert_region to damon.h mm/damon: add access checking for hugetlb pages Docs/admin-guide/mm/damon/usage: update for schemes statistics mm/damon/dbgfs: support all DAMOS stats Docs/admin-guide/mm/damon/reclaim: document statistics parameters mm/damon/reclaim: provide reclamation statistics mm/damon/schemes: account how many times quota limit has exceeded mm/damon/schemes: account scheme actions that successfully applied mm/damon: remove a mistakenly added comment for a future feature Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning Docs/admin-guide/mm/damon/usage: remove redundant information Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks mm/damon: convert macro functions to static inline functions mm/damon: modify damon_rand() macro to static inline function mm/damon: move damon_rand() definition into damon.h ...
2022-01-15cifs: move superblock magic defitions to magic.hJeff Layton1-0/+4
Help userland apps to identify cifs and smb2 mounts. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
2022-01-15mm/damon: hide kernel pointer from tracepoint eventSeongJae Park1-4/+4
DAMON's virtual address spaces monitoring primitive uses 'struct pid *' of the target process as its monitoring target id. The kernel address is exposed as-is to the user space via the DAMON tracepoint, 'damon_aggregated'. Though primarily only privileged users are allowed to access that, it would be better to avoid unnecessarily exposing kernel pointers so. Because the trace result is only required to be able to distinguish each target, we aren't need to use the pointer as-is. This makes the tracepoint to use the index of the target in the context's targets list as its id in the tracepoint, to hide the kernel space address. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/damon: move the implementation of damon_insert_region to damon.hGuoqing Jiang1-2/+11
Usually, inline function is declared static since it should sit between storage and type. And implement it in a header file if used by multiple files. And this change also fixes compile issue when backport damon to 5.10. mm/damon/vaddr.c: In function `damon_va_evenly_split_region': ./include/linux/damon.h:425:13: error: inlining failed in call to `always_inline' `damon_insert_region': function body not available 425 | inline void damon_insert_region(struct damon_region *r, | ^~~~~~~~~~~~~~~~~~~ mm/damon/vaddr.c:86:3: note: called from here 86 | damon_insert_region(n, r, next, t); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Guoqing Jiang <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/damon/schemes: account how many times quota limit has exceededSeongJae Park1-0/+2
If the time/space quotas of a given DAMON-based operation scheme is too small, the scheme could show unexpectedly slow progress. However, there is no good way to notice the case in runtime. This commit extends the DAMOS stat to provide how many times the quota limits exceeded so that the users can easily notice the case and tune the scheme. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/damon/schemes: account scheme actions that successfully appliedSeongJae Park1-7/+21
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning". To help online access pattern analysis and tuning of DAMON-based Operation Schemes (DAMOS), DAMOS provides simple statistics for each scheme. Introduction of DAMOS time/space quota further made the tuning easier by making the risk management easier. However, that also made understanding of the working schemes a little bit more difficult. For an example, progress of a given scheme can now be throttled by not only the aggressiveness of the target access pattern, but also the time/space quotas. So, when a scheme is showing unexpectedly slow progress, it's difficult to know by what the progress of the scheme is throttled, with currently provided statistics. This patchset extends the statistics to contain some metrics that can be helpful for such online schemes analysis and tuning (patches 1-2), exports those to users (patches 3 and 5), and add documents (patches 4 and 6). This patch (of 6): DAMON-based operation schemes (DAMOS) stats provide only the number and the amount of regions that the action of the scheme has tried to be applied. Because the action could be failed for some reasons, the currently provided information is sometimes not useful or convenient enough for schemes profiling and tuning. To improve this situation, this commit extends the DAMOS stats to provide the number and the amount of regions that the action has successfully applied. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/damon: remove a mistakenly added comment for a future featureSeongJae Park1-1/+1
Due to a mistake in patches reordering, a comment for a future feature called 'arbitrary monitoring target support'[1], which is still under development, has added. Because it only introduces confusion and we don't have a plan to post the patches soon, this commit removes the mistakenly added part. [1] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 1f366e421c8f ("mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)") Signed-off-by: SeongJae Park <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/damon: convert macro functions to static inline functionsSeongJae Park1-6/+12
Patch series "mm/damon: Misc cleanups". This patchset contains miscellaneous cleanups for DAMON's macro functions and documentation. This patch (of 6): This commit converts macro functions in DAMON to static inline functions, for better type checking, code documentation, etc[1]. [1] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/damon: modify damon_rand() macro to static inline functionXin Hao1-1/+4
damon_rand() cannot be implemented as a macro. Example: damon_rand(a++, b); The value of 'a' will be incremented twice, This is obviously unreasonable, So there fix it. Link: https://lkml.kernel.org/r/110ffcd4e420c86c42b41ce2bc9f0fe6a4f32cd3.1638795127.git.xhao@linux.alibaba.com Fixes: b9a6ac4e4ede ("mm/damon: adaptively adjust regions") Signed-off-by: Xin Hao <[email protected]> Reported-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/damon: move damon_rand() definition into damon.hXin Hao1-0/+4
damon_rand() is called in three files:damon/core.c, damon/ paddr.c, damon/vaddr.c, i think there is no need to redefine this twice, So move it to damon.h will be a good choice. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Xin Hao <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/damon: remove some unneeded function definitions in damon.hXin Hao1-21/+0
In damon.h some func definitions about VA & PA can only be used in its own file, so there no need to define in the header file, and the header file will look cleaner. If other files later need these functions, the prototypes can be added to damon.h at that time. [[email protected]: remove unnecessary function prototype position changes] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/45fd5b3ef6cce8e28dbc1c92f9dc845ccfc949d7.1636989871.git.xhao@linux.alibaba.com Signed-off-by: Xin Hao <[email protected]> Signed-off-by: SeongJae Park <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/damon: add 'age' of region tracepoint supportXin Hao1-2/+5
In Damon, we can get age information by analyzing the nr_access change, But short time sampling is not effective, we have to obtain enough data for analysis through long time trace, this also means that we need to consume more cpu resources and storage space. Now the region add a new 'age' variable, we only need to get the change of age value through a little time trace, for example, age has been increasing to 141, but nr_access shows a value of 0 at the same time, Through this,we can conclude that the region has a very low nr_access value for a long time. Link: https://lkml.kernel.org/r/b9def1262af95e0dc1d0caea447886434db01161.1636989871.git.xhao@linux.alibaba.com Signed-off-by: Xin Hao <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: make some vars and functions static or __initTing Liu1-1/+0
"page_idle_ops" as a global var, but its scope of use within this document. So it should be static. "page_ext_ops" is a var used in the kernel initial phase. And other functions are aslo used in the kernel initial phase. So they should be __init or __initdata to reclaim memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ting Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/rmap: fix potential batched TLB flush raceHuang Ying1-1/+1
In theory, the following race is possible for batched TLB flushing. CPU0 CPU1 ---- ---- shrink_page_list() unmap zap_pte_range() flush_tlb_batched_pending() flush_tlb_mm() try_to_unmap() set_tlb_ubc_flush_pending() mm->tlb_flush_batched = true mm->tlb_flush_batched = false After the TLB is flushed on CPU1 via flush_tlb_mm() and before mm->tlb_flush_batched is set to false, some PTE is unmapped on CPU0 and the TLB flushing is pended. Then the pended TLB flushing will be lost. Although both set_tlb_ubc_flush_pending() and flush_tlb_batched_pending() are called with PTL locked, different PTL instances may be used. Because the race window is really small, and the lost TLB flushing will cause problem only if a TLB entry is inserted before the unmapping in the race window, the race is only theoretical. But the fix is simple and cheap too. Syzbot has reported this too as follows: ================================================================== BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one write to 0xffff8881072cfbbc of 1 bytes by task 17406 on cpu 1: flush_tlb_batched_pending+0x5f/0x80 mm/rmap.c:691 madvise_free_pte_range+0xee/0x7d0 mm/madvise.c:594 walk_pmd_range mm/pagewalk.c:128 [inline] walk_pud_range mm/pagewalk.c:205 [inline] walk_p4d_range mm/pagewalk.c:240 [inline] walk_pgd_range mm/pagewalk.c:277 [inline] __walk_page_range+0x981/0x1160 mm/pagewalk.c:379 walk_page_range+0x131/0x300 mm/pagewalk.c:475 madvise_free_single_vma mm/madvise.c:734 [inline] madvise_dontneed_free mm/madvise.c:822 [inline] madvise_vma mm/madvise.c:996 [inline] do_madvise+0xe4a/0x1140 mm/madvise.c:1202 __do_sys_madvise mm/madvise.c:1228 [inline] __se_sys_madvise mm/madvise.c:1226 [inline] __x64_sys_madvise+0x5d/0x70 mm/madvise.c:1226 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae write to 0xffff8881072cfbbc of 1 bytes by task 71 on cpu 0: set_tlb_ubc_flush_pending mm/rmap.c:636 [inline] try_to_unmap_one+0x60e/0x1220 mm/rmap.c:1515 rmap_walk_anon+0x2fb/0x470 mm/rmap.c:2301 try_to_unmap+0xec/0x110 shrink_page_list+0xe91/0x2620 mm/vmscan.c:1719 shrink_inactive_list+0x3fb/0x730 mm/vmscan.c:2394 shrink_list mm/vmscan.c:2621 [inline] shrink_lruvec+0x3c9/0x710 mm/vmscan.c:2940 shrink_node_memcgs+0x23e/0x410 mm/vmscan.c:3129 shrink_node+0x8f6/0x1190 mm/vmscan.c:3252 kswapd_shrink_node mm/vmscan.c:4022 [inline] balance_pgdat+0x702/0xd30 mm/vmscan.c:4213 kswapd+0x200/0x340 mm/vmscan.c:4473 kthread+0x2c7/0x2e0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 value changed: 0x01 -> 0x00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 71 Comm: kswapd0 Not tainted 5.16.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ================================================================== [[email protected]: tweak comments] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reported-by: [email protected] Cc: Nadav Amit <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/hwpoison: fix unpoison_memory()Naoya Horiguchi2-0/+5
After recent soft-offline rework, error pages can be taken off from buddy allocator, but the existing unpoison_memory() does not properly undo the operation. Moreover, due to the recent change on __get_hwpoison_page(), get_page_unless_zero() is hardly called for hwpoisoned pages. So __get_hwpoison_page() highly likely returns -EBUSY (meaning to fail to grab page refcount) and unpoison just clears PG_hwpoison without releasing a refcount. That does not lead to a critical issue like kernel panic, but unpoisoned pages never get back to buddy (leaked permanently), which is not good. To (partially) fix this, we need to identify "taken off" pages from other types of hwpoisoned pages. We can't use refcount or page flags for this purpose, so a pseudo flag is defined by hacking ->private field. Someone might think that put_page() is enough to cancel taken-off pages, but the normal free path contains some operations not suitable for the current purpose, and can fire VM_BUG_ON(). Note that unpoison_memory() is now supposed to be cancel hwpoison events injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page, not by MCE injection, so please don't try to use unpoison when testing with MCE injection. [[email protected]: report build failure for ARCH=i386] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Ding Hui <[email protected]> Cc: Tony Luck <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGENaoya Horiguchi2-4/+0
These action_page_types are no longer used, so remove them. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Yang Shi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Ding Hui <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Xu <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/thp: drop unused trace events hugepage_[invalidate|splitting]Anshuman Khandual1-35/+0
The trace events hugepage_[invalidate|splitting], were added via the commit 9e813308a5c1 ("powerpc/thp: Add tracepoints to track hugepage invalidate"). Afterwards their call sites i.e trace_hugepage_[invalidate|splitting] were just dropped off, leaving these trace points unused. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: compaction: fix the migration stats in trace_mm_compaction_migratepages()Baolin Wang1-20/+4
Now the migrate_pages() has changed to return the number of {normal page, THP, hugetlb} instead, thus we should not use the return value to calculate the number of pages migrated successfully. Instead we can just use the 'nr_succeeded' which indicates the number of normal pages migrated successfully to calculate the non-migrated pages in trace_mm_compaction_migratepages(). Link: https://lkml.kernel.org/r/b4225251c4bec068dcd90d275ab7de88a39e2bd7.1636275127.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/mempolicy: wire up syscall set_mempolicy_home_nodeAneesh Kumar K.V2-1/+7
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Feng Tang <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Dan Williams <[email protected]> Cc: Huang Ying <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/mempolicy: add set_mempolicy_home_node syscallAneesh Kumar K.V1-0/+1
This syscall can be used to set a home node for the MPOL_BIND and MPOL_PREFERRED_MANY memory policy. Users should use this syscall after setting up a memory policy for the specified range as shown below. mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp, new_nodes->size + 1, 0); sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size, home_node, 0); The syscall allows specifying a home node/preferred node from which kernel will fulfill memory allocation requests first. For address range with MPOL_BIND memory policy, if nodemask specifies more than one node, page allocations will come from the node in the nodemask with sufficient free memory that is closest to the home node/preferred node. For MPOL_PREFERRED_MANY if the nodemask specifies more than one node, page allocation will come from the node in the nodemask with sufficient free memory that is closest to the home node/preferred node. If there is not enough memory in all the nodes specified in the nodemask, the allocation will be attempted from the closest numa node to the home node in the system. This helps applications to hint at a memory allocation preference node and fallback to _only_ a set of nodes if the memory is not available on the preferred node. Fallback allocation is attempted from the node which is nearest to the preferred node. This helps applications to have control on memory allocation numa nodes and avoids default fallback to slow memory NUMA nodes. For example a system with NUMA nodes 1,2 and 3 with DRAM memory and 10, 11 and 12 of slow memory new_nodes = numa_bitmask_alloc(nr_nodes); numa_bitmask_setbit(new_nodes, 1); numa_bitmask_setbit(new_nodes, 2); numa_bitmask_setbit(new_nodes, 3); p = mmap(NULL, nr_pages * page_size, protflag, mapflag, -1, 0); mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp, new_nodes->size + 1, 0); sys_set_mempolicy_home_node(p, nr_pages * page_size, 2, 0); This will allocate from nodes closer to node 2 and will make sure the kernel will only allocate from nodes 1, 2, and 3. Memory will not be allocated from slow memory nodes 10, 11, and 12. This differs from default MPOL_BIND behavior in that with default MPOL_BIND the allocation will be attempted from node closer to the local node. One of the reasons to specify a home node is to allow allocations from cpu less NUMA node and its nearby NUMA nodes. With MPOL_PREFERRED_MANY on the other hand will first try to allocate from the closest node to node 2 from the node list 1, 2 and 3. If those nodes don't have enough memory, kernel will allocate from slow memory node 10, 11 and 12 which ever is closer to node 2. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Feng Tang <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Dan Williams <[email protected]> Cc: Huang Ying <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15vmscan: make drop_slab_node staticGang Li1-1/+0
drop_slab_node is only used in drop_slab. So remove it's declaration from header file and add keyword static for it's definition. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gang Li <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/vmstat: add events for THP max_ptes_* exceedsYang Yang1-0/+3
There are interfaces to adjust max_ptes_none, max_ptes_swap, max_ptes_shared values, see /sys/kernel/mm/transparent_hugepage/khugepaged/. But system administrator may not know which value is the best. So Add those events to support adjusting max_ptes_* to suitable values. For example, if default max_ptes_swap value causes too much failures, and system uses zram whose IO is fast, administrator could increase max_ptes_swap until THP_SCAN_EXCEED_SWAP_PTE not increase anymore. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Yang <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Saravanan D <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>