blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-04-20	net/mlx5: Add vnic devlink health reporter to PFs/VFs	Maher Sanalla	1	-0/+1
	Create a vnic devlink health reporter for PFs/VFs interfaces. The reporter's diagnose callback displays the values of vNIC/vport transport debug counters of PFs/VFs, as follows: $ devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 Moreover, add documentation on the reporter functionality and the counters description. While at it, expose the vNIC counters diagnose function to be used by the downstream patch, which will reveal the counters for representor interfaces. Signed-off-by: Maher Sanalla <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	3	-25/+24
	Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20	Merge tag 'net-6.3-rc8' of ↵	Linus Torvalds	2	-7/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. There are a few fixes for new code bugs, including the Mellanox one noted in the last networking pull. No known regressions outstanding. Current release - regressions: - sched: clear actions pointer in miss cookie init fail - mptcp: fix accept vs worker race - bpf: fix bpf_arch_text_poke() with new_addr == NULL on s390 - eth: bnxt_en: fix a possible NULL pointer dereference in unload path - eth: veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag Current release - new code bugs: - eth: revert "net/mlx5: Enable management PF initialization" Previous releases - regressions: - netfilter: fix recent physdev match breakage - bpf: fix incorrect verifier pruning due to missing register precision taints - eth: virtio_net: fix overflow inside xdp_linearize_page() - eth: cxgb4: fix use after free bugs caused by circular dependency problem - eth: mlxsw: pci: fix possible crash during initialization Previous releases - always broken: - sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg - netfilter: validate catch-all set elements - bridge: don't notify FDB entries with "master dynamic" - eth: bonding: fix memory leak when changing bond type to ethernet - eth: i40e: fix accessing vsi->active_filters without holding lock Misc: - Mat is back as MPTCP co-maintainer" * tag 'net-6.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits) net: bridge: switchdev: don't notify FDB entries with "master dynamic" Revert "net/mlx5: Enable management PF initialization" MAINTAINERS: Resume MPTCP co-maintainer role mailmap: add entries for Mat Martineau e1000e: Disable TSO on i219-LM card to increase speed bnxt_en: fix free-runnig PHC mode net: dsa: microchip: ksz8795: Correctly handle huge frame configuration bpf: Fix incorrect verifier pruning due to missing register precision taints hamradio: drop ISA_DMA_API dependency mlxsw: pci: Fix possible crash during initialization mptcp: fix accept vs worker race mptcp: stops worker on unaccepted sockets at listener close net: rpl: fix rpl header size calculation net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete() bonding: Fix memory leak when changing bond type to Ethernet veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next() bnxt_en: Fix a possible NULL pointer dereference in unload path bnxt_en: Do not initialize PTP on older P3/P4 chips netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements ...
2023-04-20	PM: Add sysfs files to represent time spent in hardware sleep state	Mario Limonciello	1	-0/+8
	Userspace can't easily discover how much of a sleep cycle was spent in a hardware sleep state without using kernel tracing and vendor specific sysfs or debugfs files. To make this information more discoverable, introduce 3 new sysfs files: 1) The time spent in a hw sleep state for last cycle. 2) The time spent in a hw sleep state since the kernel booted 3) The maximum time that the hardware can report for a sleep cycle. All of these files will be present only if the system supports s2idle. Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-04-20	Merge branch 'for-next/perf' into for-next/core	Will Deacon	1	-0/+303
	* for-next/perf: (24 commits) KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege drivers/perf: hisi: add NULL check for name drivers/perf: hisi: Remove redundant initialized of pmu->name perf/arm-cmn: Fix port detection for CMN-700 arm64: pmuv3: dynamically map PERF_COUNT_HW_BRANCH_INSTRUCTIONS perf/arm-cmn: Validate cycles events fully Revert "ARM: mach-virt: Select PMUv3 driver by default" drivers/perf: apple_m1: Add Apple M2 support dt-bindings: arm-pmu: Add PMU compatible strings for Apple M2 cores perf: arm_cspmu: Fix variable dereference warning perf/amlogic: Fix config1/config2 parsing issue drivers/perf: Use devm_platform_get_and_ioremap_resource() kbuild, drivers/perf: remove MODULE_LICENSE in non-modules perf: qcom: Use devm_platform_get_and_ioremap_resource() perf: arm: Use devm_platform_get_and_ioremap_resource() perf/arm-cmn: Move overlapping wp_combine field ARM: mach-virt: Select PMUv3 driver by default ARM: perf: Allow the use of the PMUv3 driver on 32bit ARM ARM: Make CONFIG_CPU_V7 valid for 32bit ARMv8 implementations perf: pmuv3: Change GENMASK to GENMASK_ULL ...
2023-04-20	efi/pe: Import new BTI/IBT header flags from the spec	Ard Biesheuvel	1	-0/+4
	The latest version of your favorite fork of the PE/COFF spec includes a new type of header flag that is intended to be used in the context of EFI firmware to indicate to the image loader that the executable regions of an image can be mapped with BTI/IBT enforcement enabled. So let's import these definitions so we can use them in subsequent patches. Signed-off-by: Ard Biesheuvel <[email protected]>
2023-04-20	swiotlb: Omit total_used and used_hiwater if !CONFIG_DEBUG_FS	Petr Tesarik	1	-0/+2
	The tracking of used_hiwater adds an atomic operation to the hot path. This is acceptable only when debugging the kernel. To make sure that the fields can never be used by mistake, do not even include them in struct io_tlb_mem if CONFIG_DEBUG_FS is not set. The build fails after doing that. To fix it, it is necessary to remove all code specific to debugfs and instead provide a stub implementation of swiotlb_create_debugfs_files(). As a bonus, this change allows to remove one __maybe_unused attribute. Signed-off-by: Petr Tesarik <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-20	device property: make device_property functions take const device *	Guenter Roeck	1	-18/+18
	device_property functions do not modify the device pointer passed to them. The underlying of_device and fwnode_ functions actually already take const * arguments. Mark the parameter constant to simplify conversion from of_property to device_property functions, and to let the calling code use const device pointers where possible. Cc: Chris Packham <[email protected]> Reviewed-by: Chris Packham <[email protected]> Reviewed-by: Sakari Ailus <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-20	Merge branch 'for-next/misc' into for-next/core	Will Deacon	1	-1/+0
	* for-next/misc: arm64: kexec: include reboot.h arm64: delete dead code in this_cpu_set_vectors() arm64: kernel: Fix kernel warning when nokaslr is passed to commandline arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step arm64/sme: Fix some comments of ARM SME arm64/signal: Alloc tpidr2 sigframe after checking system_supports_tpidr2() arm64/signal: Use system_supports_tpidr2() to check TPIDR2 arm64: compat: Remove defines now in asm-generic arm64: kexec: remove unnecessary (void) conversions arm64: armv8_deprecated: remove unnecessary (void) conversions firmware: arm_sdei: Fix sleep from invalid context BUG
2023-04-20	i2c: designware: Use PCI PSP driver for communication	Mario Limonciello	1	-0/+1
	Currently the PSP semaphore communication base address is discovered by using an MSR that is not architecturally guaranteed for future platforms. Also the mailbox that is utilized for communication with the PSP may have other consumers in the kernel, so it's better to make all communication go through a single driver. Signed-off-by: Mario Limonciello <[email protected]> Reviewed-by: Mark Hasemeyer <[email protected]> Acked-by: Jarkko Nikula <[email protected]> Tested-by: Mark Hasemeyer <[email protected]> Acked-by: Wolfram Sang <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-04-20	crypto: api - Add crypto_tfm_get	Herbert Xu	1	-0/+1
	Add a crypto_tfm_get interface to allow tfm objects to be shared. They can still be freed in the usual way. This should only be done with tfm objects with no keys. You must also not modify the tfm flags in any way once it becomes shared. Signed-off-by: Herbert Xu <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-04-20	USB: core: Add routines for endpoint checks in old drivers	Alan Stern	1	-0/+5
	Many of the older USB drivers in the Linux USB stack were written based simply on a vendor's device specification. They use the endpoint information in the spec and assume these endpoints will always be present, with the properties listed, in any device matching the given vendor and product IDs. While that may have been true back then, with spoofing and fuzzing it is not true any more. More and more we are finding that those old drivers need to perform at least a minimum of checking before they try to use any endpoint other than ep0. To make this checking as simple as possible, we now add a couple of utility routines to the USB core. usb_check_bulk_endpoints() and usb_check_int_endpoints() take an interface pointer together with a list of endpoint addresses (numbers and directions). They check that the interface's current alternate setting includes endpoints with those addresses and that each of these endpoints has the right type: bulk or interrupt, respectively. Although we already have usb_find_common_endpoints() and related routines meant for a similar purpose, they are not well suited for this kind of checking. Those routines find endpoints of various kinds, but only one (either the first or the last) of each kind, and they don't verify that the endpoints' addresses agree with what the caller expects. In theory the new routines could be more general: They could take a particular altsetting as their argument instead of always using the interface's current altsetting. In practice I think this won't matter too much; multiple altsettings tend to be used for transferring media (audio or visual) over isochronous endpoints, not bulk or interrupt. Drivers for such devices will generally require more sophisticated checking than these simplistic routines provide. Signed-off-by: Alan Stern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-19	Revert "net/mlx5: Enable management PF initialization"	Jakub Kicinski	1	-5/+0
	This reverts commit fe998a3c77b9f989a30a2a01fb00d3729a6d53a4. Paul reports that it causes a regression with IB on CX4 and FW 12.18.1000. In addition I think that the concept of "management PF" is not fully accepted and requires a discussion. Fixes: fe998a3c77b9 ("net/mlx5: Enable management PF initialization") Reported-by: Paul Moore <[email protected]> Link: https://lore.kernel.org/all/CAHC9VhQ7A4+msL38WpbOMYjAqLp0EtOjeLh4Dc6SQtD6OUvCQg@mail.gmail.com/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-19	Merge tag 'mm-hotfixes-stable-2023-04-19-16-36' of ↵	Linus Torvalds	1	-18/+21
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "22 hotfixes. 19 are cc:stable and the remainder address issues which were introduced during this merge cycle, or aren't considered suitable for -stable backporting. 19 are for MM and the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) nilfs2: initialize unused bytes in segment summary blocks mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages mm/mmap: regression fix for unmapped_area{_topdown} maple_tree: fix mas_empty_area() search maple_tree: make maple state reusable after mas_empty_area_rev() mm: kmsan: handle alloc failures in kmsan_ioremap_page_range() mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush() tools/Makefile: do missed s/vm/mm/ mm: fix memory leak on mm_init error handling mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() Revert "userfaultfd: don't fail on unrecognized features" writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used mailmap: update jtoppins' entry to reference correct email mm/mempolicy: fix use-after-free of VMA iterator mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO mm/mprotect: fix do_mprotect_pkey() return on error mm/khugepaged: check again on anon uffd-wp during isolation ...
2023-04-19	SUNRPC: remove the maximum number of retries in call_bind_status	Dai Ngo	1	-2/+1
	Currently call_bind_status places a hard limit of 3 to the number of retries on EACCES error. This limit was done to prevent NLM unlock requests from being hang forever when the server keeps returning garbage. However this change causes problem for cases when NLM service takes longer than 9 seconds to register with the port mapper after a restart. This patch removes this hard coded limit and let the RPC handles the retry based on the standard hard/soft task semantics. Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests") Reported-by: Helen Chao <[email protected]> Tested-by: Helen Chao <[email protected]> Signed-off-by: Dai Ngo <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2023-04-19	sed-opal: geometry feature reporting command	Ondrej Kozina	1	-0/+1
	Locking range start and locking range length attributes may be require to satisfy restrictions exposed by OPAL2 geometry feature reporting. Geometry reporting feature is described in TCG OPAL SSC, section 3.1.1.4 (ALIGN, LogicalBlockSize, AlignmentGranularity and LowestAlignedLBA). 4.3.5.2.1.1 RangeStart Behavior: [ StartAlignment = (RangeStart modulo AlignmentGranularity) - LowestAlignedLBA ] When processing a Set method or CreateRow method on the Locking table for a non-Global Range row, if: a) the AlignmentRequired (ALIGN above) column in the LockingInfo table is TRUE; b) RangeStart is non-zero; and c) StartAlignment is non-zero, then the method SHALL fail and return an error status code INVALID_PARAMETER. 4.3.5.2.1.2 RangeLength Behavior: If RangeStart is zero, then [ LengthAlignment = (RangeLength modulo AlignmentGranularity) - LowestAlignedLBA ] If RangeStart is non-zero, then [ LengthAlignment = (RangeLength modulo AlignmentGranularity) ] When processing a Set method or CreateRow method on the Locking table for a non-Global Range row, if: a) the AlignmentRequired (ALIGN above) column in the LockingInfo table is TRUE; b) RangeLength is non-zero; and c) LengthAlignment is non-zero, then the method SHALL fail and return an error status code INVALID_PARAMETER In userspace we stuck to logical block size reported by general block device (via sysfs or ioctl), but we can not read 'AlignmentGranularity' or 'LowestAlignedLBA' anywhere else and we need to get those values from sed-opal interface otherwise we will not be able to report or avoid locking range setup INVALID_PARAMETER errors above. Signed-off-by: Ondrej Kozina <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Tested-by: Milan Broz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-19	Merge tag 'nand/for-6.4' into mtd/next	Miquel Raynal	2	-1/+2
	Raw NAND core changes: * Convert to platform remove callback returning void * Fix spelling mistake waifunc() -> waitfunc() Raw NAND controller driver changes: * imx: Remove unused is_imx51_nfc and imx53_nfc functions * omap2: Drop obsolete dependency on COMPILE_TEST * orion: Use devm_platform_ioremap_resource() * qcom: - Use of_property_present() for testing DT property presence - Use devm_platform_get_and_ioremap_resource() * stm32_fmc2: Depends on ARCH_STM32 instead of MACH_STM32MP157 * tmio: Remove reference to config MTD_NAND_TMIO in the parsers Raw NAND manufacturer driver changes: * hynix: Fix up bit 0 of sdr_timing_mode SPI-NAND changes: * Add support for ESMT F50x1G41LB Signed-off-by: Miquel Raynal <[email protected]>
2023-04-19	Merge tag 'cacheinfo-updates-6.4' of ↵	Greg Kroah-Hartman	1	-0/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into driver-core-next Sudeep writes: cacheinfo and arch_topology updates for v6.4 The cache information can be extracted from either a Device Tree(DT), the PPTT ACPI table, or arch registers (clidr_el1 for arm64). When the DT is used but no cache properties are advertised, the current code doesn't correctly fallback to using arch information. The changes fixes the same and also assuse the that L1 data/instruction caches are private and L2/higher caches are shared when the cache information is missing in DT/ACPI and is derived form clidr_el1/arch registers. Currently the cacheinfo is built from the primary CPU prior to secondary CPUs boot, if the DT/ACPI description contains cache information. However, if not present, it still reverts to the old behavior, which allocates the cacheinfo memory on each secondary CPUs which causes RT kernels to triggers a "BUG: sleeping function called from invalid context". The changes here attempts to enable automatic detection for RT kernels when no DT/ACPI cache information is available, by pre-allocating cacheinfo memory on the primary CPU. * tag 'cacheinfo-updates-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: cacheinfo: Add use_arch[\|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer
2023-04-19	Merge tag 'icc-6.4-rc1' of ↵	Greg Kroah-Hartman	1	-17/+0
	git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next Georgi writes: interconnect changes for 6.4 This pull request contains the interconnect changes for the 6.4-rc1 merge window, which this time are mostly cleanups. Core changes: interconnect: Skip call into provider if initial bw is zero interconnect: Use of_property_present() for testing DT property presence interconnect: drop racy registration API interconnect: drop unused icc_link_destroy() interface Driver changes: interconnect: qcom: Drop obsolete dependency on COMPILE_TEST interconnect: qcom: drop obsolete OSM_L3/EPSS defines interconnect: qcom: osm-l3: drop unuserd header inclusion interconnect: qcom: rpm: drop bogus pm domain attach interconnect: qcom: rpm: make QoS INVALID default interconnect: qcom: rpm: Add support for specifying channel num interconnect: qcom: Sort kerneldoc entries dt-bindings: interconnect: OSM L3: Add SM6375 CPUCP compatible dt-bindings: interconnect: qcom,msm8998-bwmon: Resolve MSM8998 support Signed-off-by: Georgi Djakov <[email protected]> * tag 'icc-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc: dt-bindings: interconnect: qcom,msm8998-bwmon: Resolve MSM8998 support dt-bindings: interconnect: OSM L3: Add SM6375 CPUCP compatible interconnect: qcom: Sort kerneldoc entries interconnect: qcom: rpm: Add support for specifying channel num interconnect: qcom: rpm: make QoS INVALID default interconnect: qcom: rpm: drop bogus pm domain attach interconnect: drop unused icc_link_destroy() interface interconnect: drop racy registration API interconnect: Use of_property_present() for testing DT property presence interconnect: qcom: osm-l3: drop unuserd header inclusion interconnect: qcom: drop obsolete OSM_L3/EPSS defines interconnect: Skip call into provider if initial bw is zero interconnect: qcom: Drop obsolete dependency on COMPILE_TEST
2023-04-19	Merge tag 'mhi-for-v6.4' of ↵	Greg Kroah-Hartman	1	-7/+0
	git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI Host ======== Core ---- - Removed the "mhi_poll()" API as there are no in-kernel users available at the moment. - Added range check for the CHDBOFF and ERDBOFF registers in case the device reports bad values. - Fixed the errno for the rest of the range checks to use -ERANGE. - Modified the event ring handlers to ring the doorbell only if there are any pending elements in the ring to process for the device. - Removed the check for EE (Execution Environment) while processing the SYS_ERR transition as it creates device recovery issues when SBL (Secondary Bootloader) crashes early. - Used mhi_tryset_pm_state() API to set the error state instead of open coding if the firmware loading fails. This avoids the race with other pm_state updates. pci_generic ----------- - Dropped the dedundant pci_{enable/disable}_pcie_error_reporting() calls from driver probe's error path as the PCI core itself takes care of that now. - Revered the commit 2d5253a096c6 ("bus: mhi: host: pci_generic: Add a secondary AT port to Telit FN990") as it turned out to be erroneous. This happened due to the patch adding secondary AT port for FN990 getting applied through NET and MHI trees and this caused two commits for the same functionality but one of them ended up wrong. - Added support for Foxconn T99W510 modem based on SDX24 chipset from Qualcomm. MHI Endpoint ============ - Demoted the channel not supported error log to debug as not all devices will support all channels defined in MHI spec and this may spam users. * tag 'mhi-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: host: Use mhi_tryset_pm_state() for setting fw error state bus: mhi: host: Remove duplicate ee check for syserr bus: mhi: host: Avoid ringing EV DB if there are no elements to process bus: mhi: pci_generic: Add Foxconn T99W510 bus: mhi: host: Use ERANGE for BHIOFF/BHIEOFF range check bus: mhi: host: Range check CHDBOFF and ERDBOFF bus: mhi: host: pci_generic: Revert "Add a secondary AT port to Telit FN990" bus: mhi: host: pci_generic: Drop redundant pci_enable_pcie_error_reporting() bus: mhi: ep: Demote unsupported channel error log to debug bus: mhi: host: Remove mhi_poll() API
2023-04-19	net: skbuff: hide nf_trace and ipvs_property	Jakub Kicinski	1	-0/+4
	Accesses to nf_trace and ipvs_property are already wrapped by ifdefs where necessary. Don't allocate the bits for those fields at all if possible. Acked-by: Florian Westphal <[email protected]> Acked-by: Simon Horman <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19	net: skbuff: push nf_trace down the bitfield	Jakub Kicinski	1	-3/+3
	nf_trace is a debug feature, AFAIU, and yet it sits oddly high in the sk_buff bitfield. Move it down, pushing up dst_pending_confirm and inner_protocol_type. Next change will make nf_trace optional (under Kconfig) and all optional fields should be placed after 2b fields to avoid 2b fields straddling bytes. dst_pending_confirm is L3, so it makes sense next to ignore_df. inner_protocol_type goes up just to keep the balance. Acked-by: Florian Westphal <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19	net: skbuff: move alloc_cpu into a potential hole	Jakub Kicinski	1	-1/+2
	alloc_cpu is currently between 4 byte fields, so it's almost guaranteed to create a 2B hole. It has a knock on effect of creating a 4B hole after @end (and @end and @tail being in different cachelines). None of this matters hugely, but for kernel configs which don't enable all the features there may well be a 2B hole after the bitfield. Move alloc_cpu there. Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19	net: skbuff: hide csum_not_inet when CONFIG_IP_SCTP not set	Jakub Kicinski	1	-0/+14
	SCTP is not universally deployed, allow hiding its bit from the skb. Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19	net: skbuff: hide wifi_acked when CONFIG_WIRELESS not set	Jakub Kicinski	1	-0/+11
	Datacenter kernel builds will very likely not include WIRELESS, so let them shave 2 bits off the skb by hiding the wifi fields. Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19	net: phy: phy_device: Call into the PHY driver to set LED blinking	Andrew Lunn	1	-0/+12
	Linux LEDs can be requested to perform hardware accelerated blinking. Pass this to the PHY driver, if it implements the op. Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19	net: phy: phy_device: Call into the PHY driver to set LED brightness	Andrew Lunn	1	-0/+13
	Linux LEDs can be software controlled via the brightness file in /sys. LED drivers need to implement a brightness_set function which the core will call. Implement an intermediary in phy_device, which will call into the phy driver if it implements the necessary function. Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19	net: phy: Add a binding for PHY LEDs	Andrew Lunn	1	-0/+16
	Define common binding parsing for all PHY drivers with LEDs using phylib. Parse the DT as part of the phy_probe and add LEDs to the linux LED class infrastructure. For the moment, provide a dummy brightness function, which will later be replaced with a call into the PHY driver. This allows testing since the LED core might otherwise reject an LED whose brightness cannot be set. Add a dependency on LED_CLASS. It either needs to be built in, or not enabled, since a modular build can result in linker errors. Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19	leds: Provide stubs for when CLASS_LED & NEW_LEDS are disabled	Andrew Lunn	1	-0/+18
	Provide stubs for devm_led_classdev_register_ext() and led_init_default_state_get() so that LED drivers embedded within other drivers such as PHYs and Ethernet switches still build when LEDS_CLASS or NEW_LEDS are disabled. This also helps with Kconfig dependencies, which are somewhat hairy for phylib and mdio and only get worse when adding a dependency on LED_CLASS. Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-18	bonding: add software tx timestamping support	Hangbin Liu	1	-0/+1
	Currently, bonding only obtain the timestamp (ts) information of the active slave, which is available only for modes 1, 5, and 6. For other modes, bonding only has software rx timestamping support. However, some users who use modes such as LACP also want tx timestamp support. To address this issue, let's check the ts information of each slave. If all slaves support tx timestamping, we can enable tx timestamping support for the bond. Add a note that the get_ts_info may be called with RCU, or rtnl or reference on the device in ethtool.h> Suggested-by: Miroslav Lichvar <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Jay Vosburgh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf	Jakub Kicinski	1	-2/+3
	Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Unbreak br_netfilter physdev match support, from Florian Westphal. 2) Use GFP_KERNEL_ACCOUNT for stateful/policy objects, from Chen Aotian. 3) Use IS_ENABLED() in nf_reset_trace(), from Florian Westphal. 4) Fix validation of catch-all set element. 5) Tighten requirements for catch-all set elements. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements netfilter: nf_tables: validate catch-all set elements netfilter: nf_tables: fix ifdef to also consider nf_tables=m netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNT netfilter: br_netfilter: fix recent physdev match breakage ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-18	mm: ksm: support hwpoison for ksm page	Longlong Xia	2	-0/+18
	hwpoison_user_mappings() is updated to support ksm pages, and add collect_procs_ksm() to collect processes when the error hit an ksm page. The difference from collect_procs_anon() is that it also needs to traverse the rmap-item list on the stable node of the ksm page. At the same time, add_to_kill_ksm() is added to handle ksm pages. And task_in_to_kill_list() is added to avoid duplicate addition of tsk to the to_kill list. This is because when scanning the list, if the pages that make up the ksm page all come from the same process, they may be added repeatedly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Longlong Xia <[email protected]> Tested-by: Naoya Horiguchi <[email protected]> Reviewed-by: Kefeng Wang <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Nanyong Sun <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	delayacct: track delays from IRQ/SOFTIRQ	Yang Yang	1	-0/+15
	Delay accounting does not track the delay of IRQ/SOFTIRQ. While IRQ/SOFTIRQ could have obvious impact on some workloads productivity, such as when workloads are running on system which is busy handling network IRQ/SOFTIRQ. Get the delay of IRQ/SOFTIRQ could help users to reduce such delay. Such as setting interrupt affinity or task affinity, using kernel thread for NAPI etc. This is inspired by "sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure"[1]. Also fix some code indent problems of older code. And update tools/accounting/getdelays.c: / # ./getdelays -p 156 -di print delayacct stats ON printing IO accounting PID 156 CPU count real total virtual total delay total delay average 15 15836008 16218149 275700790 18.380ms IO count delay total delay average 0 0 0.000ms SWAP count delay total delay average 0 0 0.000ms RECLAIM count delay total delay average 0 0 0.000ms THRASHING count delay total delay average 0 0 0.000ms COMPACT count delay total delay average 0 0 0.000ms WPCOPY count delay total delay average 36 7586118 0.211ms IRQ count delay total delay average 42 929161 0.022ms [1] commit 52b1364ba0b1("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Yang <[email protected]> Cc: Jiang Xuexin <[email protected]> Cc: wangyong <[email protected]> Cc: junhua huang <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	lib/rbtree: use '+' instead of '\|' for setting color.	Noah Goldstein	1	-2/+2
	This has a slight benefit for x86 and has no effect on other targets. The benefit to x86 is it change the codegen for setting a node to block from `mov %r0, %r1; or $RB_BLACK, %r1` to `lea RB_BLACK(%r0), %r1` which saves an instructions. In all other cases it just replace ALU with ALU (or -> and) which perform the same on all machines I am aware of. Total instructions in rbtree.o: Before - 802 After - 782 so it saves about 20 `mov` instructions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Noah Goldstein <[email protected]> Cc: Michel Lespinasse <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	memfd: pass argument of memfd_fcntl as int	Luca Vizzarro	1	-2/+2
	The interface for fcntl expects the argument passed for the command F_ADD_SEALS to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. This commit changes the signature of all the related and helper functions so that they treat the argument as int instead of long. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luca Vizzarro <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Kevin Brodsky <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Szabolcs Nagy <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: David Laight <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	mm: Multi-gen LRU: remove wait_event_killable()	Kalesh Singh	1	-6/+2
	Android 14 and later default to MGLRU [1] and field telemetry showed occasional long tail latency (>100ms) in the reclaim path. Tracing revealed priority inversion in the reclaim path. In try_to_inc_max_seq(), when high priority tasks were blocked on wait_event_killable(), the preemption of the low priority task to call wake_up_all() caused those high priority tasks to wait longer than necessary. In general, this problem is not different from others of its kind, e.g., one caused by mutex_lock(). However, it is specific to MGLRU because it introduced the new wait queue lruvec->mm_state.wait. The purpose of this new wait queue is to avoid the thundering herd problem. If many direct reclaimers rush into try_to_inc_max_seq(), only one can succeed, i.e., the one to wake up the rest, and the rest who failed might cause premature OOM kills if they do not wait. So far there is no evidence supporting this scenario, based on how often the wait has been hit. And this begs the question how useful the wait queue is in practice. Based on Minchan's recommendation, which is in line with his commit 6d4675e60135 ("mm: don't be stuck to rmap lock on reclaim path") and the rest of the MGLRU code which also uses trylock when possible, remove the wait queue. [1] https://android-review.googlesource.com/q/I7ed7fbfd6ef9ce10053347528125dd98c39e50bf Link: https://lkml.kernel.org/r/[email protected] Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Kalesh Singh <[email protected]> Suggested-by: Minchan Kim <[email protected]> Reported-by: Wei Wang <[email protected]> Acked-by: Yu Zhao <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Jan Alexander Steffens (heftig) <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	mm: vmscan: refactor updating current->reclaim_state	Yosry Ahmed	1	-1/+16
	During reclaim, we keep track of pages reclaimed from other means than LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab, which we stash a pointer to in current task_struct. However, we keep track of more than just reclaimed slab pages through this. We also use it for clean file pages dropped through pruned inodes, and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add a helper function that wraps updating it through current, so that future changes to this logic are contained within include/linux/swap.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: NeilBrown <[email protected]> Cc: Peter Xu <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	mm: kmsan: apply __must_check to non-void functions	Alexander Potapenko	1	-21/+22
	Non-void KMSAN hooks may return error codes that indicate that KMSAN failed to reflect the changed memory state in the metadata (e.g. it could not create the necessary memory mappings). In such cases the callers should handle the errors to prevent the tool from using the inconsistent metadata in the future. We mark non-void hooks with __must_check so that error handling is not skipped. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dipanjan Das <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	mm: hwpoison: support recovery from HugePage copy-on-write faults	Liu Shixin	1	-3/+3
	copy-on-write of hugetlb user pages with uncorrectable errors will result in a kernel crash. This is because the copy is performed in kernel mode and in general we can not handle accessing memory with such errors while in kernel mode. Commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults") introduced the routine copy_user_highpage_mc() to gracefully handle copying of user pages with uncorrectable errors. However, the separate hugetlb copy-on-write code paths were not modified as part of commit a873dfe1032a. Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage() so that they can also gracefully handle uncorrectable errors in user pages. This involves changing the hugetlb specific routine copy_user_large_folio() from type void to int so that it can return an error. Modify the hugetlb userfaultfd code in the same way so that it can return -EHWPOISON if it encounters an uncorrectable error. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Shixin <[email protected]> Acked-by: Mike Kravetz <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Muchun Song <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP	Aneesh Kumar K.V	1	-1/+1
	Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to indicate devdax and hugetlb vmemmap optimization support. Hence rename that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Joao Martins <[email protected]> Cc: Dan Williams <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Tarun Sahu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	mm/vmemmap/devdax: fix kernel crash when probing devdax devices	Aneesh Kumar K.V	1	-0/+16
	commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. I selected architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature. This patch fixes the below crash on ppc64. BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Link: https://lkml.kernel.org/r/[email protected] Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") Signed-off-by: Aneesh Kumar K.V <[email protected]> Reported-by: Tarun Sahu <[email protected]> Reviewed-by: Joao Martins <[email protected]> Cc: Muchun Song <[email protected]> Cc: Dan Williams <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	userfaultfd: convert mfill_atomic() to use a folio	ZhangPeng	1	-2/+2
	Convert mfill_atomic_pte_copy(), shmem_mfill_atomic_pte() and mfill_atomic_pte() to take in a folio pointer. Convert mfill_atomic() to use a folio. Convert page_kaddr to kaddr in mfill_atomic(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ZhangPeng <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	mm: convert copy_user_huge_page() to copy_user_large_folio()	ZhangPeng	1	-4/+3
	Replace copy_user_huge_page() with copy_user_large_folio(). copy_user_large_folio() does the same as copy_user_huge_page(), but takes in folios instead of pages. Remove pages_per_huge_page from copy_user_large_folio(), because we can get that from folio_nr_pages(dst). Convert copy_user_gigantic_page() to take in folios. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ZhangPeng <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	userfaultfd: convert mfill_atomic_hugetlb() to use a folio	ZhangPeng	1	-2/+2
	Convert hugetlb_mfill_atomic_pte() to take in a folio pointer instead of a page pointer. Convert mfill_atomic_hugetlb() to use a folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ZhangPeng <[email protected]> Reviewed-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	userfaultfd: convert copy_huge_page_from_user() to copy_folio_from_user()	ZhangPeng	1	-4/+3
	Replace copy_huge_page_from_user() with copy_folio_from_user(). copy_folio_from_user() does the same as copy_huge_page_from_user(), but takes in a folio instead of a page. Convert page_kaddr to kaddr in copy_folio_from_user() to do indenting cleanup. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ZhangPeng <[email protected]> Reviewed-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	smaps: fix defined but not used smaps_shmem_walk_ops	Steven Price	1	-0/+7
	When !CONFIG_SHMEM smaps_shmem_walk_ops is defined but not used, triggering a compiler warning. To avoid the warning remove the #ifdef around the usage. This has no effect because shmem_mapping() is a stub returning false when !CONFIG_SHMEM so the code will be compiled out, however we now need to also provide a stub for shmem_swap_usage(). Link: https://lkml.kernel.org/r/[email protected] Fixes: 7b86ac3371b7 ("pagewalk: separate function pointers from iterator data") Signed-off-by: Steven Price <[email protected]> Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Thomas Hellström <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	mm/hwpoison: introduce copy_mc_highpage	Jiaqi Yan	1	-13/+41
	Similar to how copy_mc_user_highpage is implemented for copy_user_highpage on #MC supported architecture, introduce the #MC handled version of copy_highpage. This helper has immediate usage when khugepaged wants to copy file-backed memory pages and tolerate #MC. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiaqi Yan <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: David Stevens <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Tong Tiangen <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	workingset: memcg: sleep when flushing stats in workingset_refault()	Yosry Ahmed	1	-2/+2
	In workingset_refault(), we call mem_cgroup_flush_stats_atomic_ratelimited() to read accurate stats within an RCU read section and with sleeping disallowed. Move the call above the RCU read section to make it non-atomic. Flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically where possible. Since workingset_refault() is the only caller of mem_cgroup_flush_stats_atomic_ratelimited(), just make it non-atomic, and rename it to mem_cgroup_flush_stats_ratelimited(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vasily Averin <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	memcg: sleep during flushing stats in safe contexts	Yosry Ahmed	1	-2/+7
	Currently, all contexts that flush memcg stats do so with sleeping not allowed. Some of these contexts are perfectly safe to sleep in, such as reading cgroup files from userspace or the background periodic flusher. Flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically where possible. Refactor the code to make mem_cgroup_flush_stats() non-atomic (aka sleepable), and provide a separate atomic version. The atomic version is used in reclaim, refault, writeback, and in mem_cgroup_usage(). All other code paths are left to use the non-atomic version. This includes callbacks for userspace reads and the periodic flusher. Since refault is the only caller of mem_cgroup_flush_stats_ratelimited(), change it to mem_cgroup_flush_stats_atomic_ratelimited(). Reclaim and refault code paths are modified to do non-atomic flushing in separate later patches -- so it will eventually be changed back to mem_cgroup_flush_stats_ratelimited(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vasily Averin <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	memcg: rename mem_cgroup_flush_stats_"delayed" to "ratelimited"	Yosry Ahmed	1	-2/+2
	mem_cgroup_flush_stats_delayed() suggests his is using a delayed_work, but this is actually sometimes flushing directly from the callsite. What it's doing is ratelimited calls. A better name would be mem_cgroup_flush_stats_ratelimited(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vasily Averin <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>