aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-12Merge tag 'lsm-pr-20240312' of ↵Linus Torvalds36-1141/+1126
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Promote IMA/EVM to a proper LSM This is the bulk of the diffstat, and the source of all the changes in the VFS code. Prior to the start of the LSM stacking work it was important that IMA/EVM were separate from the rest of the LSMs, complete with their own hooks, infrastructure, etc. as it was the only way to enable IMA/EVM at the same time as a LSM. However, now that the bulk of the LSM infrastructure supports multiple simultaneous LSMs, we can simplify things greatly by bringing IMA/EVM into the LSM infrastructure as proper LSMs. This is something I've wanted to see happen for quite some time and Roberto was kind enough to put in the work to make it happen. - Use the LSM hook default values to simplify the call_int_hook() macro Previously the call_int_hook() macro required callers to supply a default return value, despite a default value being specified when the LSM hook was defined. This simplifies the macro by using the defined default return value which makes life easier for callers and should also reduce the number of return value bugs in the future (we've had a few pop up recently, hence this work). - Use the KMEM_CACHE() macro instead of kmem_cache_create() The guidance appears to be to use the KMEM_CACHE() macro when possible and there is no reason why we can't use the macro, so let's use it. - Fix a number of comment typos in the LSM hook comment blocks Not much to say here, we fixed some questionable grammar decisions in the LSM hook comment blocks. * tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (28 commits) cred: Use KMEM_CACHE() instead of kmem_cache_create() lsm: use default hook return value in call_int_hook() lsm: fix typos in security/security.c comment headers integrity: Remove LSM ima: Make it independent from 'integrity' LSM evm: Make it independent from 'integrity' LSM evm: Move to LSM infrastructure ima: Move IMA-Appraisal to LSM infrastructure ima: Move to LSM infrastructure integrity: Move integrity_kernel_module_request() to IMA security: Introduce key_post_create_or_update hook security: Introduce inode_post_remove_acl hook security: Introduce inode_post_set_acl hook security: Introduce inode_post_create_tmpfile hook security: Introduce path_post_mknod hook security: Introduce file_release hook security: Introduce file_post_open hook security: Introduce inode_post_removexattr hook security: Introduce inode_post_setattr hook security: Align inode_setattr hook definition with EVM ...
2024-03-12Merge tag 'selinux-pr-20240312' of ↵Linus Torvalds22-731/+724
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "Really only a few notable changes: - Continue the coding style/formatting fixup work This is the bulk of the diffstat in this pull request, with the focus this time around being the security/selinux/ss directory. We've only got a couple of files left to cleanup and once we're done with that we can start enabling some automatic style verfication and introduce tooling to help new folks format their code correctly. - Don't restrict xattr copy-up when SELinux policy is not loaded This helps systems that use overlayfs, or similar filesystems, preserve their SELinux labels during early boot when the SELinux policy has yet to be loaded. - Reduce the work we do during inode initialization time This isn't likely to show up in any benchmark results, but we removed an unnecessary SELinux object class lookup/calculation during inode initialization. - Correct the return values in selinux_socket_getpeersec_dgram() We had some inconsistencies with respect to our return values across selinux_socket_getpeersec_dgram() and selinux_socket_getpeersec_stream(). This provides a more uniform set of error codes across the two functions and should help make it easier for users to identify the source of a failure" * tag 'selinux-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (24 commits) selinux: fix style issues in security/selinux/ss/symtab.c selinux: fix style issues in security/selinux/ss/symtab.h selinux: fix style issues in security/selinux/ss/sidtab.c selinux: fix style issues in security/selinux/ss/sidtab.h selinux: fix style issues in security/selinux/ss/services.h selinux: fix style issues in security/selinux/ss/policydb.c selinux: fix style issues in security/selinux/ss/policydb.h selinux: fix style issues in security/selinux/ss/mls_types.h selinux: fix style issues in security/selinux/ss/mls.c selinux: fix style issues in security/selinux/ss/mls.h selinux: fix style issues in security/selinux/ss/hashtab.c selinux: fix style issues in security/selinux/ss/hashtab.h selinux: fix style issues in security/selinux/ss/ebitmap.c selinux: fix style issues in security/selinux/ss/ebitmap.h selinux: fix style issues in security/selinux/ss/context.h selinux: fix style issues in security/selinux/ss/context.h selinux: fix style issues in security/selinux/ss/constraint.h selinux: fix style issues in security/selinux/ss/conditional.c selinux: fix style issues in security/selinux/ss/conditional.h selinux: fix style issues in security/selinux/ss/avtab.c ...
2024-03-13Revert "crypto: remove CONFIG_CRYPTO_STATS"Herbert Xu32-77/+1139
This reverts commit 2beb81fbf0c01a62515a1bcef326168494ee2bd0. While removing CONFIG_CRYPTO_STATS is a worthy goal, this also removed unrelated infrastructure such as crypto_comp_alg_common. Signed-off-by: Herbert Xu <[email protected]>
2024-03-12f2fs: prevent atomic write on pinned fileDaeho Jeong1-1/+2
Since atomic write way was changed to out-place-update, we should prevent it on pinned files. Signed-off-by: Daeho Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-12f2fs: fix to handle error paths of {new,change}_curseg()Zhiguo Niu4-27/+45
{new,change}_curseg() may return error in some special cases, error handling should be did in their callers, and this will also facilitate subsequent error path expansion in {new,change}_curseg(). Signed-off-by: Zhiguo Niu <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-12f2fs: unify the error handling of f2fs_is_valid_blkaddrZhiguo Niu7-62/+29
There are some cases of f2fs_is_valid_blkaddr not handled as ERROR_INVALID_BLKADDR,so unify the error handling about all of f2fs_is_valid_blkaddr. Do f2fs_handle_error in __f2fs_is_valid_blkaddr for cleanup. Signed-off-by: Zhiguo Niu <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-12f2fs: zone: fix to remove pow2 check condition for zoned block deviceChao Yu1-5/+0
Commit 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned device") missed to remove pow2 check condition in init_blkz_info(), fix it. Fixes: 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned device") Signed-off-by: Feng Song <[email protected]> Signed-off-by: Yongpeng Yang <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-12f2fs: fix to truncate meta inode pages forcelyChao Yu4-6/+33
Below race case can cause data corruption: Thread A GC thread - gc_data_segment - ra_data_block - locked meta_inode page - f2fs_inplace_write_data - invalidate_mapping_pages : fail to invalidate meta_inode page due to lock failure or dirty|writeback status - f2fs_submit_page_bio : write last dirty data to old blkaddr - move_data_block - load old data from meta_inode page - f2fs_submit_page_write : write old data to new blkaddr Because invalidate_mapping_pages() will skip invalidating page which has unclear status including locked, dirty, writeback and so on, so we need to use truncate_inode_pages_range() instead of invalidate_mapping_pages() to make sure meta_inode page will be dropped. Fixes: 6aa58d8ad20a ("f2fs: readahead encrypted block during GC") Fixes: e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-12f2fs: compress: fix reserve_cblocks counting error when out of spaceXiuhong Wang1-8/+7
When a file only needs one direct_node, performing the following operations will cause the file to be unrepairable: unisoc # ./f2fs_io compress test.apk unisoc #df -h | grep dm-48 /dev/block/dm-48 112G 112G 1.2M 100% /data unisoc # ./f2fs_io release_cblocks test.apk 924 unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 4.8M 100% /data unisoc # dd if=/dev/random of=file4 bs=1M count=3 3145728 bytes (3.0 M) copied, 0.025 s, 120 M/s unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 1.8M 100% /data unisoc # ./f2fs_io reserve_cblocks test.apk F2FS_IOC_RESERVE_COMPRESS_BLOCKS failed: No space left on device adb reboot unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 11M 100% /data unisoc # ./f2fs_io reserve_cblocks test.apk 0 This is because the file has only one direct_node. After returning to -ENOSPC, reserved_blocks += ret will not be executed. As a result, the reserved_blocks at this time is still 0, which is not the real number of reserved blocks. Therefore, fsck cannot be set to repair the file. After this patch, the fsck flag will be set to fix this problem. unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 1.8M 100% /data unisoc # ./f2fs_io reserve_cblocks test.apk F2FS_IOC_RESERVE_COMPRESS_BLOCKS failed: No space left on device adb reboot then fsck will be executed unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 11M 100% /data unisoc # ./f2fs_io reserve_cblocks test.apk 924 Fixes: c75488fb4d82 ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS") Signed-off-by: Xiuhong Wang <[email protected]> Signed-off-by: Zhiguo Niu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-12f2fs: compress: relocate some judgments in f2fs_reserve_compress_blocksXiuhong Wang1-4/+3
The following f2fs_io test will get a "0" result instead of -EINVAL, unisoc # ./f2fs_io compress file unisoc # ./f2fs_io reserve_cblocks file 0 it's not reasonable, so the judgement of atomic_read(&F2FS_I(inode)->i_compr_blocks) should be placed after the judgement of is_inode_flag_set(inode, FI_COMPRESS_RELEASED). Fixes: c75488fb4d82 ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS") Signed-off-by: Xiuhong Wang <[email protected]> Signed-off-by: Zhiguo Niu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-12Merge tag 'net-next-6.9' of ↵Linus Torvalds1881-36042/+91460
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ...
2024-03-12Merge tag 'docs-6.9' of git://git.lwn.net/linuxLinus Torvalds64-1607/+6857
Pull documentation updates from Jonathan Corbet: "A moderatly busy cycle for development this time around. - Some cleanup of the main index page for easier navigation - Rework some of the other top-level pages for better readability and, with luck, fewer merge conflicts in the future. - Submit-checklist improvements, hopefully the first of many. - New Italian translations - A fair number of kernel-doc fixes and improvements. We have also dropped the recommendation to use an old version of Sphinx. - A new document from Thorsten on bisection ... and lots of fixes and updates" * tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits) docs: verify/bisect: fixes, finetuning, and support for Arch docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst docs: submit-checklist: use subheadings docs: submit-checklist: structure by category docs: new text on bisecting which also covers bug validation docs: drop the version constraints for sphinx and dependencies docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4 docs: Restore "smart quotes" for quotes docs/zh_CN: accurate translation of "function" docs: Include simplified link titles in main index docs: Correct formatting of title in admin-guide/index.rst docs: kernel_feat.py: fix build error for missing files MAINTAINERS: Set the field name for subsystem profile section kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO Fixed case issue with 'fault-injection' in documentation kernel-doc: handle #if in enums as well Documentation: update mailing list addresses doc: kerneldoc.py: fix indentation scripts/kernel-doc: simplify signature printing ...
2024-03-12Merge tag 'audit-pr-20240312' of ↵Linus Torvalds2-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Two small audit patches: - Use the KMEM_CACHE() macro instead of kmem_cache_create() The guidance appears to be to use the KMEM_CACHE() macro when possible and there is no reason why we can't use the macro, so let's use it. - Remove an unnecessary assignment in audit_dupe_lsm_field() A return value variable was assigned a value in its declaration, but the declaration value is overwritten before the return value variable is ever referenced; drop the assignment at declaration time" * tag 'audit-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: use KMEM_CACHE() instead of kmem_cache_create() audit: remove unnecessary assignment in audit_dupe_lsm_field()
2024-03-12Merge tag 'Smack-for-6.9' of https://github.com/cschaufler/smack-nextLinus Torvalds2-47/+87
Pull smack updates from Casey Schaufler: - Improvements to the initialization of in-memory inodes - A fix in ramfs to propery ensure the initialization of in-memory inodes - Removal of duplicated code in smack_cred_transfer() * tag 'Smack-for-6.9' of https://github.com/cschaufler/smack-next: Smack: use init_task_smack() in smack_cred_transfer() ramfs: Initialize security of in-memory inodes smack: Initialize the in-memory inode in smack_inode_init_security() smack: Always determine inode labels in smack_inode_init_security() smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity() smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
2024-03-12Merge tag 'seccomp-v6.9-rc1' of ↵Linus Torvalds3-14/+73
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "There are no core kernel changes here; it's entirely selftests and samples: - Improve reliability of selftests (Terry Tritton, Kees Cook) - Fix strict-aliasing warning in samples (Arnd Bergmann)" * tag 'seccomp-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: samples: user-trap: fix strict-aliasing warning selftests/seccomp: Pin benchmark to single CPU selftests/seccomp: user_notification_addfd check nextfd is available selftests/seccomp: Change the syscall used in KILL_THREAD test selftests/seccomp: Handle EINVAL on unshare(CLONE_NEWPID)
2024-03-12cxl/region: Deal with numa nodes not enumerated by SRATDave Jiang6-1/+33
For the numa nodes that are not created by SRAT, no memory_target is allocated and is not managed by the HMAT_REPORTING code. Therefore hmat_callback() memory hotplug notifier will exit early on those NUMA nodes. The CXL memory hotplug notifier will need to call node_set_perf_attrs() directly in order to setup the access sysfs attributes. In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is stored. Add a helper function acpi_node_backed_by_real_pxm() in order to check if a NUMA node id is defined by SRAT or created by CFMWS. node_set_perf_attrs() symbol is exported to allow update of perf attribs for a node. The sysfs path of /sys/devices/system/node/nodeX/access0/initiators/* is created by node_set_perf_attrs() for the various attributes where nodeX is matched to the NUMA node of the CXL region. Cc: Rafael J. Wysocki <[email protected]> Reviewed-by: Alison Schofield <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12Merge tag 'hardening-v6.9-rc1' of ↵Linus Torvalds71-688/+1949
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "As is pretty normal for this tree, there are changes all over the place, especially for small fixes, selftest improvements, and improved macro usability. Some header changes ended up landing via this tree as they depended on the string header cleanups. Also, a notable set of changes is the work for the reintroduction of the UBSAN signed integer overflow sanitizer so that we can continue to make improvements on the compiler side to make this sanitizer a more viable future security hardening option. Summary: - string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko) - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit Mogalapalli) - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael Ellerman) - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn) - Handle tail call optimization better in LKDTM (Douglas Anderson) - Use long form types in overflow.h (Andy Shevchenko) - Add flags param to string_get_size() (Andy Shevchenko) - Add Coccinelle script for potential struct_size() use (Jacob Keller) - Fix objtool corner case under KCFI (Josh Poimboeuf) - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng) - Add str_plural() helper (Michal Wajdeczko, Kees Cook) - Ignore relocations in .notes section - Add comments to explain how __is_constexpr() works - Fix m68k stack alignment expectations in stackinit Kunit test - Convert string selftests to KUnit - Add KUnit tests for fortified string functions - Improve reporting during fortified string warnings - Allow non-type arg to type_max() and type_min() - Allow strscpy() to be called with only 2 arguments - Add binary mode to leaking_addresses scanner - Various small cleanups to leaking_addresses scanner - Adding wrapping_*() arithmetic helper - Annotate initial signed integer wrap-around in refcount_t - Add explicit UBSAN section to MAINTAINERS - Fix UBSAN self-test warnings - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL - Reintroduce UBSAN's signed overflow sanitizer" * tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits) selftests/powerpc: Fix load_unaligned_zeropad build failure string: Convert helpers selftest to KUnit string: Convert selftest to KUnit sh: Fix build with CONFIG_UBSAN=y compiler.h: Explain how __is_constexpr() works overflow: Allow non-type arg to type_max() and type_min() VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() lib/string_helpers: Add flags param to string_get_size() x86, relocs: Ignore relocations in .notes section objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks overflow: Use POD in check_shl_overflow() lib: stackinit: Adjust target string to 8 bytes for m68k sparc: vdso: Disable UBSAN instrumentation kernel.h: Move lib/cmdline.c prototypes to string.h leaking_addresses: Provide mechanism to scan binary files leaking_addresses: Ignore input device status lines leaking_addresses: Use File::Temp for /tmp files MAINTAINERS: Update LEAKING_ADDRESSES details fortify: Improve buffer overflow reporting fortify: Add KUnit tests for runtime overflows ...
2024-03-12Merge tag 'execve-v6.9-rc1' of ↵Linus Torvalds3-10/+5
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - Drop needless error path code in remove_arg_zero() (Li kunyu, Kees Cook) - binfmt_elf_efpic: Don't use missing interpreter's properties (Max Filippov) - Use /bin/bash for execveat selftests * tag 'execve-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Simplify remove_arg_zero() error path selftests/exec: Perform script checks with /bin/bash exec: Delete unnecessary statements in remove_arg_zero() fs: binfmt_elf_efpic: don't use missing interpreter's properties
2024-03-12Merge tag 'pstore-v6.9-rc1' of ↵Linus Torvalds5-16/+42
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: - Make PSTORE_RAM available by default on arm64 (Nícolas F R A Prado) - Allow for dynamic initialization in modular build (Guilherme G Piccoli) - Add missing allocation failure check (Kunwu Chan) - Avoid duplicate memory zeroing (Christophe JAILLET) - Avoid potential double-free during pstorefs umount * tag 'pstore-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/zone: Don't clear memory twice pstore/zone: Add a null pointer check to the psz_kmsg_read efi: pstore: Allow dynamic initialization based on module parameter arm64: defconfig: Enable PSTORE_RAM pstore/ram: Register to module device table pstore: inode: Only d_invalidate() is needed
2024-03-12Merge tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds44-753/+1763
Pull nfsd updates from Chuck Lever: "The bulk of the patches for this release are optimizations, code clean-ups, and minor bug fixes. One new feature to mention is that NFSD administrators now have the ability to revoke NFSv4 open and lock state. NFSD's NFSv3 support has had this capability for some time. As always I am grateful to NFSD contributors, reviewers, and testers" * tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (75 commits) NFSD: Clean up nfsd4_encode_replay() NFSD: send OP_CB_RECALL_ANY to clients when number of delegations reaches its limit NFSD: Document nfsd_setattr() fill-attributes behavior nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr() nfsd: Fix a regression in nfsd_setattr() NFSD: OP_CB_RECALL_ANY should recall both read and write delegations NFSD: handle GETATTR conflict with write delegation NFSD: add support for CB_GETATTR callback NFSD: Document the phases of CREATE_SESSION NFSD: Fix the NFSv4.1 CREATE_SESSION operation nfsd: clean up comments over nfs4_client definition svcrdma: Add Write chunk WRs to the RPC's Send WR chain svcrdma: Post WRs for Write chunks in svc_rdma_sendto() svcrdma: Post the Reply chunk and Send WR together svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt svcrdma: Post Send WR chain svcrdma: Fix retry loop in svc_rdma_send() svcrdma: Prevent a UAF in svc_rdma_send() svcrdma: Fix SQ wake-ups svcrdma: Increase the per-transport rw_ctx count ...
2024-03-12Merge tag 'erofs-for-6.9-rc1' of ↵Linus Torvalds9-317/+335
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, we introduce compressed inode support over fscache since a lot of native EROFS images are explicitly compressed so that EROFS over fscache can be more widely used even without Dragonfly Nydus [1]. Apart from that, there are some folio conversions for compressed inodes available as well as a lockdep false positive fix. Summary: - Some folio conversions for compressed inodes; - Add compressed inode support over fscache; - Fix lockdep false positives of erofs_pseudo_mnt" Link: https://nydus.dev [1] * tag 'erofs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: support compressed inodes over fscache erofs: make iov_iter describe target buffers over fscache erofs: fix lockdep false positives on initializing erofs_pseudo_mnt erofs: refine managed cache operations to folios erofs: convert z_erofs_submissionqueue_endio() to folios erofs: convert z_erofs_fill_bio_vec() to folios erofs: get rid of `justfound` debugging tag erofs: convert z_erofs_do_read_page() to folios erofs: convert z_erofs_onlinepage_.* to folios
2024-03-12Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds3-26/+24
Pull fsverity update from Eric Biggers: "Slightly improve data verification performance by eliminating an unnecessary lock" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: remove hash page spin lock
2024-03-12Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds4-24/+36
Pull fscrypt updates from Eric Biggers: "Fix flakiness in a test by releasing the quota synchronously when a key is removed, and other minor cleanups" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: shrink the size of struct fscrypt_inode_info slightly fscrypt: write CBC-CTS instead of CTS-CBC fscrypt: clear keyring before calling key_put() fscrypt: explicitly require that inode->i_blkbits be set
2024-03-12assoc_array: fix the return value in assoc_array_insert_mid_shortcut()Roman Smirnov1-1/+1
Returning the edit variable is redundant because it is dereferenced right before it is returned. It would be better to return true. Found by Linux Verification Center (linuxtesting.org) with Svace. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Smirnov <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12buildid: use kmap_local_page()Peng Hao1-2/+2
Use kmap_local_page() instead of kmap_atomic() which has been deprecated. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peng Hao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12watchdog/core: remove sysctl handlers from public headerThomas Weißschuh2-17/+12
The functions are only used in the file where they are defined. Remove them from the header and make them static. Also guard proc_soft_watchdog with a #define-guard as it is not used otherwise. Link: https://lkml.kernel.org/r/20240306-const-sysctl-prep-watchdog-v1-1-bd45da3a41cf@weissschuh.net Signed-off-by: Thomas Weißschuh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12nilfs2: use div64_ul() instead of do_div()Thorsten Blum6-7/+7
Fixes Coccinelle/coccicheck warnings reported by do_div.cocci. Compared to do_div(), div64_ul() does not implicitly cast the divisor and does not unnecessarily calculate the remainder. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Thorsten Blum <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12mul_u64_u64_div_u64: increase precision by conditionally swapping a and bUwe Kleine-König1-0/+15
As indicated in the added comment, the algorithm works better if b is big. As multiplication is commutative, a and b can be swapped. Do this if a is bigger than b. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uwe Kleine-König <[email protected]> Tested-by: Biju Das <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12bus: ts-nbus: Improve error reportingUwe Kleine-König1-38/+28
Using dev_err_probe() brings several improvements: - emits the symbolic error code - properly handles EPROBE_DEFER - combines error message generation and return value handling While at it add error messages to two error paths that were silent before. Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2024-03-12selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirementsNico Pache1-1/+2
Now that run_vmtests.sh does not guarantee that the correct hugepage count is available, skip the hugetlb-madvise test if the requirements are not met rather than failing. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nico Pache <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12selftests/mm: skip uffd hugetlb tests with insufficient hugepagesNico Pache1-0/+6
Now that run_vmtests.sh does not guarantee that the correct hugepage count is available, add a check inside the userfaultfd hugetlb test to verify the nr_hugepages count before continuing. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nico Pache <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12selftests/mm: dont fail testsuite due to a lack of hugepagesNico Pache1-1/+0
Patch series "selftests/mm: Improve Hugepage Test Handling in MM Selftests", v2. This series addresses issues related to hugepage requirements in the MM selftests, ensuring tests are skipped rather than failing when the necessary hugepage count is not met. This adjustment allows for a more graceful handling for systems with insufficient hugepages, preventing unnecessary test failures and improving the overall robustness of the test suite. This patch (of 3): On systems that have large core counts and large page sizes, but limited memory, the userfaultfd test hugepage requirement is too large. Exiting early due to missing one test's requirements is a rather aggressive strategy, and prevents a lot of other tests from running. Remove the early exit to prevent this. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: ee00479d6702 ("selftests: vm: Try harder to allocate huge pages") Signed-off-by: Nico Pache <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12mm/huge_memory: skip invalid debugfs new_order input for folio splitZi Yan1-0/+6
User can put arbitrary new_order via debugfs for folio split test. Although new_order check is added to split_huge_page_to_list_order() in the prior commit, these two additional checks can avoid unnecessary folio locking and split_folio_to_order() calls. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zi Yan <[email protected]> Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/linux-mm/[email protected]/ Cc: David Hildenbrand <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12mm/huge_memory: check new folio order when split a folioZi Yan1-0/+3
A folio can only be split into lower orders. Since there are no new_order checks in debugfs, any new_order can be passed via debugfs into split_huge_page_to_list_to_order(). Check new_order to make sure it is smaller than input folio order. Link: https://lkml.kernel.org/r/[email protected] Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Zi Yan <[email protected]> Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/linux-mm/[email protected]/ Cc: David Hildenbrand <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failureByungchul Park1-1/+20
With cache_trim_mode on, reclaim logic doesn't bother reclaiming anon pages. However, it should be more careful to use the mode because it's going to prevent anon pages from being reclaimed even if there are a huge number of anon pages that are cold and should be reclaimed. Even worse, that leads kswapd_failures to reach MAX_RECLAIM_RETRIES and stopping kswapd from functioning until direct reclaim eventually works to resume kswapd. So kswapd needs to retry its scan priority loop with cache_trim_mode off again if the mode doesn't work for reclaim. The problematic behavior can be reproduced by: CONFIG_NUMA_BALANCING enabled sysctl_numa_balancing_mode set to NUMA_BALANCING_MEMORY_TIERING numa node0 (8GB local memory, 16 CPUs) numa node1 (8GB slow tier memory, no CPUs) Sequence: 1) echo 3 > /proc/sys/vm/drop_caches 2) To emulate the system with full of cold memory in local DRAM, run the following dummy program and never touch the region: mmap(0, 8 * 1024 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0); 3) Run any memory intensive work e.g. XSBench. 4) Check if numa balancing is working e.i. promotion/demotion. 5) Iterate 1) ~ 4) until numa balancing stops. With this, you could see that promotion/demotion are not working because kswapd has stopped due to ->kswapd_failures >= MAX_RECLAIM_RETRIES. Interesting vmstat delta's differences between before and after are like: +-----------------------+-------------------------------+ | interesting vmstat | before | after | +-----------------------+-------------------------------+ | nr_inactive_anon | 321935 | 1664772 | | nr_active_anon | 1780700 | 437834 | | nr_inactive_file | 30425 | 40882 | | nr_active_file | 14961 | 3012 | | pgpromote_success | 356 | 1293122 | | pgpromote_candidate | 21953245 | 1824148 | | pgactivate | 1844523 | 3311907 | | pgdeactivate | 50634 | 1554069 | | pgfault | 31100294 | 6518806 | | pgdemote_kswapd | 30856 | 2230821 | | pgscan_kswapd | 1861981 | 7667629 | | pgscan_anon | 1822930 | 7610583 | | pgscan_file | 39051 | 57046 | | pgsteal_anon | 386 | 2192033 | | pgsteal_file | 30470 | 38788 | | pageoutrun | 30 | 412 | | numa_hint_faults | 27418279 | 2875955 | | numa_pages_migrated | 356 | 1293122 | +-----------------------+-------------------------------+ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Byungchul Park <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12mm: add an explicit smp_wmb() to UFFDIO_CONTINUEJames Houghton2-4/+22
Users of UFFDIO_CONTINUE may reasonably assume that a write memory barrier is included as part of UFFDIO_CONTINUE. That is, a user may believe that all writes it has done to a page that it is now UFFDIO_CONTINUE'ing are guaranteed to be visible to anyone subsequently reading the page through the newly mapped virtual memory region. Today, such a user happens to be correct. mmget_not_zero(), for example, is called as part of UFFDIO_CONTINUE (and comes before any PTE updates), and it implicitly gives us a write barrier. To be resilient against future changes, include an explicit smp_wmb(). While we're at it, optimize the smp_wmb() that is already incidentally present for the HugeTLB case. Merely making a syscall does not generally imply the memory ordering constraints that we need (including on x86). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: James Houghton <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12mm: fix list corruption in put_pages_listMatthew Wilcox (Oracle)1-2/+2
My recent change to put_pages_list() dereferences folio->lru.next after returning the folio to the page allocator. Usually this is now on the pcp list with other free folios, so we try to free an already-free folio. This only happens with lists that have more than 15 entries, so it wasn't immediately discovered. Revert to using list_for_each_safe() so we dereference lru.next before disposing of the folio. Link: https://lkml.kernel.org/r/[email protected] Fixes: 24835f899c01 ("mm: use free_unref_folios() in put_pages_list()") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reported-by: "Borah, Chaitanya Kumar" <[email protected]> Closes: https://lore.kernel.org/intel-gfx/SJ1PR11MB61292145F3B79DA58ADDDA63B9232@SJ1PR11MB6129.namprd11.prod.outlook.com/ Signed-off-by: Andrew Morton <[email protected]>
2024-03-12mm: remove folio from deferred split list before uncharging itMatthew Wilcox (Oracle)2-0/+9
When freeing a large folio, we must remove it from the deferred split list before we uncharge it as each memcg has its own deferred split list (with associated lock) and removing a folio from the deferred split list while holding the wrong lock will corrupt that list and cause various related problems. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: f77171d241e3 (mm: allow non-hugetlb large folios to be batch processed) Fixes: 29f3843026cf (mm: free folios directly in move_folios_to_lru()) Fixes: bc2ff4cbc329 (mm: free folios in a batch in shrink_folio_list()) Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Debugged-by: Ryan Roberts <[email protected]> Tested-by: Ryan Roberts <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-12bus: ts-nbus: Convert to atomic pwm APIUwe Kleine-König1-10/+7
With this change the PWM hardware is only configured once (instead of three times). Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2024-03-12Merge tag 'affs-for-6.9' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull affs update from David Sterba: "One change to AFFS that removes use of SLAB_MEM_SPREAD, which is going to be removed from MM code" * tag 'affs-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: affs: remove SLAB_MEM_SPREAD flag usage
2024-03-12cxl/region: Add memory hotplug notifier for cxl regionDave Jiang7-0/+127
When the CXL region is formed, the driver computes the performance data for the region. However this data is not available at the node data collection that has been populated by the HMAT during kernel initialization. Add a memory hotplug notifier to update the access coordinates to the 'struct memory_target' context kept by the HMAT_REPORTING code. Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the priority number to be called before HMAT_CALLBACK_PRI. The CXL update must happen before hmat_callback(). A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in order to allow CXL to update the memory_target access coordinates. A new ext_updated member is added to the memory_target to indicate that the access coordinates within the memory_target has been updated by an external agent such as CXL. This prevents data being overwritten by the hmat_update_target_attrs() triggered by hmat_callback(). Cc: Andrew Morton <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Reviewed-by: Huang, Ying <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12cxl/region: Add sysfs attribute for locality attributes of CXL regionsDave Jiang2-0/+128
Add read/write latencies and bandwidth sysfs attributes for the enabled CXL region. The bandwidth is the aggregated bandwidth of all devices that contribute to the CXL region. The latency is the worst latency of the device amongst all the devices that contribute to the CXL region. Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12cxl/region: Calculate performance data for a regionDave Jiang3-0/+71
Calculate and store the performance data for a CXL region. Find the worst read and write latency for all the included ranges from each of the devices that attributes to the region and designate that as the latency data. Sum all the read and write bandwidth data for each of the device region and that is the total bandwidth for the region. The perf list is expected to be constructed before the endpoint decoders are registered and thus there should be no early reading of the entries from the region assemble action. The calling of the region qos calculate function is under the protection of cxl_dpa_rwsem and will ensure that all DPA associated work has completed. Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12cxl: Set cxlmd->endpoint before adding port deviceDave Jiang1-1/+1
Move setting of cxlmd->endpoint to before calling add_device() on the port device. Otherwise when referencing cxlmd->endpoint in region discovery code that is triggered by the port driver probe function, the endpoint port pointer is not valid. Current code does not hit this issue yet since cxlmd->endpoint is not being referenced during region discovery. However follow on code that does performance calculations will. Tested-by: Wonjae Lee <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12cxl: Move QoS class to be calculated from the nearest CPUDave Jiang1-3/+3
Retrieve the qos_class (QTG ID) using the access coordinates from the nearest CPU rather than the nearst initiator that may not be a CPU. This may be the more appropriate number that applications care about. For most cases, access0 and access1 have the same values. Link: https://lore.kernel.org/linux-cxl/[email protected]/ Suggested-by: Jonathan Cameron <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12cxl: Split out host bridge access coordinatesDave Jiang3-9/+56
The difference between access class 0 and access class 1 for 'struct access_coordinate', if any, is that class 0 is for the distance from the target to the closest initiator and that class 1 is for the distance from the target to the closest CPU. For CXL memory, the nearest initiator may not necessarily be a CPU node. The performance path from the CXL endpoint to the host bridge should remain the same. However, the numbers extracted and stored from HMAT is the difference for the two access classes. Split out the performance numbers for the host bridge (generic target) from the calculation of the entire path in order to allow calculation of both access classes for a CXL region. Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12cxl: Split out combine_coordinates() for common shared usageDave Jiang3-25/+29
Refactor the common code of combining coordinates in order to reduce code. Create a new function cxl_cooordinates_combine() it combine two 'struct access_coordinate'. Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access ↵Dave Jiang4-7/+13
classes Update acpi_get_genport_coordinates() to allow retrieval of both access classes of the 'struct access_coordinate' for a generic target. The update will allow CXL code to compute access coordinates for both access class. Cc: Rafael J. Wysocki <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12ACPI: HMAT: Introduce 2 levels of generic port access classDave Jiang1-5/+10
In order to compute access0 and access1 classes for CXL memory, 2 levels of generic port information must be stored. Access0 will indicate the generic port access coordinates to the closest initiator and access1 will indicate the generic port access coordinates to the cloest CPU. Cc: Rafael J. Wysocki <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2024-03-12base/node / ACPI: Enumerate node access class for 'struct access_coordinate'Dave Jiang3-18/+32
Both generic node and HMAT handling code have been using magic numbers to indicate access classes for 'struct access_coordinate'. Introduce enums to enumerate the access0 and access1 classes shared by the two subsystems. Update the function parameters and callers as appropriate to utilize the new enum. Access0 is named to ACCESS_COORDINATE_LOCAL in order to indicate that the access class is for 'struct access_coordinate' between a target node and the nearest initiator node. Access1 is named to ACCESS_COORDINATE_CPU in order to indicate that the access class is for 'struct access_coordinate' between a target node and the nearest CPU node. Cc: Greg Kroah-Hartman <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Tested-by: Jonathan Cameron <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>