blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-03-11	Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512"	Catalin Marinas	1	-2/+1
	This reverts commit 0499a78369adacec1af29340b71ff8dd375b4697. Enabling CPUMASK_OFFSTACK on arm64 triggers a warning in the dev_pm_opp_set_config() function followed by a failure to set the regulators and cpufreq-dt probing error. There is no apparent reason why this happens, so revert this commit until further investigation. Signed-off-by: Catalin Marinas <[email protected]> Reported-by: Marek Szyprowski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-07	Merge branch 'for-next/stage1-lpa2' into for-next/core	Catalin Marinas	55	-1124/+1949
	* for-next/stage1-lpa2: (48 commits) : Add support for LPA2 and WXN and stage 1 arm64/mm: Avoid ID mapping of kpti flag if it is no longer needed arm64/mm: Use generic __pud_free() helper in pud_free() implementation arm64: gitignore: ignore relacheck arm64: Use Signed/Unsigned enums for TGRAN{4,16,64} and VARange arm64: mm: Make PUD folding check in set_pud() a runtime check arm64: mm: add support for WXN memory translation attribute mm: add arch hook to validate mmap() prot flags arm64: defconfig: Enable LPA2 support arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels arm64: ptdump: Deal with translation levels folded at runtime arm64: ptdump: Disregard unaddressable VA space arm64: mm: Add support for folding PUDs at runtime arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging arm64: mm: Add 5 level paging support to fixmap and swapper handling arm64: Enable LPA2 at boot if supported by the system arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion arm64: mm: Add definitions to support 5 levels of paging arm64: mm: Add LPA2 support to phys<->pte conversion routines arm64: mm: Wire up TCR.DS bit to PTE shareability fields ...
2024-03-07	Merge branches 'for-next/reorg-va-space', 'for-next/rust-for-arm64', ↵	Catalin Marinas	62	-241/+2647
	'for-next/misc', 'for-next/daif-cleanup', 'for-next/kselftest', 'for-next/documentation', 'for-next/sysreg' and 'for-next/dpisa', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (39 commits) docs: perf: Fix build warning of hisi-pcie-pmu.rst perf: starfive: Only allow COMPILE_TEST for 64-bit architectures MAINTAINERS: Add entry for StarFive StarLink PMU docs: perf: Add description for StarFive's StarLink PMU dt-bindings: perf: starfive: Add JH8100 StarLink PMU perf: starfive: Add StarLink PMU support docs: perf: Update usage for target filter of hisi-pcie-pmu drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx() drivers/perf: hisi_pcie: Relax the check on related events drivers/perf: hisi_pcie: Check the target filter properly drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth drivers/perf: hisi_pcie: Fix incorrect counting under metric mode drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val() drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter() drivers/perf: hisi: Enable HiSilicon Erratum 162700402 quirk for HIP09 perf/arm_cspmu: Add devicetree support dt-bindings/perf: Add Arm CoreSight PMU perf/arm_cspmu: Simplify counter reset perf/arm_cspmu: Simplify attribute groups perf/arm_cspmu: Simplify initialisation ... * for-next/reorg-va-space: : Reorganise the arm64 kernel VA space in preparation for LPA2 support : (52-bit VA/PA). arm64: kaslr: Adjust randomization range dynamically arm64: mm: Reclaim unused vmemmap region for vmalloc use arm64: vmemmap: Avoid base2 order of struct page size to dimension region arm64: ptdump: Discover start of vmemmap region at runtime arm64: ptdump: Allow all region boundaries to be defined at boot time arm64: mm: Move fixmap region above vmemmap region arm64: mm: Move PCI I/O emulation region above the vmemmap region * for-next/rust-for-arm64: : Enable Rust support for arm64 arm64: rust: Enable Rust support for AArch64 rust: Refactor the build target to allow the use of builtin targets * for-next/misc: : Miscellaneous arm64 patches ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 arm64: Remove enable_daif macro arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception arm64: cpufeatures: Clean up temporary variable to simplify code arm64: Update setup_arch() comment on interrupt masking arm64: remove unnecessary ifdefs around is_compat_task() arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with Clang arm64/sme: Ensure that all fields in SMCR_EL1 are set to known values arm64/sve: Ensure that all fields in ZCR_EL1 are set to known values arm64/sve: Document that __SVE_VQ_MAX is much larger than needed arm64: make member of struct pt_regs and it's offset macro in the same order arm64: remove unneeded BUILD_BUG_ON assertion arm64: kretprobes: acquire the regs via a BRK exception arm64: io: permit offset addressing arm64: errata: Don't enable workarounds for "rare" errata by default * for-next/daif-cleanup: : Clean up DAIF handling for EL0 returns arm64: Unmask Debug + SError in do_notify_resume() arm64: Move do_notify_resume() to entry-common.c arm64: Simplify do_notify_resume() DAIF masking * for-next/kselftest: : Miscellaneous arm64 kselftest patches kselftest/arm64: Test that ptrace takes effect in the target process * for-next/documentation: : arm64 documentation patches arm64/sme: Remove spurious 'is' in SME documentation arm64/fp: Clarify effect of setting an unsupported system VL arm64/sme: Fix cut'n'paste in ABI document arm64/sve: Remove bitrotted comment about syscall behaviour * for-next/sysreg: : sysreg updates arm64/sysreg: Update ID_AA64DFR0_EL1 register arm64/sysreg: Update ID_DFR0_EL1 register fields arm64/sysreg: Add register fields for ID_AA64DFR1_EL1 * for-next/dpisa: : Support for 2023 dpISA extensions kselftest/arm64: Add 2023 DPISA hwcap test coverage kselftest/arm64: Add basic FPMR test kselftest/arm64: Handle FPMR context in generic signal frame parser arm64/hwcap: Define hwcaps for 2023 DPISA features arm64/ptrace: Expose FPMR via ptrace arm64/signal: Add FPMR signal handling arm64/fpsimd: Support FEAT_FPMR arm64/fpsimd: Enable host kernel access to FPMR arm64/cpufeature: Hook new identification registers up to cpufeature
2024-03-07	ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512	Christoph Lameter (Ampere)	1	-1/+2
	Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere Computing) are planning to ship systems with 512 CPUs. So that all CPUs on these systems can be used with defconfig, we'd like to bump NR_CPUS to 512. Therefore this patch increases the default NR_CPUS from 256 to 512. As increasing NR_CPUS will increase the size of cpumasks, there's a fear that this might have a significant impact on stack usage due to code which places cpumasks on the stack. To mitigate that concern, we can select CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with NR_CPUS=256, we only select this when NR_CPUS > 256. CPUMASK_OFFSTACK configures the cpumasks in the kernel to be dynamically allocated. This was used in the X86 architecture in the past to enable support for larger CPU configurations up to 8k cpus. With that is becomes possible to dynamically size the allocation of the cpu bitmaps depending on the quantity of processors detected on bootup. Memory used for cpumasks will increase if the kernel is run on a machine with more cores. Further increases may be needed if ARM processor vendors start supporting more processors. Given the current inflationary trends in core counts from multiple processor manufacturers this may occur. There are minor regressions for hackbench. The kernel data size for 512 cpus is smaller with offstack than with onstack. Benchmark results using hackbench average over 10 runs of hackbench -s 512 -l 2000 -g 15 -f 25 -P on Altra 80 Core Support for 256 CPUs on stack. Baseline 7.8564 sec Support for 512 CUs on stack. 7.8713 sec + 0.18% 512 CPUS offstack 7.8916 sec + 0.44% Kernel size comparison: text data filename Difference to onstack256 baseline 25755648 9589248 vmlinuz-6.8.0-rc4-onstack256 25755648 9607680 vmlinuz-6.8.0-rc4-onstack512 +0.19% 25755648 9603584 vmlinuz-6.8.0-rc4-offstack512 +0.14% Tested-by: Eric Mackay <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Signed-off-by: Christoph Lameter (Ampere) <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] [[email protected]: use 'select' instead of duplicating 'config CPUMASK_OFFSTACK'] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-07	kselftest/arm64: Add 2023 DPISA hwcap test coverage	Mark Brown	1	-0/+217
	Add the hwcaps added for the 2023 DPISA extensions to the hwcaps test program. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-07	kselftest/arm64: Add basic FPMR test	Mark Brown	2	-0/+83
	Verify that a FPMR frame is generated on systems that support FPMR and not generated otherwise. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-07	kselftest/arm64: Handle FPMR context in generic signal frame parser	Mark Brown	2	-0/+9
	Teach the generic signal frame parsing code about the newly added FPMR frame, avoiding warnings every time one is generated. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-07	arm64/hwcap: Define hwcaps for 2023 DPISA features	Mark Brown	5	-0/+129
	The 2023 architecture extensions include a large number of floating point features, most of which simply add new instructions. Add hwcaps so that userspace can enumerate these features. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-07	arm64/ptrace: Expose FPMR via ptrace	Mark Brown	2	-0/+43
	Add a new regset to expose FPMR via ptrace. It is not added to the FPSIMD registers since that structure is exposed elsewhere without any allowance for extension we don't add there. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-07	arm64/signal: Add FPMR signal handling	Mark Brown	2	-0/+67
	Expose FPMR in the signal context on systems where it is supported. The kernel validates the exact size of the FPSIMD registers so we can't readily add it to fpsimd_context without disruption. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-07	arm64/fpsimd: Support FEAT_FPMR	Mark Brown	8	-0/+36
	FEAT_FPMR defines a new EL0 accessible register FPMR use to configure the FP8 related features added to the architecture at the same time. Detect support for this register and context switch it for EL0 when present. Due to the sharing of responsibility for saving floating point state between the host kernel and KVM FP8 support is not yet implemented in KVM and a stub similar to that used for SVCR is provided for FPMR in order to avoid bisection issues. To make it easier to share host state with the hypervisor we store FPMR as a hardened usercopy field in uw (along with some padding). Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-07	arm64/fpsimd: Enable host kernel access to FPMR	Mark Brown	1	-1/+1
	FEAT_FPMR provides a new generally accessible architectural register FPMR. This is only accessible to EL0 and EL1 when HCRX_EL2.EnFPM is set to 1, do this when the host is running. The guest part will be done along with context switching the new register and exposing it via guest management. Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-07	arm64/cpufeature: Hook new identification registers up to cpufeature	Mark Brown	3	-0/+34
	The 2023 architecture extensions have defined several new ID registers, hook them up to the cpufeature code so we can add feature checks and hwcaps based on their contents. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-05	docs: perf: Fix build warning of hisi-pcie-pmu.rst	Yicong Yang	1	-0/+1
	`make htmldocs SPHINXDIRS="admin-guide"` shows below warnings: Documentation/admin-guide/perf/hisi-pcie-pmu.rst:48: ERROR: Unexpected indentation. Documentation/admin-guide/perf/hisi-pcie-pmu.rst:49: WARNING: Block quote ends without a blank line; unexpected unindent. Fix this. Closes: https://lore.kernel.org/lkml/[email protected]/ Fixes: 89a032923d4b ("docs: perf: Update usage for target filter of hisi-pcie-pmu") Signed-off-by: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-05	perf: starfive: Only allow COMPILE_TEST for 64-bit architectures	Will Deacon	1	-1/+1
	The kbuild robot exploded while wasting its time building the Starfive PMU driver for the 32-bit PA-RISC and Hexagon architectures. Adjust the Kconfig dependencies so that COMPILE_TEST is only applicable for 64-bit architectures (which implement writeq()). Reported-by: kernel test robot <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2024-03-04	MAINTAINERS: Add entry for StarFive StarLink PMU	Ji Sheng Teoh	1	-0/+7
	Add maintainer entry for StarFive StarLink PMU driver, and mark it as "Maintained" Signed-off-by: Ji Sheng Teoh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	docs: perf: Add description for StarFive's StarLink PMU	Ji Sheng Teoh	2	-0/+47
	StarFive StarLink PMU support monitoring L3 memory system PMU events. Add documentation to describe StarFive StarLink PMU support and it's usage. Signed-off-by: Ji Sheng Teoh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	dt-bindings: perf: starfive: Add JH8100 StarLink PMU	Ji Sheng Teoh	1	-0/+46
	Add device tree binding for StarFive's JH8100 StarLink PMU (Performance Monitor Unit). Signed-off-by: Ji Sheng Teoh <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	perf: starfive: Add StarLink PMU support	Ji Sheng Teoh	3	-0/+652
	This patch adds support for StarFive's StarLink PMU (Performance Monitor Unit). StarLink PMU integrates one or more CPU cores with a shared L3 memory system. The PMU supports overflow interrupt, up to 16 programmable 64bit event counters, and an independent 64bit cycle counter. StarLink PMU is accessed via MMIO. Example Perf stat output: [root@user]# perf stat -a -e /starfive_starlink_pmu/cycles/ \ -e /starfive_starlink_pmu/read_miss/ \ -e /starfive_starlink_pmu/read_hit/ \ -e /starfive_starlink_pmu/release_request/ \ -e /starfive_starlink_pmu/write_hit/ \ -e /starfive_starlink_pmu/write_miss/ \ -e /starfive_starlink_pmu/write_request/ \ -e /starfive_starlink_pmu/writeback/ \ -e /starfive_starlink_pmu/read_request/ \ -- openssl speed rsa2048 Doing 2048 bits private rsa's for 10s: 5 2048 bits private RSA's in 2.84s Doing 2048 bits public rsa's for 10s: 169 2048 bits public RSA's in 2.42s version: 3.0.11 built on: Tue Sep 19 13:02:31 2023 UTC options: bn(64,64) CPUINFO: N/A sign verify sign/s verify/s rsa 2048 bits 0.568000s 0.014320s 1.8 69.8 ///////// Performance counter stats for 'system wide': 649991998 starfive_starlink_pmu/cycles/ 1009690 starfive_starlink_pmu/read_miss/ 1079750 starfive_starlink_pmu/read_hit/ 2089405 starfive_starlink_pmu/release_request/ 129 starfive_starlink_pmu/write_hit/ 70 starfive_starlink_pmu/write_miss/ 194 starfive_starlink_pmu/write_request/ 150080 starfive_starlink_pmu/writeback/ 2089423 starfive_starlink_pmu/read_request/ 27.062755678 seconds time elapsed Signed-off-by: Ji Sheng Teoh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	docs: perf: Update usage for target filter of hisi-pcie-pmu	Junhao He	1	-8/+23
	One of the "port" and "bdf" target filter interface must be set, and the related events should preferably used in the same group. Update the usage in the documentation. Signed-off-by: Junhao He <[email protected]> Signed-off-by: Yicong Yang <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx()	Junhao He	1	-32/+19
	The function xxx_find_related_event() scan all working events to find related events. During this process, we also can find the idle counters. If not found related events, return the first idle counter to simplify the code. Signed-off-by: Junhao He <[email protected]> Signed-off-by: Yicong Yang <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	drivers/perf: hisi_pcie: Relax the check on related events	Junhao He	1	-6/+2
	If we use two events with the same filter and related event type (see the following example), the driver check whether they are related events and are in the same group, otherwise the function hisi_pcie_pmu_find_related_event() return -EINVAL, then the 2nd event cannot count but the 1st event is running, although the PCIe PMU has other idle counters. In this case, The perf event scheduler will make the two events to multiplex a counter, if the user use the formula (1st event_value / 2nd event_value) to calculate the bandwidth, he/she won't get the correct value, because they are not counting at the same period. This patch tries to fix this by making the related events to use different idle counters if they are not in the same event group. And finally, I'm going to say. The related events are best used in the same group [1]. There are two ways to know if they are related events. a) By event name, such as the latency events "xxx_latency, xxx_cnt" or bandwidth events "xxx_flux, xxx_time". b) By event type, such as "event=0xXXXX, event=0x1XXXX". Use group to count the related events: [1] -e "{pmu_name/xxx_latency,port=1/,pmu_name/xxx_cnt,port=1/}" example: 1st event: hisi_pcie0_core1/event=0x804,port=1 2nd event: hisi_pcie0_core1/event=0x10804,port=1 test cmd: perf stat -e hisi_pcie0_core1/event=0x804,port=1/ \ -e hisi_pcie0_core1/event=0x10804,port=1/ before patch: 25,281 hisi_pcie0_core1/event=0x804,port=1/ (49.91%) 470,598 hisi_pcie0_core1/event=0x10804,port=1/ (50.09%) after patch: 24,147 hisi_pcie0_core1/event=0x804,port=1/ 474,558 hisi_pcie0_core1/event=0x10804,port=1/ Signed-off-by: Junhao He <[email protected]> Signed-off-by: Yicong Yang <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	drivers/perf: hisi_pcie: Check the target filter properly	Junhao He	1	-4/+4
	The PMU can monitor traffic of certain target Root Port or downstream target Endpoint. User can specify the target filter by the "port" or "bdf" option respectively. The PMU can only monitor the Root Port or Endpoint on the same PCIe core so the value of "port" or "bdf" should be valid and will be checked by the driver. Currently at least and only one of "port" and "bdf" option must be set. If "port" filter is not set or is set explicitly to zero (default), driver will regard the user specifies a "bdf" option since "port" option is a bitmask of the target Root Ports and zero is not a valid value. If user not explicitly set "port" or "bdf" filter, the driver uses "bdf" default value (zero) to set target filter, but driver will skip the check of bdf=0, although it's a valid value (meaning 0000:000:00.0). Then the user just gets zero. Therefore, we need to check if both "port" and "bdf" are invalid, then return failure and report warning. Testing: before the patch: 0 hisi_pcie0_core1/rx_mrd_flux/ 0 hisi_pcie0_core1/rx_mrd_flux,port=0/ 24,124 hisi_pcie0_core1/rx_mrd_flux,port=1/ 0 hisi_pcie0_core1/rx_mrd_flux,bdf=0/ 0 hisi_pcie0_core1/rx_mrd_flux,port=0x800/ <not supported> hisi_pcie0_core1/rx_mrd_flux,bdf=1/ 24,132 hisi_pcie0_core1/rx_mrd_flux,bdf=0x1700/ <not supported> hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x0/ <not supported> hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x1/ 24,138 hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x1700/ 24,126 hisi_pcie0_core1/rx_mrd_flux,port=0x1,bdf=0x0/ after the patch: <not supported> hisi_pcie0_core1/rx_mrd_flux/ <not supported> hisi_pcie0_core1/rx_mrd_flux,port=0/ 24,153 hisi_pcie0_core1/rx_mrd_flux,port=1/ 0 hisi_pcie0_core1/rx_mrd_flux,port=0x800/ <not supported> hisi_pcie0_core1/rx_mrd_flux,bdf=0/ <not supported> hisi_pcie0_core1/rx_mrd_flux,bdf=1/ 24,117 hisi_pcie0_core1/rx_mrd_flux,bdf=0x1700/ <not supported> hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x0/ <not supported> hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x1/ 24,120 hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x1700/ 24,123 hisi_pcie0_core1/rx_mrd_flux,port=0x1,bdf=0x0/ Signed-off-by: Junhao He <[email protected]> Signed-off-by: Yicong Yang <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth	Yicong Yang	1	-0/+8
	A typical PCIe transaction is consisted of various TLP packets in both direction. For counting bandwidth only memory read events are exported currently. Add memory write and completion counting events of both direction to complete the bandwidth counting. Signed-off-by: Yicong Yang <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	drivers/perf: hisi_pcie: Fix incorrect counting under metric mode	Yicong Yang	1	-1/+7
	The metric counting shows incorrect results if the events in the metric group using the same event but different filter options. This is because we only judge the event code to decide whether the event in the metric group should share the same hardware counter, but ignore the settings of the filter. For example, on a platform of 2 ports 0x1 and 0x2 but only port 0x1 has a downstream PCIe NVME device. The metric counting shows both ports have the same counts because we misassign these two events to one same hardware counter: [root@localhost perf-iostat]# ./perf stat -e '{hisi_pcie0_core1/event=0x0104,port=0x2/,hisi_pcie0_core1/event=0x0104,port=0x1/}' Performance counter stats for 'system wide': 7907484924 hisi_pcie0_core1/event=0x0104,port=0x2/ 7907484924 hisi_pcie0_core1/event=0x0104,port=0x1/ 10.153863691 seconds time elapsed Fix this by using the whole config rather than the event only to judge whether two events are the same and should share the same hardware counter. With this patch, the metric counting in the above case tends to be corrected: [root@localhost perf-iostat]# ./perf stat -e '{hisi_pcie0_core1/event=0x0104,port=0x2/,hisi_pcie0_core1/event=0x0104,port=0x1/}' Performance counter stats for 'system wide': 0 hisi_pcie0_core1/event=0x0104,port=0x2/ 8123122077 hisi_pcie0_core1/event=0x0104,port=0x1/ 10.152875631 seconds time elapsed Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU") Signed-off-by: Yicong Yang <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val()	Yicong Yang	1	-3/+10
	Factor out retrieving of the register value for the corresponding event from hisi_pcie_config_event_ctrl() into a new function hisi_pcie_pmu_get_event_ctrl_val() allowing future reuse. Signed-off-by: Yicong Yang <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter()	Yicong Yang	1	-4/+4
	hisi_pcie_pmu_{config,clear}_filter() are config/clear HISI_PCIE_EVENT_CTRL register which contains not only the filter but also the event code. The function names are bit misleading. Rename it to hisi_pcie_pmu_{config,clear}_event_ctrl() to reflects their functions more accurately. Signed-off-by: Yicong Yang <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-04	drivers/perf: hisi: Enable HiSilicon Erratum 162700402 quirk for HIP09	Junhao He	1	-1/+41
	HiSilicon UC PMU v2 suffers the erratum 162700402 that the PMU counter cannot be set due to the lack of clock under power saving mode. This will lead to error or inaccurate counts. The clock can be enabled by the PMU global enabling control. This patch tries to fix this by set the UC PMU enable before set event period to turn on the clock, and then restore the UC PMU configuration. The counter register can hold its value without a clock. Signed-off-by: Junhao He <[email protected]> Reviewed-by: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-03-01	arm64: Remove enable_daif macro	Jinjie Ruan	1	-4/+0
	Since commit bb8e93a287a5 ("arm64: entry: convert SError handlers to C"), the enable_daif assembler macro is no longer used anywhere, so remove it. Signed-off-by: Jinjie Ruan <[email protected]> Reviewed-by: Mark Brown <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-01	arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception	Anshuman Khandual	2	-2/+2
	Let's use existing ISS encoding for an watchpoint exception i.e ESR_ELx_WNR This represents an instruction's either writing to or reading from a memory location during an watchpoint exception. While here this drops non-standard macro AARCH64_ESR_ACCESS_MASK. Cc: Will Deacon <[email protected]> Cc: Mark Rutland <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Mark Brown <[email protected]> Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-01	arm64: cpufeatures: Clean up temporary variable to simplify code	Liao Chang	1	-6/+2
	Clean up one temporary variable to simplifiy code in capability detection. Signed-off-by: Liao Chang <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-01	arm64/mm: Avoid ID mapping of kpti flag if it is no longer needed	Ard Biesheuvel	1	-1/+1
	arm64_use_ng_mappings will be set to 'true' by the early boot code if it decides to use non-global (nG) attributes for all kernel mappings, typically when enabling KASLR on a system that does not implement E0PD. In this case, the G-to-nG update routines are never called, and so there is no reason to create the writable mapping of the associated status flag in the ID map. Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-03-01	arm64/mm: Use generic __pud_free() helper in pud_free() implementation	Ard Biesheuvel	1	-2/+1
	Commit 0dd4f60a2c76 ("arm64: mm: Add support for folding PUDs at runtime") implements specialized PUD alloc/free helpers to allow the decision whether or not to fold PUDs to be made at runtime when the number of paging levels is 4 or higher. Its implementation of pud_free() is based on the generic version that existed when the patch was first written, but in the meantime, the freeing of a PUD has become a bit more involved, and so instead of simply freeing the page, we should invoke the generic __pud_free() that encapsulates whatever needs doing at this point. This fixes a reported warning emitted by the page flags self-diagnostics. Reported-by: Ryan Roberts <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]> Tested-by: Ryan Roberts <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-28	arm64: Update setup_arch() comment on interrupt masking	Ryo Takakura	1	-3/+2
	DAIF_PROCCTX_NOIRQ contains the FIQ bit. Update the comment as only asynchronous aborts are unmasked and FIQ is still masked. Signed-off-by: Ryo Takakura <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Mark Rutland <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2024-02-28	arm64: remove unnecessary ifdefs around is_compat_task()	Leonardo Bras	4	-16/+9
	Currently some parts of the codebase will test for CONFIG_COMPAT before testing is_compat_task(). is_compat_task() is a inlined function only present on CONFIG_COMPAT. On the other hand, for !CONFIG_COMPAT, we have in linux/compat.h: #define is_compat_task() (0) Since we have this define available in every usage of is_compat_task() for !CONFIG_COMPAT, it's unnecessary to keep the ifdefs, since the compiler is smart enough to optimize-out those snippets on CONFIG_COMPAT=n This requires some regset code as well as a few other defines to be made available on !CONFIG_COMPAT, so some symbols can get resolved before getting optimized-out. Signed-off-by: Leonardo Bras <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-28	arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with Clang	Stephen Boyd	1	-1/+1
	Per commit b3f11af9b2ce ("arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE"), GCC is silently ignoring `-falign-functions=N` when passed `-Os`, causing functions to be improperly aligned. This doesn't seem to be a problem with Clang though, where enabling CALL_OPS with CC_OPTIMIZE_FOR_SIZE doesn't spit out any warnings at boot about misaligned patch-sites. Only forbid CALL_OPS if GCC is used and we're optimizing for size so that CALL_OPS can be used with clang optimizing for size. Cc: Jason Ling <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Justin Stitt <[email protected]> Cc: [email protected] Fixes: b3f11af9b2ce ("arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE") Signed-off-by: Stephen Boyd <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-22	arm64: gitignore: ignore relacheck	Bartosz Golaszewski	1	-0/+3
	Add the generated executable for relacheck to the list of ignored files. Signed-off-by: Bartosz Golaszewski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-22	arm64/sme: Ensure that all fields in SMCR_EL1 are set to known values	Mark Brown	1	-0/+3
	At present nothing in our CPU initialisation code ever sets unknown fields in SMCR_EL1 to known values, all updates to SMCR_EL1 are read/modify/write sequences. All the unknown fields are RES0, explicitly initialise them as such to avoid future surprises. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-22	arm64/sve: Ensure that all fields in ZCR_EL1 are set to known values	Mark Brown	1	-0/+2
	At present nothing in our CPU initialisation code ever sets unknown fields in ZCR_EL1 to known values, all updates to ZCR_EL1 are read/modify/write sequences for LEN. All the unknown fields are RES0, explicitly initialise them as such to avoid future surprises. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-22	arm64/sve: Document that __SVE_VQ_MAX is much larger than needed	Mark Brown	1	-0/+11
	__SVE_VQ_MAX is defined without comment as 512 but the actual architectural maximum is 16, a substantial difference which might not be obvious to readers especially given the several different units used for specifying vector sizes in various contexts and the fact that it's often used via macros. In an effort to minimise surprises for users who might assume the value is the architectural maximum and use it to do things like size allocations add a comment noting the difference, and add a note for SVE_VQ_MAX to aid discoverability. Signed-off-by: Mark Brown <[email protected]> Acked-by: Dave Martin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-22	arm64: make member of struct pt_regs and it's offset macro in the same order	Kemeng Shi	1	-1/+1
	In struct pt_regs, member pstate is after member pc. Move offset macro of pstate after offset macro of pc to improve readability a little. Signed-off-by: Kemeng Shi <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-22	arm64: remove unneeded BUILD_BUG_ON assertion	Dawei Li	1	-3/+0
	Since commit c02433dd6de3 ("arm64: split thread_info from task stack"), CONFIG_THREAD_INFO_IN_TASK is enabled unconditionally for arm64. So remove this always-true assertion from arch_dup_task_struct. Signed-off-by: Dawei Li <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-21	arm64/sysreg: Update ID_AA64DFR0_EL1 register	Anshuman Khandual	1	-0/+2
	This updates ID_AA64DFR0_EL1.PMSVer and ID_AA64DFR0_EL1.DebugVer register fields as per the definitions based on DDI0601 2023-12. Cc: Will Deacon <[email protected]> Cc: Mark Brown <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-21	arm64/sysreg: Update ID_DFR0_EL1 register fields	Anshuman Khandual	1	-0/+2
	This updates ID_DFR0_EL1.PerfMon and ID_DFR0_EL1.CopDbg register fields as per the definitions based on DDI0601 2023-12. Cc: Will Deacon <[email protected]> Cc: Mark Brown <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-21	arm64/sysreg: Add register fields for ID_AA64DFR1_EL1	Anshuman Khandual	1	-1/+30
	This adds register fields for ID_AA64DFR1_EL1 as per the definitions based on DDI0601 2023-12. Cc: Will Deacon <[email protected]> Cc: Mark Brown <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-21	arm64/sme: Remove spurious 'is' in SME documentation	Mark Brown	1	-1/+1
	Just a typographical error. Reported-by: Edmund Grimley-Evans <[email protected]> Reviewed-by: Dave Martin <[email protected]> Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-21	arm64/fp: Clarify effect of setting an unsupported system VL	Mark Brown	2	-6/+4
	The documentation for system vector length configuration does not cover all cases where unsupported values are written, tighten it up. Reported-by: Edmund Grimley-Evans <[email protected]> Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Dave Martin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-21	arm64/sme: Fix cut'n'paste in ABI document	Mark Brown	1	-2/+2
	The ABI for SME is very like that for SVE so bits of the ABI were copied but not adequately search and replaced, fix that. Reported-by: Edmund Grimley-Evans <[email protected]> Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Dave Martin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-21	arm64/sve: Remove bitrotted comment about syscall behaviour	Mark Brown	1	-5/+0
	When we documented that we always clear state not shared with FPSIMD we didn't catch all of the places that mentioned that state might not be cleared, remove a lingering reference. Reported-by: Edmund Grimley-Evans <[email protected]> Reviewed-by: Dave Martin <[email protected]> Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-02-21	kselftest/arm64: Test that ptrace takes effect in the target process	Mark Brown	5	-1/+1800
	While we have test coverage for the ptrace interface in our selftests the current programs have a number of gaps. The testing is done per regset so does not cover interactions and at no point do any of the tests actually run the traced processes meaning that there is no validation that anything we read or write corresponds to register values the process actually sees. Let's add a new program which attempts to cover these gaps. Each test we do performs a single ptrace write. For each test we generate some random initial register data in memory and then fork() and trace a child. The child will load the generated data into the registers then trigger a breakpoint. The parent waits for the breakpoint then reads the entire child register state via ptrace, verifying that the values expected were actually loaded by the child. It then does the write being tested and resumes the child. Once resumed the child saves the register state it sees to memory and executes another breakpoint. The parent uses process_vm_readv() to get these values from the child and verifies that the values were as expected before cleaning up the child. We generate configurations with combinations of vector lengths and SVCR values and then try every ptrace write which will implement the transition we generated. In order to control execution time (especially in emulation) we only cover the minimum and maximum VL for each of SVE and SME, this will ensure we generate both increasing and decreasing changes in vector length. In order to provide a baseline test we also check the case where we resume the child without doing a ptrace write. In order to simplify the generation of the test count for kselftest we will report but skip a substantial number of tests that can't actually be expressed via a single ptrace write, several times more than we actually run. This is noisy and will add some overhead but is very much simpler so is probably worth the tradeoff. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>