aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-07-12Documentation/powerpc: Mention 40x is removedChristophe Leroy2-18/+1
Commit 732b32daef80 ("powerpc: Remove core support for 40x") removed 40x. Update documentation accordingly. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/c2d64bebc514ca892a12e51a68821a6317048d3a.1720694954.git.christophe.leroy@csgroup.eu
2024-07-12powerpc: Remove 40x leftoversChristophe Leroy2-2/+1
Remove stale references to 40x. Fixes: e939da89d024 ("powerpc: Remove 40x from Kconfig and defconfig") Fixes: 548f5244f106 ("powerpc/40x: Remove EP405") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/ab30ae302783d8617d407864b92db1b926ab5ab9.1720694914.git.christophe.leroy@csgroup.eu
2024-07-11macintosh/therm_windtunnel: fix module unload.Nick Bowler1-1/+1
The of_device_unregister call in therm_windtunnel's module_exit procedure does not fully reverse the effects of of_platform_device_create in the module_init prodedure. Once you unload this module, it is impossible to load it ever again since only the first of_platform_device_create call on the fan node succeeds. This driver predates first git commit, and it turns out back then of_platform_device_create worked differently than it does today. So this is actually an old regression. The appropriate function to undo of_platform_device_create now appears to be of_platform_device_destroy, and switching to use this makes it possible to unload and load the module as expected. Signed-off-by: Nick Bowler <[email protected]> Fixes: c6e126de43e7 ("of: Keep track of populated platform devices") Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-11powerpc: Check only single values are passed to CPU/MMU feature checksMichael Ellerman2-0/+2
cpu_has_feature()/mmu_has_feature() are only able to check a single feature at a time, but there is no enforcement of that. In fact, as fixed in the previous commit, there was code that was passing multiple values to cpu_has_feature(). So add a check that only a single feature is passed using popcount. Note that the test allows 0 or 1 bits to be set, because some code relies on cpu_has_feature(0) being false, the check with CPU_FTRS_POSSIBLE ensures that. See for example CPU_FTR_PPC_LE. Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-11powerpc/xmon: Fix disassembly CPU feature checksMichael Ellerman1-22/+11
In the xmon disassembly code there are several CPU feature checks to determine what dialects should be passed to the disassembler. The dialect controls which instructions the disassembler will recognise. Unfortunately the checks are incorrect, because instead of passing a single CPU feature they are passing a mask of feature bits. For example the code: if (cpu_has_feature(CPU_FTRS_POWER5)) dialect |= PPC_OPCODE_POWER5; Is trying to check if the system is running on a Power5 CPU. But CPU_FTRS_POWER5 is a mask of *all* the feature bits that are enabled on a Power5. In practice the test will always return true for any 64-bit CPU, because at least one bit in the mask will be present in the CPU_FTRS_ALWAYS mask. Similarly for all the other checks against CPU_FTRS_xx masks. Rather than trying to match the disassembly behaviour exactly to the current CPU, just differentiate between 32-bit and 64-bit, and Altivec, VSX and HTM. That will cause some instructions to be shown in disassembly even on a CPU that doesn't support them, but that's OK, objdump -d output has the same behaviour, and if anything it's less confusing than some instructions not being disassembled. Fixes: 897f112bb42e ("[POWERPC] Import updated version of ppc disassembly code for xmon") Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-11powerpc: Drop clang workaround for builtin constant checksMichael Ellerman2-4/+0
The CPU/MMU feature code has build-time checks that the feature value is a builtin constant. Back when the code was added clang wasn't able to compile the checks, so an ifdef was added to avoid the checks for clang builds. See commit b5fa0f7f88ed ("powerpc: Fix build failure with clang due to BUILD_BUG_ON()") These days clang 13 and later are able to build the checks successfully, so drop the workaround. Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-11powerpc64/bpf: jit support for signed division and moduloArtem Savkov2-8/+34
Add jit support for sign division and modulo. Tested using test_bpf module. Signed-off-by: Artem Savkov <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-11powerpc64/bpf: jit support for sign extended movArtem Savkov1-2/+8
Add jit support for sign extended mov. Tested using test_bpf module. Signed-off-by: Artem Savkov <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-11powerpc64/bpf: jit support for sign extended loadArtem Savkov2-19/+43
Add jit support for sign extended load. Tested using test_bpf module. Signed-off-by: Artem Savkov <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-11powerpc64/bpf: jit support for unconditional byte swapArtem Savkov1-1/+2
Add jit support for unconditional byte swap. Tested using BSWAP tests from test_bpf module. Signed-off-by: Artem Savkov <[email protected]> Reviewed-by: Hari Bathini <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-11powerpc64/bpf: jit support for 32bit offset jmp instructionArtem Savkov1-0/+3
Add jit support for JMP32_JA instruction. Tested using test_bpf module. Signed-off-by: Artem Savkov <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-04powerpc/pci: Hotplug driver bridge supportKrishna Kumar1-3/+32
There is an issue with the hotplug operation when it's done on the bridge/switch slot. The bridge-port and devices behind the bridge, which become offline by hot-unplug operation, don't get hot-plugged/enabled by doing hot-plug operation on that slot. Only the first port of the bridge gets enabled and the remaining port/devices remain unplugged. The hot plug/unplug operation is done by the hotplug driver (drivers/pci/ hotplug/pnv_php.c). This behavior is due to missing code for the switch/bridge. The existing driver depends on pci_hp_add_devices() function for device enablement. This function calls pci_scan_slot() on only one device-node/port of the bridge, not on all the siblings' device-node/port. The missing code needs to be added which will find all the sibling device-nodes/bridge-ports and will run explicit pci_scan_slot() on those. A new function has been added for this purpose which is invoked from pci_hp_add_devices(). This new function traverse_siblings_and_scan_slot() gets all the sibling bridge-ports by traversal and explicitly invokes pci_scan_slot() on them. Tested-by: Shawn Anastasio <[email protected]> Signed-off-by: Krishna Kumar <[email protected]> [mpe: Move the code into pci-hotplug.c] Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-04pci/hotplug/pnv_php: Fix hotplug driver crash on PowernvKrishna Kumar1-2/+1
The hotplug driver for powerpc (pci/hotplug/pnv_php.c) causes a kernel crash when we try to hot-unplug/disable the PCIe switch/bridge from the PHB. The crash occurs because although the MSI data structure has been released during disable/hot-unplug path and it has been assigned with NULL, still during unregistration the code was again trying to explicitly disable the MSI which causes the NULL pointer dereference and kernel crash. The patch fixes the check during unregistration path to prevent invoking pci_disable_msi/msix() since its data structure is already freed. Reported-by: Timothy Pearson <[email protected]> Closes: https://lore.kernel.org/all/1981605666.2142272.1703742465927.JavaMail.zimbra@raptorengineeringinc.com/ Acked-by: Bjorn Helgaas <[email protected]> Tested-by: Shawn Anastasio <[email protected]> Signed-off-by: Krishna Kumar <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-04powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFCEsben Haabendal1-0/+2
With CONFIG_FSL_IFC now being user-visible, and thus changed from a select to depends in CONFIG_MTD_NAND_FSL_IFC, the dependencies needs to be selected in defconfigs. Depends-on: 9ba0cae3cac0 ("memory: fsl_ifc: Make FSL_IFC config visible and selectable") Signed-off-by: Esben Haabendal <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-04powerpc: add missing MODULE_DESCRIPTION() macrosJeff Johnson9-0/+9
With ARCH=powerpc, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/kernel/rtas_flash.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/sysdev/rtc_cmos_setup.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/pseries/papr_scm.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/spufs/spufs.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_thermal.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cpufreq_spudemand.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_powerbutton.o Add the missing invocation of the MODULE_DESCRIPTION() macro to all files which have a MODULE_LICENSE(). This includes 85xx/t1042rdb_diu.c and chrp/nvram.c which, although they did not produce a warning with the powerpc allmodconfig configuration, may cause this warning with other configurations. Signed-off-by: Jeff Johnson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-04macintosh/mac_hid: add MODULE_DESCRIPTION()Jeff Johnson1-0/+1
Fix the make W=1 warning: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/macintosh/mac_hid.o Signed-off-by: Jeff Johnson <[email protected]> [mpe: Change the description to just "Mouse button 2+3 emulation"] Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-04powerpc/kexec: Use of_property_read_reg()Rob Herring (Arm)1-24/+13
Replace open-coded parsing of "reg" property with of_property_read_reg(). Signed-off-by: Rob Herring (Arm) <[email protected]> [mpe: Rename end to size, add comment to the bail out case] Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-04powerpc/64s/radix/kfence: map __kfence_pool at page granularityHari Bathini3-5/+93
When KFENCE is enabled, total system memory is mapped at page level granularity. But in radix MMU mode, ~3GB additional memory is needed to map 100GB of system memory at page level granularity when compared to using 2MB direct mapping.This is not desired considering KFENCE is designed to be enabled in production kernels [1]. Mapping only the memory allocated for KFENCE pool at page granularity is sufficient to enable KFENCE support. So, allocate __kfence_pool during bootup and map it at page granularity instead of mapping all system memory at page granularity. Without patch: # cat /proc/meminfo MemTotal: 101201920 kB With patch: # cat /proc/meminfo MemTotal: 104483904 kB Note that enabling KFENCE at runtime is disabled for radix MMU for now, as it depends on the ability to split page table mappings and such APIs are not currently implemented for radix MMU. All kfence_test.c testcases passed with this patch. [1] https://lore.kernel.org/all/[email protected]/ Signed-off-by: Hari Bathini <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-04powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with ↵Shivaprasad G Bhat1-0/+8
CONFIG_IOMMU_API The patch fixes the below warning: arch/powerpc/platforms/pseries/iommu.c:1824:37: warning: 'spapr_tce_table_group_ops' defined but not used Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: b09c031d9433 ("powerpc/iommu: Move pSeries specific functions to pseries/iommu.c") Signed-off-by: Shivaprasad G Bhat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-07-01selftests/sigaltstack: Fix ppc64 GCC buildMichael Ellerman1-1/+1
Building the sigaltstack test with GCC on 64-bit powerpc errors with: gcc -Wall sas.c -o /home/michael/linux/.build/kselftest/sigaltstack/sas In file included from sas.c:23: current_stack_pointer.h:22:2: error: #error "implement current_stack_pointer equivalent" 22 | #error "implement current_stack_pointer equivalent" | ^~~~~ sas.c: In function ‘my_usr1’: sas.c:50:13: error: ‘sp’ undeclared (first use in this function); did you mean ‘p’? 50 | if (sp < (unsigned long)sstack || | ^~ This happens because GCC doesn't define __ppc__ for 64-bit builds, only 32-bit builds. Instead use __powerpc__ to detect powerpc builds, which is defined by clang and GCC for 64-bit and 32-bit builds. Fixes: 05107edc9101 ("selftests: sigaltstack: fix -Wuninitialized") Cc: [email protected] # v6.3+ Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/prom: Add CPU info to hardware description string laterNathan Lynch1-4/+8
cur_cpu_spec->cpu_name is appended to ppc_hw_desc before cur_cpu_spec has taken on its final value. This is illustrated on pseries by comparing the CPU name as reported at boot ("POWER8E (raw)") to the contents of /proc/cpuinfo ("POWER8 (architected)"): $ dmesg | grep Hardware Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 \ of:IBM,FW860.50 (SV860_146) hv:phyp pSeries $ grep -m 1 ^cpu /proc/cpuinfo cpu : POWER8 (architected), altivec supported Some 44x models would appear to be affected as well; see identical_pvr_fixup(). This results in incorrect CPU information in stack dumps -- ppc_hw_desc is an input to dump_stack_set_arch_desc(). Delay gathering the CPU name until after all potential calls to identify_cpu(). Signed-off-by: Nathan Lynch <[email protected]> Fixes: bd649d40e0f2 ("powerpc: Add PVR & CPU name to hardware description") Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/rtas: Prevent Spectre v1 gadget construction in sys_rtas()Nathan Lynch1-0/+4
Smatch warns: arch/powerpc/kernel/rtas.c:1932 __do_sys_rtas() warn: potential spectre issue 'args.args' [r] (local cap) The 'nargs' and 'nret' locals come directly from a user-supplied buffer and are used as indexes into a small stack-based array and as inputs to copy_to_user() after they are subject to bounds checks. Use array_index_nospec() after the bounds checks to clamp these values for speculative execution. Signed-off-by: Nathan Lynch <[email protected]> Reported-by: Breno Leitao <[email protected]> Reviewed-by: Breno Leitao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/platforms: Move files from 4xx to 44xChristophe Leroy12-32/+20
Only 44x uses 4xx now, so only keep one directory. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc: Replace CONFIG_4xx with CONFIG_44xMichael Ellerman10-19/+13
Replace 4xx usage with 44x, and replace 4xx_SOC with 44x. Also, as pointed out by Christophe, if 44x || BOOKE can be simplified to just test BOOKE, because 44x always selects BOOKE. Retain the CONFIG_4xx symbol, as there are drivers that use it to mean 4xx || 44x, those will need updating before CONFIG_4xx can be removed. Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/4xx: Remove CONFIG_BOOKE_OR_40xMichael Ellerman18-26/+21
Now that 40x is gone, replace CONFIG_BOOKE_OR_40x by CONFIG_BOOKE. Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc: Remove core support for 40xChristophe Leroy29-1728/+14
Now that 40x platforms have gone, remove support for 40x in the core of powerpc arch. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc: Remove 40x from Kconfig and defconfigMichael Ellerman5-43/+10
Remove 40x from Kconfig, making the code unreachable. Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/boot: Remove all 40x platforms from bootChristophe Leroy17-2875/+1
Remove 40x platforms from the boot directory. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/40x: Remove 40x platforms.Christophe Leroy13-587/+0
40x platforms have been orphaned for many years. Remove them. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28macintosh: Drop explicit initialization of struct i2c_device_id::driver_data ↵Uwe Kleine-König6-6/+6
to 0 These drivers don't use the driver_data member of struct i2c_device_id, so don't explicitly initialize this member. This prepares putting driver_data in an anonymous union which requires either no initialization or named designators. But it's also a nice cleanup on its own. Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/iommu: Reimplement the iommu_table_group_ops for pSeriesShivaprasad G Bhat4-98/+543
PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows for PEs, their ownership transfer, create/set/unset the TCE tables for the Dynamic DMA wundows(DDW). VFIOS uses these APIs for support on POWER. The commit 9d67c9433509 ("powerpc/iommu: Add "borrowing" iommu_table_group_ops") implemented partial support for this API with "borrow" mechanism wherein the DMA windows if created already by the host driver, they would be available for VFIO to use. Also, it didn't have the support to control/modify the window size or the IO page size. The current patch implements all the necessary iommu_table_group_ops APIs there by avoiding the "borrrowing". So, just the way it is on the PowerNV platform, with this patch the iommu table group ownership is transferred to the VFIO PPC subdriver, the iommu table, DMA windows creation/deletion all driven through the APIs. The pSeries uses the query-pe-dma-window, create-pe-dma-window and reset-pe-dma-window RTAS calls for DMA window creation, deletion and reset to defaul. The RTAs calls do show some minor differences to the way things are to be handled on the pSeries which are listed below. * On pSeries, the default DMA window size is "fixed" cannot be custom sized as requested by the user. For non-SRIOV VFs, It is fixed at 2GB and for SRIOV VFs, its variable sized based on the capacity assigned to it during the VF assignment to the LPAR. So, for the default DMA window alone the size if requested less than tce32_size, the smaller size is enforced using the iommu table->it_size. * The DMA start address for 32-bit window is 0, and for the 64-bit window in case of PowerNV is hardcoded to TVE select (bit 59) at 512PiB offset. This address is returned at the time of create_table() API call (even before the window is created), the subsequent set_window() call actually opens the DMA window. On pSeries, the DMA start address for 32-bit window is known from the 'ibm,dma-window' DT property. However, the 64-bit window start address is not known until the create-pe-dma RTAS call is made. So, the create_table() which returns the DMA window start address actually opens the DMA window and returns the DMA start address as returned by the Hypervisor for the create-pe-dma RTAS call. * The reset-pe-dma RTAS call resets the DMA windows and restores the default DMA window, however it does not clear the TCE table entries if there are any. In case of ownership transfer from platform domain which used direct mapping, the patch chooses remove-pe-dma instead of reset-pe for the 64-bit window intentionally so that the clear_dma_window() is called. Other than the DMA window management changes mentioned above, the patch also brings back the userspace view for the single level TCE as it existed before commit 090bad39b237a ("powerpc/powernv: Add indirect levels to it_userspace") along with the relavent refactoring. Signed-off-by: Shivaprasad G Bhat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/iommu: Move dev_has_iommu_table() to iommu.cShivaprasad G Bhat3-16/+23
Move function dev_has_iommu_table() to powerpc/kernel/iommu.c as it is going to be used by machine specific iommu code as well in subsequent patches. Signed-off-by: Shivaprasad G Bhat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28vfio/spapr: Always clear TCEs before unsetting the windowShivaprasad G Bhat1-4/+9
The PAPR expects the TCE table to have no entries at the time of unset window(i.e. remove-pe). The TCE clear right now is done before freeing the iommu table. On pSeries, the unset window makes those entries inaccessible to the OS and the H_PUT/GET calls fail on them with H_CONSTRAINED. On PowerNV, this has no side effect as the TCE clear can be done before the DMA window removal as well. Signed-off-by: Shivaprasad G Bhat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/pseries/iommu: Use the iommu table[0] for IOV VF's DDWShivaprasad G Bhat1-5/+7
This patch basically brings consistency with PowerNV approach to use the first freely available iommu table when the default window is removed. The pSeries iommu code convention has been that the table[0] is for the default 32 bit DMA window and the table[1] is for the 64 bit DDW. With VFs having only 1 DMA window, the default has to be removed for creating the larger DMA window. The existing code uses the table[1] for that, while marking the table[0] as NULL. This is fine as long as the host driver itself uses the device. For the VFIO user, on pSeries there is no way to skip table[0] as the VFIO subdriver uses the first freely available table. The window 0, when created as 64-bit DDW in that context would still be on table[0], as the maximum number of windows is 1. This won't have any impact for the host driver as the table is fetched from the device's iommu_table_base. Signed-off-by: Shivaprasad G Bhat <[email protected]> [mpe: Rebase and resolve conflicts] Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/pseries/iommu: Fix the VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl outputShivaprasad G Bhat1-14/+67
The ioctl VFIO_IOMMU_SPAPR_TCE_GET_INFO is not reporting the actuals on the platform as not all the details are correctly collected during the platform probe/scan into the iommu_table_group. Collect the information during the device setup time as the DMA window property is already looked up on parent nodes anyway. Signed-off-by: Shivaprasad G Bhat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-28powerpc/iommu: Move pSeries specific functions to pseries/iommu.cShivaprasad G Bhat3-148/+149
The PowerNV specific table_group_ops are defined in powernv/pci-ioda.c. The pSeries specific table_group_ops are sitting in the generic powerpc file. Move it to where it actually belong(pseries/iommu.c). The functions are currently defined even for CONFIG_PPC_POWERNV which are unused on PowerNV. Only code movement, no functional changes intended. Signed-off-by: Shivaprasad G Bhat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-17powerpc/kexec_file: fix cpus node update to FDTSourabh Jain1-16/+37
While updating the cpus node, commit 40c753993e3a ("powerpc/kexec_file: Use current CPU info while setting up FDT") first deletes all subnodes under the /cpus node. However, while adding sub-nodes back, it missed adding cpus subnodes whose device_type != "cpu", such as l2-cache*, l3-cache*, ibm,powerpc-cpu-features. Fix this by only deleting cpus sub-nodes of device_type == "cpus" and then adding all available nodes with device_type == "cpu". Fixes: 40c753993e3a ("powerpc/kexec_file: Use current CPU info while setting up FDT") Signed-off-by: Sourabh Jain <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-17powerpc/kexec_file: fix extra size calculation for kexec FDTSourabh Jain3-38/+33
While setting up the FDT for kexec, CPU nodes that are added after the system boots and reserved memory ranges are incorporated into the initial_boot_params (base FDT). However, they are not taken into account when determining the additional size needed for the kexec FDT. As a result, kexec fails to load, generating the following error: [1116.774451] Error updating memory reserve map: FDT_ERR_NOSPACE kexec_file_load failed: No such process Therefore, consider the extra size for CPU nodes added post-system boot and reserved memory ranges while preparing the kexec FDT. While adding a new parameter to the setup_new_fdt_ppc64 function, it was noticed that there were a couple of unused parameters, so they were removed. Signed-off-by: Sourabh Jain <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-17powerpc/perf: Set cpumode flags using sample addressAnjali K3-28/+23
Currently in some cases, when the sampled instruction address register latches to a specific address during sampling, the privilege bits captured in the sampled event register are incorrect. For example, a snippet from the perf report on a power10 system is: Overhead Address Command Shared Object Symbol ........ .................. ............ ................. ....................... 2.41% 0x7fff9f94a02c null_syscall [unknown] [k] 0x00007fff9f94a02c 2.20% 0x7fff9f94a02c null_syscall libc.so.6 [.] syscall perf_get_misc_flags() function looks at the privilege bits to return the corresponding flags to be used for the address symbol and these privilege bit details are read from the sampled event register. In the above snippet, address "0x00007fff9f94a02c" is shown as "k" (kernel) due to the incorrect privilege bits captured in the sampled event register. To address this case check whether the sampled address is in the kernel area. Since this is specific to the latest platform, a new pmu flag is added called "PPMU_P10" and is used to contain the proposed fix. PPMU_P10_DD1 marked events are also included under PPMU_P10, hence remove the code specific to PPMU_P10_DD1 marked events. Signed-off-by: Anjali K <[email protected]> Reviewed-by: Athira Rajeev <[email protected] <mailto:[email protected]>> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-12powerpc/configs: drop RT_GROUP_SCHED=y from ppc6xx_defconfigCeleste Liu1-1/+0
Commit 6e5f1537833a ("powerpc: Add a 6xx defconfig") said it was copied from fedore's ppc32 defconfig, but at least since 2015-06-10, Fedora has dropped this option.[1] For cgroup v1, if turned on, and there's any cgroup in the "cpu" hierarchy it needs an RT budget assigned, otherwise the processes in it will not be able to get RT at all. The problem with RT group scheduling is that it requires the budget assigned but there's no way we could assign a default budget, since the values to assign are both upper and lower time limits, are absolute, and need to be sum up to < 1 for each individal cgroup. That means we cannot really come up with values that would work by default in the general case.[1] For cgroup v2, it's almost unusable as well. If it turned on, the cpu controller can only be enabled when all RT processes are in the root cgroup. But it will lose the benefits of cgroup v2 if all RT process were placed in the same cgroup. systemd also doesn't support it.[2] [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1229700 [2]: https://github.com/systemd/systemd/issues/13781#issuecomment-549164383 Signed-off-by: Celeste Liu <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-04powerpc/mm/drmem: Silence drmem_init() early returnNathan Lynch1-3/+1
It's not an error or noteworthy condition if the "ibm,dynamic-reconfiguration-memory" node isn't present. Drop the needless message. Signed-off-by: Nathan Lynch <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-04powerpc/pseries/vas: Use usleep_range() to support HCALL delayHaren Myneni1-1/+21
VAS allocate, modify and deallocate HCALLs returns H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy delay and expects OS to reissue HCALL after that delay. But using msleep() will often sleep at least 20 msecs even though the hypervisor suggests OS reissue these HCALLs after 1 or 10msecs. The open and close VAS window functions hold mutex and then issue these HCALLs. So these operations can take longer than the necessary when multiple threads issue open or close window APIs simultaneously, especially might affect the performance in the case of repeat open/close APIs for each compression request. Multiple tasks can open / close VAS windows at the same time which depends on the available VAS credits. For example, 240 cores system provides 4800 VAS credits. It means 4800 tasks can execute open VAS windows HCALLs with the mutex. Since each msleep() will often sleep more than 20 msecs, some tasks are waiting more than 120 secs to acquire mutex. It can cause hung traces for these tasks in dmesg due to mutex contention around open/close HCALLs. Instead of msleep(), use usleep_range() to ensure sleep with the expected value before issuing HCALL again. So since each task sleep 10 msecs maximum, this patch allow more tasks can issue open/close VAS calls without any hung traces in the dmesg. Signed-off-by: Haren Myneni <[email protected]> Suggested-by: Nathan Lynch <[email protected]> Reviewed-by: Nathan Lynch <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-04powerpc/numa: Online a node if PHB is attached.Nilay Shroff2-1/+27
In the current design, a numa-node is made online only if that node is attached to cpu/memory. With this design, if any PCI/IO device is found to be attached to a numa-node which is not online then the numa-node id of the corresponding PCI/IO device is set to NUMA_NO_NODE(-1). This design may negatively impact the performance of PCIe device if the numa-node assigned to PCIe device is -1 because in such case we may not be able to accurately calculate the distance between two nodes. The multi-controller NVMe PCIe disk has an issue with calculating the node distance if the PCIe NVMe controller is attached to a PCI host bridge which has numa-node id value set to NUMA_NO_NODE. This patch helps fix this ensuring that a cpu/memory less numa node is made online if it's attached to PCI host bridge. Signed-off-by: Nilay Shroff <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-04powerpc/pseries/iommu: Split Dynamic DMA Window to be used in Hybrid modeGaurav Batra2-17/+54
Dynamic DMA Window (DDW) supports TCEs that are backed by 2MB page size. In most configurations, DDW is big enough to pre-map all of LPAR memory for IO. Pre-mapping of memory for DMA results in improvements in IO performance. Persistent memory, vPMEM, can be assigned to an LPAR as well. vPMEM is not contiguous with LPAR memory and usually is assigned at high memory addresses. This makes is not possible to pre-map both vPMEM and LPAR memory in the same DDW. For a dedicated adapter this limitation is not an issue. Dedicated adapters can have both Default DMA window, which is backed by 4K page size and a DDW backed by 2MB page size TCEs. In this scenario, LPAR memory is pre-mapped in the DDW. Any DMA going to the vPMEM is routed via dynamically allocated TCEs in the default window. The issue arises with SR-IOV adapters. There is only one DMA window - either Default or DDW. If an LPAR has vPMEM assigned, memory is not pre-mapped in the DDW since TCEs needs to be allocated for vPMEM as well. In this case, DDW is created and TCEs are dynamically allocated for both vPMEM and LPAR memory. Today, DDW is only used in single mode - direct mapped TCEs or dynamically mapped TCEs. This enhancement breaks a single DDW in 2 regions - 1. First region to pre-map LPAR memory 2. Second region to dynamically allocate TCEs for IO to vPMEM The DDW is split only if it is big enough to pre-map complete LPAR memory and still have some space left to dynamically map vPMEM. Maximum size possible DDW is created as permitted by the Hypervisor. Signed-off-by: Gaurav Batra <[email protected]> Reviewed-by: Brian King <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-03powerpc/pseries: Remove unused cede related functionsGautam Menghani1-28/+0
Remove extended_cede_processor() and its helpers as extended_cede_processor() has no callers since commit 48f6e7f6d948("powerpc/pseries: remove cede offline state for CPUs") Signed-off-by: Gautam Menghani <[email protected]> Acked-by: Naveen N Rao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-06-02Linux 6.10-rc2Linus Torvalds1-1/+1
2024-06-02Merge tag 'ata-6.10-rc2' of ↵Linus Torvalds2-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fixes from Niklas Cassel: - Add a quirk for three different devices that have shown issues with LPM (link power management). These devices appear to not implement LPM properly, since we see command timeouts when enabling LPM. The quirk disables LPM for these problematic devices. (Me) - Do not apply the Intel PCS quirk on Alder Lake. The quirk is not needed and was originally added by mistake when LPM support was enabled for this AHCI controller. Enabling the quirk when not needed causes the the controller to not be able to detect the connected devices on some platforms. * tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340 ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1 ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake
2024-06-02Merge tag 'x86-urgent-2024-06-02' of ↵Linus Torvalds4-12/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Miscellaneous topology parsing fixes: - Fix topology parsing regression on older CPUs in the new AMD/Hygon parser - Fix boot crash on odd Intel Quark and similar CPUs that do not fill out cpuinfo_x86::x86_clflush_size and zero out cpuinfo_x86::x86_cache_alignment as a result. Provide 32 bytes as a general fallback value. - Fix topology enumeration on certain rare CPUs where the BIOS locks certain CPUID leaves and the kernel unlocked them late, which broke with the new topology parsing code. Factor out this unlocking logic and move it earlier in the parsing sequence" * tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/topology/intel: Unlock CPUID before evaluating anything x86/cpu: Provide default cache line size if not enumerated x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater
2024-06-02Merge tag 'sched-urgent-2024-06-02' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Export a symbol to make life easier for instrumentation/debugging" * tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/x86: Export 'percpu arch_freq_scale'
2024-06-02Merge tag 'perf-urgent-2024-06-02' of ↵Linus Torvalds3-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fix from Ingo Molnar: "Add missing MODULE_DESCRIPTION() lines" * tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Add missing MODULE_DESCRIPTION() lines perf/x86/rapl: Add missing MODULE_DESCRIPTION() line