Age | Commit message (Collapse) | Author | Files | Lines |
|
Commit 732b32daef80 ("powerpc: Remove core support for 40x") removed 40x.
Update documentation accordingly.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/c2d64bebc514ca892a12e51a68821a6317048d3a.1720694954.git.christophe.leroy@csgroup.eu
|
|
Remove stale references to 40x.
Fixes: e939da89d024 ("powerpc: Remove 40x from Kconfig and defconfig")
Fixes: 548f5244f106 ("powerpc/40x: Remove EP405")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/ab30ae302783d8617d407864b92db1b926ab5ab9.1720694914.git.christophe.leroy@csgroup.eu
|
|
The of_device_unregister call in therm_windtunnel's module_exit procedure
does not fully reverse the effects of of_platform_device_create in the
module_init prodedure. Once you unload this module, it is impossible
to load it ever again since only the first of_platform_device_create
call on the fan node succeeds.
This driver predates first git commit, and it turns out back then
of_platform_device_create worked differently than it does today.
So this is actually an old regression.
The appropriate function to undo of_platform_device_create now appears
to be of_platform_device_destroy, and switching to use this makes it
possible to unload and load the module as expected.
Signed-off-by: Nick Bowler <[email protected]>
Fixes: c6e126de43e7 ("of: Keep track of populated platform devices")
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
cpu_has_feature()/mmu_has_feature() are only able to check a single
feature at a time, but there is no enforcement of that.
In fact, as fixed in the previous commit, there was code that was
passing multiple values to cpu_has_feature().
So add a check that only a single feature is passed using popcount.
Note that the test allows 0 or 1 bits to be set, because some code
relies on cpu_has_feature(0) being false, the check with
CPU_FTRS_POSSIBLE ensures that. See for example CPU_FTR_PPC_LE.
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
In the xmon disassembly code there are several CPU feature checks to
determine what dialects should be passed to the disassembler. The
dialect controls which instructions the disassembler will recognise.
Unfortunately the checks are incorrect, because instead of passing a
single CPU feature they are passing a mask of feature bits.
For example the code:
if (cpu_has_feature(CPU_FTRS_POWER5))
dialect |= PPC_OPCODE_POWER5;
Is trying to check if the system is running on a Power5 CPU. But
CPU_FTRS_POWER5 is a mask of *all* the feature bits that are enabled on
a Power5.
In practice the test will always return true for any 64-bit CPU, because
at least one bit in the mask will be present in the CPU_FTRS_ALWAYS
mask.
Similarly for all the other checks against CPU_FTRS_xx masks.
Rather than trying to match the disassembly behaviour exactly to the
current CPU, just differentiate between 32-bit and 64-bit, and Altivec,
VSX and HTM.
That will cause some instructions to be shown in disassembly even
on a CPU that doesn't support them, but that's OK, objdump -d output
has the same behaviour, and if anything it's less confusing than some
instructions not being disassembled.
Fixes: 897f112bb42e ("[POWERPC] Import updated version of ppc disassembly code for xmon")
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The CPU/MMU feature code has build-time checks that the feature value is
a builtin constant.
Back when the code was added clang wasn't able to compile the
checks, so an ifdef was added to avoid the checks for clang builds.
See commit b5fa0f7f88ed ("powerpc: Fix build failure with clang due
to BUILD_BUG_ON()")
These days clang 13 and later are able to build the checks successfully,
so drop the workaround.
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Add jit support for sign division and modulo. Tested using test_bpf
module.
Signed-off-by: Artem Savkov <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Add jit support for sign extended mov. Tested using test_bpf module.
Signed-off-by: Artem Savkov <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Add jit support for sign extended load. Tested using test_bpf module.
Signed-off-by: Artem Savkov <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Add jit support for unconditional byte swap. Tested using BSWAP tests
from test_bpf module.
Signed-off-by: Artem Savkov <[email protected]>
Reviewed-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Add jit support for JMP32_JA instruction. Tested using test_bpf module.
Signed-off-by: Artem Savkov <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
There is an issue with the hotplug operation when it's done on the
bridge/switch slot. The bridge-port and devices behind the bridge, which
become offline by hot-unplug operation, don't get hot-plugged/enabled
by doing hot-plug operation on that slot. Only the first port of the
bridge gets enabled and the remaining port/devices remain unplugged. The
hot plug/unplug operation is done by the hotplug driver (drivers/pci/
hotplug/pnv_php.c).
This behavior is due to missing code for the switch/bridge. The existing
driver depends on pci_hp_add_devices() function for device enablement.
This function calls pci_scan_slot() on only one device-node/port of the
bridge, not on all the siblings' device-node/port.
The missing code needs to be added which will find all the sibling
device-nodes/bridge-ports and will run explicit pci_scan_slot()
on those. A new function has been added for this purpose
which is invoked from pci_hp_add_devices(). This new function
traverse_siblings_and_scan_slot() gets all the sibling bridge-ports by
traversal and explicitly invokes pci_scan_slot() on them.
Tested-by: Shawn Anastasio <[email protected]>
Signed-off-by: Krishna Kumar <[email protected]>
[mpe: Move the code into pci-hotplug.c]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The hotplug driver for powerpc (pci/hotplug/pnv_php.c) causes a kernel
crash when we try to hot-unplug/disable the PCIe switch/bridge from
the PHB.
The crash occurs because although the MSI data structure has been
released during disable/hot-unplug path and it has been assigned
with NULL, still during unregistration the code was again trying to
explicitly disable the MSI which causes the NULL pointer dereference and
kernel crash.
The patch fixes the check during unregistration path to prevent invoking
pci_disable_msi/msix() since its data structure is already freed.
Reported-by: Timothy Pearson <[email protected]>
Closes: https://lore.kernel.org/all/1981605666.2142272.1703742465927.JavaMail.zimbra@raptorengineeringinc.com/
Acked-by: Bjorn Helgaas <[email protected]>
Tested-by: Shawn Anastasio <[email protected]>
Signed-off-by: Krishna Kumar <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
With CONFIG_FSL_IFC now being user-visible, and thus changed from a select
to depends in CONFIG_MTD_NAND_FSL_IFC, the dependencies needs to be
selected in defconfigs.
Depends-on: 9ba0cae3cac0 ("memory: fsl_ifc: Make FSL_IFC config visible and selectable")
Signed-off-by: Esben Haabendal <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
With ARCH=powerpc, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/kernel/rtas_flash.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/sysdev/rtc_cmos_setup.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/pseries/papr_scm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/spufs/spufs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_thermal.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cpufreq_spudemand.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_powerbutton.o
Add the missing invocation of the MODULE_DESCRIPTION() macro to all
files which have a MODULE_LICENSE().
This includes 85xx/t1042rdb_diu.c and chrp/nvram.c which, although
they did not produce a warning with the powerpc allmodconfig
configuration, may cause this warning with other configurations.
Signed-off-by: Jeff Johnson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Fix the make W=1 warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/macintosh/mac_hid.o
Signed-off-by: Jeff Johnson <[email protected]>
[mpe: Change the description to just "Mouse button 2+3 emulation"]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Replace open-coded parsing of "reg" property with
of_property_read_reg().
Signed-off-by: Rob Herring (Arm) <[email protected]>
[mpe: Rename end to size, add comment to the bail out case]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
When KFENCE is enabled, total system memory is mapped at page level
granularity. But in radix MMU mode, ~3GB additional memory is needed
to map 100GB of system memory at page level granularity when compared
to using 2MB direct mapping.This is not desired considering KFENCE is
designed to be enabled in production kernels [1].
Mapping only the memory allocated for KFENCE pool at page granularity is
sufficient to enable KFENCE support. So, allocate __kfence_pool during
bootup and map it at page granularity instead of mapping all system
memory at page granularity.
Without patch:
# cat /proc/meminfo
MemTotal: 101201920 kB
With patch:
# cat /proc/meminfo
MemTotal: 104483904 kB
Note that enabling KFENCE at runtime is disabled for radix MMU for now,
as it depends on the ability to split page table mappings and such APIs
are not currently implemented for radix MMU.
All kfence_test.c testcases passed with this patch.
[1] https://lore.kernel.org/all/[email protected]/
Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
CONFIG_IOMMU_API
The patch fixes the below warning:
arch/powerpc/platforms/pseries/iommu.c:1824:37: warning:
'spapr_tce_table_group_ops' defined but not used
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Fixes: b09c031d9433 ("powerpc/iommu: Move pSeries specific functions to pseries/iommu.c")
Signed-off-by: Shivaprasad G Bhat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Building the sigaltstack test with GCC on 64-bit powerpc errors with:
gcc -Wall sas.c -o /home/michael/linux/.build/kselftest/sigaltstack/sas
In file included from sas.c:23:
current_stack_pointer.h:22:2: error: #error "implement current_stack_pointer equivalent"
22 | #error "implement current_stack_pointer equivalent"
| ^~~~~
sas.c: In function ‘my_usr1’:
sas.c:50:13: error: ‘sp’ undeclared (first use in this function); did you mean ‘p’?
50 | if (sp < (unsigned long)sstack ||
| ^~
This happens because GCC doesn't define __ppc__ for 64-bit builds, only
32-bit builds. Instead use __powerpc__ to detect powerpc builds, which
is defined by clang and GCC for 64-bit and 32-bit builds.
Fixes: 05107edc9101 ("selftests: sigaltstack: fix -Wuninitialized")
Cc: [email protected] # v6.3+
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
cur_cpu_spec->cpu_name is appended to ppc_hw_desc before cur_cpu_spec
has taken on its final value. This is illustrated on pseries by
comparing the CPU name as reported at boot ("POWER8E (raw)") to the
contents of /proc/cpuinfo ("POWER8 (architected)"):
$ dmesg | grep Hardware
Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 \
of:IBM,FW860.50 (SV860_146) hv:phyp pSeries
$ grep -m 1 ^cpu /proc/cpuinfo
cpu : POWER8 (architected), altivec supported
Some 44x models would appear to be affected as well; see
identical_pvr_fixup().
This results in incorrect CPU information in stack dumps --
ppc_hw_desc is an input to dump_stack_set_arch_desc().
Delay gathering the CPU name until after all potential calls to
identify_cpu().
Signed-off-by: Nathan Lynch <[email protected]>
Fixes: bd649d40e0f2 ("powerpc: Add PVR & CPU name to hardware description")
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Smatch warns:
arch/powerpc/kernel/rtas.c:1932 __do_sys_rtas() warn: potential
spectre issue 'args.args' [r] (local cap)
The 'nargs' and 'nret' locals come directly from a user-supplied
buffer and are used as indexes into a small stack-based array and as
inputs to copy_to_user() after they are subject to bounds checks.
Use array_index_nospec() after the bounds checks to clamp these values
for speculative execution.
Signed-off-by: Nathan Lynch <[email protected]>
Reported-by: Breno Leitao <[email protected]>
Reviewed-by: Breno Leitao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Only 44x uses 4xx now, so only keep one directory.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Replace 4xx usage with 44x, and replace 4xx_SOC with 44x.
Also, as pointed out by Christophe, if 44x || BOOKE can be simplified to
just test BOOKE, because 44x always selects BOOKE.
Retain the CONFIG_4xx symbol, as there are drivers that use it to mean
4xx || 44x, those will need updating before CONFIG_4xx can be removed.
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Now that 40x is gone, replace CONFIG_BOOKE_OR_40x by CONFIG_BOOKE.
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Now that 40x platforms have gone, remove support
for 40x in the core of powerpc arch.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Remove 40x from Kconfig, making the code unreachable.
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Remove 40x platforms from the boot directory.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
40x platforms have been orphaned for many years.
Remove them.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
to 0
These drivers don't use the driver_data member of struct i2c_device_id,
so don't explicitly initialize this member.
This prepares putting driver_data in an anonymous union which requires
either no initialization or named designators. But it's also a nice
cleanup on its own.
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
PPC64 IOMMU API defines iommu_table_group_ops which handles DMA
windows for PEs, their ownership transfer, create/set/unset the TCE
tables for the Dynamic DMA wundows(DDW). VFIOS uses these APIs for
support on POWER.
The commit 9d67c9433509 ("powerpc/iommu: Add "borrowing"
iommu_table_group_ops") implemented partial support for this API with
"borrow" mechanism wherein the DMA windows if created already by the
host driver, they would be available for VFIO to use. Also, it didn't
have the support to control/modify the window size or the IO page
size.
The current patch implements all the necessary iommu_table_group_ops
APIs there by avoiding the "borrrowing". So, just the way it is on the
PowerNV platform, with this patch the iommu table group ownership is
transferred to the VFIO PPC subdriver, the iommu table, DMA windows
creation/deletion all driven through the APIs.
The pSeries uses the query-pe-dma-window, create-pe-dma-window and
reset-pe-dma-window RTAS calls for DMA window creation, deletion and
reset to defaul. The RTAs calls do show some minor differences to the
way things are to be handled on the pSeries which are listed below.
* On pSeries, the default DMA window size is "fixed" cannot be custom
sized as requested by the user. For non-SRIOV VFs, It is fixed at 2GB
and for SRIOV VFs, its variable sized based on the capacity assigned
to it during the VF assignment to the LPAR. So, for the default DMA
window alone the size if requested less than tce32_size, the smaller
size is enforced using the iommu table->it_size.
* The DMA start address for 32-bit window is 0, and for the 64-bit
window in case of PowerNV is hardcoded to TVE select (bit 59) at 512PiB
offset. This address is returned at the time of create_table() API call
(even before the window is created), the subsequent set_window() call
actually opens the DMA window. On pSeries, the DMA start address for
32-bit window is known from the 'ibm,dma-window' DT property. However,
the 64-bit window start address is not known until the create-pe-dma
RTAS call is made. So, the create_table() which returns the DMA window
start address actually opens the DMA window and returns the DMA start
address as returned by the Hypervisor for the create-pe-dma RTAS call.
* The reset-pe-dma RTAS call resets the DMA windows and restores the
default DMA window, however it does not clear the TCE table entries
if there are any. In case of ownership transfer from platform domain
which used direct mapping, the patch chooses remove-pe-dma instead of
reset-pe for the 64-bit window intentionally so that the
clear_dma_window() is called.
Other than the DMA window management changes mentioned above, the
patch also brings back the userspace view for the single level TCE
as it existed before commit 090bad39b237a ("powerpc/powernv: Add
indirect levels to it_userspace") along with the relavent
refactoring.
Signed-off-by: Shivaprasad G Bhat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Move function dev_has_iommu_table() to powerpc/kernel/iommu.c
as it is going to be used by machine specific iommu code as
well in subsequent patches.
Signed-off-by: Shivaprasad G Bhat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The PAPR expects the TCE table to have no entries at the time of
unset window(i.e. remove-pe). The TCE clear right now is done
before freeing the iommu table. On pSeries, the unset window
makes those entries inaccessible to the OS and the H_PUT/GET calls
fail on them with H_CONSTRAINED.
On PowerNV, this has no side effect as the TCE clear can be done
before the DMA window removal as well.
Signed-off-by: Shivaprasad G Bhat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
This patch basically brings consistency with PowerNV approach
to use the first freely available iommu table when the default
window is removed.
The pSeries iommu code convention has been that the table[0] is
for the default 32 bit DMA window and the table[1] is for the
64 bit DDW.
With VFs having only 1 DMA window, the default has to be removed
for creating the larger DMA window. The existing code uses the
table[1] for that, while marking the table[0] as NULL. This is
fine as long as the host driver itself uses the device.
For the VFIO user, on pSeries there is no way to skip table[0]
as the VFIO subdriver uses the first freely available table.
The window 0, when created as 64-bit DDW in that context would
still be on table[0], as the maximum number of windows is 1.
This won't have any impact for the host driver as the table is
fetched from the device's iommu_table_base.
Signed-off-by: Shivaprasad G Bhat <[email protected]>
[mpe: Rebase and resolve conflicts]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The ioctl VFIO_IOMMU_SPAPR_TCE_GET_INFO is not reporting the
actuals on the platform as not all the details are correctly
collected during the platform probe/scan into the iommu_table_group.
Collect the information during the device setup time as the DMA
window property is already looked up on parent nodes anyway.
Signed-off-by: Shivaprasad G Bhat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The PowerNV specific table_group_ops are defined in powernv/pci-ioda.c.
The pSeries specific table_group_ops are sitting in the generic powerpc
file. Move it to where it actually belong(pseries/iommu.c).
The functions are currently defined even for CONFIG_PPC_POWERNV
which are unused on PowerNV.
Only code movement, no functional changes intended.
Signed-off-by: Shivaprasad G Bhat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
While updating the cpus node, commit 40c753993e3a ("powerpc/kexec_file:
Use current CPU info while setting up FDT") first deletes all subnodes
under the /cpus node. However, while adding sub-nodes back, it missed
adding cpus subnodes whose device_type != "cpu", such as l2-cache*,
l3-cache*, ibm,powerpc-cpu-features.
Fix this by only deleting cpus sub-nodes of device_type == "cpus" and
then adding all available nodes with device_type == "cpu".
Fixes: 40c753993e3a ("powerpc/kexec_file: Use current CPU info while setting up FDT")
Signed-off-by: Sourabh Jain <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
While setting up the FDT for kexec, CPU nodes that are added after the
system boots and reserved memory ranges are incorporated into the
initial_boot_params (base FDT).
However, they are not taken into account when determining the additional
size needed for the kexec FDT. As a result, kexec fails to load,
generating the following error:
[1116.774451] Error updating memory reserve map: FDT_ERR_NOSPACE
kexec_file_load failed: No such process
Therefore, consider the extra size for CPU nodes added post-system boot
and reserved memory ranges while preparing the kexec FDT.
While adding a new parameter to the setup_new_fdt_ppc64 function, it was
noticed that there were a couple of unused parameters, so they were
removed.
Signed-off-by: Sourabh Jain <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Currently in some cases, when the sampled instruction address register
latches to a specific address during sampling, the privilege bits
captured in the sampled event register are incorrect.
For example, a snippet from the perf report on a power10 system is:
Overhead Address Command Shared Object Symbol
........ .................. ............ ................. .......................
2.41% 0x7fff9f94a02c null_syscall [unknown] [k] 0x00007fff9f94a02c
2.20% 0x7fff9f94a02c null_syscall libc.so.6 [.] syscall
perf_get_misc_flags() function looks at the privilege bits to return
the corresponding flags to be used for the address symbol and these
privilege bit details are read from the sampled event register. In the
above snippet, address "0x00007fff9f94a02c" is shown as "k" (kernel) due
to the incorrect privilege bits captured in the sampled event register.
To address this case check whether the sampled address is in the kernel
area. Since this is specific to the latest platform, a new pmu flag
is added called "PPMU_P10" and is used to contain the proposed fix.
PPMU_P10_DD1 marked events are also included under PPMU_P10, hence
remove the code specific to PPMU_P10_DD1 marked events.
Signed-off-by: Anjali K <[email protected]>
Reviewed-by: Athira Rajeev <[email protected] <mailto:[email protected]>>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Commit 6e5f1537833a ("powerpc: Add a 6xx defconfig") said it was copied
from fedore's ppc32 defconfig, but at least since 2015-06-10, Fedora has
dropped this option.[1]
For cgroup v1, if turned on, and there's any cgroup in the "cpu" hierarchy it
needs an RT budget assigned, otherwise the processes in it will not be able to
get RT at all. The problem with RT group scheduling is that it requires the
budget assigned but there's no way we could assign a default budget, since the
values to assign are both upper and lower time limits, are absolute, and need to
be sum up to < 1 for each individal cgroup. That means we cannot really come up
with values that would work by default in the general case.[1]
For cgroup v2, it's almost unusable as well. If it turned on, the cpu controller
can only be enabled when all RT processes are in the root cgroup. But it will
lose the benefits of cgroup v2 if all RT process were placed in the same cgroup.
systemd also doesn't support it.[2]
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1229700
[2]: https://github.com/systemd/systemd/issues/13781#issuecomment-549164383
Signed-off-by: Celeste Liu <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
It's not an error or noteworthy condition if the
"ibm,dynamic-reconfiguration-memory" node isn't present.
Drop the needless message.
Signed-off-by: Nathan Lynch <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
VAS allocate, modify and deallocate HCALLs returns
H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
delay and expects OS to reissue HCALL after that delay. But using
msleep() will often sleep at least 20 msecs even though the
hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
The open and close VAS window functions hold mutex and then issue
these HCALLs. So these operations can take longer than the
necessary when multiple threads issue open or close window APIs
simultaneously, especially might affect the performance in the
case of repeat open/close APIs for each compression request.
Multiple tasks can open / close VAS windows at the same time
which depends on the available VAS credits. For example, 240
cores system provides 4800 VAS credits. It means 4800 tasks can
execute open VAS windows HCALLs with the mutex. Since each
msleep() will often sleep more than 20 msecs, some tasks are
waiting more than 120 secs to acquire mutex. It can cause hung
traces for these tasks in dmesg due to mutex contention around
open/close HCALLs.
Instead of msleep(), use usleep_range() to ensure sleep with
the expected value before issuing HCALL again. So since each
task sleep 10 msecs maximum, this patch allow more tasks can
issue open/close VAS calls without any hung traces in the
dmesg.
Signed-off-by: Haren Myneni <[email protected]>
Suggested-by: Nathan Lynch <[email protected]>
Reviewed-by: Nathan Lynch <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
In the current design, a numa-node is made online only if that node is
attached to cpu/memory. With this design, if any PCI/IO device is found
to be attached to a numa-node which is not online then the numa-node
id of the corresponding PCI/IO device is set to NUMA_NO_NODE(-1). This
design may negatively impact the performance of PCIe device if the
numa-node assigned to PCIe device is -1 because in such case we may not
be able to accurately calculate the distance between two nodes.
The multi-controller NVMe PCIe disk has an issue with calculating the
node distance if the PCIe NVMe controller is attached to a PCI host
bridge which has numa-node id value set to NUMA_NO_NODE. This patch
helps fix this ensuring that a cpu/memory less numa node is made online
if it's attached to PCI host bridge.
Signed-off-by: Nilay Shroff <[email protected]>
Reviewed-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Dynamic DMA Window (DDW) supports TCEs that are backed by 2MB page
size. In most configurations, DDW is big enough to pre-map all of LPAR
memory for IO. Pre-mapping of memory for DMA results in improvements in
IO performance.
Persistent memory, vPMEM, can be assigned to an LPAR as well. vPMEM is
not contiguous with LPAR memory and usually is assigned at high memory
addresses. This makes is not possible to pre-map both vPMEM and LPAR
memory in the same DDW.
For a dedicated adapter this limitation is not an issue. Dedicated
adapters can have both Default DMA window, which is backed by 4K page
size and a DDW backed by 2MB page size TCEs. In this scenario, LPAR
memory is pre-mapped in the DDW. Any DMA going to the vPMEM is routed
via dynamically allocated TCEs in the default window.
The issue arises with SR-IOV adapters. There is only one DMA window -
either Default or DDW. If an LPAR has vPMEM assigned, memory is not
pre-mapped in the DDW since TCEs needs to be allocated for vPMEM as well.
In this case, DDW is created and TCEs are dynamically allocated for both
vPMEM and LPAR memory.
Today, DDW is only used in single mode - direct mapped TCEs or
dynamically mapped TCEs. This enhancement breaks a single DDW in 2
regions -
1. First region to pre-map LPAR memory
2. Second region to dynamically allocate TCEs for IO to vPMEM
The DDW is split only if it is big enough to pre-map complete LPAR
memory and still have some space left to dynamically map vPMEM. Maximum
size possible DDW is created as permitted by the Hypervisor.
Signed-off-by: Gaurav Batra <[email protected]>
Reviewed-by: Brian King <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Remove extended_cede_processor() and its helpers as
extended_cede_processor() has no callers since
commit 48f6e7f6d948("powerpc/pseries: remove cede offline state for CPUs")
Signed-off-by: Gautam Menghani <[email protected]>
Acked-by: Naveen N Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Add a quirk for three different devices that have shown issues with
LPM (link power management). These devices appear to not implement
LPM properly, since we see command timeouts when enabling LPM. The
quirk disables LPM for these problematic devices. (Me)
- Do not apply the Intel PCS quirk on Alder Lake. The quirk is not
needed and was originally added by mistake when LPM support was
enabled for this AHCI controller. Enabling the quirk when not needed
causes the the controller to not be able to detect the connected
devices on some platforms.
* tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340
ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD
ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1
ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Miscellaneous topology parsing fixes:
- Fix topology parsing regression on older CPUs in the new AMD/Hygon
parser
- Fix boot crash on odd Intel Quark and similar CPUs that do not fill
out cpuinfo_x86::x86_clflush_size and zero out
cpuinfo_x86::x86_cache_alignment as a result.
Provide 32 bytes as a general fallback value.
- Fix topology enumeration on certain rare CPUs where the BIOS locks
certain CPUID leaves and the kernel unlocked them late, which broke
with the new topology parsing code. Factor out this unlocking logic
and move it earlier in the parsing sequence"
* tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology/intel: Unlock CPUID before evaluating anything
x86/cpu: Provide default cache line size if not enumerated
x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Export a symbol to make life easier for instrumentation/debugging"
* tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/x86: Export 'percpu arch_freq_scale'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events fix from Ingo Molnar:
"Add missing MODULE_DESCRIPTION() lines"
* tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add missing MODULE_DESCRIPTION() lines
perf/x86/rapl: Add missing MODULE_DESCRIPTION() line
|