aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)AuthorFilesLines
2024-02-27x86/apic: Build the x86 topology enumeration functions on UP APIC builds tooIngo Molnar1-1/+1
These functions are mostly pointless on UP, but nevertheless the 64-bit UP APIC build already depends on the existence of topology_apply_cmdline_limits_early(), which caused a build bug, resolve it by making them available under CONFIG_X86_LOCAL_APIC, as their prototypes already are. Reviewed-by: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2024-02-26x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registersPaolo Bonzini1-87/+91
MKTME repurposes the high bit of physical address to key id for encryption key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same, the valid bits in the MTRR mask register are based on the reduced number of physical address bits. detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts it from the total usable physical bits, but it is called too late. Move the call to early_init_intel() so that it is called in setup_arch(), before MTRRs are setup. This fixes boot on TDX-enabled systems, which until now only worked with "disable_mtrr_cleanup". Without the patch, the values written to the MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and the writes failed; with the patch, the values are 46-bit wide, which matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo. Reported-by: Zixi Chen <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Cc:[email protected] Link: https://lore.kernel.org/all/20240131230902.1867092-3-pbonzini%40redhat.com
2024-02-26x86/cpu: Allow reducing x86_phys_bits during early_identify_cpu()Paolo Bonzini1-2/+2
In commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach"), the initialization of c->x86_phys_bits was moved after this_cpu->c_early_init(c). This is incorrect because early_init_amd() expected to be able to reduce the value according to the contents of CPUID leaf 0x8000001f. Fortunately, the bug was negated by init_amd()'s call to early_init_amd(), which does reduce x86_phys_bits in the end. However, this is very late in the boot process and, most notably, the wrong value is used for x86_phys_bits when setting up MTRRs. To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is set/cleared, and c->extended_cpuid_level is retrieved. Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Cc:[email protected] Link: https://lore.kernel.org/all/20240131230902.1867092-2-pbonzini%40redhat.com
2024-02-23x86, crash: wrap crash dumping code into crash related ifdefsBaoquan He1-2/+8
Now crash codes under kernel/ folder has been split out from kexec code, crash dumping can be separated from kexec reboot in config items on x86 with some adjustments. Here, also change some ifdefs or IS_ENABLED() check to more appropriate ones, e,g - #ifdef CONFIG_KEXEC_CORE -> #ifdef CONFIG_CRASH_DUMP - (!IS_ENABLED(CONFIG_KEXEC_CORE)) - > (!IS_ENABLED(CONFIG_CRASH_RESERVE)) [[email protected]: don't nest CONFIG_CRASH_DUMP ifdef inside CONFIG_KEXEC_CODE ifdef scope] Link: https://lore.kernel.org/all/SN6PR02MB4157931105FA68D72E3D3DB8D47B2@SN6PR02MB4157.namprd02.prod.outlook.com/T/#u Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Cc: Al Viro <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Pingfan Liu <[email protected]> Cc: Klara Modin <[email protected]> Cc: Michael Kelley <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Yang Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-22x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspaceSean Christopherson1-0/+2
Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query whether or not the CPU supports 5-level EPT paging. EPT capabilities are enumerated via MSR, i.e. aren't accessible to userspace without help from the kernel, and knowing whether or not 5-level EPT is supported is useful for debug, triage, testing, etc. For example, when EPT is enabled, bits 51:48 of guest physical addresses are consumed by the CPU if and only if 5-level EPT is enabled. For CPUs with MAXPHYADDR > 48, KVM *can't* map all legal guest memory without 5-level EPT, making 5-level EPT support valuable information for userspace. Reported-by: Yi Lai <[email protected]> Cc: Tao Su <[email protected]> Cc: Xudong Hao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-02-22x86/resctrl: Remove lockdep annotation that triggers false positiveJames Morse1-9/+0
get_domain_from_cpu() walks a list of domains to find the one that contains the specified CPU. This needs to be protected against races with CPU hotplug when the list is modified. It has recently gained a lockdep annotation to check this. The lockdep annotation causes false positives when called via IPI as the lock is held, but by another process. Remove it. [ bp: Refresh it ontop of x86/cache. ] Fixes: fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks") Reported-by: Tony Luck <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/all/ZdUSwOM9UUNpw84Y@agluck-desk3
2024-02-20x86/pat: Simplify the PAT programming protocolKirill A. Shutemov1-3/+4
The programming protocol for the PAT MSR follows the MTRR programming protocol. However, this protocol is cumbersome and requires disabling caching (CR0.CD=1), which is not possible on some platforms. Specifically, a TDX guest is not allowed to set CR0.CD. It triggers a #VE exception. It turns out that the requirement to follow the MTRR programming protocol for PAT programming is unnecessarily strict. The new Intel Software Developer Manual (http://www.intel.com/sdm) (December 2023) relaxes this requirement, please refer to the section titled "Programming the PAT" for more information. In short, this section provides an alternative PAT update sequence which doesn't need to disable caches around the PAT update but only to flush those caches and TLBs. The AMD documentation does not link PAT programming to MTRR and is there fore, fine too. The kernel only needs to flush the TLB after updating the PAT MSR. The set_memory code already takes care of flushing the TLB and cache when changing the memory type of a page. [ bp: Expand commit message. ] Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-19x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static keyPawan Gupta1-9/+6
The VERW mitigation at exit-to-user is enabled via a static branch mds_user_clear. This static branch is never toggled after boot, and can be safely replaced with an ALTERNATIVE() which is convenient to use in asm. Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user path. Also remove the now redundant VERW in exc_nmi() and arch_exit_to_user_mode(). Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.intel.com
2024-02-19x86/resctrl: Separate arch and fs resctrl locksJames Morse5-27/+111
resctrl has one mutex that is taken by the architecture-specific code, and the filesystem parts. The two interact via cpuhp, where the architecture code updates the domain list. Filesystem handlers that walk the domains list should not run concurrently with the cpuhp callback modifying the list. Exposing a lock from the filesystem code means the interface is not cleanly defined, and creates the possibility of cross-architecture lock ordering headaches. The interaction only exists so that certain filesystem paths are serialised against CPU hotplug. The CPU hotplug code already has a mechanism to do this using cpus_read_lock(). MPAM's monitors have an overflow interrupt, so it needs to be possible to walk the domains list in irq context. RCU is ideal for this, but some paths need to be able to sleep to allocate memory. Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp callback, cpus_read_lock() must always be taken first. rdtgroup_schemata_write() already does this. Most of the filesystem code's domain list walkers are currently protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). The exceptions are rdt_bit_usage_show() and the mon_config helpers which take the lock directly. Make the domain list protected by RCU. An architecture-specific lock prevents concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU, but to keep all the filesystem operations the same, this is changed to call cpus_read_lock(). The mon_config helpers send multiple IPIs, take the cpus_read_lock() in these cases. The other filesystem list walkers need to be able to sleep. Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't be invoked when file system operations are occurring. Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live() call isn't obvious. Resctrl's domain online/offline calls now need to take the rdtgroup_mutex themselves. [ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ] Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Move domain helper migration into resctrl_offline_cpu()James Morse2-16/+18
When a CPU is taken offline the resctrl filesystem code needs to check if it was the CPU nominated to perform the periodic overflow and limbo work. If so, another CPU needs to be chosen to do this work. This is currently done in core.c, mixed in with the code that removes the CPU from the domain's mask, and potentially free()s the domain. Move the migration of the overflow and limbo helpers into the filesystem code, into resctrl_offline_cpu(). As resctrl_offline_cpu() runs before the architecture code has removed the CPU from the domain mask, the callers need to be told which CPU is being removed, to avoid picking it as the new CPU. This uses the exclude_cpu feature previously added. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Add CPU offline callback for resctrl workJames Morse2-20/+29
The resctrl architecture specific code may need to free a domain when a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register. Amongst other things, the resctrl filesystem code needs to clear this CPU from the cpu_mask of any control and monitor groups. Currently, this is all done in core.c and called from resctrl_offline_cpu(), making the split between architecture and filesystem code unclear. Move the filesystem work to remove the CPU from the control and monitor groups into a filesystem helper called resctrl_offline_cpu(), and rename the one in core.c resctrl_arch_offline_cpu(). Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPUJames Morse5-21/+70
When a CPU is taken offline resctrl may need to move the overflow or limbo handlers to run on a different CPU. Once the offline callbacks have been split, cqm_setup_limbo_handler() will be called while the CPU that is going offline is still present in the CPU mask. Pass the CPU to exclude to cqm_setup_limbo_handler() and mbm_setup_overflow_handler(). These functions can use a variant of cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs need excluding. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Add CPU online callback for resctrl workJames Morse2-4/+12
The resctrl architecture specific code may need to create a domain when a CPU comes online, it also needs to reset the CPUs PQR_ASSOC register. The resctrl filesystem code needs to update the rdtgroup_default CPU mask when CPUs are brought online. Currently, this is all done in one function, resctrl_online_cpu(). It will need to be split into architecture and filesystem parts before resctrl can be moved to /fs/. Pull the rdtgroup_default update work out as a filesystem specific cpu_online helper. resctrl_online_cpu() is the obvious name for this, which means the version in core.c needs renaming. resctrl_online_cpu() is called by the arch code once it has done the work to add the new CPU to any domains. In future patches, resctrl_online_cpu() will take the rdtgroup_mutex itself. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Add helpers for system wide mon/alloc capableJames Morse4-26/+24
resctrl reads rdt_alloc_capable or rdt_mon_capable to determine whether any of the resources support the corresponding features. resctrl also uses the static keys that affect the architecture's context-switch code to determine the same thing. This forces another architecture to have the same static keys. As the static key is enabled based on the capable flag, and none of the filesystem uses of these are in the scheduler path, move the capable flags behind helpers, and use these in the filesystem code instead of the static key. After this change, only the architecture code manages and uses the static keys to ensure __resctrl_sched_in() does not need runtime checks. This avoids multiple architectures having to define the same static keys. Cases where the static key implicitly tested if the resctrl filesystem was mounted all have an explicit check now. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Make rdt_enable_key the arch's decision to switchJames Morse1-6/+5
rdt_enable_key is switched when resctrl is mounted. It was also previously used to prevent a second mount of the filesystem. Any other architecture that wants to support resctrl has to provide identical static keys. Now that there are helpers for enabling and disabling the alloc/mon keys, resctrl doesn't need to switch this extra key, it can be done by the arch code. Use the static-key increment and decrement helpers, and change resctrl to ensure the calls are balanced. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Move alloc/mon static keys into helpersJames Morse2-9/+4
resctrl enables three static keys depending on the features it has enabled. Another architecture's context switch code may look different, any static keys that control it should be buried behind helpers. Move the alloc/mon logic into arch-specific helpers as a preparatory step for making the rdt_enable_key's status something the arch code decides. This means other architectures don't have to mirror the static keys. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Make resctrl_mounted checks explicitJames Morse3-8/+28
The rdt_enable_key is switched when resctrl is mounted, and used to prevent a second mount of the filesystem. It also enables the architecture's context switch code. This requires another architecture to have the same set of static keys, as resctrl depends on them too. The existing users of these static keys are implicitly also checking if the filesystem is mounted. Make the resctrl_mounted checks explicit: resctrl can keep track of whether it has been mounted once. This doesn't need to be combined with whether the arch code is context switching the CLOSID. rdt_mon_enable_key is never used just to test that resctrl is mounted, but does also have this implication. Add a resctrl_mounted to all uses of rdt_mon_enable_key. This will allow the static key changing to be moved behind resctrl_arch_ calls. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read()James Morse3-3/+40
Depending on the number of monitors available, Arm's MPAM may need to allocate a monitor prior to reading the counter value. Allocating a contended resource may involve sleeping. __check_limbo() and mon_event_count() each make multiple calls to resctrl_arch_rmid_read(), to avoid extra work on contended systems, the allocation should be valid for multiple invocations of resctrl_arch_rmid_read(). The memory or hardware allocated is not specific to a domain. Add arch hooks for this allocation, which need calling before resctrl_arch_rmid_read(). The allocated monitor is passed to resctrl_arch_rmid_read(), then freed again afterwards. The helper can be called on any CPU, and can sleep. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Allow resctrl_arch_rmid_read() to sleepJames Morse1-20/+5
MPAM's cache occupancy counters can take a little while to settle once the monitor has been configured. The maximum settling time is described to the driver via a firmware table. The value could be large enough that it makes sense to sleep. To avoid exposing this to resctrl, it should be hidden behind MPAM's resctrl_arch_rmid_read(). resctrl_arch_rmid_read() may be called via IPI meaning it is unable to sleep. In this case, it should return an error if it needs to sleep. This will only affect MPAM platforms where the cache occupancy counter isn't available immediately, nohz_full is in use, and there are no housekeeping CPUs in the necessary domain. There are three callers of resctrl_arch_rmid_read(): __mon_event_count() and __check_limbo() are both called from a non-migrateable context. mon_event_read() invokes __mon_event_count() using smp_call_on_cpu(), which adds work to the target CPUs workqueue. rdtgroup_mutex() is held, meaning this cannot race with the resctrl cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on() also adds work to a per-cpu workqueue. The remaining call is add_rmid_to_limbo() which is called in response to a user-space syscall that frees an RMID. This opportunistically reads the LLC occupancy counter on the current domain to see if the RMID is over the dirty threshold. This has to disable preemption to avoid reading the wrong domain's value. Disabling preemption here prevents resctrl_arch_rmid_read() from sleeping. add_rmid_to_limbo() walks each domain, but only reads the counter on one domain. If the system has more than one domain, the RMID will always be added to the limbo list. If the RMIDs usage was not over the threshold, it will be removed from the list when __check_limbo() runs. Make this the default behaviour. Free RMIDs are always added to the limbo list for each domain. The user visible effect of this is that a clean RMID is not available for re-allocation immediately after 'rmdir()' completes. This behaviour was never portable as it never happened on a machine with multiple domains. Removing this path allows resctrl_arch_rmid_read() to sleep if its called with interrupts unmasked. Document this is the expected behaviour, and add a might_sleep() annotation to catch changes that won't work on arm64. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Queue mon_event_read() instead of sending an IPIJames Morse2-3/+25
Intel is blessed with an abundance of monitors, one per RMID, that can be read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC, the number implemented is up to the manufacturer. This means when there are fewer monitors than needed, they need to be allocated and freed. MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The CSU counter is allowed to return 'not ready' for a small number of micro-seconds after programming. To allow one CSU hardware monitor to be used for multiple control or monitor groups, the CPU accessing the monitor needs to be able to block when configuring and reading the counter. Worse, the domain may be broken up into slices, and the MMIO accesses for each slice may need performing from different CPUs. These two details mean MPAMs monitor code needs to be able to sleep, and IPI another CPU in the domain to read from a resource that has been sliced. mon_event_read() already invokes mon_event_count() via IPI, which means this isn't possible. On systems using nohz-full, some CPUs need to be interrupted to run kernel work as they otherwise stay in user-space running realtime workloads. Interrupting these CPUs should be avoided, and scheduling work on them may never complete. Change mon_event_read() to pick a housekeeping CPU, (one that is not using nohz_full) and schedule mon_event_count() and wait. If all the CPUs in a domain are using nohz-full, then an IPI is used as the fallback. This function is only used in response to a user-space filesystem request (not the timing sensitive overflow code). This allows MPAM to hide the slice behaviour from resctrl, and to keep the monitor-allocation in monitor.c. When the IPI fallback is used on machines where MPAM needs to make an access on multiple CPUs, the counter read will always fail. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Peter Newman <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflowJames Morse2-7/+37
The limbo and overflow code picks a CPU to use from the domain's list of online CPUs. Work is then scheduled on these CPUs to maintain the limbo list and any counters that may overflow. cpumask_any() may pick a CPU that is marked nohz_full, which will either penalise the work that CPU was dedicated to, or delay the processing of limbo list or counters that may overflow. Perhaps indefinitely. Delaying the overflow handling will skew the bandwidth values calculated by mba_sc, which expects to be called once a second. Add cpumask_any_housekeeping() as a replacement for cpumask_any() that prefers housekeeping CPUs. This helper will still return a nohz_full CPU if that is the only option. The CPU to use is re-evaluated each time the limbo/overflow work runs. This ensures the work will move off a nohz_full CPU once a housekeeping CPU is available. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Move CLOSID/RMID matching and setting to use helpersJames Morse1-24/+38
When switching tasks, the CLOSID and RMID that the new task should use are stored in struct task_struct. For x86 the CLOSID known by resctrl, the value in task_struct, and the value written to the CPU register are all the same thing. MPAM's CPU interface has two different PARTIDs - one for data accesses the other for instruction fetch. Storing resctrl's CLOSID value in struct task_struct implies the arch code knows whether resctrl is using CDP. Move the matching and setting of the struct task_struct properties to use helpers. This allows arm64 to store the hardware format of the register, instead of having to convert it each time. __rdtgroup_move_task()s use of READ_ONCE()/WRITE_ONCE() ensures torn values aren't seen as another CPU may schedule the task being moved while the value is being changed. MPAM has an additional corner-case here as the PMG bits extend the PARTID space. If the scheduler sees a new-CLOSID but old-RMID, the task will dirty an RMID that the limbo code is not watching causing an inaccurate count. x86's RMID are independent values, so the limbo code will still be watching the old-RMID in this circumstance. To avoid this, arm64 needs both the CLOSID/RMID WRITE_ONCE()d together. Both values must be provided together. Because MPAM's RMID values are not unique, the CLOSID must be provided when matching the RMID. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmidJames Morse3-5/+61
MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be used for different control groups. This means once a CLOSID is allocated, all its monitoring ids may still be dirty, and held in limbo. Instead of allocating the first free CLOSID, on architectures where CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is enabled, search closid_num_dirty_rmid[] to find the cleanest CLOSID. The CLOSID found is returned to closid_alloc() for the free list to be updated. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Use __set_bit()/__clear_bit() instead of open codingJames Morse1-6/+12
The resctrl CLOSID allocator uses a single 32bit word to track which CLOSID are free. The setting and clearing of bits is open coded. Convert the existing open coded bit manipulations of closid_free_map to use __set_bit() and friends. These don't need to be atomic as this list is protected by the mutex. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Track the number of dirty RMID a CLOSID hasJames Morse1-10/+65
MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be used for different control groups. This means once a CLOSID is allocated, all its monitoring ids may still be dirty, and held in limbo. Keep track of the number of RMID held in limbo each CLOSID has. This will allow a future helper to find the 'cleanest' CLOSID when allocating. The array is only needed when CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is defined. This will never be the case on x86. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Allow RMID allocation to be scoped by CLOSIDJames Morse4-12/+37
MPAMs RMID values are not unique unless the CLOSID is considered as well. alloc_rmid() expects the RMID to be an independent number. Pass the CLOSID in to alloc_rmid(). Use this to compare indexes when allocating. If the CLOSID is not relevant to the index, this ends up comparing the free RMID with itself, and the first free entry will be used. With MPAM the CLOSID is included in the index, so this becomes a walk of the free RMID entries, until one that matches the supplied CLOSID is found. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Access per-rmid structures by indexJames Morse4-45/+83
x86 systems identify traffic using the CLOSID and RMID. The CLOSID is used to lookup the control policy, the RMID is used for monitoring. For x86 these are independent numbers. Arm's MPAM has equivalent features PARTID and PMG, where the PARTID is used to lookup the control policy. The PMG in contrast is a small number of bits that are used to subdivide PARTID when monitoring. The cache-occupancy monitors require the PARTID to be specified when monitoring. This means MPAM's PMG field is not unique. There are multiple PMG-0, one per allocated CLOSID/PARTID. If PMG is treated as equivalent to RMID, it cannot be allocated as an independent number. Bitmaps like rmid_busy_llc need to be sized by the number of unique entries for this resource. Treat the combined CLOSID and RMID as an index, and provide architecture helpers to pack and unpack an index. This makes the MPAM values unique. The domain's rmid_busy_llc and rmid_ptrs[] are then sized by index, as are domain mbm_local[] and mbm_total[]. x86 can ignore the CLOSID field when packing and unpacking an index, and report as many indexes as RMID. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Track the closid with the rmidJames Morse4-35/+56
x86's RMID are independent of the CLOSID. An RMID can be allocated, used and freed without considering the CLOSID. MPAM's equivalent feature is PMG, which is not an independent number, it extends the CLOSID/PARTID space. For MPAM, only PMG-bits worth of 'RMID' can be allocated for a single CLOSID. i.e. if there is 1 bit of PMG space, then each CLOSID can have two monitor groups. To allow resctrl to disambiguate RMID values for different CLOSID, everything in resctrl that keeps an RMID value needs to know the CLOSID too. This will always be ignored on x86. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Xin Hao <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Move RMID allocation out of mkdir_rdt_prepare()James Morse1-9/+26
RMIDs are allocated for each monitor or control group directory, because each of these needs its own RMID. For control groups, rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID. MPAM's equivalent of RMID is not an independent number, so can't be allocated until the CLOSID is known. An RMID allocation for one CLOSID may fail, whereas another may succeed depending on how many monitor groups a control group has. The RMID allocation needs to move to be after the CLOSID has been allocated. Move the RMID allocation out of mkdir_rdt_prepare() to occur in its caller, after the mkdir_rdt_prepare() call. This allows the RMID allocator to know the CLOSID. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Create helper for RMID allocation and mondata dir creationJames Morse1-15/+27
When monitoring is supported, each monitor and control group is allocated an RMID. For control groups, rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID. MPAM's equivalent of RMID are not an independent number, so can't be allocated until the CLOSID is known. An RMID allocation for one CLOSID may fail, whereas another may succeed depending on how many monitor groups a control group has. The RMID allocation needs to move to be after the CLOSID has been allocated. Move the RMID allocation and mondata dir creation to a helper. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/resctrl: Free rmid_ptrs from resctrl_exit()James Morse3-0/+22
rmid_ptrs[] is allocated from dom_data_init() but never free()d. While the exit text ends up in the linker script's DISCARD section, the direction of travel is for resctrl to be/have loadable modules. Add resctrl_put_mon_l3_config() to cleanup any memory allocated by rdt_get_mon_l3_config(). There is no reason to backport this to a stable kernel. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2024-02-16x86/cpu/topology: Get rid of cpuinfo::x86_max_coresThomas Gleixner6-9/+5
Now that __num_cores_per_package and __num_threads_per_package are available, cpuinfo::x86_max_cores and the related math all over the place can be replaced with the ready to consume data. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16x86/CPU/AMD: Do the common init on future Zens tooBorislav Petkov (AMD)1-7/+7
There's no need to enable the common Zen init stuff for each new family - just do it by default on everything >= 0x17 family. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Provide __num_[cores|threads]_per_packageThomas Gleixner2-1/+13
Expose properly accounted information and accessors so the fiddling with other topology variables can be replaced. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Rename smp_num_siblingsThomas Gleixner4-8/+8
It's really a non-intuitive name. Rename it to __max_threads_per_core which is obvious. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Retrieve cores per package from topology bitmapsThomas Gleixner3-15/+57
Similar to other sizing information the number of cores per package can be established from the topology bitmap. Provide a function for retrieving that information and replace the buggy hack in the CPUID evaluation with it. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Use topology logical mapping mechanismThomas Gleixner2-13/+4
Replace the logical package and die management functionality and retrieve the logical IDs from the topology bitmaps. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Provide logical pkg/die mappingThomas Gleixner1-0/+28
With the topology bitmaps in place the logical package and die IDs can trivially be retrieved by determining the bitmap weight of the relevant topology domain level up to and including the physical ID in question. Provide a function to that effect. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Simplify cpu_mark_primary_thread()Thomas Gleixner1-4/+1
No point in creating a mask via fls(). smp_num_siblings is guaranteed to be a power of 2. So just using (smp_num_siblings - 1) has the same effect. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Mop up primary thread mask handlingThomas Gleixner1-27/+2
The early initcall to initialize the primary thread mask is not longer required because topology_init_possible_cpus() can mark primary threads correctly when initializing the possible and present map as the number of SMT threads is already determined correctly. The XENPV workaround is not longer required because XENPV now registers fake APIC IDs which will just work like any other enumeration. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Use topology bitmaps for sizingThomas Gleixner4-29/+26
Now that all possible APIC IDs are tracked in the topology bitmaps, its trivial to retrieve the real information from there. This gets rid of the guesstimates for the maximal packages and dies per package as the actual numbers can be determined before a single AP has been brought up. The number of SMT threads can now be determined correctly from the bitmaps in all situations. Up to now a system which has SMT disabled in the BIOS will still claim that it is SMT capable, because the lowest APIC ID bit is reserved for that and CPUID leaf 0xb/0x1f still enumerates the SMT domain accordingly. By calculating the bitmap weights of the SMT and the CORE domain and setting them into relation the SMT disabled in BIOS situation reports correctly that the system is not SMT capable. It also handles the situation correctly when a hybrid systems boot CPU does not have SMT as it takes the SMT capability of the APs fully into account. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Let XEN/PV use topology from CPUID/MADTThomas Gleixner1-1/+1
It turns out that XEN/PV Dom0 has halfways usable CPUID/MADT enumeration except that it cannot deal with CPUs which are enumerated as disabled in MADT. DomU has no MADT and provides at least rudimentary topology information in CPUID leaves 1 and 4. For both it's important that there are not more possible Linux CPUs than vCPUs provided by the hypervisor. As this is ensured by counting the vCPUs before enumeration happens: - lift the restrictions in the CPUID evaluation and the MADT parser - Utilize MADT registration for Dom0 - Keep the fake APIC ID registration for DomU - Fix the XEN APIC fake so the readout of the local APIC ID works for Dom0 via the hypercall and for DomU by returning the registered fake APIC IDs. With that the XEN/PV fake approximates usefulness. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Assign hotpluggable CPUIDs during initThomas Gleixner1-11/+18
There is no point in assigning the CPU numbers during ACPI physical hotplug. The number of possible hotplug CPUs is known when the possible map is initialized, so the CPU numbers can be associated to the registered non-present APIC IDs right there. This allows to put more code into the __init section and makes the related data __ro_after_init. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplugThomas Gleixner1-0/+4
The topology bitmaps track all possible APIC IDs which have been registered during enumeration. As sizing and further topology information is going to be derived from these bitmaps, reject attempts to hotplug an APIC ID which was not registered during enumeration. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/topology: Add a mechanism to track topology via APIC IDsThomas Gleixner1-2/+46
Topology on X86 is determined by the registered APIC IDs and the segmentation information retrieved from CPUID. Depending on the granularity of the provided CPUID information the most fine grained scheme looks like this according to Intel terminology: [PKG][DIEGRP][DIE][TILE][MODULE][CORE][THREAD] Not enumerated domain levels consume 0 bits in the APIC ID. This allows to provide a consistent view at the topology and determine other information precisely like the number of cores in a package on hybrid systems, where the existing assumption that number or cores == number of threads / threads per core does not hold. Provide per domain level bitmaps which record the APIC ID split into the domain levels to make later evaluation of domain level specific information simple. This allows to calculate e.g. the logical IDs without any further extra logic. Contrary to the existing registration mechanism this records disabled CPUs, which are subject to later hotplug as well. That's useful for boot time sizing of package or die dependent allocations without using heuristics. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu: Detect real BSP on crash kernelsThomas Gleixner1-38/+59
When a kdump kernel is started from a crashing CPU then there is no guarantee that this CPU is the real boot CPU (BSP). If the kdump kernel tries to online the BSP then the INIT sequence will reset the machine. There is a command line option to prevent this, but in case of nested kdump kernels this is wrong. But that command line option is not required at all because the real BSP is enumerated as the first CPU by firmware. Support for the only known system which was different (Voyager) got removed long ago. Detect whether the boot CPU APIC ID is the first APIC ID enumerated by the firmware. If the first APIC ID enumerated is not matching the boot CPU APIC ID then skip registering it. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Rework possible CPU managementThomas Gleixner1-70/+106
Managing possible CPUs is an unreadable and uncomprehensible maze. Aside of that it's backwards because it applies command line limits after registering all APICs. Rewrite it so that it: - Applies the command line limits upfront so that only the allowed amount of APIC IDs can be registered. - Applies eventual late restrictions in an understandable way - Uses simple min_t() calculations which are trivial to follow. - Provides a separate function for resetting to UP mode late in the bringup process. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Sanitize the APIC admission logicThomas Gleixner1-82/+77
Move the actually required content of generic_processor_id() into the call sites and use common helper functions for them. This separates the early boot registration and the ACPI hotplug mechanism completely which allows further cleanups and improvements. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Use a data structure for topology infoThomas Gleixner1-30/+29
Put the processor accounting into a data structure, which will gain more topology related information in the next steps, and sanitize the accounting. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/topology: Simplify APIC registrationThomas Gleixner1-20/+3
Having the same check whether the number of assigned CPUs has reached the nr_cpu_ids limit twice in the same code path is pointless. Repeating the information that CPUs are ignored over and over is also pointless noise. Remove the redundant check and reduce the noise by using a pr_warn_once(). Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]