aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/resctrl
AgeCommit message (Collapse)AuthorFilesLines
2023-01-23x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()Babu Moger2-29/+11
on_each_cpu_mask() runs the function on each CPU specified by cpumask, which may include the local processor. Replace smp_call_function_many() with on_each_cpu_mask() to simplify the code. Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-01-10x86/resctrl: Fix event counts regression in reused RMIDsPeter Newman1-16/+33
When creating a new monitoring group, the RMID allocated for it may have been used by a group which was previously removed. In this case, the hardware counters will have non-zero values which should be deducted from what is reported in the new group's counts. resctrl_arch_reset_rmid() initializes the prev_msr value for counters to 0, causing the initial count to be charged to the new group. Resurrect __rmid_read() and use it to initialize prev_msr correctly. Unlike before, __rmid_read() checks for error bits in the MSR read so that callers don't need to. Fixes: 1d81d15db39c ("x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()") Signed-off-by: Peter Newman <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2023-01-10x86/resctrl: Fix task CLOSID/RMID update racePeter Newman1-1/+11
When the user moves a running task to a new rdtgroup using the task's file interface or by deleting its rdtgroup, the resulting change in CLOSID/RMID must be immediately propagated to the PQR_ASSOC MSR on the task(s) CPUs. x86 allows reordering loads with prior stores, so if the task starts running between a task_curr() check that the CPU hoisted before the stores in the CLOSID/RMID update then it can start running with the old CLOSID/RMID until it is switched again because __rdtgroup_move_task() failed to determine that it needs to be interrupted to obtain the new CLOSID/RMID. Refer to the diagram below: CPU 0 CPU 1 ----- ----- __rdtgroup_move_task(): curr <- t1->cpu->rq->curr __schedule(): rq->curr <- t1 resctrl_sched_in(): t1->{closid,rmid} -> {1,1} t1->{closid,rmid} <- {2,2} if (curr == t1) // false IPI(t1->cpu) A similar race impacts rdt_move_group_tasks(), which updates tasks in a deleted rdtgroup. In both cases, use smp_mb() to order the task_struct::{closid,rmid} stores before the loads in task_curr(). In particular, in the rdt_move_group_tasks() case, simply execute an smp_mb() on every iteration with a matching task. It is possible to use a single smp_mb() in rdt_move_group_tasks(), but this would require two passes and a means of remembering which task_structs were updated in the first loop. However, benchmarking results below showed too little performance impact in the simple approach to justify implementing the two-pass approach. Times below were collected using `perf stat` to measure the time to remove a group containing a 1600-task, parallel workload. CPU: Intel(R) Xeon(R) Platinum P-8136 CPU @ 2.00GHz (112 threads) # mkdir /sys/fs/resctrl/test # echo $$ > /sys/fs/resctrl/test/tasks # perf bench sched messaging -g 40 -l 100000 task-clock time ranges collected using: # perf stat rmdir /sys/fs/resctrl/test Baseline: 1.54 - 1.60 ms smp_mb() every matching task: 1.57 - 1.67 ms [ bp: Massage commit message. ] Fixes: ae28d1aae48a ("x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR") Fixes: 0efc89be9471 ("x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount") Signed-off-by: Peter Newman <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Babu Moger <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-16Merge tag 'driver-core-6.2-rc1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core and kernfs changes for 6.2-rc1. The "big" change in here is the addition of a new macro, container_of_const() that will preserve the "const-ness" of a pointer passed into it. The "problem" of the current container_of() macro is that if you pass in a "const *", out of it can comes a non-const pointer unless you specifically ask for it. For many usages, we want to preserve the "const" attribute by using the same call. For a specific example, this series changes the kobj_to_dev() macro to use it, allowing it to be used no matter what the const value is. This prevents every subsystem from having to declare 2 different individual macros (i.e. kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce the const value at build time, which having 2 macros would not do either. The driver for all of this have been discussions with the Rust kernel developers as to how to properly mark driver core, and kobject, objects as being "non-mutable". The changes to the kobject and driver core in this pull request are the result of that, as there are lots of paths where kobjects and device pointers are not modified at all, so marking them as "const" allows the compiler to enforce this. So, a nice side affect of the Rust development effort has been already to clean up the driver core code to be more obvious about object rules. All of this has been bike-shedded in quite a lot of detail on lkml with different names and implementations resulting in the tiny version we have in here, much better than my original proposal. Lots of subsystem maintainers have acked the changes as well. Other than this change, included in here are smaller stuff like: - kernfs fixes and updates to handle lock contention better - vmlinux.lds.h fixes and updates - sysfs and debugfs documentation updates - device property updates All of these have been in the linux-next tree for quite a while with no problems" * tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits) device property: Fix documentation for fwnode_get_next_parent() firmware_loader: fix up to_fw_sysfs() to preserve const usb.h: take advantage of container_of_const() device.h: move kobj_to_dev() to use container_of_const() container_of: add container_of_const() that preserves const-ness of the pointer driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion. driver core: fix up missed scsi/cxlflash class.devnode() conversion. driver core: fix up some missing class.devnode() conversions. driver core: make struct class.devnode() take a const * driver core: make struct class.dev_uevent() take a const * cacheinfo: Remove of_node_put() for fw_token device property: Add a blank line in Kconfig of tests device property: Rename goto label to be more precise device property: Move PROPERTY_ENTRY_BOOL() a bit down device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*() kernfs: fix all kernel-doc warnings and multiple typos driver core: pass a const * into of_device_uevent() kobject: kset_uevent_ops: make name() callback take a const * kobject: kset_uevent_ops: make filter() callback take a const * kobject: make kobject_namespace take a const * ...
2022-11-27x86/resctrl: Move MSR defines into msr-index.hBorislav Petkov3-13/+3
msr-index.h should contain all MSRs for easier grepping for MSR numbers when dealing with unchecked MSR access warnings, for example. Move the resctrl ones. Prefix IA32_PQR_ASSOC with "MSR_" while at it. No functional changes. Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-24driver core: make struct class.devnode() take a const *Greg Kroah-Hartman1-2/+2
The devnode() in struct class should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Cc: Fenghua Yu <[email protected]> Cc: Reinette Chatre <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: FUJITA Tomonori <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Justin Sanders <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Benjamin Gaignard <[email protected]> Cc: Liam Mark <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Brian Starkey <[email protected]> Cc: John Stultz <[email protected]> Cc: "Christian König" <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Sean Young <[email protected]> Cc: Frank Haverkamp <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Cornelia Huck <[email protected]> Cc: Kees Cook <[email protected]> Cc: Anton Vorontsov <[email protected]> Cc: Colin Cross <[email protected]> Cc: Tony Luck <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Xie Yongji <[email protected]> Cc: Gautam Dawar <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Eli Cohen <[email protected]> Cc: Parav Pandit <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-10-24x86/resctrl: Remove arch_has_empty_bitmapsBabu Moger2-4/+1
The field arch_has_empty_bitmaps is not required anymore. The field min_cbm_bits is enough to validate the CBM (capacity bit mask) if the architecture can support the zero CBM or not. Suggested-by: Reinette Chatre <[email protected]> Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Link: https://lore.kernel.org/r/166430979654.372014.615622285687642644.stgit@bmoger-ubuntu
2022-10-18x86/resctrl: Fix min_cbm_bits for AMDBabu Moger1-6/+2
AMD systems support zero CBM (capacity bit mask) for cache allocation. That is reflected in rdt_init_res_defs_amd() by: r->cache.arch_has_empty_bitmaps = true; However given the unified code in cbm_validate(), checking for: val == 0 && !arch_has_empty_bitmaps is not enough because of another check in cbm_validate(): if ((zero_bit - first_bit) < r->cache.min_cbm_bits) The default value of r->cache.min_cbm_bits = 1. Leading to: $ cd /sys/fs/resctrl $ mkdir foo $ cd foo $ echo L3:0=0 > schemata -bash: echo: write error: Invalid argument $ cat /sys/fs/resctrl/info/last_cmd_status Need at least 1 bits in the mask Initialize the min_cbm_bits to 0 for AMD. Also, remove the default setting of min_cbm_bits and initialize it separately. After the fix: $ cd /sys/fs/resctrl $ mkdir foo $ cd foo $ echo L3:0=0 > schemata $ cat /sys/fs/resctrl/info/last_cmd_status ok Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps") Co-developed-by: Stephane Eranian <[email protected]> Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Reviewed-by: James Morse <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]
2022-09-23x86/resctrl: Make resctrl_arch_rmid_read() return values in bytesJames Morse3-19/+15
resctrl_arch_rmid_read() returns a value in chunks, as read from the hardware. This needs scaling to bytes by mon_scale, as provided by the architecture code. Now that resctrl_arch_rmid_read() performs the overflow and corrections itself, it may as well return a value in bytes directly. This allows the accesses to the architecture specific 'hw' structure to be removed. Move the mon_scale conversion into resctrl_arch_rmid_read(). mbm_bw_count() is updated to calculate bandwidth from bytes. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_dataJames Morse2-3/+8
resctrl_rmid_realloc_threshold can be set by user-space. The maximum value is specified by the architecture. Currently max_threshold_occ_write() reads the maximum value from boot_cpu_data.x86_cache_size, which is not portable to another architecture. Add resctrl_rmid_realloc_limit to describe the maximum size in bytes that user-space can set the threshold to. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Rename and change the units of resctrl_cqm_thresholdJames Morse3-25/+28
resctrl_cqm_threshold is stored in a hardware specific chunk size, but exposed to user-space as bytes. This means the filesystem parts of resctrl need to know how the hardware counts, to convert the user provided byte value to chunks. The interface between the architecture's resctrl code and the filesystem ought to treat everything as bytes. Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read() still returns its value in chunks, so this needs converting to bytes. As all the users have been touched, rename the variable to resctrl_rmid_realloc_threshold, which describes what the value is for. Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power of 2, so the existing code introduces a rounding error from resctrl's theoretical fraction of the cache usage. This behaviour is kept as it ensures the user visible value matches the value read from hardware when the rmid will be reallocated. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()James Morse2-6/+6
resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. When reading a bandwidth counter, get_corrected_mbm_count() must be used to correct the value read. get_corrected_mbm_count() is architecture specific, this work should be done in resctrl_arch_rmid_read(). Move the function calls. This allows the resctrl filesystems's chunks value to be removed in favour of the architecture private version. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()James Morse2-17/+20
resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. When reading a bandwidth counter, mbm_overflow_count() must be used to correct for any possible overflow. mbm_overflow_count() is architecture specific, its behaviour should be part of resctrl_arch_rmid_read(). Move the mbm_overflow_count() calls into resctrl_arch_rmid_read(). This allows the resctrl filesystems's prev_msr to be removed in favour of the architecture private version. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()James Morse1-14/+17
resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a hardware register. Currently the function returns the MBM values in chunks directly from hardware. To convert this to bytes, some correction and overflow calculations are needed. These depend on the resource and domain structures. Overflow detection requires the old chunks value. None of this is available to resctrl_arch_rmid_read(). MPAM requires the resource and domain structures to find the MMIO device that holds the registers. Pass the resource and domain to resctrl_arch_rmid_read(). This makes rmid_dirty() too big. Instead merge it with its only caller, and the name is kept as a local variable. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Abstract __rmid_read()James Morse3-25/+42
__rmid_read() selects the specified eventid and returns the counter value from the MSR. The error handling is architecture specific, and handled by the callers, rdtgroup_mondata_show() and __mon_event_count(). Error handling should be handled by architecture specific code, as a different architecture may have different requirements. MPAM's counters can report that they are 'not ready', requiring a second read after a short delay. This should be hidden from resctrl. Make __rmid_read() the architecture specific function for reading a counter. Rename it resctrl_arch_rmid_read() and move the error handling into it. A read from a counter that hardware supports but resctrl does not now returns -EINVAL instead of -EIO from the default case in __mon_event_count(). It isn't possible for user-space to see this change as resctrl doesn't expose counters it doesn't support. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Allow per-rmid arch private storage to be resetJames Morse2-14/+39
To abstract the rmid counters into a helper that returns the number of bytes counted, architecture specific per-rmid state is needed. It needs to be possible to reset this hidden state, as the values may outlive the life of an rmid, or the mount time of the filesystem. mon_event_read() is called with first = true when an rmid is first allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid() and call it from __mon_event_count()'s rr->first check. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Add per-rmid arch private storage for overflow and chunksJames Morse2-0/+49
A renamed __rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. For bandwidth counters the resctrl filesystem uses this to calculate the number of bytes ever seen. MPAM's scaling of counters can be changed at runtime, reducing the resolution but increasing the range. When this is changed the prev_msr values need to be converted by the architecture code. Add an array for per-rmid private storage. The prev_msr and chunks values will move here to allow resctrl_arch_rmid_read() to always return the number of bytes read by this counter without assistance from the filesystem. The values are moved in later patches when the overflow and correction calls are moved into __rmid_read(). Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunksJames Morse2-11/+18
mbm_bw_count() is only called by the mbm_handle_overflow() worker once a second. It reads the hardware register, calculates the bandwidth and updates m->prev_bw_msr which is used to hold the previous hardware register value. Operating directly on hardware register values makes it difficult to make this code architecture independent, so that it can be moved to /fs/, making the mba_sc feature something resctrl supports with no additional support from the architecture. Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware register using __mon_event_count(). Change mbm_bw_count() to use the current chunks value most recently saved by __mon_event_count(). This removes an extra call to __rmid_read(). Instead of using m->prev_msr to calculate the number of chunks seen, use the rr->val that was updated by __mon_event_count(). This removes an extra call to mbm_overflow_count() and get_corrected_mbm_count(). Calculating bandwidth like this means mbm_bw_count() no longer operates on hardware register values directly. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Allow update_mba_bw() to update controls directlyJames Morse4-11/+26
update_mba_bw() calculates a new control value for the MBA resource based on the user provided mbps_val and the current measured bandwidth. Some control values need remapping by delay_bw_map(). It does this by calling wrmsrl() directly. This needs splitting up to be done by an architecture specific helper, so that the remainder can eventually be moved to /fs/. Add resctrl_arch_update_one() to apply one configuration value to the provided resource and domain. This avoids the staging and cross-calling that is only needed with changes made by user-space. delay_bw_map() moves to be part of the arch code, to maintain the 'percentage control' view of MBA resources in resctrl. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Remove architecture copy of mbps_valJames Morse3-22/+5
The resctrl arch code provides a second configuration array mbps_val[] for the MBA software controller. Since resctrl switched over to allocating and freeing its own array when needed, nothing uses the arch code version. Remove it. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Switch over to the resctrl mbps_val listJames Morse3-29/+52
Updates to resctrl's software controller follow the same path as other configuration updates, but they don't modify the hardware state. rdtgroup_schemata_write() uses parse_line() and the resource's parse_ctrlval() function to stage the configuration. resctrl_arch_update_domains() then updates the mbps_val[] array instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update() call that would update hardware. This complicates the interface between resctrl's filesystem parts and architecture specific code. It should be possible for mba_sc to be completely implemented by the filesystem parts of resctrl. This would allow it to work on a second architecture with no additional code. resctrl_arch_update_domains() using the mbps_val[] array prevents this. Change parse_bw() to write the configuration value directly to the mbps_val[] array in the domain structure. Change rdtgroup_schemata_write() to skip the call to resctrl_arch_update_domains(), meaning all the mba_sc specific code in resctrl_arch_update_domains() can be removed. On the read-side, show_doms() and update_mba_bw() are changed to read the mbps_val[] array from the domain structure. With this, resctrl_arch_get_config() no longer needs to consider mba_sc resources. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Create mba_sc configuration in the rdt_domainJames Morse2-1/+39
To support resctrl's MBA software controller, the architecture must provide a second configuration array to hold the mbps_val[] from user-space. This complicates the interface between the architecture specific code and the filesystem portions of resctrl that will move to /fs/, to allow multiple architectures to support resctrl. Make the filesystem parts of resctrl create an array for the mba_sc values. The software controller can be changed to use this, allowing the architecture code to only consider the values configured in hardware. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Abstract and use supports_mba_mbps()James Morse1-5/+14
To determine whether the mba_MBps option to resctrl should be supported, resctrl tests the boot CPUs' x86_vendor. This isn't portable, and needs abstracting behind a helper so this check can be part of the filesystem code that moves to /fs/. Re-use the tests set_mba_sc() does to determine if the mba_sc is supported on this system. An 'alloc_capable' test is added so that support for the controls isn't implied by the 'delay_linear' property, which is always true for MPAM. Because mbm_update() only update mba_sc if the mbm_local counters are enabled, supports_mba_mbps() checks is_mbm_local_enabled(). (instead of using is_mbm_enabled(), which checks both). Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Remove set_mba_sc()s control array re-initialisationJames Morse1-7/+3
set_mba_sc() enables the 'software controller' to regulate the bandwidth based on the byte counters. This can be managed entirely in the parts of resctrl that move to /fs/, without any extra support from the architecture specific code. set_mba_sc() is called by rdt_enable_ctx() during mount and unmount. It currently resets the arch code's ctrl_val[] and mbps_val[] arrays. The ctrl_val[] was already reset when the domain was created, and by reset_all_ctrls() when the filesystem was last unmounted. Doing the work in set_mba_sc() is not necessary as the values are already at their defaults due to the creation of the domain, or were previously reset during umount(), or are about to reset during umount(). Add a reset of the mbps_val[] in reset_all_ctrls(), allowing the code in set_mba_sc() that reaches in to the architecture specific structures to be removed. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Add domain offline callback for resctrl workJames Morse3-30/+43
Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to free the memory. Move the monitor subdir removal and the cancelling of the mbm/limbo works into a new resctrl_offline_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Group struct rdt_hw_domain cleanupJames Morse1-7/+10
domain_add_cpu() and domain_remove_cpu() need to kfree() the child arrays that were allocated by domain_setup_ctrlval(). As this memory is moved around, and new arrays are created, adjusting the error handling cleanup code becomes noisier. To simplify this, move all the kfree() calls into a domain_free() helper. This depends on struct rdt_hw_domain being kzalloc()d, allowing it to unconditionally kfree() all the child arrays. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Add domain online callback for resctrl workJames Morse3-54/+66
Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to allocate the memory. Move domain_setup_mon_state(), the monitor subdir creation call and the mbm/limbo workers into a new resctrl_online_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Merge mon_capable and mon_enabledJames Morse3-9/+4
mon_enabled and mon_capable are always set as a pair by rdt_get_mon_l3_config(). There is no point having two values. Merge them together. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Kill off alloc_enabledJames Morse4-12/+4
rdt_resources_all[] used to have extra entries for L2CODE/L2DATA. These were hidden from resctrl by the alloc_enabled value. Now that the L2/L2CODE/L2DATA resources have been merged together, alloc_enabled doesn't mean anything, it always has the same value as alloc_capable which indicates allocation is supported by this resource. Remove alloc_enabled and its helpers. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-31x86/resctrl: Fix to restore to original value when re-enabling hardware ↵Kohei Tarumizu1-3/+9
prefetch register The current pseudo_lock.c code overwrites the value of the MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0. Therefore, modify it to save and restore the original values. Fixes: 018961ae5579 ("x86/intel_rdt: Pseudo-lock region creation/removal core") Fixes: 443810fe6160 ("x86/intel_rdt: Create debugfs files for pseudo-locking testing") Fixes: 8a2fc0e1bc0c ("x86/intel_rdt: More precise L2 hit/miss measurements") Signed-off-by: Kohei Tarumizu <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Reinette Chatre <[email protected]> Link: https://lkml.kernel.org/r/eb660f3c2010b79a792c573c02d01e8e841206ad.1661358182.git.reinette.chatre@intel.com
2022-04-10x86: Replace cpumask_weight() with cpumask_empty() where appropriateYury Norov1-7/+7
In some cases, x86 code calls cpumask_weight() to check if any bit of a given cpumask is set. This can be done more efficiently with cpumask_empty() because cpumask_empty() stops traversing the cpumask as soon as it finds first set bit, while cpumask_weight() counts all bits unconditionally. Signed-off-by: Yury Norov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steve Wahl <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-02-23kernfs: move struct kernfs_root out of the public view.Greg Kroah-Hartman1-2/+2
There is no need to have struct kernfs_root be part of kernfs.h for the whole kernel to see and poke around it. Move it internal to kernfs code and provide a helper function, kernfs_root_to_node(), to handle the one field that kernfs users were directly accessing from the structure. Cc: Imran Khan <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2021-12-09x86/resctrl: Remove redundant assignment to variable chunksColin Ian King1-1/+1
The variable chunks is being shifted right and re-assinged the shifted value which is then returned. Since chunks is not being read afterwards the assignment is redundant and the >>= operator can be replaced with a shift >> operator instead. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-10-06x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()James Morse1-2/+2
Commit in Fixes separated the architecture specific and filesystem parts of the resctrl domain structures. This left the error paths in domain_add_cpu() kfree()ing the memory with the wrong type. This will cause a problem if someone adds a new member to struct rdt_hw_domain meaning d_resctrl is no longer the first member. Fixes: 792e0f6f789b ("x86/resctrl: Split struct rdt_domain") Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Reinette Chatre <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-10-06x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() failsJames Morse1-0/+2
domain_add_cpu() is called whenever a CPU is brought online. The earlier call to domain_setup_ctrlval() allocates the control value arrays. If domain_setup_mon_state() fails, the control value arrays are not freed. Add the missing kfree() calls. Fixes: 1bd2a63b4f0de ("x86/intel_rdt/mba_sc: Add initialization support") Fixes: edf6fa1c4a951 ("x86/intel_rdt/cqm: Add RMID (Resource monitoring ID) management") Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Reinette Chatre <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-30Merge tag 'x86_cache_for_v5.15' of ↵Linus Torvalds6-595/+592
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "A first round of changes towards splitting the arch-specific bits from the filesystem bits of resctrl, the ultimate goal being to support ARM's equivalent technology MPAM, with the same fs interface (James Morse)" * tag 'x86_cache_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) x86/resctrl: Make resctrl_arch_get_config() return its value x86/resctrl: Merge the CDP resources x86/resctrl: Expand resctrl_arch_update_domains()'s msr_param range x86/resctrl: Remove rdt_cdp_peer_get() x86/resctrl: Merge the ctrl_val arrays x86/resctrl: Calculate the index from the configuration type x86/resctrl: Apply offset correction when config is staged x86/resctrl: Make ctrlval arrays the same size x86/resctrl: Pass configuration type to resctrl_arch_get_config() x86/resctrl: Add a helper to read a closid's configuration x86/resctrl: Rename update_domains() to resctrl_arch_update_domains() x86/resctrl: Allow different CODE/DATA configurations to be staged x86/resctrl: Group staged configuration into a separate struct x86/resctrl: Move the schemata names into struct resctrl_schema x86/resctrl: Add a helper to read/set the CDP configuration x86/resctrl: Swizzle rdt_resource and resctrl_schema in pseudo_lock_region x86/resctrl: Pass the schema to resctrl filesystem functions x86/resctrl: Add resctrl_arch_get_num_closid() x86/resctrl: Store the effective num_closid in the schema x86/resctrl: Walk the resctrl schema list instead of an arch list ...
2021-08-22x86/resctrl: Fix a maybe-uninitialized build warning treated as errorBabu Moger1-0/+6
The recent commit 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") caused a RHEL build failure with an uninitialized variable warning treated as an error because it removed the default case snippet. The RHEL Makefile uses '-Werror=maybe-uninitialized' to force possibly uninitialized variable warnings to be treated as errors. This is also reported by smatch via the 0day robot. The error from the RHEL build is: arch/x86/kernel/cpu/resctrl/monitor.c: In function ‘__mon_event_count’: arch/x86/kernel/cpu/resctrl/monitor.c:261:12: error: ‘m’ may be used uninitialized in this function [-Werror=maybe-uninitialized] m->chunks += chunks; ^~ The upstream Makefile does not build using '-Werror=maybe-uninitialized'. So, the problem is not seen there. Fix the problem by putting back the default case snippet. [ bp: note that there's nothing wrong with the code and other compilers do not trigger this warning - this is being done just so the RHEL compiler is happy. ] Fixes: 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") Reported-by: Terry Bowman <[email protected]> Reported-by: kernel test robot <[email protected]> Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/162949631908.23903.17090272726012848523.stgit@bmoger-ubuntu
2021-08-12x86/resctrl: Fix default monitoring groups reportingBabu Moger1-14/+13
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on the entire filesystem. Steps to reproduce: 1. mount -t resctrl resctrl /sys/fs/resctrl/ 2. cd /sys/fs/resctrl/ 3. cat mon_data/mon_L3_00/mbm_total_bytes 23189832 4. Create sub monitor group: mkdir mon_groups/test1 5. cat mon_data/mon_L3_00/mbm_total_bytes Unavailable When a new monitoring group is created, a new RMID is assigned to the new group. But the RMID is not active yet. When the events are read on the new RMID, it is expected to report the status as "Unavailable". When the user reads the events on the default monitoring group with multiple subgroups, the events on all subgroups are consolidated together. Currently, if any of the RMID reads report as "Unavailable", then everything will be reported as "Unavailable". Fix the issue by discarding the "Unavailable" reads and reporting all the successful RMID reads. This is not a problem on Intel systems as Intel reports 0 on Inactive RMIDs. Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Reported-by: Paweł Szulik <[email protected]> Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Reinette Chatre <[email protected]> Cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311 Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu
2021-08-11x86/resctrl: Make resctrl_arch_get_config() return its valueJames Morse3-16/+19
resctrl_arch_get_config() has no return, but does pass a single value back via one of its arguments. Return the value instead. Suggested-by: Borislav Petkov <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Merge the CDP resourcesJames Morse3-239/+85
resctrl uses struct rdt_resource to describe the available hardware resources. The domains of the CDP aliases share a single ctrl_val[] array. The only differences between the struct rdt_hw_resource aliases is the name and conf_type. The name from struct rdt_hw_resource is visible to user-space. To support another architecture, as many user-visible details should be handled in the filesystem parts of the code that is common to all architectures. The name and conf_type go together. Remove conf_type and the CDP aliases. When CDP is supported and enabled, schemata_list_create() can create two schemata using the single resource, generating the CODE/DATA suffix to the schema name itself. This allows the alloc_ctrlval_array() and complications around free()ing the ctrl_val arrays to be removed. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Expand resctrl_arch_update_domains()'s msr_param rangeJames Morse1-3/+9
resctrl_arch_update_domains() specifies the one closid that has been modified and needs copying to the hardware. resctrl_arch_update_domains() takes a struct rdt_resource and a closid as arguments, but copies all the staged configurations for that closid into the ctrl_val[] array. resctrl_arch_update_domains() is called once per schema, but once the resources and domains are merged, the second call of a L2CODE/L2DATA pair will find no staged configurations, as they were previously applied. The msr_param of the first call only has one index, so would only have update the hardware for the last staged configuration. To avoid a second round of IPIs when changing L2CODE and L2DATA in one go, expand the range of the msr_param if multiple staged configurations are found. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Remove rdt_cdp_peer_get()James Morse1-85/+14
When CDP is enabled, rdt_cdp_peer_get() finds the alternative CODE/DATA resource and returns the alternative domain. This is used to determine if bitmaps overlap when there are aliased entries in the two struct rdt_hw_resources. Now that the ctrl_val[] used by the CODE/DATA resources is the same, the search for an alternate resource/domain is not needed. Replace rdt_cdp_peer_get() with resctrl_peer_type(), which returns the alternative type. This can be passed to resctrl_arch_get_config() with the same resource and domain. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Merge the ctrl_val arraysJames Morse1-4/+61
Each struct rdt_hw_resource has its own ctrl_val[] array. When CDP is enabled, two resources are in use, each with its own ctrl_val[] array that holds half of the configuration used by hardware. One uses the odd slots, the other the even. rdt_cdp_peer_get() is the helper to find the alternate resource, its domain, and corresponding entry in the other ctrl_val[] array. Once the CDP resources are merged there will be one struct rdt_hw_resource and one ctrl_val[] array for each hardware resource. This will include changes to rdt_cdp_peer_get(), making it hard to bisect any issue. Merge the ctrl_val[] arrays for three CODE/DATA/NONE resources first. Doing this before merging the resources temporarily complicates allocating and freeing the ctrl_val arrays. Add a helper to allocate the ctrl_val array, that returns the value on the L2 or L3 resource if it already exists. This gets removed once the resources are merged, and there really is only one ctrl_val[] array. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Calculate the index from the configuration typeJames Morse2-22/+15
resctrl uses cbm_idx() to map a closid to an index in the configuration array. This is based on a multiplier and offset that are held in the resource. To merge the resources, the resctrl arch code needs to calculate the index from something else, as there will only be one resource. Decide based on the staged configuration type. This makes the static mult and offset parameters redundant. [ bp: Remove superfluous brackets in get_config_index() ] Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Apply offset correction when config is stagedJames Morse4-31/+27
When resctrl comes to copy the CAT MSR values from the ctrl_val[] array into hardware, it applies an offset adjustment based on the type of the resource. CODE and DATA resources have their closid mapped into an odd/even range. This mapping is based on a property of the resource. This happens once the new control value has been written to the ctrl_val[] array. Once the CDP resources are merged, there will only be a single property that needs to cover both odd/even mappings to the single ctrl_val[] array. The offset adjustment must be applied before the new value is written to the array. Move the logic from cat_wrmsr() to resctrl_arch_update_domains(). The value provided to apply_config() is now an index in the array, not the closid. The parameters provided via struct msr_param are now indexes too. As resctrl's use of closid is a u32, struct msr_param's type is changed to match. With this, the CODE and DATA resources only use the odd or even indexes in the array. This allows the temporary num_closid/2 fixes in domain_setup_ctrlval() and reset_all_ctrls() to be removed. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Make ctrlval arrays the same sizeJames Morse2-1/+18
The CODE and DATA resources report a num_closid that is half the actual size supported by the hardware. This behaviour is visible to user-space when CDP is enabled. The CODE and DATA resources have their own ctrlval arrays which are half the size of the underlying hardware because num_closid was already adjusted. One holds the odd configurations values, the other even. Before the CDP resources can be merged, the 'half the closids' behaviour needs to be implemented by schemata_list_create(), but this causes the ctrl_val[] array to be full sized. Remove the logic from the architecture specific rdt_get_cdp_config() setup, and add it to schemata_list_create(). Functions that walk all the configurations, such as domain_setup_ctrlval() and reset_all_ctrls(), take num_closid directly from struct rdt_hw_resource also have to halve num_closid as only the lower half of each array is in use. domain_setup_ctrlval() and reset_all_ctrls() both copy struct rdt_hw_resource's num_closid to a struct msr_param. Correct the value here. This is temporary as a subsequent patch will merge all three ctrl_val[] arrays such that when CDP is in use, the CODA/DATA layout in the array matches the hardware. reset_all_ctrls()'s loop over the whole of ctrl_val[] is not touched as this is harmless, and will be required as it is once the resources are merged. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Pass configuration type to resctrl_arch_get_config()James Morse3-15/+27
The ctrl_val[] array for a struct rdt_hw_resource only holds configurations of one type. The type is implicit. Once the CDP resources are merged, the ctrl_val[] array will hold all the configurations for the hardware resource. When a particular type of configuration is needed, it must be specified explicitly. Pass the expected type from the schema into resctrl_arch_get_config(). Nothing uses this yet, but once a single ctrl_val[] array is used for the three struct rdt_hw_resources that share hardware, the type will be used to return the correct configuration value from the shared array. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Add a helper to read a closid's configurationJames Morse3-30/+35
Functions like show_doms() reach into the architecture's private structure to retrieve the configuration from the struct rdt_hw_resource. The hardware configuration may look completely different to the values resctrl gets from user-space. The staged configuration and resctrl_arch_update_domains() allow the architecture to convert or translate these values. Resctrl shouldn't read or write the ctrl_val[] values directly. Add a helper to read the current configuration. This will allow another architecture to scale the bitmaps if necessary, and possibly use controls that don't take the user-space control format at all. Of the remaining functions that access ctrl_val[] directly, apply_config() is part of the architecture-specific code, and is called via resctrl_arch_update_domains(). reset_all_ctrls() will be an architecture specific helper. update_mba_bw() manipulates both ctrl_val[], mbps_val[] and the hardware. The mbps_val[] that matches the mba_sc state of the resource is changed, but the other is left unchanged. Abstracting this is the subject of later patches that affect set_mba_sc() too. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Rename update_domains() to resctrl_arch_update_domains()James Morse3-4/+3
update_domains() merges the staged configuration changes into the arch codes configuration array. Rename to make it clear it is part of the arch code interface to resctrl. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-11x86/resctrl: Allow different CODE/DATA configurations to be stagedJames Morse2-8/+17
Before the CDP resources can be merged, struct rdt_domain will need an array of struct resctrl_staged_config, one per type of configuration. Use the type as an index to the array to ensure that a schema configuration string can't specify the same domain twice. This will allow two schemata to apply configuration changes to one resource. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Babu Moger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]