Age | Commit message (Collapse) | Author | Files | Lines |
|
When configuration changes are made, the new value is written to struct
rdt_domain's new_ctrl field and the have_new_ctrl flag is set. Later
new_ctrl is copied to hardware by a call to update_domains().
Once the CDP resources are merged, there will be one new_ctrl field in
use by two struct resctrl_schema requiring a per-schema IPI to copy the
value to hardware.
Move new_ctrl and have_new_ctrl into a new struct resctrl_staged_config.
Before the CDP resources can be merged, struct rdt_domain will need an
array of these, one per type of configuration. Using the type as an
index to the array will ensure that a schema configuration string can't
specify the same domain twice.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
resctrl 'info' directories and schema parsing use the schema name.
This lives in the struct rdt_resource, and is specified by the
architecture code.
Once the CDP resources are merged, there will only be one resource (and
one name) in use by two schemata. To allow the CDP CODE/DATA property to
be the type of configuration the schema uses, the name should also be
per-schema.
Add a name field to struct resctrl_schema, and use this wherever
the schema name is exposed (or read from) user-space. Calculating
max_name_width for padding the schemata file also moves as this is
visible to user-space. As the names in struct rdt_resource already
include the CDP information, schemata_list_create() copies them.
schemata_list_create() includes the length of the CDP suffix when
calculating max_name_width in preparation for CDP resources being
merged.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Whether CDP is enabled for a hardware resource like the L3 cache can be
found by inspecting the alloc_enabled flags of the L3CODE/L3DATA struct
rdt_hw_resources, even if they aren't in use.
Once these resources are merged, the flags can't be compared. Whether
CDP is enabled needs tracking explicitly. If another architecture is
emulating CDP the behaviour may not be per-resource. 'cdp_capable' needs
to be visible to resctrl, even if its not in use, as this affects the
padding of the schemata table visible to user-space.
Add cdp_enabled to struct rdt_hw_resource and cdp_capable to struct
rdt_resource. Add resctrl_arch_set_cdp_enabled() to let resctrl enable
or disable CDP on a resource. resctrl_arch_get_cdp_enabled() lets it
read the current state.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
struct pseudo_lock_region points to the rdt_resource.
Once the resources are merged, this won't be unique. The resource name
is moving into the schema, so that the filesystem portions of resctrl can
generate it.
Swap pseudo_lock_region's rdt_resource pointer for a schema pointer.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Once the CDP resources are merged, there will be two struct
resctrl_schema for one struct rdt_resource. CDP becomes a type of
configuration that belongs to the schema.
Helpers like rdtgroup_cbm_overlaps() need access to the schema to query
the configuration (or configurations) based on schema properties.
Change these functions to take a struct schema instead of the struct
rdt_resource. All the modified functions are part of the filesystem code
that will move to /fs/resctrl once it is possible to support a second
architecture.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
To initialise struct resctrl_schema's num_closid, schemata_list_create()
reaches into the architectures private structure to retrieve num_closid
from the struct rdt_hw_resource. The 'half the closids' behaviour should
be part of the filesystem parts of resctrl that are the same on any
architecture. struct resctrl_schema's num_closid should include any
correction for CDP.
Having two properties called num_closid is likely to be confusing when
they have different values.
Add a helper to read the resource's num_closid from the arch code.
This should return the number of closid that the resource supports,
regardless of whether CDP is in use. Once the CDP resources are merged,
schemata_list_create() can apply the correction itself.
Using a type with an obvious size for the arch helper means changing the
type of num_closid to u32, which matches the type already used by struct
rdtgroup.
reset_all_ctrls() does not use resctrl_arch_get_num_closid(), even
though it sets up a structure for modifying the hardware. This function
will be part of the architecture code, the maximum closid should be the
maximum value the hardware has, regardless of the way resctrl is using
it. All the uses of num_closid in core.c are naturally part of the
architecture specific code.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Struct resctrl_schema holds properties that vary with the style of
configuration that resctrl applies to a resource. There are already
two values for the hardware's num_closid, depending on whether the
architecture presents the L3 or L3CODE/L3DATA resources.
As the way CDP changes the number of control groups that resctrl can
create is part of the user-space interface, it should be managed by the
filesystem parts of resctrl. This allows the architecture code to only
describe the value the hardware supports.
Add num_closid to resctrl_schema. This is the value seen by the
filesystem, which may be different to the maximum value described by the
arch code when CDP is enabled.
These functions operate on the num_closid value that is exposed to
user-space:
* rdtgroup_parse_resource()
* rdtgroup_schemata_show()
* rdt_num_closids_show()
* closid_init()
Change them to use the schema value instead. schemata_list_create() sets
this value, and reaches into the architecture-specific structure to get
the value. This will eventually be replaced with a helper.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
When parsing a schema configuration value from user-space, resctrl walks
the architectures rdt_resources_all[] array to find a matching struct
rdt_resource.
Once the CDP resources are merged there will be one resource in use
by two schemata. Anything walking rdt_resources_all[] on behalf of a
user-space request should walk the list of struct resctrl_schema
instead.
Change the users of for_each_alloc_enabled_rdt_resource() to walk the
schema instead. Schemata were only created for alloc_enabled resources
so these two lists are currently equivalent.
schemata_list_create() and rdt_kill_sb() are ignored. The first
creates the schema list, and will eventually loop over the resource
indexes using an arch helper to retrieve the resource. rdt_kill_sb()
will eventually make use of an arch 'reset everything' helper.
After the filesystem code is moved, rdtgroup_pseudo_locked_in_hierarchy()
remains part of the x86 specific hooks to support pseudo lock. This
code walks each domain, and still does this after the separate resources
are merged.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The names of resources are used for the schema name presented to
user-space. The name used is rooted in a structure provided by the
architecture code because the names are different when CDP is enabled.
x86 implements this by swapping between two sets of resource structures
based on their alloc_enabled flag. The type of configuration in-use is
encoded in the name (and cbm_idx_offset).
Once the CDP behaviour is moved into the parts of resctrl that will
move to /fs/, there will be two struct resctrl_schema for one struct
rdt_resource. The schema describes the type of configuration being
applied to the resource. The name of the schema should be generated
by resctrl, base on the type of configuration. To do this struct
resctrl_schema needs to store the type of configuration in use for a
schema.
Create an enum resctrl_conf_type describing the options, and add it to
struct resctrl_schema. The underlying resources are still separate, as
cbm_idx_offset is still in use.
Temporarily label all the entries in rdt_resources_all[] and copy that
value to struct resctrl_schema. Copying the value ensures there is no
mismatch while the filesystem parts of resctrl are modified to use the
schema. Once the resources are merged, the filesystem code can assign
this value based on the schema being created.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Many of resctrl's per-schema files return a value from struct
rdt_resource, which they take as their 'priv' pointer.
Moving properties that resctrl exposes to user-space into the core 'fs'
code, (e.g. the name of the schema), means some of the functions that
back the filesystem need the schema struct (to where the properties are
moved), but currently take struct rdt_resource. For example, once the
CDP resources are merged, struct rdt_resource no longer reflects all the
properties of the schema.
For the info dirs that represent a control, the information needed
will be accessed via struct resctrl_schema, as this is how the resource
is being used. For the monitors, its still struct rdt_resource as the
monitors aren't described as schema.
This difference means the type of the private pointers varies between
control and monitor info dirs.
Change the 'priv' pointer to point to struct resctrl_schema for
the per-schema files that represent a control. The type can be
determined from the fflags field. If the flags are RF_MON_INFO, its
a struct rdt_resource. If the flags are RF_CTRL_INFO, its a struct
resctrl_schema. No entry in res_common_files[] has both flags.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Resctrl exposes schemata to user-space, which allow the control values
to be specified for a group of tasks.
User-visible properties of the interface, (such as the schemata names
and how the values are parsed) are rooted in a struct provided by the
architecture code. (struct rdt_hw_resource). Once a second architecture
uses resctrl, this would allow user-visible properties to diverge
between architectures.
These properties should come from the resctrl code that will be common
to all architectures. Resctrl has no per-schema structure, only struct
rdt_{hw_,}resource. Create a struct resctrl_schema to hold the
rdt_resource. Before a second architecture can be supported, this
structure will also need to hold the schema name visible to user-space
and the type of configuration values for resctrl.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, it needs to be abstracted from
the features provided by Intel RDT and AMD PQoS, and moved to /fs/.
struct rdt_domain contains a mix of architecture private details and
properties of the filesystem interface user-space uses.
Continue by splitting struct rdt_domain, into an architecture private
'hw' struct, which contains the common resctrl structure that would be
used by any architecture. The hardware values in ctrl_val and mbps_val
need to be accessed via helpers to allow another architecture to convert
these into a different format if necessary. After this split, filesystem
code paths touching a 'hw' struct indicates where an abstraction is
needed.
Splitting this structure only moves types around, and should not lead
to any change in behaviour.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, it needs to be abstracted from
the features provided by Intel RDT and AMD PQoS, and moved to /fs/.
struct rdt_resource contains a mix of architecture private details
and properties of the filesystem interface user-space uses.
Start by splitting struct rdt_resource, into an architecture private
'hw' struct, which contains the common resctrl structure that would be
used by any architecture. The foreach helpers are most commonly used by
the filesystem code, and should return the common resctrl structure.
for_each_rdt_resource() is changed to walk the common structure in its
parent arch private structure.
Move as much of the structure as possible into the common structure
in the core code's header file. The x86 hardware accessors remain
part of the architecture private code, as do num_closid, mon_scale
and mbm_width.
mon_scale and mbm_width are used to detect overflow of the hardware
counters, and convert them from their native size to bytes. Any
cross-architecture abstraction should be in terms of bytes, making
these properties private.
The hardware's num_closid is kept in the private structure to force the
filesystem code to use a helper to access it. MPAM would return a single
value for the system, regardless of the resource. Using the helper
prevents this field from being confused with the version of num_closid
that is being exposed to user-space (added in a later patch).
After this split, filesystem code touching a 'hw' struct indicates
where an abstraction is needed.
Splitting this structure only moves types around, and should not lead
to any change in behaviour.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Add description of undocumented parameters. Issues detected by
scripts/kernel-doc.
Signed-off-by: Fabio M. De Francesco <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Add undocumented parameters detected by scripts/kernel-doc.
Signed-off-by: Fabio M. De Francesco <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
const variable must be initconst, not initdata.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99.
As discussed in [1] this commit was a no-op because the mapping type was
checked in vma_to_resize before move_vma is ever called. This meant that
vm_ops->mremap() would never be called on such mappings. Furthermore,
we've since expanded support of MREMAP_DONTUNMAP to non-anonymous
mappings, and these special mappings are still protected by the existing
check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL.
1. https://lkml.org/lkml/2020/12/28/2340
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Brian Geffon <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Reviewed-by: Dmitry Safonov <[email protected]>
Cc: Alejandro Colomar <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Lokesh Gidra <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Sonny Rao <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix another ~42 single-word typos in arch/x86/ code comments,
missed a few in the first pass, in particular in .S files.
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: [email protected]
|
|
Fix ~144 single-word typos in arch/x86/ code comments.
Doing this in a single commit should reduce the churn.
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: [email protected]
|
|
A CPU's current task can have its {closid, rmid} fields read locally
while they are being concurrently written to from another CPU.
This can happen anytime __resctrl_sched_in() races with either
__rdtgroup_move_task() or rdt_move_group_tasks().
Prevent load / store tearing for those accesses by giving them the
READ_ONCE() / WRITE_ONCE() treatment.
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/9921fda88ad81afb9885b517fbe864a2bc7c35a9.1608243147.git.reinette.chatre@intel.com
|
|
unnecessary IPI
James reported in [1] that there could be two tasks running on the same CPU
with task_struct->on_cpu set. Using task_struct->on_cpu as a test if a task
is running on a CPU may thus match the old task for a CPU while the
scheduler is running and IPI it unnecessarily.
task_curr() is the correct helper to use. While doing so move the #ifdef
check of the CONFIG_SMP symbol to be a C conditional used to determine
if this helper should be used to ensure the code is always checked for
correctness by the compiler.
[1] https://lore.kernel.org/lkml/[email protected]
Reported-by: James Morse <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/e9e68ce1441a73401e08b641cc3b9a3cf13fe6d4.1608243147.git.reinette.chatre@intel.com
|
|
Mark the function with the __printf attribute to allow the compiler to
more thoroughly typecheck its arguments against a format string with
-Wformat and similar flags.
[ bp: Massage commit message. ]
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Shakeel Butt reported in [1] that a user can request a task to be moved
to a resource group even if the task is already in the group. It just
wastes time to do the move operation which could be costly to send IPI
to a different CPU.
Add a sanity check to ensure that the move operation only happens when
the task is not already in the resource group.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/962ede65d8e95be793cb61102cca37f7bb018e66.1608243147.git.reinette.chatre@intel.com
|
|
Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.
This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.
This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.
To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/[email protected]
[ bp: Massage commit message and drop the two update_task_closid_rmid()
variants. ]
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <[email protected]>
Reported-by: Valentin Schneider <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: James Morse <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com
|
|
Merge misc updates from Andrew Morton:
- a few random little subsystems
- almost all of the MM patches which are staged ahead of linux-next
material. I'll trickle to post-linux-next work in as the dependents
get merged up.
Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).
* emailed patches from Andrew Morton <[email protected]>: (200 commits)
mm: cleanup kstrto*() usage
mm: fix fall-through warnings for Clang
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
mm:backing-dev: use sysfs_emit in macro defining functions
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
mm: use sysfs_emit for struct kobject * uses
mm: fix kernel-doc markups
zram: break the strict dependency from lzo
zram: add stat to gather incompressible pages since zram set up
zram: support page writeback
mm/process_vm_access: remove redundant initialization of iov_r
mm/zsmalloc.c: rework the list_add code in insert_zspage()
mm/zswap: move to use crypto_acomp API for hardware acceleration
mm/zswap: fix passing zero to 'PTR_ERR' warning
mm/zswap: make struct kernel_param_ops definitions const
userfaultfd/selftests: hint the test runner on required privilege
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd/selftests: always dump something in modes
userfaultfd: selftests: make __{s,u}64 format specifiers portable
...
|
|
As kernel expect to see only one of such mappings, any further operations
on the VMA-copy may be unexpected by the kernel. Maybe it's being on the
safe side, but there doesn't seem to be any expected use-case for this, so
restrict it now.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Geffon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource control updates from Borislav Petkov:
- add logic to correct MBM total and local values fixing errata SKX99
and BDF102 (Fenghua Yu)
- cleanups
* tag 'x86_cache_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Clean up unused function parameter in rmdir path
x86/resctrl: Constify kernfs_ops
x86/resctrl: Correct MBM total and local values
Documentation/x86: Rename resctrl_ui.rst and add two errata to the file
|
|
The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().
The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:
rdtgroup_mondata_show()
mon_event_read()
mon_event_count()
__mon_event_count()
In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.
When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is shown to
the user.
Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.
Test steps:
# Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
&& make
./tools/membw/membw -c 0 -b 10000 --read
# Enable MBA software controller
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
# Create control group c1
mkdir /sys/fs/resctrl/c1
# Set MB throttle to 6 GB/s
echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata
# Write PID of the workload to tasks file
echo `pidof membw` > /sys/fs/resctrl/c1/tasks
# Read local bytes counters twice with 1s interval, the calculated
# local bandwidth is not as expected (approaching to 6 GB/s):
local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
sleep 1
local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
echo "local b/w (bytes/s):" `expr $local_2 - $local_1`
Before fix:
local b/w (bytes/s): 11076796416
After fix:
local b/w (bytes/s): 5465014272
Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Commit
fd8d9db3559a ("x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak")
removed superfluous kernfs_get() calls in rdtgroup_ctrl_remove() and
rdtgroup_rmdir_ctrl(). That change resulted in an unused function
parameter to these two functions.
Clean up the unused function parameter in rdtgroup_ctrl_remove(),
rdtgroup_rmdir_mon() and their callers rdtgroup_rmdir_ctrl() and
rdtgroup_rmdir().
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Merge -rc6 tag to pick up dependent changes.
Signed-off-by: Borislav Petkov <[email protected]>
|
|
When the AMD QoS feature CDP (code and data prioritization) is enabled
or disabled, the CDP bit in MSR 0000_0C81 is written on one of the CPUs
in an L3 domain (core complex). That is not correct - the CDP bit needs
to be updated on all the logical CPUs in the domain.
This was not spelled out clearly in the spec earlier. The specification
has been updated and the updated document, "AMD64 Technology Platform
Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue
Date: October 2020" is available now. Refer the section: Code and Data
Prioritization.
Fix the issue by adding a new flag arch_has_per_cpu_cfg in rdt_cache
data structure.
The documentation can be obtained at:
https://developer.amd.com/wp-content/resources/56375.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
[ bp: Massage commit message. ]
Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/160675180380.15628.3309402017215002347.stgit@bmoger-ubuntu
|
|
On resource group creation via a mkdir an extra kernfs_node reference is
obtained by kernfs_get() to ensure that the rdtgroup structure remains
accessible for the rdtgroup_kn_unlock() calls where it is removed on
deletion. Currently the extra kernfs_node reference count is only
dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup
structure is removed in a few other locations that lack the matching
reference drop.
In call paths of rmdir and umount, when a control group is removed,
kernfs_remove() is called to remove the whole kernfs nodes tree of the
control group (including the kernfs nodes trees of all child monitoring
groups), and then rdtgroup structure is freed by kfree(). The rdtgroup
structures of all child monitoring groups under the control group are
freed by kfree() in free_all_child_rdtgrp().
Before calling kfree() to free the rdtgroup structures, the kernfs node
of the control group itself as well as the kernfs nodes of all child
monitoring groups still take the extra references which will never be
dropped to 0 and the kernfs nodes will never be freed. It leads to
reference count leak and kernfs_node_cache memory leak.
For example, reference count leak is observed in these two cases:
(1) mount -t resctrl resctrl /sys/fs/resctrl
mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
umount /sys/fs/resctrl
(2) mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
rmdir /sys/fs/resctrl/c1
The same reference count leak issue also exists in the error exit paths
of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon().
Fix this issue by following changes to make sure the extra kernfs_node
reference on rdtgroup is dropped before freeing the rdtgroup structure.
(1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up
kernfs_put() and kfree().
(2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup
structure is about to be freed by kfree().
(3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error
exit paths of mkdir where an extra reference is taken by kernfs_get().
Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Reported-by: Willem de Bruijn <[email protected]>
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Willem reported growing of kernfs_node_cache entries in slabtop when
repeatedly creating and removing resctrl subdirectories as well as when
repeatedly mounting and unmounting the resctrl filesystem.
On resource group (control as well as monitoring) creation via a mkdir
an extra kernfs_node reference is obtained to ensure that the rdtgroup
structure remains accessible for the rdtgroup_kn_unlock() calls where it
is removed on deletion. The kernfs_node reference count is dropped by
kernfs_put() in rdtgroup_kn_unlock().
With the above explaining the need for one kernfs_get()/kernfs_put()
pair in resctrl there are more places where a kernfs_node reference is
obtained without a corresponding release. The excessive amount of
reference count on kernfs nodes will never be dropped to 0 and the
kernfs nodes will never be freed in the call paths of rmdir and umount.
It leads to reference count leak and kernfs_node_cache memory leak.
Remove the superfluous kernfs_get() calls and expand the existing
comments surrounding the remaining kernfs_get()/kernfs_put() pair that
remains in use.
Superfluous kernfs_get() calls are removed from two areas:
(1) In call paths of mount and mkdir, when kernfs nodes for "info",
"mon_groups" and "mon_data" directories and sub-directories are
created, the reference count of newly created kernfs node is set to 1.
But after kernfs_create_dir() returns, superfluous kernfs_get() are
called to take an additional reference.
(2) kernfs_get() calls in rmdir call paths.
Fixes: 17eafd076291 ("x86/intel_rdt: Split resource group removal in two")
Fixes: 4af4a88e0c92 ("x86/intel_rdt/cqm: Add mount,umount support")
Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data")
Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Fixes: 5dc1d5c6bac2 ("x86/intel_rdt: Simplify info and base file lists")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system")
Reported-by: Willem de Bruijn <[email protected]>
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Willem de Bruijn <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The only usage of the kf_ops field in the rftype struct is to pass
it as argument to __kernfs_create_file(), which accepts a pointer to
const. Make it a pointer to const. This makes it possible to make
rdtgroup_kf_single_ops and kf_mondata_ops const, which allows the
compiler to put them in read-only memory.
Signed-off-by: Rikard Falkeborn <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Intel Memory Bandwidth Monitoring (MBM) counters may report system
memory bandwidth incorrectly on some Intel processors. The errata SKX99
for Skylake server, BDF102 for Broadwell server, and the correction
factor table are documented in Documentation/x86/resctrl.rst.
Intel MBM counters track metrics according to the assigned Resource
Monitor ID (RMID) for that logical core. The IA32_QM_CTR register
(MSR 0xC8E) used to report these metrics, may report incorrect system
bandwidth for certain RMID values.
Due to the errata, system memory bandwidth may not match what is
reported.
To work around the errata, correct MBM total and local readings using a
correction factor table. If rmid > rmid threshold, MBM total and local
values should be multiplied by the correction factor.
[ bp: Mark mbm_cf_table[] __initdata. ]
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
A previous commit changed the notification mode from true/false to an
int, allowing notify-no, notify-yes, or signal-notify. This was
backwards compatible in the sense that any existing true/false user
would translate to either 0 (on notification sent) or 1, the latter
which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.
Clean this up properly, and define a proper enum for the notification
mode. Now we have:
- TWA_NONE. This is 0, same as before the original change, meaning no
notification requested.
- TWA_RESUME. This is 1, same as before the original change, meaning
that we use TIF_NOTIFY_RESUME.
- TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
notification.
Clean up all the callers, switching their 0/1/false/true to using the
appropriate TWA_* mode for notifications.
Fixes: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource control updates from Borislav Petkov:
- Misc cleanups to the resctrl code in preparation for the ARM side
(James Morse)
- Add support for controlling per-thread memory bandwidth throttling
delay values on hw which supports it (Fenghua Yu)
* tag 'x86_cache_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Enable user to view thread or core throttling mode
x86/resctrl: Enumerate per-thread MBA controls
cacheinfo: Move resctrl's get_cache_id() to the cacheinfo header file
x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps
x86/resctrl: Merge AMD/Intel parse_bw() calls
x86/resctrl: Add struct rdt_membw::arch_needs_linear to explain AMD/Intel MBA difference
x86/resctrl: Use is_closid_match() in more places
x86/resctrl: Include pid.h
x86/resctrl: Use container_of() in delayed_work handlers
x86/resctrl: Fix stale comment
x86/resctrl: Remove struct rdt_membw::max_delay
x86/resctrl: Remove unused struct mbm_state::chunks_bw
|
|
Fix spelling mistake "Could't" -> "Couldn't" in user-visible warning
messages.
[ bp: Massage commit message; s/cpu/CPU/g ]
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Early Intel hardware implementations of Memory Bandwidth Allocation (MBA)
could only control bandwidth at the processor core level. This meant that
when two processes with different bandwidth allocations ran simultaneously
on the same core the hardware had to resolve this difference. It did so by
applying the higher throttling value (lower bandwidth) to both processes.
Newer implementations can apply different throttling values to each
thread on a core.
Introduce a new resctrl file, "thread_throttle_mode", on Intel systems
that shows to the user how throttling values are allocated, per-core or
per-thread.
On systems that support per-core throttling, the file will display "max".
On newer systems that support per-thread throttling, the file will display
"per-thread".
AMD confirmed in [1] that AMD bandwidth allocation is already at thread
level but that the AMD implementation does not use a memory delay
throttle mode. So to avoid confusion the thread throttling mode would be
UNDEFINED on AMD systems and the "thread_throttle_mode" file will not be
visible.
Originally-by: Reinette Chatre <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: [1] https://lore.kernel.org/lkml/[email protected]
|
|
resctrl/core.c defines get_cache_id() for use in its cpu-hotplug
callbacks. This gets the id attribute of the cache at the corresponding
level of a CPU.
Later rework means this private function needs to be shared. Move
it to the header file.
The name conflicts with a different definition in intel_cacheinfo.c,
name it get_cpu_cacheinfo_id() to show its relation with
get_cpu_cacheinfo().
Now this is visible on other architectures, check the id attribute
has actually been set.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Intel CPUs expect the cache bitmap provided by user-space to have on a
single span of 1s, whereas AMD can support bitmaps like 0xf00f. Arm's
MPAM support also allows sparse bitmaps.
Similarly, Intel CPUs check at least one bit set, whereas AMD CPUs are
quite happy with an empty bitmap. Arm's MPAM allows an empty bitmap.
To move resctrl out to /fs/, platform differences like this need to be
explained.
Add two resource properties arch_has_{empty,sparse}_bitmaps. Test these
around the relevant parts of cbm_validate().
Merging the validate calls causes AMD to gain the min_cbm_bits test
needed for Haswell, but as it always sets this value to 1, it will never
match.
[ bp: Massage commit message. ]
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Now after arch_needs_linear has been added, the parse_bw() calls are
almost the same between AMD and Intel.
The difference is '!is_mba_sc()', which is not checked on AMD. This
will always be true on AMD CPUs as mba_sc cannot be enabled as
is_mba_linear() is false.
Removing this duplication means user-space visible behaviour and
error messages are not validated or generated in different places.
Reviewed-by : Babu Moger <[email protected]>
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
MBA difference
The configuration values user-space provides to the resctrl filesystem
are ABI. To make this work on another architecture, all the ABI bits
should be moved out of /arch/x86 and under /fs.
To do this, the differences between AMD and Intel CPUs needs to be
explained to resctrl via resource properties, instead of function
pointers that let the arch code accept subtly different values on
different platforms/architectures.
For MBA, Intel CPUs reject configuration attempts for non-linear
resources, whereas AMD ignore this field as its MBA resource is never
linear. To merge the parse/validate functions, this difference needs to
be explained.
Add struct rdt_membw::arch_needs_linear to indicate the arch code needs
the linear property to be true to configure this resource. AMD can set
this and delay_linear to false. Intel can set arch_needs_linear to
true to keep the existing "No support for non-linear MB domains" error
message for affected platforms.
[ bp: convert "we" etc to passive voice. ]
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
rdtgroup_tasks_assigned() and show_rdt_tasks() loop over threads testing
for a CTRL/MON group match by closid/rmid with the provided rdtgrp.
Further down the file are helpers to do this, move these further up and
make use of them here.
These helpers additionally check for alloc/mon capable. This is harmless
as rdtgroup_mkdir() tests these capable flags before allowing the config
directories to be created.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
mbm_handle_overflow() and cqm_handle_limbo() are both provided with
the domain's work_struct when called, but use get_domain_from_cpu()
to find the domain, along with the appropriate error handling.
container_of() saves some list walking and bitmap testing, use that
instead.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The comment in rdtgroup_init() refers to the non existent function
rdt_mount(), which has now been renamed rdt_get_tree(). Fix the
comment.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
max_delay is used by x86's __get_mem_config_intel() as a local variable.
Remove it, replacing it with a local variable.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Nothing reads struct mbm_states's chunks_bw value, its a copy of
chunks. Remove it.
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The callers don't expect *d_cdp to be set to an error pointer, they only
check for NULL. This leads to a static checker warning:
arch/x86/kernel/cpu/resctrl/rdtgroup.c:2648 __init_one_rdt_domain()
warn: 'd_cdp' could be an error pointer
This would not trigger a bug in this specific case because
__init_one_rdt_domain() calls it with a valid domain that would not have
a negative id and thus not trigger the return of the ERR_PTR(). If this
was a negative domain id then the call to rdt_find_domain() in
domain_add_cpu() would have returned the ERR_PTR() much earlier and the
creation of the domain with an invalid id would have been prevented.
Even though a bug is not triggered currently the right and safe thing to
do is to set the pointer to NULL because that is what can be checked for
when the caller is handling the CDP and non-CDP cases.
Fixes: 52eb74339a62 ("x86/resctrl: Fix rdt_find_domain() return value and checks")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Reinette Chatre <[email protected]>
Acked-by: Fenghua Yu <[email protected]>
Link: https://lkml.kernel.org/r/20200602193611.GA190851@mwanda
|
|
Memory bandwidth is calculated reading the monitoring counter
at two intervals and calculating the delta. It is the software’s
responsibility to read the count often enough to avoid having
the count roll over _twice_ between reads.
The current code hardcodes the bandwidth monitoring counter's width
to 24 bits for AMD. This is due to default base counter width which
is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
to adjust the counter width. But, the AMD hardware supports much
wider bandwidth counter with the default width of 44 bits.
Kernel reads these monitoring counters every 1 second and adjusts the
counter value for overflow. With 24 bits and scale value of 64 for AMD,
it can only measure up to 1GB/s without overflowing. For the rates
above 1GB/s this will fail to measure the bandwidth.
Fix the issue setting the default width to 44 bits by adjusting the
offset.
AMD future products will implement CPUID 0xF.[ECX=1]:EAX.
[ bp: Let the line stick out and drop {}-brackets around a single
statement. ]
Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|