| Age | Commit message (Collapse) | Author | Files | Lines |
|
Kernel-doc markups should use this format:
identifier - description
While here, fix a kernel-doc tag that was using, instead,
a normal comment block.
[[email protected]: coding style fixes]
Link: https://lkml.kernel.org/r/c5e38e1070f8dbe2f9607a10b44afe2875bd966c.1605521731.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Cc: "Jonathan Corbet" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now we have for 'other' and 'type' variables
other type return
0 0 REGION_DISJOINT
0 x REGION_INTERSECTS
x 0 REGION_DISJOINT
x x REGION_MIXED
Obviously it's easier to check 'type' for 0 first instead of
currently checked 'other'.
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Hanjun Guo <[email protected]>
Tested-by: Hanjun Guo <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
"mem" in the name already indicates the root, similar to
release_mem_region() and devm_request_mem_region(). Make it implicit.
The only single caller always passes iomem_resource, other parents are not
applicable.
Suggested-by: Wei Yang <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Baoquan He <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
resources
Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.
This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space). We really want to merge
added resources in this scenario where possible.
Let's provide a flag (MEMHP_MERGE_RESOURCE) to specify that a resource
either created within add_memory*() or passed via add_memory_resource()
shall be marked mergeable and merged with applicable siblings.
To implement that, we need a kernel/resource interface to mark selected
System RAM resources mergeable (IORESOURCE_SYSRAM_MERGEABLE) and trigger
merging.
Note: We really want to merge after the whole operation succeeded, not
directly when adding a resource to the resource tree (it would break
add_memory_resource() and require splitting resources again when the
operation failed - e.g., due to -ENOMEM).
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Roger Pau Monné <[email protected]>
Cc: Julien Grall <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Leonardo Bras <[email protected]>
Cc: Libor Pechacek <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: "Oliver O'Halloran" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vishal Verma <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "selective merging of system ram resources", v4.
Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.
This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space). We really want to merge
added resources in this scenario where possible.
Resources are effectively stored in a list-based tree. Having a lot of
resources not only wastes memory, it also makes traversing that tree more
expensive, and makes /proc/iomem explode in size (e.g., requiring
kexec-tools to manually merge resources when creating a kdump header. The
current kexec-tools resource count limit does not allow for more than
~100GB of memory with a memory block size of 128MB on x86-64).
Let's allow to selectively merge system ram resources by specifying a new
flag for add_memory*(). Patch #5 contains a /proc/iomem example. Only
tested with virtio-mem.
This patch (of 8):
Let's make sure splitting a resource on memory hotunplug will never fail.
This will become more relevant once we merge selected System RAM resources
- then, we'll trigger that case more often on memory hotunplug.
In general, this function is already unlikely to fail. When we remove
memory, we free up quite a lot of metadata (memmap, page tables, memory
block device, etc.). The only reason it could really fail would be when
injecting allocation errors.
All other error cases inside release_mem_region_adjustable() seem to be
sanity checks if the function would be abused in different context - let's
add WARN_ON_ONCE() in these cases so we can catch them.
[[email protected]: fix use of ternary condition in release_mem_region_adjustable]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://github.com/ClangBuiltLinux/linux/issues/1159
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Julien Grall <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Leonardo Bras <[email protected]>
Cc: Libor Pechacek <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: "Oliver O'Halloran" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Roger Pau Monn <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Wei Liu <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In support of detecting whether a resource might have been been claimed,
report the parent to the walk_iomem_res_desc() callback. For example, the
ACPI HMAT parser publishes "hmem" platform devices per target range.
However, if the HMAT is disabled / missing a fallback driver can attach
devices to the raw memory ranges as a fallback if it sees unclaimed /
orphan "Soft Reserved" resources in the resource tree.
Otherwise, find_next_iomem_res() returns a resource with garbage data from
the stack allocation in __walk_iomem_res_desc() for the res->parent field.
There are currently no users that expect ->child and ->sibling to be
valid, and the resource_lock would be needed to traverse them. Use a
compound literal to implicitly zero initialize the fields that are not
being returned in addition to setting ->parent.
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Hulk Robot <[email protected]>
Cc: Jason Yan <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Vivek Goyal <[email protected]>
Link: https://lkml.kernel.org/r/159643097166.4062302.11875688887228572793.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Close the hole of holding a mapping over kernel driver takeover event of
a given address range.
Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
kernel against scenarios where a /dev/mem user tramples memory that a
kernel driver owns. However, this protection only prevents *new* read(),
write() and mmap() requests. Established mappings prior to the driver
calling request_mem_region() are left alone.
Especially with persistent memory, and the core kernel metadata that is
stored there, there are plentiful scenarios for a /dev/mem user to
violate the expectations of the driver and cause amplified damage.
Teach request_mem_region() to find and shoot down active /dev/mem
mappings that it believes it has successfully claimed for the exclusive
use of the driver. Effectively a driver call to request_mem_region()
becomes a hole-punch on the /dev/mem device.
The typical usage of unmap_mapping_range() is part of
truncate_pagecache() to punch a hole in a file, but in this case the
implementation is only doing the "first half" of a hole punch. Namely it
is just evacuating current established mappings of the "hole", and it
relies on the fact that /dev/mem establishes mappings in terms of
absolute physical address offsets. Once existing mmap users are
invalidated they can attempt to re-establish the mapping, or attempt to
continue issuing read(2) / write(2) to the invalidated extent, but they
will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
block those subsequent accesses.
Cc: Arnd Bergmann <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Russell King <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Patch series "mm/memory_hotplug: online_pages() cleanups", v2.
Some cleanups (+ one fix for a special case) in the context of
online_pages().
This patch (of 5):
This makes it clearer that we will never call func() with duplicate PFNs
in case we have multiple sub-page memory resources. All unaligned parts
of PFNs are completely discarded.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Factor out the guts of devm_request_free_mem_region so that we can
implement both a device managed and a manually release version as tiny
wrappers around it.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Tested-by: Bharata B Rao <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
find_next_iomem_res() shows up to be a source for overhead in dax
benchmarks.
Improve performance by not considering children of the tree if the top
level does not match. Since the range of the parents should include the
range of the children such check is redundant.
Running sysbench on dax (pmem emulation, with write_cache disabled):
sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
--file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run
Provides the following results:
events (avg/stddev)
-------------------
5.2-rc3: 1247669.0000/16075.39
w/patch: 1286320.5000/16402.72 (+3%)
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nadav Amit <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since resources can be removed, locking should ensure that the resource
is not removed while accessing it. However, find_next_iomem_res() does
not hold the lock while copying the data of the resource.
Keep holding the lock while the data is copied. While at it, change the
return value to a more informative value. It is disregarded by the
callers.
[[email protected]: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Keep the physical address allocation that hmm_add_device does with the
rest of the resource code, and allow future reuse of it without the hmm
wrapper.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The three checks in region_intersects() are basically an open-coded version
of resource_overlaps() - so use the real thing.
Also fix typos in comments while at it.
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Like Xu <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull device-dax updates from Dan Williams:
"New device-dax infrastructure to allow persistent memory and other
"reserved" / performance differentiated memories, to be assigned to
the core-mm as "System RAM".
Some users want to use persistent memory as additional volatile
memory. They are willing to cope with potential performance
differences, for example between DRAM and 3D Xpoint, and want to use
typical Linux memory management apis rather than a userspace memory
allocator layered over an mmap() of a dax file. The administration
model is to decide how much Persistent Memory (pmem) to use as System
RAM, create a device-dax-mode namespace of that size, and then assign
it to the core-mm. The rationale for device-dax is that it is a
generic memory-mapping driver that can be layered over any "special
purpose" memory, not just pmem. On subsequent boots udev rules can be
used to restore the memory assignment.
One implication of using pmem as RAM is that mlock() no longer keeps
data off persistent media. For this reason it is recommended to enable
NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
at rest. We considered making this recommendation an actively enforced
requirement, but in the end decided to leave it as a distribution /
administrator policy to allow for emulation and test environments that
lack security capable NVDIMMs.
Summary:
- Replace the /sys/class/dax device model with /sys/bus/dax, and
include a compat driver so distributions can opt-in to the new ABI.
- Allow for an alternative driver for the device-dax address-range
- Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.
- Arrange for the device-dax target-node to be onlined so that the
newly added memory range can be uniquely referenced by numa apis"
NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.
And apparently the PMEM modules can cause that a lot more than regular
RAM. The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.
Quoting Dan from another email:
"The exposure can be reduced in the volatile-RAM case by scanning for
and clearing errors before it is onlined as RAM. The userspace tooling
for that can be in place before v5.1-final. There's also runtime
notifications of errors via acpi_nfit_uc_error_notify() from
background scrubbers on the DIMM devices. With that mechanism the
kernel could proactively clear newly discovered poison in the volatile
case, but that would be additional development more suitable for v5.2.
I understand the concern, and the need to highlight this issue by
tapping the brakes on feature development, but I don't see PMEM as RAM
making the situation worse when the exposure is also there via DAX in
the PMEM case. Volatile-RAM is arguably a safer use case since it's
possible to repair pages where the persistent case needs active
application coordination"
* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: "Hotplug" persistent memory for use like normal RAM
mm/resource: Let walk_system_ram_range() search child resources
mm/memory-hotplug: Allow memory resources to be children
mm/resource: Move HMM pr_debug() deeper into resource code
mm/resource: Return real error codes from walk failures
device-dax: Add a 'modalias' attribute to DAX 'bus' devices
device-dax: Add a 'target_node' attribute
device-dax: Auto-bind device after successful new_id
acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
device-dax: Add /sys/class/dax backwards compatibility
device-dax: Add support for a dax override driver
device-dax: Move resource pinning+mapping into the common driver
device-dax: Introduce bus + driver model
device-dax: Start defining a dax bus model
device-dax: Remove multi-resource infrastructure
device-dax: Kill dax_region base
device-dax: Kill dax_region ida
|
|
In the process of onlining memory, we use walk_system_ram_range()
to find the actual RAM areas inside of the area being onlined.
However, it currently only finds memory resources which are
"top-level" iomem_resources. Children are not currently
searched which causes it to skip System RAM in areas like this
(in the format of /proc/iomem):
a0000000-bfffffff : Persistent Memory (legacy)
a0000000-afffffff : System RAM
Changing the true->false here allows children to be searched
as well. We need this because we add a new "System RAM"
resource underneath the "persistent memory" resource when
we use persistent memory in a volatile mode.
Signed-off-by: Dave Hansen <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Huang Ying <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Yaowei Bai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Jerome Glisse <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
HMM consumes physical address space for its own use, even
though nothing is mapped or accessible there. It uses a
special resource description (IORES_DESC_DEVICE_PRIVATE_MEMORY)
to uniquely identify these areas.
When HMM consumes address space, it makes a best guess about
what to consume. However, it is possible that a future memory
or device hotplug can collide with the reserved area. In the
case of these conflicts, there is an error message in
register_memory_resource().
Later patches in this series move register_memory_resource()
from using request_resource_conflict() to __request_region().
Unfortunately, __request_region() does not return the conflict
like the previous function did, which makes it impossible to
check for IORES_DESC_DEVICE_PRIVATE_MEMORY in a conflicting
resource.
Instead of warning in register_memory_resource(), move the
check into the core resource code itself (__request_region())
where the conflicting resource _is_ available. This has the
added bonus of producing a warning in case of HMM conflicts
with devices *or* RAM address space, as opposed to the RAM-
only warnings that were there previously.
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Jerome Glisse <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Huang Ying <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Keith Busch <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
walk_system_ram_range() can return an error code either becuase
*it* failed, or because the 'func' that it calls returned an
error. The memory hotplug does the following:
ret = walk_system_ram_range(..., func);
if (ret)
return ret;
and 'ret' makes it out to userspace, eventually. The problem
s, walk_system_ram_range() failues that result from *it* failing
(as opposed to 'func') return -1. That leads to a very odd
-EPERM (-1) return code out to userspace.
Make walk_system_ram_range() return -EINVAL for internal
failures to keep userspace less confused.
This return code is compatible with all the callers that I
audited.
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Bjorn Helgaas <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Huang Ying <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Yaowei Bai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: [email protected]
Cc: Keith Busch <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
Since commit c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
it is possible to use the generic walk_system_ram_range() and
the generic page_is_ram().
To enable the use of walk_system_ram_range() by the IBM EHEA ethernet
driver, we still need an export of the generic function.
As powerpc was the only user of CONFIG_ARCH_HAS_WALK_MEMORY, the
ifdef around the generic walk_system_ram_range() has become useless
and can be dropped.
Fixes: c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Keep the EXPORT_SYMBOL_GPL in powerpc code]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This is a preparation for the next patch.
Currently, we only call release_mem_region_adjustable() in __remove_pages
if the zone is not ZONE_DEVICE, because resources that belong to HMM/devm
are being released by themselves with devm_release_mem_region.
Since we do not want to touch any zone/page stuff during the removing of
the memory (but during the offlining), we do not want to check for the
zone here. So we need another way to tell release_mem_region_adjustable()
to not realease the resource in case it belongs to HMM/devm.
HMM/devm acquires/releases a resource through
devm_request_mem_region/devm_release_mem_region.
These resources have the flag IORESOURCE_MEM, while resources acquired by
hot-add memory path (register_memory_resource()) contain
IORESOURCE_SYSTEM_RAM.
So, we can check for this flag in release_mem_region_adjustable, and if
the resource does not contain such flag, we know that we are dealing with
a HMM/devm resource, so we can back off.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oscar Salvador <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the missing kernel-doc style function parameters documentation.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: b69c2e20f6e4 ("resource: Clean it up a bit")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The first group of warnings is caused by a "/**" kernel-doc notation
marker but the function comments are not in kernel-doc format.
Also add another error return value here.
../kernel/resource.c:337: warning: Function parameter or member 'start' not described in 'find_next_iomem_res'
../kernel/resource.c:337: warning: Function parameter or member 'end' not described in 'find_next_iomem_res'
../kernel/resource.c:337: warning: Function parameter or member 'flags' not described in 'find_next_iomem_res'
../kernel/resource.c:337: warning: Function parameter or member 'desc' not described in 'find_next_iomem_res'
../kernel/resource.c:337: warning: Function parameter or member 'first_lvl' not described in 'find_next_iomem_res'
../kernel/resource.c:337: warning: Function parameter or member 'res' not described in 'find_next_iomem_res'
Add the missing function parameter documentation for the other warnings:
../kernel/resource.c:409: warning: Function parameter or member 'arg' not described in 'walk_iomem_res_desc'
../kernel/resource.c:409: warning: Function parameter or member 'func' not described in 'walk_iomem_res_desc'
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: b69c2e20f6e4 ("resource: Clean it up a bit")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
- Drop BUG_ON()s and do normal error handling instead, in
find_next_iomem_res().
- Align function arguments on opening braces.
- Get rid of local var sibling_only in find_next_iomem_res().
- Shorten unnecessarily long first_level_children_only arg name.
Signed-off-by: Borislav Petkov <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Bjorn Helgaas <[email protected]>
CC: Brijesh Singh <[email protected]>
CC: Dan Williams <[email protected]>
CC: H. Peter Anvin <[email protected]>
CC: Lianbo Jiang <[email protected]>
CC: Takashi Iwai <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Tom Lendacky <[email protected]>
CC: Vivek Goyal <[email protected]>
CC: Yaowei Bai <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
Link: <new submission>
|
|
Previously find_next_iomem_res() used "*res" as both an input parameter for
the range to search and the type of resource to search for, and an output
parameter for the resource we found, which makes the interface confusing.
The current callers use find_next_iomem_res() incorrectly because they
allocate a single struct resource and use it for repeated calls to
find_next_iomem_res(). When find_next_iomem_res() returns a resource, it
overwrites the start, end, flags, and desc members of the struct. If we
call find_next_iomem_res() again, we must update or restore these fields.
The previous code restored res.start and res.end, but not res.flags or
res.desc.
Since the callers did not restore res.flags, if they searched for flags
IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags
IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would
incorrectly skip resources unless they were also marked as
IORESOURCE_SYSRAM.
Fix this by restructuring the interface so it takes explicit "start, end,
flags" parameters and uses "*res" only as an output parameter.
Based on a patch by Lianbo Jiang <[email protected]>.
[ bp: While at it:
- make comments kernel-doc style.
-
Originally-by: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Brijesh Singh <[email protected]>
CC: Dan Williams <[email protected]>
CC: H. Peter Anvin <[email protected]>
CC: Lianbo Jiang <[email protected]>
CC: Takashi Iwai <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Tom Lendacky <[email protected]>
CC: Vivek Goyal <[email protected]>
CC: Yaowei Bai <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: x86-ml <[email protected]>
Link: http://lkml.kernel.org/r/153805812916.1157.177580438135143788.stgit@bhelgaas-glaptop.roam.corp.google.com
|
|
find_next_iomem_res() finds an iomem resource that covers part of a range
described by "start, end". All callers expect that range to be inclusive,
i.e., both start and end are included, but find_next_iomem_res() doesn't
handle the end address correctly.
If it finds an iomem resource that contains exactly the end address, it
skips it, e.g., if "start, end" is [0x0-0x10000] and there happens to be an
iomem resource [mem 0x10000-0x10000] (the single byte at 0x10000), we skip
it:
find_next_iomem_res(...)
{
start = 0x0;
end = 0x10000;
for (p = next_resource(...)) {
# p->start = 0x10000;
# p->end = 0x10000;
# we *should* return this resource, but this condition is false:
if ((p->end >= start) && (p->start < end))
break;
Adjust find_next_iomem_res() so it allows a resource that includes the
single byte at the end of the range. This is a corner case that we
probably don't see in practice.
Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch range fix")
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Brijesh Singh <[email protected]>
CC: Dan Williams <[email protected]>
CC: H. Peter Anvin <[email protected]>
CC: Lianbo Jiang <[email protected]>
CC: Takashi Iwai <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Tom Lendacky <[email protected]>
CC: Vivek Goyal <[email protected]>
CC: Yaowei Bai <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: x86-ml <[email protected]>
Link: http://lkml.kernel.org/r/153805812254.1157.16736368485811773752.stgit@bhelgaas-glaptop.roam.corp.google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This adds a user for the new 'bytes-remaining' updates to
memcpy_mcsafe() that you already received through Ingo via the
x86-dax- for-linus pull.
Not included here, but still targeting this cycle, is support for
handling memory media errors (poison) consumed via userspace dax
mappings.
Summary:
- DAX broke a fundamental assumption of truncate of file mapped
pages. The truncate path assumed that it is safe to disconnect a
pinned page from a file and let the filesystem reclaim the physical
block. With DAX the page is equivalent to the filesystem block.
Introduce dax_layout_busy_page() to enable filesystems to wait for
pinned DAX pages to be released. Without this wait a filesystem
could allocate blocks under active device-DMA to a new file.
- DAX arranges for the block layer to be bypassed and uses
dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
However, the memcpy_mcsafe() facility is available through the pmem
block driver. In order to safely handle media errors, via the DAX
block-layer bypass, introduce copy_to_iter_mcsafe().
- Fix cache management policy relative to the ACPI NFIT Platform
Capabilities Structure to properly elide cache flushes when they
are not necessary. The table indicates whether CPU caches are
power-fail protected. Clarify that a deep flush is always performed
on REQ_{FUA,PREFLUSH} requests"
* tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
dax: Use dax_write_cache* helpers
libnvdimm, pmem: Do not flush power-fail protected CPU caches
libnvdimm, pmem: Unconditionally deep flush on *sync
libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH
acpi, nfit: Remove ecc_unit_size
dax: dax_insert_mapping_entry always succeeds
libnvdimm, e820: Register all pmem resources
libnvdimm: Debug probe times
linvdimm, pmem: Preserve read-only setting for pmem devices
x86, nfit_test: Add unit test for memcpy_mcsafe()
pmem: Switch to copy_to_iter_mcsafe()
dax: Report bytes remaining in dax_iomap_actor()
dax: Introduce a ->copy_to_iter dax operation
uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
xfs, dax: introduce xfs_break_dax_layouts()
xfs: prepare xfs_break_layouts() for another layout type
xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
mm, fs, dax: handle layout changes to pinned dax mappings
mm: fix __gup_device_huge vs unmap
mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
...
|
|
There is currently a mismatch between the resources that will trigger
the e820_pmem driver to register/load and the resources that will
actually be surfaced as pmem ranges. register_e820_pmem() uses
walk_iomem_res_desc() which includes children and siblings. In contrast,
e820_pmem_probe() only considers top level resources. For example the
following resource tree results in the driver being loaded, but no
resources being registered:
398000000000-39bfffffffff : PCI Bus 0000:ae
39be00000000-39bf07ffffff : PCI Bus 0000:af
39be00000000-39beffffffff : 0000:af:00.0
39be10000000-39beffffffff : Persistent Memory (legacy)
Fix this up to allow definitions of "legacy" pmem ranges anywhere in
system-physical address space. Not that it is a recommended or safe to
define a pmem range in PCI space, but it is useful for debug /
experimentation, and the restriction on being a top-level resource was
arbitrary.
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
And use the root resource directly from the proc private data.
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
We've got a bug report indicating a kernel panic at booting on an x86-32
system, and it turned out to be the invalid PCI resource assigned after
reallocation. __find_resource() first aligns the resource start address
and resets the end address with start+size-1 accordingly, then checks
whether it's contained. Here the end address may overflow the integer,
although resource_contains() still returns true because the function
validates only start and end address. So this ends up with returning an
invalid resource (start > end).
There was already an attempt to cover such a problem in the commit
47ea91b4052d ("Resource: fix wrong resource window calculation"), but
this case is an overseen one.
This patch adds the validity check of the newly calculated resource for
avoiding the integer overflow problem.
Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1086739
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 23c570a67448 ("resource: ability to resize an allocated resource")
Signed-off-by: Takashi Iwai <[email protected]>
Reported-by: Michael Henders <[email protected]>
Tested-by: Michael Henders <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Ram Pai <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Merge misc updates from Andrew Morton:
- kasan updates
- procfs
- lib/bitmap updates
- other lib/ updates
- checkpatch tweaks
- rapidio
- ubsan
- pipe fixes and cleanups
- lots of other misc bits
* emailed patches from Andrew Morton <[email protected]>: (114 commits)
Documentation/sysctl/user.txt: fix typo
MAINTAINERS: update ARM/QUALCOMM SUPPORT patterns
MAINTAINERS: update various PALM patterns
MAINTAINERS: update "ARM/OXNAS platform support" patterns
MAINTAINERS: update Cortina/Gemini patterns
MAINTAINERS: remove ARM/CLKDEV SUPPORT file pattern
MAINTAINERS: remove ANDROID ION pattern
mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors
mm: docs: fix parameter names mismatch
mm: docs: fixup punctuation
pipe: read buffer limits atomically
pipe: simplify round_pipe_size()
pipe: reject F_SETPIPE_SZ with size over UINT_MAX
pipe: fix off-by-one error when checking buffer limits
pipe: actually allow root to exceed the pipe buffer limits
pipe, sysctl: remove pipe_proc_fn()
pipe, sysctl: drop 'min' parameter from pipe-max-size converter
kasan: rework Kconfig settings
crash_dump: is_kdump_kernel can be boolean
kernel/mutex: mutex_is_locked can be boolean
...
|
|
Make iomem_is_exclusive return bool due to this particular function only
using either one or zero as its return value.
No functional change.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yaowei Bai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Set resource structs inserted by __reserve_region_with_split() to have the
correct type. Setting the type doesn't fix any functional problem but
makes %pR on the resource work better.
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
When we reserve regions because the user specified a "reserve=" parameter,
set the resource type to either IORESOURCE_IO (for regions below 0x10000)
or IORESOURCE_MEM. The test for 0x10000 is just a heuristic; obviously
there can be memory below 0x10000 as well.
Improve documentation of the "reserve=" parameter.
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
In order for memory pages to be properly mapped when SEV is active, it's
necessary to use the PAGE_KERNEL protection attribute as the base
protection. This ensures that memory mapping of, e.g. ACPI tables,
receives the proper mapping attributes.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: [email protected]
Cc: Jérôme Glisse <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
In preperation for a new function that will need additional resource
information during the resource walk, update the resource walk callback to
pass the resource structure. Since the current callback start and end
arguments are pulled from the resource structure, the callback functions
can obtain them from the resource structure directly.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range()
functions each have much of the same code.
Create a new function that consolidates the common code from these
functions in one place to reduce the amount of duplicated code.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Cc: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
In commit c4004b02f8e5b ("x86: remove the kernel code/data/bss resources
from /proc/iomem") I was hoping to remove the phyiscal kernel address
data from /proc/iomem entirely, but that had to be reverted because some
system programs actually use it.
This limits all the detailed resource information to properly
credentialed users instead.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
insert_resource() and remove_resouce() are called by producers
of resources, such as FW modules and bus drivers. These modules
may be implemented as loadable modules.
Export insert_resource() and remove_resouce() so that they can
be called from such modules.
link: https://lkml.org/lkml/2016/3/8/872
Signed-off-by: Toshi Kani <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
insert_resource() and insert_resource_conflict() are called
by resource producers to insert a new resource. When there
is any conflict, they move conflicting resources down to the
children of the new resource. There is no destructor of these
interfaces, however.
Add remove_resource(), which removes a resource previously
inserted by insert_resource() or insert_resource_conflict(),
and moves the children up to where they were before.
__release_resource() is changed to have @release_child, so
that this function can be used for remove_resource() as well.
Also add comments to clarify that these functions are intended
for producers of resources to avoid any confusion with
request/release_resource() for consumers.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
__request_region() sets 'flags' of a new resource from @parent
as it inherits the parent's attribute. When a target resource
has a conflict, this function inserts the new resource entry
under the conflicted entry by updating @parent. In this case,
the new resource entry needs to inherit attribute from the updated
parent. This conflict is a typical case since __request_region()
is used to allocate a new resource from a specific resource range.
For instance, request_mem_region() calls __request_region() with
@parent set to &iomem_resource, which is the root entry of the
whole iomem range. When this request results in inserting a new
entry "DEV-A" under "BUS-1", "DEV-A" needs to inherit from the
immediate parent "BUS-1" as it holds specific attribute for the
range.
root (&iomem_resource)
:
+ "BUS-1"
+ "DEV-A"
Change __request_region() to set 'flags' and 'desc' of a new entry
from the immediate parent.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
In __request_region, if a conflict with a BUSY and MUXED resource is
detected, then the caller goes to sleep and waits for the resource to be
released. A pointer on the conflicting resource is kept. At wake-up
this pointer is used as a parent to retry to request the region.
A first problem is that this pointer might well be invalid (if for
example the conflicting resource have already been freed). Another
problem is that the next call to __request_region() fails to detect a
remaining conflict. The previously conflicting resource is passed as a
parameter and __request_region() will look for a conflict among the
children of this resource and not at the resource itself. It is likely
to succeed anyway, even if there is still a conflict.
Instead, the parent of the conflicting resource should be passed to
__request_region().
As a fix, this patch doesn't update the parent resource pointer in the
case we have to wait for a muxed region right after.
Reported-and-tested-by: Vincent Pelletier <[email protected]>
Signed-off-by: Simon Guinot <[email protected]>
Tested-by: Vincent Donnefort <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
walk_iomem_res_desc() replaced walk_iomem_res() and there is no
caller to walk_iomem_res() any more. Kill it. Also remove @name
from find_next_iomem_res() as it is no longer used.
Signed-off-by: Toshi Kani <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Dave Young <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Hanjun Guo <[email protected]>
Cc: Jakub Sitnicki <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: [email protected]
Cc: linux-mm <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add a new interface, walk_iomem_res_desc(), which walks through
the iomem table by identifying a target with @flags and @desc.
This interface provides the same functionality as
walk_iomem_res(), but does not use strcmp() to @name for better
efficiency.
walk_iomem_res() is deprecated and will be removed in a later
patch.
Requested-by: Borislav Petkov <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
[ Fixup comments. ]
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Hanjun Guo <[email protected]>
Cc: Jakub Sitnicki <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: [email protected]
Cc: linux-mm <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Change region_intersects() to identify a target with @flags and
@desc, instead of @name with strcmp().
Change the callers of region_intersects(), memremap() and
devm_memremap(), to set IORESOURCE_SYSTEM_RAM in @flags and
IORES_DESC_NONE in @desc when searching System RAM.
Also, export region_intersects() so that the ACPI EINJ error
injection driver can call this function in a later patch.
Signed-off-by: Toshi Kani <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jakub Sitnicki <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: [email protected]
Cc: linux-mm <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that all System RAM resource entries have been initialized
to IORESOURCE_SYSTEM_RAM type, change walk_system_ram_res() and
walk_system_ram_range() to call find_next_iomem_res() by setting
@res.flags to IORESOURCE_SYSTEM_RAM and @name to NULL. With this
change, they walk through the iomem table to find System RAM
ranges without the need to do strcmp() on the resource names.
No functional change is made to the interfaces.
Signed-off-by: Toshi Kani <[email protected]>
[ Boris: fixup comments. ]
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jakub Sitnicki <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: [email protected]
Cc: linux-mm <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
walk_iomem_res() and region_intersects() still need to use
strcmp() for searching a resource entry by @name in the iomem
table.
This patch introduces I/O resource descriptor 'desc' in struct
resource for the iomem search interfaces. Drivers can assign
their unique descriptor to a range when they support the search
interfaces.
Otherwise, 'desc' is set to IORES_DESC_NONE (0). This avoids
changing most of the drivers as they typically allocate resource
entries statically, or by calling alloc_resource(), kzalloc(),
or alloc_bootmem_low(), which set the field to zero by default.
A later patch will address some drivers that use kmalloc()
without zero'ing the field.
Also change release_mem_region_adjustable() to set 'desc' when
its resource entry gets separated. Other resource interfaces are
also changed to initialize 'desc' explicitly although
alloc_resource() sets it to 0.
Signed-off-by: Toshi Kani <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jakub Sitnicki <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: [email protected]
Cc: linux-mm <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
I/O resource flags consist of I/O resource types and modifier
bits. Therefore, checking an I/O resource type in 'flags' must
be performed with a bitwise operation.
Fix find_next_iomem_res() and region_intersects() that simply
compare 'flags' against a given value.
Also change __request_region() to set 'res->flags' from
resource_type() and resource_ext_type() of the parent, so that
children nodes will inherit the extended I/O resource type.
Signed-off-by: Toshi Kani <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jakub Sitnicki <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: [email protected]
Cc: linux-mm <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE
semantics by default. If userspace really believes it is safe to access
the memory region it can also perform the extra step of disabling an
active driver. This protects device address ranges with read side
effects and otherwise directs userspace to use the driver.
Persistent memory presents a large "mistake surface" to /dev/mem as now
accidental writes can corrupt a filesystem.
In general if a device driver is busily using a memory region it already
informs other parts of the kernel to not touch it via
request_mem_region(). /dev/mem should honor the same safety restriction
by default. Debugging a device driver from userspace becomes more
difficult with this enabled. Any application using /dev/mem or mmap of
sysfs pci resources will now need to perform the extra step of either:
1/ Disabling the driver, for example:
echo <device id> > /dev/bus/<parent bus>/drivers/<driver name>/unbind
2/ Rebooting with "iomem=relaxed" on the command line
3/ Recompiling with CONFIG_IO_STRICT_DEVMEM=n
Traditional users of /dev/mem like dosemu are unaffected because the
first 1MB of memory is not subject to the IO_STRICT_DEVMEM restriction.
Legacy X configurations use /dev/mem to talk to graphics hardware, but
that functionality has since moved to kernel graphics drivers.
Cc: Arnd Bergmann <[email protected]>
Cc: Russell King <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
region_is_ram() is used to prevent the establishment of aliased mappings
to physical "System RAM" with incompatible cache settings. However, it
uses "-1" to indicate both "unknown" memory ranges (ranges not described
by platform firmware) and "mixed" ranges (where the parameters describe
a range that partially overlaps "System RAM").
Fix this up by explicitly tracking the "unknown" vs "mixed" resource
cases and returning REGION_INTERSECTS, REGION_MIXED, or REGION_DISJOINT.
This re-write also adds support for detecting when the requested region
completely eclipses all of a resource. Note, the implementation treats
overlaps between "unknown" and the requested memory type as
REGION_INTERSECTS.
Finally, other memory types can be passed in by name, for now the only
usage "System RAM".
Suggested-by: Luis R. Rodriguez <[email protected]>
Reviewed-by: Toshi Kani <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|