Age | Commit message (Collapse) | Author | Files | Lines |
|
rio_mport_add_riodev() misses to call put_device() when the device already
exists. Add the missed function call to fix it.
Fixes: e8de370188d0 ("rapidio: add mport char device driver")
Signed-off-by: Jing Xiangfeng <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Dan Carpenter <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Madhuparna Bhowmik <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
rio_dma_transfer() attempts to clamp the return value of
pin_user_pages_fast() to be >= 0. However, the attempt fails because
nr_pages is overridden a few lines later, and restored to the undesirable
-ERRNO value.
The return value is ultimately stored in nr_pages, which in turn is passed
to unpin_user_pages(), which expects nr_pages >= 0, else, disaster.
Fix this by fixing the nesting of the assignment to nr_pages: nr_pages
should be clamped to zero if pin_user_pages_fast() returns -ERRNO, or set
to the return value of pin_user_pages_fast(), otherwise.
[[email protected]: new changelog]
Fixes: e8de370188d09 ("rapidio: add mport char device driver")
Signed-off-by: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Madhuparna Bhowmik <[email protected]>
Cc: Dan Carpenter <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The preceding patches have ensured that core dumping properly takes the
mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
its users.
Signed-off-by: Jann Horn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If we fail to decompress in zram it's a pretty serious problem. We were
entrusted to be able to decompress the old data but we failed. Either
we've got some crazy bug in the compression code or we've got memory
corruption.
At the moment, when this happens the log looks like this:
ERR kernel: [ 1833.099861] zram: Decompression failed! err=-22, page=336112
ERR kernel: [ 1833.099881] zram: Decompression failed! err=-22, page=336112
ALERT kernel: [ 1833.099886] Read-error on swap-device (253:0:2688896)
It is true that we have an "ALERT" level log in there, but (at least to
me) it feels like even this isn't enough to impart the seriousness of this
error. Let's convert to a WARN_ON. Note that WARN_ON is automatically
"unlikely" so we can simply replace the old annotation with the new one.
Signed-off-by: Douglas Anderson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Sonny Rao <[email protected]>
Cc: Jens Axboe <[email protected]>
Link: https://lkml.kernel.org/r/20200917174059.1.If09c882545dbe432268f7a67a4d4cfcb6caace4f@changeid
Signed-off-by: Linus Torvalds <[email protected]>
|
|
At boot time, or when doing memory hot-add operations, if the links in
sysfs can't be created, the system is still able to run, so just report
the error in the kernel log rather than BUG_ON and potentially make system
unusable because the callpath can be called with locks held.
Since the number of memory blocks managed could be high, the messages are
rate limited.
As a consequence, link_mem_sections() has no status to report anymore.
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: "Rafael J . Wysocki" <[email protected]>
Cc: Scott Cheloha <[email protected]>
Cc: Tony Luck <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's try to merge system ram resources we add, to minimize the number of
resources in /proc/iomem. We don't care about the boundaries of
individual chunks we added.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Julien Grall <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Leonardo Bras <[email protected]>
Cc: Libor Pechacek <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: "Oliver O'Halloran" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Roger Pau Monné <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vishal Verma <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's try to merge system ram resources we add, to minimize the number of
resources in /proc/iomem. We don't care about the boundaries of
individual chunks we added.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Roger Pau Monné <[email protected]>
Cc: Julien Grall <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Leonardo Bras <[email protected]>
Cc: Libor Pechacek <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: "Oliver O'Halloran" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Wei Liu <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
virtio-mem adds memory in memory block granularity, to be able to remove
it in the same granularity again later, and to grow slowly on demand.
This, however, results in quite a lot of resources when adding a lot of
memory. Resources are effectively stored in a list-based tree. Having a
lot of resources not only wastes memory, it also makes traversing that
tree more expensive, and makes /proc/iomem explode in size (e.g.,
requiring kexec-tools to manually merge resources later when e.g., trying
to create a kdump header).
Before this patch, we get (/proc/iomem) when hotplugging 2G via virtio-mem
on x86-64:
[...]
100000000-13fffffff : System RAM
140000000-33fffffff : virtio0
140000000-147ffffff : System RAM (virtio_mem)
148000000-14fffffff : System RAM (virtio_mem)
150000000-157ffffff : System RAM (virtio_mem)
158000000-15fffffff : System RAM (virtio_mem)
160000000-167ffffff : System RAM (virtio_mem)
168000000-16fffffff : System RAM (virtio_mem)
170000000-177ffffff : System RAM (virtio_mem)
178000000-17fffffff : System RAM (virtio_mem)
180000000-187ffffff : System RAM (virtio_mem)
188000000-18fffffff : System RAM (virtio_mem)
190000000-197ffffff : System RAM (virtio_mem)
198000000-19fffffff : System RAM (virtio_mem)
1a0000000-1a7ffffff : System RAM (virtio_mem)
1a8000000-1afffffff : System RAM (virtio_mem)
1b0000000-1b7ffffff : System RAM (virtio_mem)
1b8000000-1bfffffff : System RAM (virtio_mem)
3280000000-32ffffffff : PCI Bus 0000:00
With this patch, we get (/proc/iomem):
[...]
fffc0000-ffffffff : Reserved
100000000-13fffffff : System RAM
140000000-33fffffff : virtio0
140000000-1bfffffff : System RAM (virtio_mem)
3280000000-32ffffffff : PCI Bus 0000:00
Of course, with more hotplugged memory, it gets worse. When unplugging
memory blocks again, try_remove_memory() (via offline_and_remove_memory())
will properly split the resource up again.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Julien Grall <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Leonardo Bras <[email protected]>
Cc: Libor Pechacek <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: "Oliver O'Halloran" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Roger Pau Monné <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Wei Liu <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We soon want to pass flags, e.g., to mark added System RAM resources.
mergeable. Prepare for that.
This patch is based on a similar patch by Oscar Salvador:
https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Juergen Gross <[email protected]> # Xen related part
Reviewed-by: Pankaj Gupta <[email protected]>
Acked-by: Wei Liu <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: "Oliver O'Halloran" <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Libor Pechacek <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Leonardo Bras <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Julien Grall <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Roger Pau Monné <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Wei Yang <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The conversion to request_mem_region() is broken because it assumes that
the range is marked busy prior to release. However, due to the way that
the kmem driver manipulates the IORESOURCE_BUSY flag (clears it to let
{add,remove}_memory() handle busy) it requires a manual release_resource()
to perform cleanup.
Given that the actual 'struct resource *' needs to be recalled, not just
the range, add that tracking to the kmem driver-data.
Fixes: 0513bd5bb114 ("device-dax/kmem: replace release_resource() with release_mem_region()")
Reported-by: David Hildenbrand <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Link: https://lkml.kernel.org/r/160272252925.3136502.17220638073995895400.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
ucma_free_ctx() should call to __destroy_id() on all the connection requests
that have not been delivered to user space. Currently it calls on the
context itself and cause to use after free.
Fixes the trace:
BUG: Unable to handle kernel data access on write at 0x5deadbeef0000108
Faulting instruction address: 0xc0080000002428f4
Oops: Kernel access of bad area, sig: 11 [#1]
Call Trace:
[c000000207f2b680] [c00800000024280c] .__destroy_id+0x28c/0x610 [rdma_ucm] (unreliable)
[c000000207f2b750] [c0080000002429c4] .__destroy_id+0x444/0x610 [rdma_ucm]
[c000000207f2b820] [c008000000242c24] .ucma_close+0x94/0xf0 [rdma_ucm]
[c000000207f2b8c0] [c00000000046fbdc] .__fput+0xac/0x330
[c000000207f2b960] [c00000000015d48c] .task_work_run+0xbc/0x110
[c000000207f2b9f0] [c00000000012fb00] .do_exit+0x430/0xc50
[c000000207f2bae0] [c0000000001303ec] .do_group_exit+0x5c/0xd0
[c000000207f2bb70] [c000000000144a34] .get_signal+0x194/0xe30
[c000000207f2bc60] [c00000000001f6b4] .do_notify_resume+0x124/0x470
[c000000207f2bd60] [c000000000032484] .interrupt_exit_user_prepare+0x1b4/0x240
[c000000207f2be20] [c000000000010034] interrupt_return+0x14/0x1c0
Rename listen_ctx to conn_req_ctx as the poor name was the cause of this
bug.
Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
If skb_clone() is unable to allocate memory for a new sk_buff this is not
detected by the current code.
Check for a NULL return and continue. This is similar to other errors in
this loop over QPs attached to the multicast address and consistent with
the unreliable UD transport.
Fixes: e7ec96fc7932f ("RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()")
Addresses-Coverity-ID: 1497804: Null pointer dereferences (NULL_RETURNS)
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
RXE was wrongly using an internal kernel enum as part of its uAPI, split
this out into a dedicated uAPI enum just for RXE. It only uses the IPv4
and IPv6 values.
This was exposed by changing the internal kernel enum definition which
broke RXE.
Fixes: 1c15b4f2a42f ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type")
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The code in setup_dma_device has become rather convoluted, move all of
this to the drivers. Drives now pass in a DMA capable struct device which
will be used to setup DMA, or drivers must fully configure the ibdev for
DMA and pass in NULL.
Other than setting the masks in rvt all drivers were doing this already
anyhow.
mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for
DMA based on their hardweare limits in:
__mthca_init_one()
dma_set_max_seg_size (1G)
__mlx4_init_one()
dma_set_max_seg_size (1G)
mlx5_pci_init()
set_dma_caps()
dma_set_max_seg_size (2G)
Other non software drivers (except usnic) were extended to UINT_MAX [1, 2]
instead of 2G as was before.
[1] https://lore.kernel.org/linux-rdma/[email protected]/
[2] https://lore.kernel.org/linux-rdma/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
intel_dp_ycbcr420_config() is rather pointless. Just inline it
directly into intel_dp_compute_config(). This gets rid of the
ugly double assignment of output_format.
Not really sure what the best policy would be when the user
supplies a mode classified by the display as "YCbCr 4:2:0
only", but we know that we can't do YCbCr 4:2:0 output. For
now keep the current behaviour of just silently upgrade
it to RGB 4:4:4.
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>
|
|
Remove the lspcon special case from intel_dp_compute_config() and
just treat it like any other DFP than can do 4:4:4->4:2:0 conversion.
The only difference between the two codepaths was that the lspcon
code tried to already halve port_clock. That was just total nonsense
as we hadn't even computed the base port_clock at that time.
All that stuff happens intel_dp_compute_link_config*() and it
already takes care of the 4:2:0 clock reduction.
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>
|
|
crtc_state->lspcon_downsampling isn't particularly useful at
the moment since we can't even do proper readout for it.
Let's get rid of it. Will help with unifying the LSPCON with
the regular DFP YCbCr output support.
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>
|
|
during fbdev init
Currently we leave the cache_level of the initial fb obj
set to NONE. This means on eLLC machines the first pin_to_display()
will try to switch it to WT which requires a vma unbind+bind.
If that happens during the fbdev initialization rcu does not
seem operational which causes the unbind to get stuck. To
most appearances this looks like a dead machine on boot.
Avoid the unbind by already marking the object cache_level
as WT when creating it. We still do an excplicit ggtt pin
which will rewrite the PTEs anyway, so they will match whatever
cache level we set.
Cc: <[email protected]> # v5.7+
Suggested-by: Chris Wilson <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Chris Wilson <[email protected]>
|
|
WAC6entrylatency is trying to fix excessive rc6 entry latency caused
by the extra delay from FBC_LLC_READ_CTRL, which is there for some
extra sync with uncore for frame buffer caching in LLC.
Reading through the hsd the recommendation was to set the FBC_LLC_FULLY_OPEN
bit to disable this extra delay entirely. This can be done whenever fb LLC
caching is not used. The alternative suggestion was to reduce the delay to
eg. 0x5 via updated BIOS programming instructions. But all the kbl/cfl
machines I've seen still have the default 0xff programmed. As we never use
fb LLC caching let's just apply the w/a to all skl derivatives to get
consistent rc6 latencies.
I was able to measure the effect of FBC_LLC_READ_CTRL to rc6 latency
via forcewake. Here's a graph of some of the results:
sleep;fw_req=1;wait fw_ack==1;sleep;fw_req=0;wait fw_ack==0
fw_ack==1 duration
160us +----------------------------------------------------------------+
| + + $$+ + + |
| $$ $ $ ******$$ ** $ $**$* #########$$######|
140us |-$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$*$$$$$$$$$$$$$$$$ $$$$$$|
| $ * # |
| $ * # |
120us |$+ * # +-|
|$ * # |
|$ * # # |
100us |$+ ************######################## +-|
|$ * *# |
|$ ***** ######### |
80us |$+ * # #### ## +-|
|$ **** ### # # |
| ** #### FBC_LLC_READ_CTRL: 0x8000 ******* |
60us |-###### FBC_LLC_READ_CTRL: 0xffff #######-|
|## + + FBC_LLC_READ_CTRL: 0x400000ff $$$$$$$ |
+----------------------------------------------------------------+
0ms 10ms 20ms 30ms 40ms 50ms 60ms
sleep duration
The default FBC_LLC_READ_CTRL value of 0xff is documented to give us
a 170usec delay. That tracks well with the knees at 0xffff->~44msec and
0x8000->~22msec we see in the graph.
We can see that if we sleep longer than the FBC_LLC_READ_CTRL delay
we always observe the full (~145usec) rc6 wakeup latency. But if we sleep
for less than the FBC_LLC_READ_CTRL delay we see a quicker fw wakeup,
presumably due the hardware not having yet entered rc6 fully.
The other plateaus in the graph I suspect correspond to some shallower
internal rc states.
v2: s/usec/msec/ typo in commit msg
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Chris Wilson <[email protected]>
|
|
The avs drivers are all SoC specific drivers that doesn't share any code.
Instead they are located in a directory, mostly to keep similar
functionality together. From a maintenance point of view, it makes better
sense to collect SoC specific drivers like these, into the SoC specific
directories.
Therefore, let's move the smartreflex driver for OMAP to the ti directory.
Signed-off-by: Ulf Hansson <[email protected]>
Reviewed-by: Nishanth Menon <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
The avs drivers are all SoC specific drivers that doesn't share any code.
Instead they are located in a directory, mostly to keep similar
functionality together. From a maintenance point of view, it makes better
sense to collect SoC specific drivers like these, into the SoC specific
directories.
Therefore, let's move the rockchip-io driver to the rockchip directory.
Signed-off-by: Ulf Hansson <[email protected]>
Acked-by: Heiko Stuebner <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Previously we computed L1.2 parameters in the enumeration path, saved them
in struct pcie_link_state.l1ss, and programmed them into the devices
whenever we enabled or disabled L1.2 on the link. But these parameters are
constant and don't need to be updated when enabling/disabling L1.2.
Compute and program the L1.2 parameters once during enumeration and remove
the struct pcie_link_state.l1ss member. No functional change intended.
[bhelgaas: rework to program L1.2 parameters during enumeration]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Saheed O. Bolarinwa <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Previously we stored the L1SS Capabilities value in the struct
aspm_register_info.
We only need this information in one place, so read it there and remove
struct aspm_register_info completely, since it's now empty. No functional
change intended.
[bhelgaas: split up, don't cache l1ss_cap in pci_dev]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Saheed O. Bolarinwa <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
aspm_calc_l1ss_info() needs only the L1SS Capabilities. It doesn't need
anything else from struct aspm_register_info, so pass only the Capabilities
value. No functional change intended.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Previously we stored the L1SS Control 1 register in the struct
aspm_register_info.
We only need this information in one place, so read it there and remove it
from struct aspm_register_info. No functional change intended.
[bhelgaas: split ctl1/ctl2]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Saheed O. Bolarinwa <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
We never use the aspm_register_info.l1ss_ctl2 value, so remove it. No
functional change intended.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Save the L1 Substates Capability pointer in struct pci_dev. Then we don't
have to keep track of it in the struct aspm_register_info and struct
pcie_link_state, which makes the code easier to read. No functional change
intended.
[bhelgaas: split to a separate patch]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Saheed O. Bolarinwa <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Previously we stored L0s and L1 Exit Latency information from the Link
Capabilities register in the struct aspm_register_info.
We only need these latencies when we already have the Link Capabilities
values, so use those directly and remove the latencies from struct
aspm_register_info. No functional change intended.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Saheed O. Bolarinwa <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Previously we stored the "ASPM Control" bits from the Link Control register
in the struct aspm_register_info.
Read PCI_EXP_LNKCTL directly when needed. This means we can use the
PCI_EXP_LNKCTL_ASPM_* bits directly instead of the similar but different
PCIE_LINK_STATE_* bits. No functional change intended.
[bhelgaas: drop get_aspm_enable() and read LNKCTL once directly]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Saheed O. Bolarinwa <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Previously we stored the "ASPM Support" field from the Link Capabilities
register in the struct aspm_register_info.
Read the Link Capabilities directly when needed and remove it from the
struct aspm_register_info. No functional change intended.
[bhelgaas: remove pci_dev cached copy since LNKCAP isn't truly read-only,
add PCI_EXP_LNKCAP_ASPM_L0S & PCI_EXP_LNKCAP_ASPM_L1, check them directly
instead of adding aspm_support()]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Saheed O. Bolarinwa <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Other users of link->pdev and link->downstream, e.g., pcie_aspm_cap_init(),
pcie_config_aspm_l1ss(), and pcie_config_aspm_link(), use "parent" and
"child" as local names.
Do the same in aspm_calc_l1ss_info() for readability. No functional change
intended.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
pcie_get_aspm_reg() mostly reads ASPM-related registers, but in some cases
it also updates the value read from PCI_L1SS_CAP based on LTR properties.
Move this update to the point where the value is used to make the code more
readable.
No functional change intended, although previously we could clear
PCI_L1SS_CAP_ASPM_L1_2 for both ends of the link, and now we'll only do it
for the downstream end of a link. This shouldn't matter because we always
test that bit by ANDing l1ss_cap for the upstream and downstream ends.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Move pci_clear_and_set_dword() earlier in file to prepare for future patch.
No functional change intended.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Add a Kconfig menu for Intel DPTF (Dynamic Platform and Thermal
Framework), put both the existing participant drivers in it and set
them to be built as modules by default.
While at it, do a few assorted cleanups for a good measure.
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Srinivas Pandruvada <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
|
|
Change the names of DPTF participant drivers to adhere to the
sysfs file naming conventions (no spaces present in the name in
particular).
Fixes: 2ce6324eadb0 ("ACPI: DPTF: Add PCH FIVR participant driver")
Fixes: 6256ebd5daf9 ("ACPI / DPTF: Add DPTF power participant driver")
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Srinivas Pandruvada <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
|
|
ACPI 6.3 Errata A no longer allows _UID to return a string except for
Itanium (for historical reasons) as stated in section 5.2.12:
"From ACPI Specification 6.3 onward, all processor objects for all
architectures except Itanium must now use Device() objects with an
_HID of ACPI0007, and use only integer _UID values."
Therefore, the "we don't handle string _UIDs yet" comment, which
implies a missing feature, is redundant, so drop it.
Signed-off-by: Alex Hung <[email protected]>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
According to the ACPI spec, "The system must reset immediately following
the write to the ACPI RESET_REG register.", but there are cases that the
system does not follow this and results in racing with the subsequetial
reboot mechanism, which brings unexpected behavior.
Fix this by adding a 15ms delay after writing to the ACPI RESET_REG.
Reported-by: Ghorai, Sukumar <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
[ rjw: Edit comment in the code and subject ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
If ACPI is disabled then loading the acpi_dbg module will result in the
following splat when lock debugging is enabled.
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 0 PID: 1 at kernel/locking/mutex.c:938 __mutex_lock+0xa10/0x1290
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc8+ #103
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x4d8
show_stack+0x34/0x48
dump_stack+0x174/0x1f8
panic+0x360/0x7a0
__warn+0x244/0x2ec
report_bug+0x240/0x398
bug_handler+0x50/0xc0
call_break_hook+0x160/0x1d8
brk_handler+0x30/0xc0
do_debug_exception+0x184/0x340
el1_dbg+0x48/0xb0
el1_sync_handler+0x170/0x1c8
el1_sync+0x80/0x100
__mutex_lock+0xa10/0x1290
mutex_lock_nested+0x6c/0xc0
acpi_register_debugger+0x40/0x88
acpi_aml_init+0xc4/0x114
do_one_initcall+0x24c/0xb10
kernel_init_freeable+0x690/0x728
kernel_init+0x20/0x1e8
ret_from_fork+0x10/0x18
This is because acpi_debugger.lock has not been initialized as
acpi_debugger_init() is not called when ACPI is disabled. Fail module
loading to avoid this and any subsequent problems that might arise by
trying to debug AML when ACPI is disabled.
Fixes: 8cfb0cdf07e2 ("ACPI / debugger: Add IO interface to access debugger functionalities")
Reviewed-by: Hanjun Guo <[email protected]>
Signed-off-by: Jamie Iles <[email protected]>
Cc: 4.10+ <[email protected]> # 4.10+
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
To enable better debug of PM domains, keep a track of successful
and failing attempts to enter each domain idle state.
This statistics are exported in debugfs when reading the
idle_states node associated with each PM domain.
Signed-off-by: Lina Iyer <[email protected]>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
A recent bspec update has provided a new cdclk table for RKL. All of
the cdclk values are the same as those we've been using on ICL, TGL,
etc., but we obtain them by doubling both the PLL ratio and CD2X divider
numbers.
Bspec: 49202
Cc: Anusha Srivatsa <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Anusha Srivatsa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
There is not strict need to group a comment and a single statement in an
if block, as comments are stripped by the pre-processor. However,
adding curly braces does make the code easier to read, and may avoid
mistakes when changing the code later.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Ulf Hansson <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
From Maor Gottlieb says:
====================
This series extends __sg_alloc_table_from_pages to allow chaining of new
pages to an already initialized SG table.
This allows for drivers to utilize the optimization of merging contiguous
pages without a need to pre allocate all the pages and hold them in a very
large temporary buffer prior to the call to SG table initialization.
The last patch changes the Infiniband core to use the new API. It removes
duplicate functionality from the code and benefits from the optimization
of allocating dynamic SG table from pages.
In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
====================
* branch 'dynamic_sg':
RDMA/umem: Move to allocate SG table from pages
lib/scatterlist: Add support in dynamic allocation of SG table from pages
tools/testing/scatterlist: Show errors in human readable form
tools/testing/scatterlist: Rejuvenate bit-rotten test
|
|
A device may have specific HW constraints that must be obeyed to, before
its corresponding PM domain (genpd) can be powered off - and vice verse at
power on. These constraints can't be managed through the regular runtime PM
based deployment for a device, because the access pattern for it, isn't
always request based. In other words, using the runtime PM callbacks to
deal with the constraints doesn't work for these cases.
For these reasons, let's instead add a PM domain power on/off notification
mechanism to genpd. To add/remove a notifier for a device, the device must
already have been attached to the genpd, which also means that it needs to
be a part of the PM domain topology.
To add/remove a notifier, let's introduce two genpd specific functions:
- dev_pm_genpd_add|remove_notifier()
Note that, to further clarify when genpd power on/off notifiers may be
used, one can compare with the existing CPU_CLUSTER_PM_ENTER|EXIT
notifiers. In the long run, the genpd power on/off notifiers should be able
to replace them, but that requires additional genpd based platform support
for the current users.
Signed-off-by: Ulf Hansson <[email protected]>
Tested-by: Lina Iyer <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
domain
On multi-package systems, the Psys MSR is only valid for CPUs on
specific package (master package). The current code makes the
assumption that package 0 is the master package, but this is not
true on new platforms like SPR.
Fix the problem by emuerating the Psys RAPL domain for every
package, so CPUs in slave packages will read 0 for the Psys energy
counter and only CPUs in master packages can get a valid reading
and register the Psys RAPL domain.
The sysfs I/F for the Psys RAPL domain is not changed.
Signed-off-by: Zhang Rui <[email protected]>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
As only the low 32 bits of the RAPL_DOMAIN_REG_STATUS register
represents the energy counter, and the high 32 bits are reserved,
detect the existence of a RAPL domain by checking the low 32 bits only.
Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Before commit 9495b7e92f716ab2 ("driver core: platform: Initialize
dma_parms for platform devices"), the R-Car SATA device didn't have DMA
parameters. Hence the DMA boundary mask supplied by its driver was
silently ignored, as __scsi_init_queue() doesn't check the return value
of dma_set_seg_boundary(), and the default value of 0xffffffff was used.
Now the device has gained DMA parameters, the driver-supplied value is
used, and the following warning is printed on Salvator-XS:
DMA-API: sata_rcar ee300000.sata: mapping sg segment across boundary [start=0x00000000ffffe000] [end=0x00000000ffffefff] [boundary=0x000000001ffffffe]
WARNING: CPU: 5 PID: 38 at kernel/dma/debug.c:1233 debug_dma_map_sg+0x298/0x300
(the range of start/end values depend on whether IOMMU support is
enabled or not)
The issue here is that SATA_RCAR_DMA_BOUNDARY doesn't have bit 0 set, so
any typical end value, which is odd, will trigger the check.
Fix this by increasing the DMA boundary value by 1.
This also fixes the following WRITE DMA EXT timeout issue:
# dd if=/dev/urandom of=/mnt/de1/file1-1024M bs=1M count=1024
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: failed command: WRITE DMA EXT
ata1.00: cmd 35/00:00:00:e6:0c/00:0a:00:00:00/e0 tag 0 dma 1310720 out
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
as seen by Shimoda-san since commit 429120f3df2dba2b ("block: fix
splitting segments on boundary masks").
Fixes: 8bfbeed58665dbbf ("sata_rcar: correct 'sata_rcar_sht'")
Fixes: 9495b7e92f716ab2 ("driver core: platform: Initialize dma_parms for platform devices")
Fixes: 429120f3df2dba2b ("block: fix splitting segments on boundary masks")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Lad Prabhakar <[email protected]>
Tested-by: Yoshihiro Shimoda <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Sergei Shtylyov <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
There is an off-by-one array check that can lead to a out-of-bounds
write to devices->info[i]. Fix this by checking by using >= rather
than > for the size check. Also replace hard-coded array size limit
with ARRAY_SIZE on the array.
Addresses-Coverity: ("Out-of-bounds write")
Fixes: cd9e9808d18f ("lightnvm: Support for Open-Channel SSDs")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
e6d4f08a6776 ("intel_idle: Use ACPI _CST on server systems") avoids
enabling c-states that have been disabled by the platform with the
exception of C1E.
Unfortunately, BIOS implementations are not always consistent in terms
of how capabilities are advertised and control cannot always be handed
over. If control cannot be handed over then intel_idle reports that "ACPI
_CST not found or not usable" but does not clear acpi_state_table.count
meaning the information is still partially used.
This patch ignores ACPI information if CST control cannot be requested from
the platform. This was only observed on a number of Haswell platforms that
had identical CPUs but not identical BIOS versions. While this problem
may be rare overall, 24 separate test cases bisected to this specific
commit across 4 separate test machines and is worth addressing. If the
situation occurs, the kernel behaves as it did before commit e6d4f08a6776
and uses any c-states that are discovered.
The affected test cases were all ones that involved a small number of
processes -- exec microbenchmark, pipe microbenchmark, git test suite,
netperf, tbench with one client and system call microbenchmark. Each
case benefits from being able to use turboboost which is prevented if the
lower c-states are unavailable. This may mask real regressions specific
to older hardware so it is worth addressing.
C-state status before and after the patch
5.9.0-vanilla POLL latency:0 disabled:0 default:enabled
5.9.0-vanilla C1 latency:2 disabled:0 default:enabled
5.9.0-vanilla C1E latency:10 disabled:0 default:enabled
5.9.0-vanilla C3 latency:33 disabled:1 default:disabled
5.9.0-vanilla C6 latency:133 disabled:1 default:disabled
5.9.0-ignore-cst-v1r1 POLL latency:0 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C1 latency:2 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C1E latency:10 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C3 latency:33 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C6 latency:133 disabled:0 default:enabled
Patch enables C3/C6.
Netperf UDP_STREAM
netperf-udp
5.5.0 5.9.0
vanilla ignore-cst-v1r1
Hmean send-64 193.41 ( 0.00%) 226.54 * 17.13%*
Hmean send-128 392.16 ( 0.00%) 450.54 * 14.89%*
Hmean send-256 769.94 ( 0.00%) 881.85 * 14.53%*
Hmean send-1024 2994.21 ( 0.00%) 3468.95 * 15.85%*
Hmean send-2048 5725.60 ( 0.00%) 6628.99 * 15.78%*
Hmean send-3312 8468.36 ( 0.00%) 10288.02 * 21.49%*
Hmean send-4096 10135.46 ( 0.00%) 12387.57 * 22.22%*
Hmean send-8192 17142.07 ( 0.00%) 19748.11 * 15.20%*
Hmean send-16384 28539.71 ( 0.00%) 30084.45 * 5.41%*
Fixes: e6d4f08a6776 ("intel_idle: Use ACPI _CST on server systems")
Signed-off-by: Mel Gorman <[email protected]>
Cc: 5.6+ <[email protected]> # 5.6+
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Intel SDM does not explicitly say that entering a C-state via MWAIT will
implicitly flush CPU caches as appropriate for that C-state. However,
documentation for individual Intel CPU generations does mention this
behavior.
Since intel_idle binds to any Intel CPU with MWAIT, list this assumption
of MWAIT behavior.
In passing, reword opening comment to make it clear that the driver can
load on any old and future Intel CPU with MWAIT.
Signed-off-by: Alexander Monakov <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
driver
There is a corner case that if the intel_pstate driver fails to be
registered (might be due to invalid MSR access) and acpi_cpufreq
takse over, the intel_pstate sysfs interface is still populated
which may confuse user space (turbostat for example):
grep . /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
acpi-cpufreq
grep . /sys/devices/system/cpu/intel_pstate/*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:0
/sys/devices/system/cpu/intel_pstate/min_perf_pct:0
grep: /sys/devices/system/cpu/intel_pstate/no_turbo: Resource temporarily unavailable
grep: /sys/devices/system/cpu/intel_pstate/num_pstates: Resource temporarily unavailable
/sys/devices/system/cpu/intel_pstate/status:off
grep: /sys/devices/system/cpu/intel_pstate/turbo_pct: Resource temporarily unavailable
The mere presence of the intel_pstate sysfs interface does not mean
that intel_pstate is in use (for example, echo "off" to "status"),
but it should not be created in the failing case.
Fix this issue by deleting the intel_pstate sysfs if the driver
registration fails.
Reported-by: Wendy Wang <[email protected]>
Suggested-by: Zhang Rui <[email protected]>
Signed-off-by: Chen Yu <[email protected]>
Acked-by: Srinivas Pandruvada <[email protected]
[ rjw: Refactor code to avoid jumps, change function name, changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|