Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This tree includes four core perf fixes for misc bugs, three fixes to
x86 PMU drivers, and two updates to old email addresses"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Do not send exit event twice
perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro
perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell
perf: Fix PERF_EVENT_IOC_PERIOD deadlock
treewide: Remove old email address
perf/x86: Fix LBR call stack save/restore
perf: Update email address in MAINTAINERS
perf/core: Robustify the perf_cgroup_from_task() RCU checks
perf/core: Fix RCU problem with cgroup context switching code
|
|
Module couldn't release resource properly during the initialization. To
fix this issue, we will clean up the proper resource before returning.
Signed-off-by: Minfei Huang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
eof"
Sure, it's better to bail out of past-the-eof read and return 0 than return
a bogus negative value on such. Only we'd better make sure we are bailing out
with 0 and not -ENOMEM...
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
|
|
For block devices the pagecache is associated with the inode
on bdevfs, not with the aliasing ones on the mountable filesystems.
The latter have its own ->i_data empty and ->i_mapping pointing
to the (unique per major/minor) bdevfs inode. That guarantees
cache coherence between all block device inodes with the same
device number.
Eviction of an alias inode has no business trying to evict the
pages belonging to bdevfs one; moreover, ->i_mapping is only
safe to access when the thing is opened. At the time of
->evict_inode() the victim is definitely *not* opened. We are
about to kill the address space embedded into struct inode
(inode->i_data) and that's what we need to empty of any pages.
9p instance tries to empty inode->i_mapping instead, which is
both unsafe and bogus - if we have several device nodes with
the same device number in different places, closing one of them
should not try to empty the (shared) page cache.
Fortunately, other instances in the tree are OK; they are
evicting from &inode->i_data instead, as 9p one should.
Cc: [email protected] # v2.6.32+, ones prior to 2.6.36 need only half of that
Reported-by: "Suzuki K. Poulose" <[email protected]>
Tested-by: "Suzuki K. Poulose" <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
The driver now exposes sufficient limits so we can
avoid having mlx4 specific work-around.
Signed-off-by: Sagi Grimberg <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
mlx4 devices (ConnectX-2, ConnectX-3) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)
So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.
Signed-off-by: Sagi Grimberg <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Receipt of CM MAD with other than the Send method for an attribute
other than the ClassPortInfo attribute is invalid.
CM attributes other than ClassPortInfo only use the send method.
The SRP initiator does not maintain a timeout policy for CM connect
requests relies on the CM layer to do that. The result was that
the SRP initiator hung as the connect request never completed.
A new SRP target has been observed to respond to Send CM REQ
with GetResp of CM REQ with bad status. This is non conformant
with IBA spec but exposes a vulnerability in the current MAD/CM
code which will respond to the incoming GetResp of CM REQ as if
it was a valid incoming Send of CM REQ rather than tossing
this on the floor. It also causes the MAD layer not to
retransmit the original REQ even though it has not received a REP.
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Hal Rosenstock <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Ensure that validate_ipv4_net_dev() calls rcu_read_unlock() if
fib_lookup() fails. Detected by sparse. Compile-tested only.
Fixes: "IB/cma: Validate routing of incoming requests" (commit f887f2ac87c2).
Cc: Haggai Eran <[email protected]>
Cc: stable <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Haggai Eran <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
__unflatten_device_tree() calls unflatten_dt_node(), which declares
a static variable. It is therefore not reentrant.
One of the callers of __unflatten_device_tree(), unflatten_device_tree(),
is only called once during early initialization and does not need to be
protected. The other caller, of_fdt_unflatten_tree(), can be called at
any time, possibly multiple times in parallel. This can happen, for
example, if multiple devicetree overlays have to be loaded and installed.
Without this protection, errors such as the following may be seen.
kernel: End of tree marker overwritten: e6a3a458
kernel: find_target_node:
Failed to find target-indirect node at /fragment@0
kernel: __of_overlay_create: of_build_overlay_info() failed for tree@/
Add a mutex to of_fdt_unflatten_tree() to make the call reentrant.
Cc: Pantelis Antoniou <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Cc: [email protected] # v4.1+
Signed-off-by: Rob Herring <[email protected]>
|
|
76e0da3 "usb-gadget/uvc: use per-attribute show and store methods"
removed write permission for writeable attributes. Correct attribute
permissions.
Fixes: 76e0da3 "usb-gadget/uvc: use per-attribute show and store methods"
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Signed-off-by: Mian Yousaf Kaukab <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
|
|
If musb_init_controller fails at musb_platform_init, we have already
called pm_runtime_irq_safe for musb and that causes the pm runtime count
to be enabled for parent before the parent has completed initialization.
This causes pm to stop working as on unload nothing gets idled.
This issue can be reproduced at least with:
# modprobe omap2430
HS USB OTG: no transceiver configured
musb-hdrc musb-hdrc.0.auto: musb_init_controller failed with status -517
# modprobe phy-twl4030-usb
# rmmod omap2430
And after the steps above omap2430 will block deeper idle states on
omap3.
To fix this, let's not enable pm runtime until we need to and the
parent has been initialized. Note that this does not fix the issue of
PM being broken for musb during runtime.
Signed-off-by: Tony Lindgren <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
|
|
of_match_device could return NULL, and so cause a NULL pointer
dereference later.
Even if the probability of this case is very low, fixing it made
static analyzers happy.
Solving this with of_device_get_match_data made also code simplier.
Reported-by: coverity (CID 1324133)
Signed-off-by: LABBE Corentin <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
|
|
Fixes native drm clients like Fedora 23 Wayland which now appears to
be able to use cursor hotspots without strange cursor offsets.
Also fixes a couple of ignored error paths.
Since the core drm cursor hotspot is incompatible with the legacy vmwgfx
hotspot (the core drm hotspot is reset when the drm_mode_cursor ioctl
is used), we need to keep track of both and add them when the device
hotspot is set. We assume that either is always zero.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
|
|
We have two latest thinkpad laptop models which are all based on the
Intel skylake platforms, and all of them have the codec alc293 on
them. When the machines boot to the desktop, an greeting dialogue
shows up with the notification sound. But on these two models, there
is noise with the notification sound. We have 3 SKUs for each of
the models, all of them have this problem.
So far, this problem is only specific to these two thinkpad models,
we did not find this problem on the old thinkpad models with the
codec alc293 or alc292.
A workaround for this problem is disabling the aamix.
Cc: [email protected]
BugLink: https://bugs.launchpad.net/bugs/1523517
Signed-off-by: Hui Wang <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
|
|
A process element (defined in CAIA) keeps track of the endianess of
contexts through the Little Endian (LE) bit of the State Register. It
is currently set for user contexts, but was somehow forgotten for
kernel contexts, so this patch fixes it.
It could lead to erratic behavior from an AFU when the context is
attached through the kernel API.
Fixes: 2f663527bd6a ("cxl: Configure PSL for kernel contexts and merge code")
Cc: [email protected] # 4.2+
Signed-off-by: Frederic Barrat <[email protected]>
Suggested-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The OPAL event calls return a mask of events that are active in big
endian format. This is checked when unmasking the events in the
irqchip by comparison with a cached value. The cached value was stored
in big endian format but should've been converted to CPU endian
first.
This bug leads to OPAL event delivery being delayed or dropped on some
systems. Symptoms may include a non-functional console.
The bug is fixed by calling opal_handle_events(...) instead of
duplicating code in opal_event_unmask(...).
Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events")
Cc: [email protected] # v4.2+
Reported-by: Douglas L Lehr <[email protected]>
Signed-off-by: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
On 12/03/2015 01:18 AM, Christoph Hellwig wrote:
> The patch looks good to me, but while we touch this area, how about
> throwing in a few cosmetic fixes as well?
How about the patch below ? In that version of the ib_sg_to_pages() fix
these concerns have been addressed and additionally to more bugs have been fixed.
------------
[PATCH] IB core: Fix ib_sg_to_pages()
Fix the code for detecting gaps. A gap occurs not only if the
second or later scatterlist element is not aligned but also if
any scatterlist element other than the last does not end at a
page boundary.
In the code for coalescing contiguous elements, ensure that
mr->length is correct and that last_page_addr is up-to-date.
Ensure that this function returns a negative
error code instead of zero if the first set_page() call fails.
Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API")
Reported-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
After dma_map_sg() has been called the return value of that function
must be used as the number of elements in the scatterlist instead of
scsi_sg_count().
Fixes: commit f7f7aab1a5c0 ("IB/srp: Convert to new registration API")
Reported-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Cc: stable <[email protected]> # v4.4+
Cc: Sagi Grimberg <[email protected]>
Cc: Sebastian Parschauer <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Detected by sparse.
Fixes: commit 330179f2fa93 ("IB/srp: Register the indirect data buffer descriptor")
Signed-off-by: Bart Van Assche <[email protected]>
Cc: stable <[email protected]> # v4.3+
Cc: Sagi Grimberg <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Sebastian Parschauer <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Without this sg_dma_len will return 0 on architectures tha have
the dma_length field.
Fixes: commit f7f7aab1a5c0 ("IB/srp: Convert to new registration API")
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
When using work request based memory registration (fast_reg)
we must reserve SQ entries for registration and invalidation
in addition to send operations. Each IO consumes 3 SQ entries
(registration, send, invalidation) so we need to allocate 3x
larger send-queue instead of 2x.
Signed-off-by: Sagi Grimberg <[email protected]>
CC: Stable <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
If srp_connect_ch() returns a positive value then that is considered
by its caller as a connection failure but this does not result in a
scsi_host_put() call and additionally causes the srp_create_target()
function to return a positive value while it should return a negative
value. Avoid all this confusion and additionally fix a memory leak by
ensuring that srp_connect_ch() always returns a value that is <= 0.
This patch avoids that a rejected login triggers the following memory
leak:
unreferenced object 0xffff88021b24a220 (size 8):
comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s)
hex dump (first 8 bytes):
68 6f 73 74 35 38 00 a5 host58..
backtrace:
[<ffffffff8151014a>] kmemleak_alloc+0x7a/0xc0
[<ffffffff81165c1e>] __kmalloc_track_caller+0xfe/0x160
[<ffffffff81260d2b>] kvasprintf+0x5b/0x90
[<ffffffff81260e2d>] kvasprintf_const+0x8d/0xb0
[<ffffffff81254b0c>] kobject_set_name_vargs+0x3c/0xa0
[<ffffffff81337e3c>] dev_set_name+0x3c/0x40
[<ffffffff81355757>] scsi_host_alloc+0x327/0x4b0
[<ffffffffa03edc8e>] srp_create_target+0x4e/0x8a0 [ib_srp]
[<ffffffff8133778b>] dev_attr_store+0x1b/0x20
[<ffffffff811f27fa>] sysfs_kf_write+0x4a/0x60
[<ffffffff811f1e8e>] kernfs_fop_write+0x14e/0x180
[<ffffffff81176eef>] __vfs_write+0x2f/0xf0
[<ffffffff811771e4>] vfs_write+0xa4/0x100
[<ffffffff81177c64>] SyS_write+0x54/0xc0
[<ffffffff8151b257>] entry_SYSCALL_64_fastpath+0x12/0x6f
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Cc: Sebastian Parschauer <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
It was found by Saurabh Sengar that the netlink code tried to allocate
memory with GFP_KERNEL while holding a spinlock. While it is possible
to fix the issue by replacing GFP_KERNEL with GFP_ATOMIC, it is better
to get rid of the spinlock while sending the packet. However, in order
to protect against a race condition that a quick response may be received
before the request is put on the request list, we need to put the request
on the list first.
Signed-off-by: Kaike Wan <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reported-by: Saurabh Sengar <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
do_div is the wrong way to divide a sector_t, as it is less
efficient when sector_t is 32-bit wide. With the upcoming
do_div optimizations, the kernel starts warning about this:
drivers/infiniband/ulp/iser/iser_verbs.c:1296:4: note: in expansion of macro 'do_div'
include/asm-generic/div64.h:224:22: warning: passing argument 1 of '__div64_32' from incompatible pointer type
This changes the code to use sector_div instead, which always
produces optimal code.
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The current implementation gets a spin_lock, and at any scale with
qib and hfi1 post send, the lock contention grows exponentially
with the number of QPs.
idr_find() is RCU compatibile, so read doesn't need the lock.
Change to use rcu_read_lock() and rcu_read_unlock() in
__idr_get_uobj().
kfree_rcu() is used to insure a grace period between the
idr removal and actual free.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Reviewed-By: Jason Gunthorpe <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Minor errors found via code inspection during future development.
SFF 8636 defines bit position 2 to hold the status indication of
QSFP memory paging. The mask used to test for the value was
incorrect and is fixed in this patch. Additionally, the dump
function had a mismatch between the field being printed out and
the field used to source the data which was fixed.
Reviewed-by: Mitko Haralanov <[email protected]>
Reviewed-by: Mike Marciniszyn <[email protected]>
Reported-by: Easwar Hariharan <[email protected]>
Signed-off-by: Easwar Hariharan <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Commit e622f2f4ad21 ("IB: split struct ib_send_wr")
introduced a regression for HCAs whose user mode post
sends go through ib_uverbs_post_send().
The code didn't account for the fact that the first sge is
offset by an operation dependent length. The allocation did,
but the pointer to the destination sge list is computed without
that knowledge. The sge list copy_from_user() then corrupts
fields in the work request
Store the operation dependent length in a local variable and
compute the sge list copy_from_user() destination using that length.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
struct qib_mr requires the mr member be the last because struct
qib_mregion contains a dynamic array at the end. The additions
of members should have been placed before this structure as the
comment noted.
Failure to do so was causing random memory corruption. Reproducing
this bug was easy to do by running the client and server of
ib_write_bw -s 8 -n 5 on the same node.
This BUG() was tripped in a slab debug kernel:
kernel BUG at mm/slab.c:2572!
Fixes: 38071a461f0a ("IB/qib: Support the new memory registration API")
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The NFSv4.1 callback channel is currently broken because the receive
message will keep shrinking because the backchannel receive buffer size
never gets reset.
The easiest solution to this problem is instead of changing the receive
buffer, to rather adjust the copied request.
Fixes: 38b7631fbe42 ("nfs4: limit callback decoding to received bytes")
Cc: Benjamin Coddington <[email protected]>
Cc: [email protected]
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Pull virtio fixes from Michael Tsirkin:
"This includes some fixes and cleanups in virtio and vhost code.
Most notably, shadowing the index fixes the excessive cacheline
bouncing observed on AMD platforms"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_ring: shadow available ring flags & index
virtio: Do not drop __GFP_HIGH in alloc_indirect
vhost: replace % with & on data path
tools/virtio: fix byteswap logic
tools/virtio: move list macro stubs
virtio: fix memory leak of virtio ida cache layers
vhost: relax log address alignment
virtio-net: Stop doing DMA from the stack
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings,
some endian conversion problems with ext4 encryption, potential memory
leaks after truncate in data=journal mode, and an ocfs2 regression
caused by a jbd2 performance improvement"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix null committed data return in undo_access
ext4: add "static" to ext4_seq_##name##_fops struct
ext4: fix an endianness bug in ext4_encrypted_follow_link()
ext4: fix an endianness bug in ext4_encrypted_zeroout()
jbd2: Fix unreclaimed pages after truncate in data=journal mode
ext4: Fix handling of extended tv_sec
|
|
Bring the linker script in line with the recent increase of
L1_CACHE_BYTES to 128. Replace the hardcoded value of 64 with the
symbolic constant.
Signed-off-by: Ard Biesheuvel <[email protected]>
Acked-by: Mark Rutland <[email protected]>
[[email protected]: fix up RW_DATA_SECTION as well]
Signed-off-by: Catalin Marinas <[email protected]>
|
|
The LightNVM module exposes a debug interface when CONFIG_NVM_DEBUG is
set. This interfaces takes a string to configure media managers and
targets. Make sure this interface is only exposed when chosen
deliberately.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
After the gennvm module has been initialized. It might be attached to
one or several devices. In that case, the module is in use. Make sure
that it can not be unloaded.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch fixes two issues during media manager registration.
1. The ppa pool can be used at media manager registration. Allocate the
ppa pool before that.
2. If a media manager can't be found, this should not lead to the
device being unallocated. A media manager can be registered later, that
can manage the device. Only warn if a media manager fails
initialization.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In the case where a request queue is passed to the low lever lightnvm
device drive integration, the device driver might pass its admin
commands through another queue. Instead pass nvm_dev, and let the
low level drive the appropriate queue.
Reported-by: Christoph Hellwig <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
It is not obvious what NVM_IO_* and NVM_BLK_T_* are used for. Make sure
to comment them appropriately as the other constants.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The core can may issue I/Os before a media manager is registered with
the lightnvm subsystem. Make sure that we don't call the media manager
->end_io prematurely with a null pointer.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The spin_unlock is duplicated multiple times. Jump to a single unlock
to improve the code flow.
Signed-off-by: Wenwei Tao <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Put the allocated blocks back to the free list
when the luns configure failed, to make these
blocks useable to others.
Signed-off-by: Wenwei Tao <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
rrpc_get_blk use constant 0 as the input parameter
of nvm_get_blk, this may result in getting gc block
failed unexpectedly.
Signed-off-by: Wenwei Tao <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Improves cacheline transfer flow of available ring header.
Virtqueues are implemented as a pair of rings, one producer->consumer
avail ring and one consumer->producer used ring; preceding the
avail ring in memory are two contiguous u16 fields -- avail->flags
and avail->idx. A producer posts work by writing to avail->idx and
a consumer reads avail->idx.
The flags and idx fields only need to be written by a producer CPU
and only read by a consumer CPU; when the producer and consumer are
running on different CPUs and the virtio_ring code is structured to
only have source writes/sink reads, we can continuously transfer the
avail header cacheline between 'M' states between cores. This flow
optimizes core -> core bandwidth on certain CPUs.
(see: "Software Optimization Guide for AMD Family 15h Processors",
Section 11.6; similar language appears in the 10h guide and should
apply to CPUs w/ exclusive caches, using LLC as a transfer cache)
Unfortunately the existing virtio_ring code issued reads to the
avail->idx and read-modify-writes to avail->flags on the producer.
This change shadows the flags and index fields in producer memory;
the vring code now reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.
In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:
(w/o shadowing):
Performance counter stats for './vring_bench':
5,451,082,016 L1-dcache-loads
...
2.221477739 seconds time elapsed
(w/ shadowing):
Performance counter stats for './vring_bench':
5,405,701,361 L1-dcache-loads
...
2.168405376 seconds time elapsed
The further away (in a NUMA sense) virtio producers and consumers are
from each other, the more we expect to benefit. Physical implementations
of virtio devices and implementations of virtio where the consumer polls
vring avail indexes (vhost) should also benefit.
Signed-off-by: Venkatesh Srinivas <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
b92b1b89a33c ("virtio: force vring descriptors to be allocated from
lowmem") tried to exclude highmem pages for descriptors so it cleared
__GFP_HIGHMEM from a given gfp mask. The patch also cleared __GFP_HIGH
which doesn't make much sense for this fix because __GFP_HIGH only
controls access to memory reserves and it doesn't have any influence
on the zone selection. Some of the call paths use GFP_ATOMIC and
dropping __GFP_HIGH will reduce their changes for success because the
lack of access to memory reserves.
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Will Deacon <[email protected]>
Reviewed-by: Mel Gorman <[email protected]>
|
|
We know vring num is a power of 2, so use &
to mask the high bits.
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
commit cf561f0d2eb74574ad9985a2feab134267a9d298 ("virtio: introduce
virtio_is_little_endian() helper") changed byteswap logic to
skip feature bit checks for LE platforms, but didn't
update tools/virtio, so vring_bench started failing.
Update the copy under tools/virtio/ (TODO: find a way to avoid this code
duplication).
Cc: Greg Kurz <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Makes them more generally available.
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
The virtio core uses a static ida named virtio_index_ida for
assigning index numbers to virtio devices during registration.
The ida core may allocate some internal idr cache layers and
an ida bitmap upon any ida allocation, and all these layers are
truely freed only upon the ida destruction. The virtio_index_ida
is not destroyed at present, leading to a memory leak when using
the virtio core as a module and atleast one virtio device is
registered and unregistered.
Fix this by invoking ida_destroy() in the virtio core module
exit.
Cc: [email protected]
Signed-off-by: Suman Anna <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
commit 5d9a07b0de512b77bf28d2401e5fe3351f00a240 ("vhost: relax used
address alignment") fixed the alignment for the used virtual address,
but not for the physical address used for logging.
That's a mistake: alignment should clearly be the same for virtual and
physical addresses,
Cc: [email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
page reads
Every attempt to issue a read log page command lockup the controller.
The command is currently sent if the sata device includes the devlsp feature
to read out the timing data.
This attempt to read the data, locks up the controller and the device
is not recognzied correctly (failed to set xfermode) and cannot be accessed.
This was found on Freescale P1013/P1022 and T4240 CPUs
using a ATP IG mSATA 4GB with the devslp feature.
fsl-sata ff718000.sata: Sata FSL Platform/CSB Driver init
[ 1.254195] scsi0 : sata_fsl
[ 1.256004] ata1: SATA max UDMA/133 irq 74
[ 1.370666] fsl-gianfar ethernet.3: enabled errata workarounds, flags: 0x4
[ 1.470671] fsl-gianfar ethernet.4: enabled errata workarounds, flags: 0x4
[ 1.775584] ata1: Signature Update detected @ 504 msecs
[ 1.947594] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.948366] ata1.00: ATA-8: ATP IG mSATA, 20150311, max UDMA/133
[ 1.948371] ata1.00: 7732368 sectors, multi 0: LBA
[ 1.948843] ata1.00: failed to get Identify Device Data, Emask 0x1
[ 1.948857] ata1.00: failed to set xfermode (err_mask=0x40)
[ 7.467557] ata1: Signature Update detected @ 504 msecs
[ 7.639560] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7.651320] ata1.00: failed to get Identify Device Data, Emask 0x1
[ 7.651360] ata1.00: failed to set xfermode (err_mask=0x40)
[ 7.655628] ata1: limiting SATA link speed to 1.5 Gbps
[ 7.659458] ata1.00: limiting speed to UDMA/133:PIO3
[ 13.163554] ata1: Signature Update detected @ 504 msecs
[ 13.335558] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 13.347298] ata1.00: failed to get Identify Device Data, Emask 0x1
[ 13.347334] ata1.00: failed to set xfermode (err_mask=0x40)
[ 13.351601] ata1.00: disabled
[ 13.353278] ata1: exception Emask 0x50 SAct 0x0 SErr 0x800 action 0x6 frozen t4
[ 13.359281] ata1: SError: { HostInt }
[ 13.361644] ata1: hard resetting link
Signed-off-by: Andreas Werner <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
log page
Some controller lockup on a ata_read_log_page.
Add new ata port flag ATA_FLAG_NO_LOG_PAGE which can used
to blacklist a controller.
If this flag is set, any attempt to read a log page returns an error
without actually issuing the command.
Signed-off-by: Andreas Werner <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|