Age | Commit message (Collapse) | Author | Files | Lines |
|
A minor version number increase should not break backwards
compatibility.
Fixes: 3cb98f84d368b ("lightnvm: add minor version to generic geometry")
Reviewed-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull NVMe changes from Christoph:
"This contains the support for TP4004, Asymmetric Namespace Access,
which makes NVMe multipathing usable in practice."
* 'nvme-4.19' of git://git.infradead.org/nvme:
nvmet: use Retain Async Event bit to clear AEN
nvmet: support configuring ANA groups
nvmet: add minimal ANA support
nvmet: track and limit the number of namespaces per subsystem
nvmet: keep a port pointer in nvmet_ctrl
nvme: add ANA support
nvme: remove nvme_req_needs_failover
nvme: simplify the API for getting log pages
nvme.h: add ANA definitions
nvme.h: add support for the log specific field
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a
merge conflict down the line.
Signed-of-by: Jens Axboe <[email protected]>
|
|
To avoid introducing problems like those fixed in commit f7068114d45e
("sr: pass down correctly sized SCSI sense buffer"), this creates a macro
wrapper for scsi_execute() that verifies the size of the sense buffer
similar to what was done for command string sizes in commit 3756f6401c30
("exec: avoid gcc-8 warning for get_task_comm").
Another solution could be to add a length argument to scsi_execute(),
but this function already takes a lot of arguments and Jens was not fond
of that approach.
Additionally, this moves the SCSI_SENSE_BUFFERSIZE definition into
scsi_device.h, and removes a redundant include for scsi_device.h from
scsi_cmnd.h.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
To support future compile-time sizeof() checks that will be able to
validate the length of sense buffers, this removes the only dynamically
allocated sense buffers in the tree by putting the 96 byte sense buffers
on the stack.
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This removes more casts of struct request_sense and uses the standard
struct scsi_sense_hdr instead. This also fixes any possible stale values
since the prior code did not check the sense length.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This is already able to process the sense buffer, so remove the redundant
parsing during the failure path. This also fixes any possible stale values
since the prior code did not check the sense length.
Acked-by: David S. Miller <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
There is a lot of needless struct request_sense usage in the CDROM
code. These can all be struct scsi_sense_hdr instead, to avoid any
confusion over their respective structure sizes. This patch is a lot
of noise changing "sense" to "sshdr", but the final code is more
readable to distinguish between "sense" meaning "struct request_sense"
and "sshdr" meaning "struct scsi_sense_hdr".
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The core target code only needs code from scsi_common.c, which is now
separately selectable.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Split scsi_common.o out of SCSI so that non-SCSI users can pull it in
easily for future sense buffer helper usage.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This removes the unused sense buffer in read_cap16() and write_same16().
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This drops unused sense buffers from:
cdrom_eject()
cdrom_read_capacity()
cdrom_read_tocentry()
ide_cd_lockdoor()
ide_cd_read_toc()
Acked-by: David S. Miller <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The passed 'nr' from userspace represents the total depth, meantime
inside 'struct blk_mq_tags', 'nr_tags' stores the total tag depth,
and 'nr_reserved_tags' stores the reserved part.
There are two issues in blk_mq_tag_update_depth() now:
1) for growing tags, we should have used the passed 'nr', and keep the
number of reserved tags not changed.
2) the passed 'nr' should have been used for checking against
'tags->nr_tags', instead of number of the normal part.
This patch fixes the above two cases, and avoids kernel crash caused
by wrong resizing sbitmap queue.
Cc: "Ewan D. Milne" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Bart Van Assche <[email protected]>
Cc: Omar Sandoval <[email protected]>
Tested by: Marco Patalano <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Runtime PM isn't ready for blk-mq yet, and commit 765e40b675a9 ("block:
disable runtime-pm for blk-mq") tried to disable it. Unfortunately,
it can't take effect in that way since user space still can switch
it on via 'echo auto > /sys/block/sdN/device/power/control'.
This patch disables runtime-pm for blk-mq really by pm_runtime_disable()
and fixes all kinds of PM related kernel crash.
Cc: Tomas Janousek <[email protected]>
Cc: Przemek Socha <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Patrick Steinhardt <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 114722 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently, avg_lat is calculated by accumulating the mean of every
window in a long running cumulative average. As time goes on, the metric
becomes less and less useful due to the accumulated history.
This patch reuses the same calculation done in load averages to make the
avg_lat metric more lively. Unlike load averages, the avg only advances
when a window elapses (due to an io). Idle periods extend the most
recent window. Bucketing is used to limit the history of avg_lat by
binding it to the window size. So, the window range for 1/exp (decay
rate) is [1 min, 2.5 min) when windows elapse immediately.
The current sample window size is exposed in the debug info to enable
calculation of the window range.
Signed-off-by: Dennis Zhou <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We were hitting a panic in production where we put too many times on the
request queue. This is because we'd get the throttle_queue of the
parent if we fork()'ed while we needed to be throttled, but we didn't
have a reference on it. Instead just clear these flags on fork so the
child doesn't pay for the sins of its father.
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The blkg lifetime is protected by the queue lifetime, so we need to put
the queue _after_ we're done using the blkg.
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
At this point we have a ref on the blkg, we need to drop it if we don't
have a iolat.
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Simplify the code by using the PTR_ERR_OR_ZERO, instead of the
open code. It is better.
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: zhong jiang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fixes a link failure whtn BLK_DEV_INTEGRITY isn't defined.
Fixes: 10c41ddd6132 ("block: move dif_prepare/dif_complete functions to block layer")
Signed-off-by: Jens Axboe <[email protected]>
|
|
We find the memory use-after-free issue in __blk_drain_queue()
on the kernel 4.14. After read the latest kernel 4.18-rc6 we
think it has the same problem.
Memory is allocated for q->fq in the blk_init_allocated_queue().
If the elevator init function called with error return, it will
run into the fail case to free the q->fq.
Then the __blk_drain_queue() uses the same memory after the free
of the q->fq, it will lead to the unpredictable event.
The patch is to set q->fq as NULL in the fail case of
blk_init_allocated_queue().
Fixes: commit 7c94e1c157a2 ("block: introduce blk_flush_queue to drive flush machinery")
Cc: <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: xiao jin <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Also moved the logic of the remapping to the nvme core driver instead
of implementing it in the nvme pci driver. This way all the other nvme
transport drivers will benefit from it (in case they'll implement metadata
support).
Suggested-by: Christoph Hellwig <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Acked-by: Keith Busch <[email protected]>
Signed-off-by: Max Gurtovoy <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently these functions are implemented in the scsi layer, but their
actual place should be the block layer since T10-PI is a general data
integrity feature that is used in the nvme protocol as well. Also, use
the tuple size from the integrity profile since it may vary between
integrity types.
Suggested-by: Christoph Hellwig <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: Max Gurtovoy <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently this function is implemented in the scsi layer, but it's
actual place should be the block layer since T10-PI is a general
data integrity feature that is used in the nvme protocol as well.
Suggested-by: Christoph Hellwig <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Signed-off-by: Max Gurtovoy <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We need to check in blkcg_bio_issue_check if the bio is flagged as
QUEUE_ENTERED, because if it is then we've already accounted for the
size of the IO in the cgroup stats. We can still however account for
the extra IO since it'll be another request.
Reported-by: Tejun Heo <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
User controls @dev_minor which to be used as index of pkt_devs.
So, It can be exploited via Spectre-like attack. (speculative execution)
This kind of attack leaks address of pkt_devs, [1]
It leads an attacker to bypass security mechanism such as KASLR.
So sanitize @dev_minor before using it to prevent attack.
[1] https://github.com/jinb-park/linux-exploit/
tree/master/exploit-remaining-spectre-gadget/leak_pkt_devs.c
Signed-off-by: Jinbum Park <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In the current implementation, we clear the AEN bit when we get the
"get log page" command if given log page is associated with AEN.
This patch allows optionally retaining the AEN for the ctrl
under consideration when Retain Asynchronous Event (RAE) bit is set
as a part of "get log page" command.
This allows the host to read the Log page and optionally retaining the
AEN associated with this log page when using userspace tools like
nvme-cli.
Signed-off-by: Chaitanya Kulkarni <[email protected]>
[hch: also use the new helper in the just merged ANA code]
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Allow creating non-default ANA groups (group ID > 1). Groups are created
either by assigning the group ID to a namespace, or by creating a configfs
group object under a specific port. All namespaces assigned to a group
that doesn't have a configfs object for a given port are marked as
inaccessible.
Allow changing the ANA state on a per-port basis by creating an
ana_groups directory under each port, and another directory with an
ana_state file in it. The default ANA group 1 directory is created
automatically for each port.
For all changes in ANA configuration the ANA change AEN is sent. We only
keep a global changecount instead of additional per-group changecounts to
keep the implementation as simple as possible.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
Add support for Asynchronous Namespace Access as specified in NVMe 1.3
TP 4004.
Just add a default ANA group 1 that is optimized on all ports. This is
(and will remain) the default assignment for any namespace not epxlicitly
assigned to another ANA group. The ANA state can be manually changed
through the configfs interface, including the change state.
Includes fixes and improvements from Hannes Reinecke.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
TP 4004 introduces a new 'Maximum Number of Allocated Namespaces' field
in the Identify controller data to help the host size resources. Put
an upper limit on the supported namespaces to be able to support this
value as supporting 32-bits worth of namespaces would lead to very
large buffers. The limit is completely arbitrary at this point.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
This will be needed for the ANA AEN code.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
Add support for Asynchronous Namespace Access as specified in NVMe 1.3
TP 4004. With ANA each namespace attached to a controller belongs to an
ANA group that describes the characteristics of accessing the namespaces
through this controller. In the optimized and non-optimized states
namespaces can be accessed regularly, although in a multi-pathing
environment we should always prefer to access a namespace through a
controller where an optimized relationship exists. Namespaces in
Inaccessible, Permanent-Loss or Change state for a given controller
should not be accessed.
The states are updated through reading the ANA log page, which is read
once during controller initialization, whenever the ANA change notice
AEN is received, or when one of the ANA specific status codes that
signal a state change is received on a command.
The ANA state is kept in the nvme_ns structure, which makes the checks in
the fast path very simple. Updating the ANA state when reading the log
page is also very simple, the only downside is that finding the initial
ANA state when scanning for namespaces is a bit cumbersome.
The gendisk for a ns_head is only registered once a live path for it
exists. Without that the kernel would hang during partition scanning.
Includes fixes and improvements from Hannes Reinecke.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
Now that we just call out to blk_path_error there isn't really any good
reason to not merge it into the only caller.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
Merge nvme_get_log and nvme_get_log_ext into a single helper, which takes
a plain nsid instead of the nvme_ns pointer. Also add support for the
log specific field while we're at it.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
Add various defintions from NVMe 1.3 TP 4004.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
NVMe 1.3 added a new log specific field to the get log page CQ
defintion, add it to our get_log_page SQ structure.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
Even if properly initialized, the lvname array (i.e., strings)
is read from disk, and might contain corrupt data (e.g., lack
the null terminating character for strings).
So, make sure the partition name string used in pr_warn() has
the null terminating character.
Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Suggested-by: Daniel J. Axtens <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The if-block that sets a successful return value in aix_partition()
uses 'lvip[].pps_per_lv' and 'n[].name' potentially uninitialized.
For example, if 'numlvs' is zero or alloc_lvn() fails, neither is
initialized, but are used anyway if alloc_pvd() succeeds after it.
So, make the alloc_pvd() call conditional on their initialization.
This has been hit when attaching an apparently corrupted/stressed
AIX LUN, misleading the kernel to pr_warn() invalid data and hang.
[...] partition (null) (11 pp's found) is not contiguous
[...] partition (null) (2 pp's found) is not contiguous
[...] partition (null) (3 pp's found) is not contiguous
[...] partition (null) (64 pp's found) is not contiguous
Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The get_seconds function is deprecated now since it returns a 32-bit
value that will eventually overflow, and we are replacing it throughout
the kernel with ktime_get_seconds() or ktime_get_real_seconds() that
return a time64_t.
bcache uses get_seconds() to read the current system time and store it in
the superblock as well as in uuid_entry structures that are user visible.
Unfortunately, the two structures in are still limited to 32 bits, so this
won't fix any real problems but will still overflow in year 2106. Let's
at least document that properly, in case we get an updated format in the
future it can be fixed. We still have a long time before the overflow
and checking the tools at https://github.com/koverstreet/bcache-tools
reveals no access to any of them.
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fixes an error condition reported by checkpatch.pl which is caused by
assigning a variable in an if condition.
Signed-off-by: Florian Schmaus <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fixes an error condition reported by checkpatch.pl which is caused by
assigning a variable in an if condition.
Signed-off-by: Florian Schmaus <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Free the cache_set->flush_bree heap memory on journal free.
Signed-off-by: Wang Sheng-Hui <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fixes an error condition reported by checkpatch.pl which is caused by
assigning a variable in an if condition.
Signed-off-by: Florian Schmaus <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
I attached several backend devices in the same cache set, and produced lots
of dirty data by running small rand I/O writes in a long time, then I
continue run I/O in the others cached devices, and stopped a cached device,
after a mean while, I register the stopped device again, I see the running
I/O in the others cached devices dropped significantly, sometimes even
jumps to zero.
In currently code, bcache would traverse each keys and btree node to count
the dirty data under read locker, and the writes threads can not get the
btree write locker, and when there is a lot of keys and btree node in the
registering device, it would last several seconds, so the write I/Os in
others cached device are blocked and declined significantly.
In this patch, when a device registering to a ache set, which exist others
cached devices with running I/Os, we get the amount of dirty data of the
device in an incremental way, and do not block other cached devices all the
time.
Patch v2: Rename some variables and macros name as Coly suggested.
Signed-off-by: Tang Junhui <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
of btree nodes
This patch base on "[PATCH] bcache: finish incremental GC".
Since incremental GC would stop 100ms when front side I/O comes, so when
there are many btree nodes, if GC only processes constant (100) nodes each
time, GC would last a long time, and the front I/Os would run out of the
buckets (since no new bucket can be allocated during GC), and I/Os be
blocked again.
So GC should not process constant nodes, but varied nodes according to the
number of btree nodes. In this patch, GC is divided into constant (100)
times, so when there are many btree nodes, GC can process more nodes each
time, otherwise GC will process less nodes each time (but no less than
MIN_GC_NODES).
Signed-off-by: Tang Junhui <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In GC thread, we record the latest GC key in gc_done, which is expected
to be used for incremental GC, but in currently code, we didn't realize
it. When GC runs, front side IO would be blocked until the GC over, it
would be a long time if there is a lot of btree nodes.
This patch realizes incremental GC, the main ideal is that, when there
are front side I/Os, after GC some nodes (100), we stop GC, release locker
of the btree node, and go to process the front side I/Os for some times
(100 ms), then go back to GC again.
By this patch, when we doing GC, I/Os are not blocked all the time, and
there is no obvious I/Os zero jump problem any more.
Patch v2: Rename some variables and macros name as Coly suggested.
Signed-off-by: Tang Junhui <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently we calculate the total amount of flash only devices dirty data
by adding the dirty data of each flash only device under registering
locker. It is very inefficient.
In this patch, we add a member flash_dev_dirty_sectors in struct cache_set
to record the total amount of flash only devices dirty data in real time,
so we didn't need to calculate the total amount of dirty data any more.
Signed-off-by: Tang Junhui <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
ondemand_readahead() checks bdi->io_pages to cap the maximum pages
that need to be processed. This works until the readit section. If
we would do an async only readahead (async size = sync size) and
target is at beginning of window we expand the pages by another
get_next_ra_size() pages. Btrace for large reads shows that kernel
always issues a doubled size read at the beginning of processing.
Add an additional check for io_pages in the lower part of the func.
The fix helps devices that hard limit bio pages and rely on proper
handling of max_hw_read_sectors (e.g. older FusionIO cards). For
that reason it could qualify for stable.
Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting")
Cc: [email protected]
Signed-off-by: Markus Stockhausen [email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
When the underlying device is a 4 KiB logical block size device with a
protection interval exponent of 0, i.e. 4096 bytes data + 8 bytes PI, the
driver miscalculates the pi_bytes{out,in} by a factor of 8x (64 bytes).
This leads to errors on all reads and writes on 4 KiB logical block size
devices when CONFIG_BLK_DEV_INTEGRITY is enabled and the
VIRTIO_SCSI_F_T10_PI feature bit has been negotiated.
Fixes: e6dc783a38ec0 ("virtio-scsi: Enable DIF/DIX modes in SCSI host LLD")
Acked-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Edwards <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|