Age | Commit message (Collapse) | Author | Files | Lines |
|
bio_integrity_add_page() returns the add length if successful, else 0,
just as bio_add_page. Simply check return value checking in
bio_integrity_prep to not deal with a > 0 but < len case that can't
happen.
Cc: Christoph Hellwig <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jinyoung Choi <[email protected]>
Tested-by: "Martin K. Petersen" <[email protected]>
Reviewed-by: "Martin K. Petersen" <[email protected]>
Link: https://lore.kernel.org/r/20230803025058epcms2p5a4d0db5da2ad967668932d463661c633@epcms2p5
Signed-off-by: Jens Axboe <[email protected]>
|
|
Previously, the bip's bi_size has been set before an integrity pages
were added. If a problem occurs in the process of adding pages for
bip, the bi_size mismatch problem must be dealt with.
When the page is successfully added to bvec, the bi_size is updated.
The parts affected by the change were also contained in this commit.
Cc: Christoph Hellwig <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jinyoung Choi <[email protected]>
Tested-by: "Martin K. Petersen" <[email protected]>
Reviewed-by: "Martin K. Petersen" <[email protected]>
Link: https://lore.kernel.org/r/20230803024956epcms2p38186a17392706650c582d38ef3dbcd32@epcms2p3
Signed-off-by: Jens Axboe <[email protected]>
|
|
This will be used for multi-page configuration for integrity payload.
Cc: Christoph Hellwig <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jinyoung Choi <[email protected]>
Tested-by: "Martin K. Petersen" <[email protected]>
Reviewed-by: "Martin K. Petersen" <[email protected]>
Link: https://lore.kernel.org/r/20230803024827epcms2p838d9e9131492c86a159fff25d195658f@epcms2p8
Signed-off-by: Jens Axboe <[email protected]>
|
|
The original formula was inaccurate:
dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
For write requests, when we assign a tags from sched_tags,
data->shallow_depth will be passed to sbitmap_find_bit,
see the following code:
nr = sbitmap_find_bit_in_word(&sb->map[index],
min_t (unsigned int,
__map_depth(sb, index),
depth),
alloc_hint, wrap);
The smaller of data->shallow_depth and __map_depth(sb, index)
will be used as the maximum range when allocating bits.
For a mmc device (one hw queue, deadline I/O scheduler):
q->nr_requests = sched_tags = 128, so according to the previous
calculation method, dd->async_depth = data->shallow_depth = 96,
and the platform is 64bits with 8 cpus, sched_tags.bitmap_tags.sb.shift=5,
sb.maps[]=32/32/32/32, 32 is smaller than 96, whether it is a read or
a write I/O, tags can be allocated to the maximum range each time,
which has not throttling effect.
In addition, refer to the methods of bfg/kyber I/O scheduler,
limit ratiois are calculated base on sched_tags.bitmap_tags.sb.shift.
This patch can throttle write requests really.
Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")
Signed-off-by: Zhiguo Niu <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add zoned storage support to ublk: report_zones and operations:
- REQ_OP_ZONE_OPEN
- REQ_OP_ZONE_CLOSE
- REQ_OP_ZONE_FINISH
- REQ_OP_ZONE_RESET
- REQ_OP_ZONE_APPEND
The zone append feature uses the `addr` field of `struct ublksrv_io_cmd` to
communicate ALBA back to the kernel. Therefore ublk must be used with the
user copy feature (UBLK_F_USER_COPY) for zoned storage support to be
available. Without this feature, ublk will not allow zoned storage support.
Signed-off-by: Andreas Hindborg <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Tested-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
In preparation for zoned storage support, move the check for empty `addr`
field into the command handler case statement. Note that the check makes no
sense for `UBLK_IO_NEED_GET_DATA` because the `addr` field must always be
set for this command.
Signed-off-by: Andreas Hindborg <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
This will be used by ublk zoned storage support.
Signed-off-by: Andreas Hindborg <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The iocg can have three throttled metrics: wait, debt, delay. This patch
add missing wait_ms to IocgStat to show the latest wait_ms of iocg.
As we are here, group iocg usage percents "inflt%" and "usage%" together,
and group iocg throttled metrics "wait", "debt" and "delay" together.
Effect after changes:
nvme0n1 RUN per=50.0ms cur_per=177105.713:v1053528.587 busy= +0 vrate=135.00%:270.00% params=ssd_dfl(CQ)
active weight hweight% inflt% usage% wait debt delay
InterfererGroup0 * 100/ 100 54.28/ 9.09 0.34 24.07 0.00 0.00 0.00
interfered * 84/ 1000 45.72/ 90.91 0.48 41.09 0.00 0.00 0.00
Signed-off-by: Chengming Zhou <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The real vrate iocost inuse is not base_vrate, but the atomic vtime_rate.
We need iocost_monitor tool to display this real vrate that iocost use,
to check if the boosted compensated vrate is normal.
Effect after change:
nvme0n1 RUN per=50.0ms cur_per=172116.580:v1040587.433 busy= +0 \
vrate=135.00%:270.00% params=ssd_dfl(CQ)
^
|
this is real vrate inuse
Signed-off-by: Chengming Zhou <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
When I use iocost_monitor on nvme0n1, this error shows up:
"Could not find ioc for nvme0n1"
There is no kobj in struct queue in recent kernel, it seems that the commit
2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
move the queue kobj to struct gendisk.
Fix it by using mq_kobj which is at the same level with queue kobj.
Signed-off-by: Chengming Zhou <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
There are some compile errors reported by kernel test robot:
arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_read':
storage.c:(.text+0x64): undefined reference to `__brelse'
arm-linux-gnueabi-ld: storage.c:(.text+0x9c): undefined reference to `__bread_gfp'
arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_strnlen':
storage.c:(.text+0x128): undefined reference to `__brelse'
arm-linux-gnueabi-ld: storage.c:(.text+0x16c): undefined reference to `__bread_gfp'
arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_strcmp':
storage.c:(.text+0x22c): undefined reference to `__bread_gfp'
arm-linux-gnueabi-ld: storage.c:(.text+0x27c): undefined reference to `__brelse'
arm-linux-gnueabi-ld: storage.c:(.text+0x2a8): undefined reference to `__bread_gfp'
arm-linux-gnueabi-ld: storage.c:(.text+0x2bc): undefined reference to `__brelse'
arm-linux-gnueabi-ld: storage.c:(.text+0x2d4): undefined reference to `__brelse'
arm-linux-gnueabi-ld: storage.c:(.text+0x2f4): undefined reference to `__brelse'
arm-linux-gnueabi-ld: storage.c:(.text+0x304): undefined reference to `__brelse'
The reason for the problem is that the commit
"925c86a19bac" ("fs: add CONFIG_BUFFER_HEAD") has added a new config
"CONFIG_BUFFER_HEAD" that controls building the buffer_head code, and
romfs needs to use the buffer_head API, but no corresponding config has
beed added. Select the config "CONFIG_BUFFER_HEAD" in romfs Kconfig to
resolve the problem.
Fixes: 925c86a19bac ("fs: add CONFIG_BUFFER_HEAD")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Li Zetao <[email protected]>
Signed-off-by: Li Zetao <[email protected]>
[axboe: fold in Christoph's incremental]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add a new config option that controls building the buffer_head code, and
select it from all file systems and stacking drivers that need it.
For the block device nodes and alternative iomap based buffered I/O path
is provided when buffer_head support is not enabled, and iomap needs a
a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call
into the buffer_head code when it doesn't exist.
Otherwise this is just Kconfig and ifdef changes.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use iomap in buffer_head compat mode to write to block devices.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Pankaj Raghav <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Direct I/O on block devices now nevers goes through aops->direct_IO.
Stop setting it and set the FMODE_CAN_ODIRECT in ->open instead.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Open code __generic_file_write_iter to remove the indirect call into
->direct_IO and to prepare using the iomap based write code.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
block_page_mkwrite_return is neither block nor mkwrite specific, and
should not be under CONFIG_BLOCK. Move it to mm.h and rename it to
vmf_fs_error.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fold emergency_thaw_bdev into it's only caller, to prepare for buffer.c
to be built only when buffer_head support is enabled.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.6/block
Pull MD updates from Song:
"1. Deprecate bitmap file support, by Christoph Hellwig;
2. Fix deadlock with md sync thread, by Yu Kuai;
3. Refactor md io accounting, by Yu Kuai;
4. Various non-urgent fixes by Li Nan, Yu Kuai, and Jack Wang."
* tag 'md-next-20230729' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: (36 commits)
md/md-bitmap: hold 'reconfig_mutex' in backlog_store()
md/md-bitmap: remove unnecessary local variable in backlog_store()
md/raid10: use dereference_rdev_and_rrdev() to get devices
md/raid10: factor out dereference_rdev_and_rrdev()
md/raid10: check replacement and rdev to prevent submit the same io twice
md/raid1: Avoid lock contention from wake_up()
md: restore 'noio_flag' for the last mddev_resume()
md: don't quiesce in mddev_suspend()
md: remove redundant check in fix_read_error()
md/raid10: optimize fix_read_error
md/raid1: prioritize adding disk to 'removed' mirror
md/md-faulty: enable io accounting
md/md-linear: enable io accounting
md/md-multipath: enable io accounting
md/raid10: switch to use md_account_bio() for io accounting
md/raid1: switch to use md_account_bio() for io accounting
raid5: fix missing io accounting in raid5_align_endio()
md: also clone new io if io accounting is disabled
md: move initialization and destruction of 'io_acct_set' to md.c
md: deprecate bitmap file support
...
|
|
Several reasons why 'reconfig_mutex' should be held:
1) rdev_for_each() is not safe to be called without the lock, because
rdev can be removed concurrently.
2) mddev_destroy_serial_pool() and mddev_create_serial_pool() should not
be called concurrently.
3) mddev_suspend() from mddev_destroy/create_serial_pool() should be
protected by the lock.
Fixes: 10c92fca636e ("md-bitmap: create and destroy wb_info_pool with the change of backlog")
Signed-off-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
Local variable is definied first in the beginning of backlog_store(),
there is no need to define it again.
Fixes: 8c13ab115b57 ("md/bitmap: don't set max_write_behind if there is no write mostly device")
Signed-off-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
Commit 2ae6aaf76912 ("md/raid10: fix io loss while replacement replace
rdev") reads replacement first to prevent io loss. However, there are same
issue in wait_blocked_dev() and raid10_handle_discard(), too. Fix it by
using dereference_rdev_and_rrdev() to get devices.
Fixes: d30588b2731f ("md/raid10: improve raid10 discard request")
Fixes: f2e7e269a752 ("md/raid10: pull the code that wait for blocked dev into one function")
Signed-off-by: Li Nan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
Factor out a helper to get 'rdev' and 'replacement' from config->mirrors.
Just to make code cleaner and prepare to fix the bug of io loss while
'replacement' replace 'rdev'.
There is no functional change.
Signed-off-by: Li Nan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
After commit 4ca40c2ce099 ("md/raid10: Allow replacement device to be
replace old drive."), 'rdev' and 'replacement' could appear to be
identical. There are already checks for that in wait_blocked_dev() and
raid10_write_request(). Add check for raid10_handle_discard() now.
Signed-off-by: Li Nan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
wake_up is called unconditionally in a few paths such as make_request(),
which cause lock contention under high concurrency workload like below
raid1_end_write_request
wake_up
__wake_up_common_lock
spin_lock_irqsave
Improve performance by only call wake_up() if waitqueue is not empty
Fio test script:
[global]
name=random reads and writes
ioengine=libaio
direct=1
readwrite=randrw
rwmixread=70
iodepth=64
buffered=0
filename=/dev/md0
size=1G
runtime=30
time_based
randrepeat=0
norandommap
refill_buffers
ramp_time=10
bs=4k
numjobs=400
group_reporting=1
[job1]
Test result with 2 ramdisk in raid1 on a Intel Broadwell 56 cores server.
Before this patch With this patch
READ BW=4621MB/s BW=7337MB/s
WRITE BW=1980MB/s BW=3144MB/s
The patch is inspired by Yu Kuai's change for raid10:
https://lore.kernel.org/r/[email protected]
Cc: Yu Kuai <[email protected]>
Signed-off-by: Jack Wang <[email protected]>
Reviewed-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
memalloc_noio_save() is called for the first mddev_suspend(), and
repeated mddev_suspend() only increase 'suspended'. However,
memalloc_noio_restore() is also called for the first mddev_resume(),
which means that memory reclaim will be enabled before the last
mddev_resume() is called, while the array is still suspended.
Fix this problem by restore 'noio_flag' for the last mddev_resume().
Fixes: 78f57ef9d50a ("md: use memalloc scope APIs in mddev_suspend()/mddev_resume()")
Signed-off-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
Some levels doesn't implement "pers->quiesce", for example
raid0_quiesce() is empty, and now that all levels will drop 'active_io'
until io is done, wait for 'active_io' to be 0 is enough to make sure all
normal io is done, and percpu_ref_kill() for 'active_io' will make sure
no new normal io can be dispatched. There is no need to call
"pers->quiesce" anymore from mddev_suspend().
Signed-off-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
In fix_read_error(), 'success' will be checked immediately after assigning
it, if it is set to 1 then the loop will break. Checking it again in
condition of loop is redundant. Clean it up.
Signed-off-by: Li Nan <[email protected]>
Reviewed-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
We dereference r10_bio->read_slot too many times in fix_read_error().
Optimize it by using a variable to store read_slot.
Signed-off-by: Li Nan <[email protected]>
Reviewed-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
New disk should be added to "removed" position first instead of to be a
replacement. Commit 6090368abcb4 ("md/raid10: prioritize adding disk to
'removed' mirror") has fixed this issue for raid10. Fix it for raid1 now.
Signed-off-by: Li Nan <[email protected]>
Reviewed-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
|
|
use md_account_bio() to enable io accounting, also make sure
mddev_suspend() will wait for all io to be done.
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
use md_account_bio() to enable io accounting, also make sure
mddev_suspend() will wait for all io to be done.
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
use md_account_bio() to enable io accounting, also make sure
mddev_suspend() will wait for all io to be done.
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Make sure that 'active_io' will represent inflight io instead of io that
is dispatching, and io accounting from all levels will be consistent.
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Two problems can be fixed this way:
1) 'active_io' will represent inflight io instead of io that is
dispatching.
2) If io accounting is enabled or disabled while io is still inflight,
bio_start_io_acct() and bio_end_io_acct() is not balanced and io
inflight counter will be leaked.
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Io will only be accounted as done from raid5_align_endio() if the io
succeeded, and io inflight counter will be leaked if such io failed.
Fix this problem by switching to use md_account_bio() for io accounting.
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Currently, 'active_io' is grabbed before make_reqeust() is called, and
it's dropped immediately make_reqeust() returns. Hence 'active_io'
actually means io is dispatching, not io is inflight.
For raid0 and raid456 that io accounting is enabled, 'active_io' will
also be grabbed when bio is cloned for io accounting, and this 'active_io'
is dropped until io is done.
Always clone new bio so that 'active_io' will mean that io is inflight,
raid1 and raid10 will switch to use this method in later patches.
Now that bio will be cloned even if io accounting is disabled, also
rename related structure from '*_acct_*' to '*_clone_*'.
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
'io_acct_set' is only used for raid0 and raid456, prepare to use it for
raid1 and raid10, so that io accounting from different levels can be
consistent.
By the way, follow up patches will also use this io clone mechanism to
make sure 'active_io' represents in flight io, not io that is dispatching,
so that mddev_suspend will wait for io to be done as designed.
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The support for bitmaps on files is a very bad idea abusing various kernel
APIs, and fundamentally requires the file to not be on the actual array
without a way to check that this is actually the case. Add a deprecation
warning to see if we might be able to eventually drop it.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The support for write intent bitmaps in files on an external files in md
is a hot mess that abuses ->bmap to map file offsets into physical device
objects, and also abuses buffer_heads in a creative way.
Make this code optional so that MD can be built into future kernels
without buffer_head support, and so that we can eventually deprecate it.
Note this does not affect the internal bitmap support, which has none of
the problems.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The md driver allocates pages for storing the bitmap file data, which
are not page cache pages, and then stores the page granularity file
offset in page->index, which is a field that isn't really valid except
for page cache pages.
Use a separate index for the superblock, and use the scheme used at
read size to recalculate the index for the bitmap pages instead.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Diretly apply mddev->bitmap_info.offset to the sector number to read
instead of doing that in both callers.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Convert read_sb_page to the normal kernel coding style, calculate the
target sector only once, and add a local iosize variable to make the call
to sync_page_io more readable.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Split the confusing loop in md_bitmap_init_from_disk that iterates over
all chunks but also needs to read and map the pages into three separate
loops: one that iterates over the pages to read them, a second optional
one to iterate over the pages to mark them invalid if the bitmaps are
out of date, and a final one that actually iterates over the chunks.
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Make the difference to read_sb_page clear.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Split the file write code out of write_page into a separate helper.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Don't bother allocating an extra buffer in the I/O failure handler and
instead use the printk built-in format to print the last 4 path name
components.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Just a small tidyup to prepare for bigger changes.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Set BITMAP_WRITE_ERROR directly in write_sb_page instead of propagating
the error to the caller and setting it there.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
For md_check_recovery():
1) if 'MD_RECOVERY_RUNING' is not set, register new sync_thread.
2) if 'MD_RECOVERY_RUNING' is set:
a) if 'MD_RECOVERY_DONE' is not set, don't do anything, wait for
md_do_sync() to be done.
b) if 'MD_RECOVERY_DONE' is set, unregister sync_thread. Current code
expects that sync_thread is not NULL, otherwise new sync_thread will
be registered, which will corrupt the array.
Make sure md_check_recovery() won't register new sync_thread if
'MD_RECOVERY_RUNING' is still set, and a new WARN_ON_ONCE() is added for
the above corruption,
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
md_reap_sync_thread() is just replaced with wait_event(resync_wait, ...)
from action_store(), just make sure action_store() will still wait for
everything to be done in md_reap_sync_thread().
Signed-off-by: Yu Kuai <[email protected]>
Reviewd-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|