Age | Commit message (Collapse) | Author | Files | Lines |
|
->free_disk is only called after disk is added successfully, so
drop ublk device reference in case of add_disk() failure.
Fixes: 6d9e6dfdf3b2 ("ublk: defer disk allocation")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Each ublk queue is started before adding disk, we have to cancel queues in
ublk_stop_dev() so that ubq daemon can be exited, otherwise DEL_DEV command
may hang forever.
Also avoid to cancel queues two times by checking if queue is ready,
otherwise use-after-free on io_uring may be triggered because ublk_stop_dev
is called by ublk_remove() too.
Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The size being added to a bio from an iov is aligned to a block size
after the pages were gotten. If the new aligned size truncates the last
page, its reference was being leaked. Ensure all pages that were not
added to the bio have their reference released.
Since this essentially requires doing the same that bio_put_pages(), and
there was only one caller for that function, this patch makes the
put_page() loop common for everyone.
Fixes: b1a000d3b8ec5 ("block: relax direct io memory alignment")
Reported-by: Al Viro <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Adding the page could fail on the bio_full() condition, which checks for
either exceeding the bio's max segments or total size exceeding
UINT_MAX. We already ensure the max segments can't be exceeded, so just
ensure the total size won't reach the limit. This simplifies error
handling and removes unnecessary repeated bio_full() checks.
Signed-off-by: Keith Busch <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
There are cases where a bio may not accept additional pages, and the iov
needs to advance to the last data length that was accepted. The zone
append used to handle this correctly, but was inadvertently broken when
the setup was made common with the normal r/w case.
Fixes: 576ed9135489c ("block: use bio_add_page in bio_iov_iter_get_pages")
Fixes: c58c0074c54c2 ("block/bio: remove duplicate append pages code")
Signed-off-by: Keith Busch <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
In line 2884, "raid5_release_stripe(sh);" drops the reference to sh and
may cause sh to be released. However, sh is subsequently used in lines
2886 "if (sh->batch_head && sh != sh->batch_head)". This may result in an
use-after-free bug.
It can be fixed by moving "raid5_release_stripe(sh);" to the bottom of
the function.
Signed-off-by: Wentao_Liang <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
A race condition exists where if raid5_quiesce() is called in the
middle of a request that has set batch_last, it will deadlock.
batch_last will hold a reference to a stripe when raid5_quiesce() is
called. This will cause the next raid5_get_active_stripe() call to
sleep waiting for the quiesce to finish, but the raid5_quiesce() thread
will wait for active_stripes to go to zero which will never happen
because request thread is waiting for the quiesce to stop.
Fix this by creating a special __raid5_get_active_stripe() function
which takes the request context and clears the last_batch before
sleeping.
While we're at it, change the arguments of raid5_get_active_stripe()
to bools.
Fixes: 3312e6c887fe ("md/raid5: Keep a reference to last stripe_head for batch")
Reported-by: David Sloan <[email protected]>
Signed-off-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Move stripe_request_ctx up. No functional changes intended.
This will be necessary in the next patch to release the batch_last
in the context before sleeping.
Signed-off-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Now that raid5_get_active_stripe() has been refactored it is appearant
that r5c_check_stripe_cache_usage() doesn't need to be called in
the wait_for_stripe branch.
r5c_check_stripe_cache_usage() will only conditionally call
r5l_wake_reclaim(), but that function is called two lines later.
Drop the call for cleanup.
Reported-by: Martin Oliveira <[email protected]>
Signed-off-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The logic to wait_for_stripe is difficult to parse being on so many
lines and with confusing operator precedence. Move it to a helper
function to make it easier to read.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Refactor the raid5_get_active_stripe() to read more linearly in
the order it's typically executed.
The init_stripe() call is called if a free stripe is found and the
function is exited early which removes a lot of if (sh) checks and
unindents the following code.
Remove the while loop in favour of the 'goto retry' pattern, which
reduces indentation further. And use a 'goto wait_for_stripe' instead
of an additional indent seeing it is the unusual path and this makes
the code easier to read.
No functional changes intended. Will make subsequent changes
in patches easier to understand.
Signed-off-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Allow using the splitting helpers on just a queue_limits instead of
a full request_queue structure. This will eventually allow file systems
or remapping drivers to split REQ_OP_ZONE_APPEND bios based on limits
calculated as the minimum common capabilities over multiple devices.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Move this helper into the only file where it is used.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Prepare for reusing blk_bio_segment_split for (file system controlled)
splits of REQ_OP_ZONE_APPEND bios by letting the caller control the
maximum size of the bio.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Only non-passthrough requests are split by the block layer and use the
->bio_split bio_set. Move it from the request_queue to the gendisk.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The double indirect bio leads to somewhat suboptimal code generation.
Instead return the (original or split) bio, and make sure the
request_queue arguments to the lower level helpers is passed after the
bio to avoid constant reshuffling of the argument passing registers.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The double indirect bio leads to somewhat suboptimal code generation.
Instead return the (original or split) bio, and make sure the
request_queue arguments to the lower level helpers is passed after the
bio to avoid constant reshuffling of the argument passing registers.
Also give it and the helpers used to implement it more descriptive names.
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add the common subdirectory and match all nvme* headers in
include/linux/.
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We probably need nvmet_tcp_wq to have MEM_RECLAIM as we are
sending/receiving for the socket from works on this workqueue.
Also this eliminates lockdep complaints:
--
[ 6174.010200] workqueue: WQ_MEM_RECLAIM
nvmet-wq:nvmet_tcp_release_queue_work [nvmet_tcp] is flushing
!WQ_MEM_RECLAIM nvmet_tcp_wq:nvmet_tcp_io_work [nvmet_tcp]
[ 6174.010216] WARNING: CPU: 20 PID: 14456 at kernel/workqueue.c:2628
check_flush_dependency+0x110/0x14c
Reported-by: Yi Zhang <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Extend nvme_alloc_ns() and nvme_validate_ns() for unknown command-set as
well. Both are made to use a new helper (nvme_update_ns_info_cs_indep)
which is similar to nvme_update_ns_info but performs fewer operations
to get the generic interface up.
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Joel Granados <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
[hch: rebased on other refactoring patches]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Javier González <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add a little helper to check if a namespace should be marked read-only
that uses a new is_readonly flag in the nvme_ns_info structure.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Javier González <[email protected]>
Reviewed-by: Joel Granados <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Change nvme_ns_scan to gather all information needed for generic
namespace setup into a nvme_ns_info structure. This structure is filled
from the Command Set Idependent Identify Namespace data structure if
it is available or else the legacy Identify namespace structure.
With that everything related to the NVM command set (and the ZNS command
set derived from it) can be encapsulated in the nvme_update_ns_info_block
function while keeping the rest of the namespace probing generic.
The downside is that we now always issue two Identify Namespace calls for
each probed namespace instead of usually just a single one previously.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Javier González <[email protected]>
Reviewed-by: Joel Granados <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Check for multiple command set support early on an error out if is
not supported when a !NVM command set namespace is found. This
prepares for adding command set independent passthrough support.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Javier González <[email protected]>
Reviewed-by: Joel Granados <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This shorter name much better fits what this function does in
the scanning process.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Javier González <[email protected]>
Reviewed-by: Joel Granados <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
nvme_revalidate_zones can also return -ENODEV if e.g. zone sizes aren't
constant or not a power of two. In that case we should jump to marking
the gendisk hidden and only support pass through.
Fixes: 602e57c9799c ("nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info")
Reported-by: Joel Granados <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Joel Granados <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 71ebe3842ebe ("nvmet-auth: Diffie-Hellman key exchange support")
intends to select 'Support for RFC 7919 FFDHE group parameters' for using
FFDHE groups for NVMe In-Band Authentication.
It however selects CRYPTO_DH_GROUPS_RFC7919, instead of the intended
CRYPTO_DH_RFC7919_GROUPS; notice the swapping of words here.
Correct the select to the intended config option.
Signed-off-by: Lukas Bulwahn <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
nvmet_auth_challenge() return type is int and currently it uses status
variable that is of type u16 in nvmet_execute_auth_receive().
Catch the return value of nvmet_auth_challenge() into int and set the
NVME_SC_INTERNAL as status variable before we jump to error.
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
nvmet_setup_auth() return type is int and currently it uses status
variable that is of type u16 in nvmet_execute_auth_send().
Catch the return value of nvmet_setup_auth() into int and set the
NVME_SC_INTERNAL as status variable before we jump to error.
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
There are a couple of spelling mistakes in pr_warn and pr_debug messages.
Fix them.
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
dh_keysize is a size_t, use the proper format specifier for printing it.
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
And add an empty line after the variable declaration.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Casting function pointers breaks control flow enforcement and is
generally a horrible coding style.
Add two wrappers to get rid of these casts.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Sven Peter <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Split nvme_tcp_alloc_tagset into one helper for the admin tag_set and
one for the I/O tag set.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Split nvme_rdma_alloc_tagset into one helper for the admin tag_set and
one for the I/O tag set.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Split nvme_dev_add into a helper to actually allocate the tag set, and
one that just update the number of queues. Add a local variable for
the tag_set to clean up the code a bit.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Split nvme_alloc_admin_tags into a helper to actually allocate the
tag set, and one that just restarts the admin queue. Add a local
variable for the tag_set to clean up the code a bit.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
To allow for slightly better debugging, print the command name when
aborting an command.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If prp_list is NULL, nvme_unmap_sg will be performed, and the assignment
to first_dma is meaningless, so remove it.
Signed-off-by: Liu Song <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
A couple of the early error gotos call kfree_sensitive(transformed_key);
before "transformed_key" has been initialized.
Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The > ARRAY_SIZE() checks need to be >= ARRAY_SIZE() to prevent reading
one element beyond the end of the arrays.
Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 89b3d6e60550 ("nvme: simplify the compat ioctl handling") removed
the initialization of compat_ioctl from the nvme block_device_operations
structures.
Presumably the expectation was that 32-bit ioctls would be directed
through the regular handler but this is not the case: failing to assign
.compat_ioctl actually means that the compat case is disabled entirely,
and any attempt to submit nvme ioctls from 32-bit userspace fails
outright with -ENOTTY.
For example:
% smartctl -x /dev/nvme0n1
[...]
Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Inappropriate ioctl for device
The blkdev_compat_ptr_ioctl helper can be used to direct compat calls
through the main ioctl handler and makes things work again.
Fixes: 89b3d6e60550 ("nvme: simplify the compat ioctl handling")
Signed-off-by: Nick Bowler <[email protected]>
Reviewed-by: Guixin Liu <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The entire content of constants.c if guarded by an ifdef, so switch to
just building the file conditionally instead.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use command_id instead of req->tag in trace_nvme_complete_rq(),
because of commit e7006de6c238 ("nvme: code command_id with a genctr
for use authentication after release"), cmd->common.command_id is set to
((genctl & 0xf)< 12 | req->tag), no longer req->tag, which makes cid in
trace_nvme_complete_rq and trace_nvme_setup_cmd are not the same.
Fixes: e7006de6c238 ("nvme: code command_id with a genctr for use authentication after release")
Signed-off-by: Bean Huo <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
There's a KASAN warning in raid10_remove_disk when running the lvm
test lvconvert-raid-reshape.sh. We fix this warning by verifying that the
value "number" is valid.
BUG: KASAN: slab-out-of-bounds in raid10_remove_disk+0x61/0x2a0 [raid10]
Read of size 8 at addr ffff889108f3d300 by task mdX_raid10/124682
CPU: 3 PID: 124682 Comm: mdX_raid10 Not tainted 5.19.0-rc6 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x45/0x57a
? __lock_text_start+0x18/0x18
? raid10_remove_disk+0x61/0x2a0 [raid10]
kasan_report+0xa8/0xe0
? raid10_remove_disk+0x61/0x2a0 [raid10]
raid10_remove_disk+0x61/0x2a0 [raid10]
Buffer I/O error on dev dm-76, logical block 15344, async page read
? __mutex_unlock_slowpath.constprop.0+0x1e0/0x1e0
remove_and_add_spares+0x367/0x8a0 [md_mod]
? super_written+0x1c0/0x1c0 [md_mod]
? mutex_trylock+0xac/0x120
? _raw_spin_lock+0x72/0xc0
? _raw_spin_lock_bh+0xc0/0xc0
md_check_recovery+0x848/0x960 [md_mod]
raid10d+0xcf/0x3360 [raid10]
? sched_clock_cpu+0x185/0x1a0
? rb_erase+0x4d4/0x620
? var_wake_function+0xe0/0xe0
? psi_group_change+0x411/0x500
? preempt_count_sub+0xf/0xc0
? _raw_spin_lock_irqsave+0x78/0xc0
? __lock_text_start+0x18/0x18
? raid10_sync_request+0x36c0/0x36c0 [raid10]
? preempt_count_sub+0xf/0xc0
? _raw_spin_unlock_irqrestore+0x19/0x40
? del_timer_sync+0xa9/0x100
? try_to_del_timer_sync+0xc0/0xc0
? _raw_spin_lock_irqsave+0x78/0xc0
? __lock_text_start+0x18/0x18
? _raw_spin_unlock_irq+0x11/0x24
? __list_del_entry_valid+0x68/0xa0
? finish_wait+0xa3/0x100
md_thread+0x161/0x260 [md_mod]
? unregister_md_personality+0xa0/0xa0 [md_mod]
? _raw_spin_lock_irqsave+0x78/0xc0
? prepare_to_wait_event+0x2c0/0x2c0
? unregister_md_personality+0xa0/0xa0 [md_mod]
kthread+0x148/0x180
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Allocated by task 124495:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x80/0xa0
setup_conf+0x140/0x5c0 [raid10]
raid10_run+0x4cd/0x740 [raid10]
md_run+0x6f9/0x1300 [md_mod]
raid_ctr+0x2531/0x4ac0 [dm_raid]
dm_table_add_target+0x2b0/0x620 [dm_mod]
table_load+0x1c8/0x400 [dm_mod]
ctl_ioctl+0x29e/0x560 [dm_mod]
dm_compat_ctl_ioctl+0x7/0x20 [dm_mod]
__do_compat_sys_ioctl+0xfa/0x160
do_syscall_64+0x90/0xc0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Last potentially related work creation:
kasan_save_stack+0x1e/0x40
__kasan_record_aux_stack+0x9e/0xc0
kvfree_call_rcu+0x84/0x480
timerfd_release+0x82/0x140
L __fput+0xfa/0x400
task_work_run+0x80/0xc0
exit_to_user_mode_prepare+0x155/0x160
syscall_exit_to_user_mode+0x12/0x40
do_syscall_64+0x42/0xc0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Second to last potentially related work creation:
kasan_save_stack+0x1e/0x40
__kasan_record_aux_stack+0x9e/0xc0
kvfree_call_rcu+0x84/0x480
timerfd_release+0x82/0x140
__fput+0xfa/0x400
task_work_run+0x80/0xc0
exit_to_user_mode_prepare+0x155/0x160
syscall_exit_to_user_mode+0x12/0x40
do_syscall_64+0x42/0xc0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
The buggy address belongs to the object at ffff889108f3d200
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 0 bytes to the right of
256-byte region [ffff889108f3d200, ffff889108f3d300)
The buggy address belongs to the physical page:
page:000000007ef2a34c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1108f3c
head:000000007ef2a34c order:2 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=2)
raw: 4000000000010200 0000000000000000 dead000000000001 ffff889100042b40
raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff889108f3d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff889108f3d280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff889108f3d300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff889108f3d380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff889108f3d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When we ran the lvm test "shell/integrity-blocksize-3.sh" on a kernel with
kasan, we got failure in write_page.
The reason for the failure is that md_bitmap_destroy is called before
destroying the thread and the thread may be waiting in the function
write_page for the bio to complete. When the thread finishes waiting, it
executes "if (test_bit(BITMAP_WRITE_ERROR, &bitmap->flags))", which
triggers the kasan warning.
Note that the commit 48df498daf62 that caused this bug claims that it is
neede for md-cluster, you should check md-cluster and possibly find
another bugfix for it.
BUG: KASAN: use-after-free in write_page+0x18d/0x680 [md_mod]
Read of size 8 at addr ffff889162030c78 by task mdX_raid1/5539
CPU: 10 PID: 5539 Comm: mdX_raid1 Not tainted 5.19.0-rc2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x45/0x57a
? __lock_text_start+0x18/0x18
? write_page+0x18d/0x680 [md_mod]
kasan_report+0xa8/0xe0
? write_page+0x18d/0x680 [md_mod]
kasan_check_range+0x13f/0x180
write_page+0x18d/0x680 [md_mod]
? super_sync+0x4d5/0x560 [dm_raid]
? md_bitmap_file_kick+0xa0/0xa0 [md_mod]
? rs_set_dev_and_array_sectors+0x2e0/0x2e0 [dm_raid]
? mutex_trylock+0x120/0x120
? preempt_count_add+0x6b/0xc0
? preempt_count_sub+0xf/0xc0
md_update_sb+0x707/0xe40 [md_mod]
md_reap_sync_thread+0x1b2/0x4a0 [md_mod]
md_check_recovery+0x533/0x960 [md_mod]
raid1d+0xc8/0x2a20 [raid1]
? var_wake_function+0xe0/0xe0
? psi_group_change+0x411/0x500
? preempt_count_sub+0xf/0xc0
? _raw_spin_lock_irqsave+0x78/0xc0
? __lock_text_start+0x18/0x18
? raid1_end_read_request+0x2a0/0x2a0 [raid1]
? preempt_count_sub+0xf/0xc0
? _raw_spin_unlock_irqrestore+0x19/0x40
? del_timer_sync+0xa9/0x100
? try_to_del_timer_sync+0xc0/0xc0
? _raw_spin_lock_irqsave+0x78/0xc0
? __lock_text_start+0x18/0x18
? __list_del_entry_valid+0x68/0xa0
? finish_wait+0xa3/0x100
md_thread+0x161/0x260 [md_mod]
? unregister_md_personality+0xa0/0xa0 [md_mod]
? _raw_spin_lock_irqsave+0x78/0xc0
? prepare_to_wait_event+0x2c0/0x2c0
? unregister_md_personality+0xa0/0xa0 [md_mod]
kthread+0x148/0x180
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Allocated by task 5522:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x80/0xa0
md_bitmap_create+0xa8/0xe80 [md_mod]
md_run+0x777/0x1300 [md_mod]
raid_ctr+0x249c/0x4a30 [dm_raid]
dm_table_add_target+0x2b0/0x620 [dm_mod]
table_load+0x1c8/0x400 [dm_mod]
ctl_ioctl+0x29e/0x560 [dm_mod]
dm_compat_ctl_ioctl+0x7/0x20 [dm_mod]
__do_compat_sys_ioctl+0xfa/0x160
do_syscall_64+0x90/0xc0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Freed by task 5680:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x40
kasan_set_free_info+0x20/0x40
__kasan_slab_free+0xf7/0x140
kfree+0x80/0x240
md_bitmap_free+0x1c3/0x280 [md_mod]
__md_stop+0x21/0x120 [md_mod]
md_stop+0x9/0x40 [md_mod]
raid_dtr+0x1b/0x40 [dm_raid]
dm_table_destroy+0x98/0x1e0 [dm_mod]
__dm_destroy+0x199/0x360 [dm_mod]
dev_remove+0x10c/0x160 [dm_mod]
ctl_ioctl+0x29e/0x560 [dm_mod]
dm_compat_ctl_ioctl+0x7/0x20 [dm_mod]
__do_compat_sys_ioctl+0xfa/0x160
do_syscall_64+0x90/0xc0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop")
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Two callers of md_alloc want to use the newly allocated devices, so
return it instead of letting them find it cumbersomely after the
allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-and-tested-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
autorun_devices should not be limited to the controls for the legacy
probe on open, so just call md_alloc directly.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-and-tested-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Eliminate the following coccicheck warning:
./drivers/md/md.c:8208:2-3: Unneeded semicolon
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This driver is for fairly obscure hardware, and has only seen random
drive-by changes after the maintainer stopped working on it in 2005
(about a year and a half after it was introduced). It has some
"interesting" block layer interactions, so let's just drop it unless
anyone complains.
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[axboe: fix date typo, it was in 2005, not 2015]
Signed-off-by: Jens Axboe <[email protected]>
|
|
After merging the block tree, today's linux-next build (x86_64
allmodconfig) failed like this:
drivers/md/md.c:717:22: error: 'mddev_find' defined but not used [-Werror=unused-function]
717 | static struct mddev *mddev_find(dev_t unit)
| ^~~~~~~~~~
cc1: all warnings being treated as errors
Caused by commit
4500d5c17910 ("md: simplify md_open")
Make mddev_find() available only for non-modular builds.
Signed-off-by: Stephen Rothwell <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|