Age | Commit message (Collapse) | Author | Files | Lines |
|
More efficient way to iterate an array due to prefetching (makes use of
the new dm_btree_cursor_* api).
Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
This uses prefetching to speed up iteration through a btree.
Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
For smq the 32 bit 'hint' stores the multiqueue level that the entry
should be stored in. If a different policy has been used previously,
and then switched to smq, the hints will be invalid. In which case we
used to put all entries in the bottom level of the multiqueue, and then
redistribute. Redistribution is faster if we put entries with invalid
hints in random levels initially.
Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
It's far quicker to always delete the hint array and recreate with
dm_array_new() because we avoid the copying caused by mutation.
Also simplifies the policy interface, replacing the walk_hints() with
the simpler get_hint().
Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
dm_array_new() creates a new, populated array more efficiently than
starting with an empty one and resizing.
Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Return DM_MAPIO_DELAY_REQUEUE from .clone_and_map_rq. Also, return
false from .busy, if all paths are down, so that blk-mq requests get
mapped via .clone_and_map_rq -- which results in DM_MAPIO_DELAY_REQUEUE
being returned to dm-rq.
This change allows for a noticeable reduction in cpu utilization
(reduced kworker load) while all paths are down, e.g.:
system CPU idleness (as measured by fio's --idle-prof=system):
before: system: 86.58%
after: system: 98.60%
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
|
|
When reinstating a path the blk-mq request_queue's requeue_list should
get kicked. It makes sense to kick the requeue_list as part of the
existing hook (previously only used by bio-based support).
Rename process_queued_bios_list to process_queued_io_list.
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
|
|
Make it possible for a request-based target to kick the DM device's
blk-mq request_queue's requeue_list.
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
|
|
dm_requeue_original_request()
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
|
|
Otherwise blk-mq will immediately dispatch requests that are requeued
via a BLK_MQ_RQ_QUEUE_BUSY return from blk_mq_ops .queue_rq.
Delayed requeue is implemented using blk_mq_delay_kick_requeue_list()
with a delay of 5 secs. In the context of DM multipath (all paths down)
it doesn't make any sense to requeue more quickly.
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Use autoremove_wake_function() instead of default_wake_function()
to make the dm wait loops more similar to other wait loops in the
kernel. This patch does not change any functionality.
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Use signal_pending_state() instead of open-coding it. This patch does
not change any functionality but makes it possible to pass TASK_KILLABLE
as the second argument of dm_wait_for_completion(). See also commit
16882c1e962b ("sched: fix TASK_WAKEKILL vs SIGKILL race").
Signed-off-by: Bart Van Assche <[email protected]>.
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Rename 'interruptible' into 'task_state' to make it clear that this
argument is a task state instead of a boolean. Also, change type from
int to long.
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Document the locking assumptions for the __bind() and __dm_suspend()
functions.
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
If pg_init_retries is set and a request is queued against a multipath
device with all underlying block device request_queues in the "dying"
state then an infinite loop is triggered because activate_path() never
succeeds and hence never calls pg_init_done().
This change avoids that device removal triggers an infinite loop by
failing the activate_path() which causes the "dying" path to be failed.
Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]
|
|
Every call of queue_flag_clear_unlocked() after block device
initialization has finished is wrong if blk_cleanup_queue() can be
called concurrently. Convert queue_flag_clear_unlocked() into
queue_flag_clear() and protect it by the block layer queue lock.
Also, factor out dm_mq_start_queue().
Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]
|
|
Also, check that the blk-mq request_queue isn't already stopped.
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
This avoids that new requests are queued while __dm_destroy() is in
progress.
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]
|
|
dm_resume() will return success (0) rather than -EINVAL if
!dm_suspended_md() upon retry within dm_resume().
Reset the error code at the start of dm_resume()'s retry loop.
Also, remove a useless assignment at the end of dm_resume().
Fixes: ffcc393641 ("dm: enhance internal suspend and resume interface")
Cc: [email protected] # 3.19+
Signed-off-by: Minfei Huang <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
blk_mq_delay_kick_requeue_list() provides the ability to kick the
q->requeue_list after a specified time. To do this the request_queue's
'requeue_work' member was changed to a delayed_work.
blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
requests while it doesn't make sense to immediately requeue them
(e.g. when all paths in a DM multipath have failed).
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Since REQ_OP_BITS == 3 and __REQ_NR_BITS == 30 it is not that hard
to pass an op_flags argument to bio_set_op_attrs() that is larger
than the number of bits reserved for the op_flags argument. Complain
if this happens. Additionally, ensure that negative arguments trigger
a complaint (1 << ... is signed while 1U << ... is unsigned; adding
0U to an integer expression causes it to be promoted to an unsigned
type).
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Introduce the bio_flags() macro. Ensure that the second argument of
bio_set_op_attrs() only contains flags and no operation. This patch
does not change any functionality.
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Chris Mason <[email protected]> (maintainer:BTRFS FILE SYSTEM)
Cc: Josef Bacik <[email protected]> (maintainer:BTRFS FILE SYSTEM)
Cc: Mike Snitzer <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Make it clear that the sizeof(unsigned int) expression in BIO_OP_SHIFT
refers to the bi_opf member of struct bio.
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
commit e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1
"block: Do away with the notion of hardsect_size"
removed the notion of "hardware sector size" from
the kernel in favor of logical block size, but
references remain in comments and documentation.
Update the remaining sites mentioning hardsect.
Signed-off-by: Linus Walleij <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The blk_mq_alloc_single_hw_queue() is a prototype artifact that
should have been removed with
commit cdef54dd85ad66e77262ea57796a3e81683dd5d6
"blk-mq: remove alloc_hctx and free_hctx methods" where the last
users of it were deleted.
Fixes: cdef54dd85ad ("blk-mq: remove alloc_hctx and free_hctx methods")
Signed-off-by: Linus Walleij <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
DAX support for block devices was removed in commits 03cdad
("block: disable block device DAX by default") and 99a01cd
("block: remove BLK_DEV_DAX config option"), but we still kept a call to
dax_do_io and some uneeded i_flags manipulations introduced in commit
bbab37 ("block: Add support for DAX reads/writes to block devices").
Remove those leftovers.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Dan Williams <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Allow the io_poll statistics to be zeroed to make for easier logging
of polling event.
Signed-off-by: Stephen Bates <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In order to help determine the effectiveness of polling in a running
system it is usful to determine the ratio of how often the poll
function is called vs how often the completion is checked. For this
reason we add a poll_considered variable and add it to the sysfs entry
for io_poll.
Signed-off-by: Stephen Bates <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of rolling our own timer, just utilize the blk mq req timeout and do the
disconnect if any of our commands timeout.
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In preparation for some future changes, change a few of the state bools over to
normal bits to set/clear properly.
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We hit a warning when shutting down the nbd connection because we have irq's
disabled. We don't really need to do the shutdown under the lock, just clear
the nbd->sock. So do the shutdown outside of the irq. This gets rid of the
warning.
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This moves NBD over to using blkmq, which allows us to get rid of the NBD
wide queue lock and the async submit kthread. We will start with 1 hw
queue for now, but I plan to add multiple tcp connection support in the
future and we'll fix how we set the hwqueue's.
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We get 1 warning when biuld kernel with W=1:
drivers/block/mtip32xx/mtip32xx.c:3689:6: warning: no previous prototype for
'mtip_block_release' [-Wmissing-prototypes]
In fact, this function is only used in the file in which it is declared
and don't need a declaration, but can be made static.
so this patch marks it 'static'.
Signed-off-by: Baoyou Xie <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When drivers or the core calls this function, they usually
dereference the request shortly there after. Prefetch the first
cache line.
Profiling IO workloads shows that this is the most common cache
miss on the block side of things.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Various cache line optimizations:
- Move delay_work towards the end. It's huge, and we don't use it
a lot (only SCSI).
- Move the atomic state into the same cacheline as the the dispatch
list and lock.
- Rearrange a few members to pack it better.
- Shrink the max-order for dispatch accounting from 10 to 7. This
means that ->dispatched[] and ->run now take up their own
cacheline.
This shrinks struct blk_mq_hw_ctx down to 8 cachelines.
Signed-off-by: Jens Axboe <[email protected]>
|
|
We don't need the larger delayed work struct, since we always run it
immediately.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add a helper to schedule a regular struct work on a particular CPU.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Like cancel_delayed_work(), but for regular work.
Signed-off-by: Jens Axboe <[email protected]>
Mehed-by: Tejun Heo <[email protected]>
Acked-by: Tejun Heo <[email protected]>
|
|
|
|
Pull drm fixes from Dave Airlie:
"A bunch of fixes covering i915, amdgpu, one tegra and some core DRM
ones. Nothing too strange at this point"
* tag 'drm-fixes-for-4.8-rc4' of git://people.freedesktop.org/~airlied/linux: (21 commits)
drm/atomic: Don't potentially reset color_mgmt_changed on successive property updates.
drm: Protect fb_defio in drivers with CONFIG_KMS_FBDEV_EMULATION
drm/amdgpu: skip TV/CV in display parsing
drm/amdgpu: avoid a possible array overflow
drm/amdgpu: fix lru size grouping v2
drm/tegra: dsi: Enhance runtime power management
drm/i915: Fix botched merge that downgrades CSR versions.
drm/i915/skl: Ensure pipes with changed wms get added to the state
drm/i915/gen9: Only copy WM results for changed pipes to skl_hw
drm/i915/skl: Add support for the SAGV, fix underrun hangs
drm/i915/gen6+: Interpret mailbox error flags
drm/i915: Reattach comment, complete type specification
drm/i915: Unconditionally flush any chipset buffers before execbuf
drm/i915/gen9: Drop invalid WARN() during data rate calculation
drm/i915/gen9: Initialize intel_state->active_crtcs during WM sanitization (v2)
drm: Reject page_flip for !DRIVER_MODESET
drm/amdgpu: fix timeout value check in amd_sched_job_recovery
drm/amdgpu: fix sdma_v2_4_ring_test_ib
drm/amdgpu: fix amdgpu_move_blit on 32bit systems
drm/radeon: fix radeon_move_blit on 32bit systems
...
|
|
property updates.
Due to assigning the 'replaced' value instead of or'ing it,
if drm_atomic_crtc_set_property() gets called multiple times,
the last call will define the color_mgmt_changed flag, so
a non-updating call to a property can reset the flag and
prevent actual hw state updates required by preceding
property updates.
Signed-off-by: Mario Kleiner <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: <[email protected]> # v4.6+
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A few fixes from the perf departement
- prevent a imbalanced preemption disable in the events teardown code
- prevent out of bound acces in perf userspace
- make perf tools compile with UCLIBC again
- a fix for the userspace unwinder utility"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Use this_cpu_ptr() when stopping AUX events
perf evsel: Do not access outside hw cache name arrays
tools lib: Reinstate strlcpy() header guard with __UCLIBC__
perf unwind: Use addr_location::addr instead of ip for entries
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single bugfix to prevent irq remapping when the ioapic is disabled"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Do not init irq remapping if ioapic is disabled
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"This lot provides:
- plug a hotplug race in the new affinity infrastructure
- a fix for the trigger type of chained interrupts
- plug a potential memory leak in the core code
- a few fixes for ARM and MIPS GICs"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Implement activate op for device domain
irqchip/mips-gic: Cleanup chip and handler setup
genirq/affinity: Use get/put_online_cpus around cpumask operations
genirq: Fix potential memleak when failing to get irq pm
irqchip/gicv3-its: Disable the ITS before initializing it
irqchip/gicv3: Remove disabling redistributor and group1 non-secure interrupts
irqchip/gic: Allow self-SGIs for SMP on UP configurations
genirq: Correctly configure the trigger on chained interrupts
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A few updates for timers & co:
- prevent a livelock in the timekeeping code when debugging is
enabled
- prevent out of bounds access in the timekeeping debug code
- various fixes in clocksource drivers
- a new maintainers entry"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe function
drivers/clocksource/pistachio: Fix memory corruption in init
clocksource/drivers/timer-atmel-pit: Enable mck clock
clocksource/drivers/pxa: Fix include files for compilation
MAINTAINERS: Add ARM ARCHITECTED TIMER entry
timekeeping: Cap array access in timekeeping_debug
timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
|
|
Pull KVM fixes from Paolo Bonzini:
"ARM:
- fixes for ITS init issues, error handling, IRQ leakage, race
conditions
- an erratum workaround for timers
- some removal of misleading use of errors and comments
- a fix for GICv3 on 32-bit guests
MIPS:
- fix for where the guest could wrongly map the first page of
physical memory
x86:
- nested virtualization fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
MIPS: KVM: Check for pfn noslot case
kvm: nVMX: fix nested tsc scaling
KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
arm64: KVM: report configured SRE value to 32-bit world
arm64: KVM: remove misleading comment on pmu status
KVM: arm/arm64: timer: Workaround misconfigured timer interrupt
arm64: Document workaround for Cortex-A72 erratum #853709
KVM: arm/arm64: Change misleading use of is_error_pfn
KVM: arm64: ITS: avoid re-mapping LPIs
KVM: arm64: check for ITS device on MSI injection
KVM: arm64: ITS: move ITS registration into first VCPU run
KVM: arm64: vgic-its: Make updates to propbaser/pendbaser atomic
KVM: arm64: vgic-its: Plug race in vgic_put_irq
KVM: arm64: vgic-its: Handle errors from vgic_add_lpi
KVM: arm64: ITS: return 1 on successful MSI injection
|