Age | Commit message (Collapse) | Author | Files | Lines |
|
Overflowed requests in io_uring_cancel_files() should be shed only of
inflight and overflowed refs. All other left references are owned by
someone else.
If refcount_sub_and_test() fails, it will go further and put put extra
ref, don't do that. Also, don't need to do io_wq_cancel_work()
for overflowed reqs, they will be let go shortly anyway.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Offset timeouts wait not for sqe->off non-timeout CQEs, but rather
sqe->off + number of prior inflight requests. Wait exactly for
sqe->off non-timeout completions
Reported-by: Jens Axboe <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Separate flushing offset timeouts io_commit_cqring() by moving it into a
helper. Just a preparation, makes following patches clearer.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The io_uring interfaces have been replaced by do_statx() and are no
longer needed.
Signed-off-by: Bijan Mottahedeh <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Calling statx directly both simplifies the interface and avoids potential
incompatibilities between sync and async invokations.
Signed-off-by: Bijan Mottahedeh <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This is a prepatory patch to allow io_uring to invoke statx directly.
Signed-off-by: Bijan Mottahedeh <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Separate statx data from open in io_kiocb. No functional changes.
Signed-off-by: Bijan Mottahedeh <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
io_close() was punting async manually to skip grabbing files. Use
REQ_F_NO_FILE_TABLE instead, and pass it through the generic path
with -EAGAIN.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
io_commit_cqring() assembly doesn't look good with extra code handling
drained requests. IOSQE_IO_DRAIN is slow and discouraged to be used in
a hot path, so try to minimise its impact by putting it into a helper
and doing a fast check.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
SQEs are user writable, don't read sqe->off twice in io_timeout_prep()
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Move spin_lock_irq() earlier to have only 1 call site of it in
io_timeout(). It makes the flow easier.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In io_uring_cancel_files(), after refcount_sub_and_test() leaves 0
req->refs, it calls io_put_req(), which would also put a ref. Call
io_free_req() instead.
Cc: [email protected]
Fixes: 2ca10259b418 ("io_uring: prune request from overflow list on flush")
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait
for sq thread to idle by busy loop:
while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
cond_resched();
Above loop isn't very CPU friendly, it may introduce a short cpu burst on
the current cpu.
If ctx->refs is dying, we forbid sq_thread from submitting any further
SQEs. Instead they just get discarded when we exit.
Signed-off-by: Xiaoguang Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If the request is still hashed in io_async_task_func(), then it cannot
have been canceled and it's pointless to check. So save that check.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add IORING_OP_TEE implementing tee(2) support. Almost identical to
splice bits, but without offsets.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
export do_tee() for use in io_uring
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
req->flags stores all sqe->flags. After checking that sqe->flags are
valid set if IOSQE* flags, no need to double check it, just forward them
all.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
io_file_put() deals with flushing state's file refs, adding "state" to
its name makes it a bit clearer. Also, avoid double check of
state->file in __io_file_get() in some cases.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
A submission is "async" IIF it's done by SQPOLL thread. Instead of
passing @async flag into io_submit_sqes(), deduce it from ctx->flags.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We only need apoll in the one section, do the juggling with the work
restoration there. This removes a special case further down as well.
No functional changes in this patch.
Signed-off-by: Jens Axboe <[email protected]>
|
|
There's no point in using list_del_init() on entries that are going
away, and the associated lock is always used in process context so
let's not use the IRQ disabling+saving variant of the spinlock.
Signed-off-by: Jens Axboe <[email protected]>
|
|
This new flag should be set/clear from the application to
disable/enable eventfd notifications when a request is completed
and queued to the CQ ring.
Before this patch, notifications were always sent if an eventfd is
registered, so IORING_CQ_EVENTFD_DISABLED is not set during the
initialization.
It will be up to the application to set the flag after initialization
if no notifications are required at the beginning.
Signed-off-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch adds the new 'cq_flags' field that should be written by
the application and read by the kernel.
This new field is available to the userspace application through
'cq_off.flags'.
We are using 4-bytes previously reserved and set to zero. This means
that if the application finds this field to zero, then the new
functionality is not supported.
In the next patch we will introduce the first flag available.
Signed-off-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Some file descriptors use separate waitqueues for their f_ops->poll()
handler, most commonly one for read and one for write. The io_uring
poll implementation doesn't work with that, as the 2nd poll_wait()
call will cause the io_uring poll request to -EINVAL.
This affects (at least) tty devices and /dev/random as well. This is a
big problem for event loops where some file descriptors work, and others
don't.
With this fix, io_uring handles multiple waitqueues.
Signed-off-by: Jens Axboe <[email protected]>
|
|
We currently embed and queue a work item per fixed_file_ref_node that
we update, but if the workload does a lot of these, then the associated
kworker-events overhead can become quite noticeable.
Since we rarely need to wait on these, batch them at 1 second intervals
instead. If we do need to wait for them, we just flush the pending
delayed work.
Signed-off-by: Jens Axboe <[email protected]>
|
|
We used to have three completions, now we just have two. With the two,
let's not allocate them dynamically, just embed then in the ctx and
name them appropriately.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Remove duplicate semicolon at the end of line in io_file_from_index()
Signed-off-by: Xiaoming Ni <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The "struct io_submit_state *state" parameter is not used, remove it.
Signed-off-by: Xiaoguang Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The attempt protecting us from closing the ring itself wasn't really
complete, and we actually don't need it. The referencing of requests
themselve, and the references they hold on the ring, ensures that the
life time of the ring is sane. With the check removed, we can also
remove the need to have the close operation fget() the file.
Reported-by: Al Viro <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We currently make some guesses as when to open this fd, but in reality
we have no business (or need) to do so at all. In fact, it makes certain
things fail, like O_PATH.
Remove the fd lookup from these opcodes, we're just passing the 'fd' to
generic helpers anyway. With that, we can also remove the special casing
of fd values in io_req_needs_file(), and the 'fd_non_neg' check that
we have. And we can ensure that we only read sqe->fd once.
This fixes O_PATH usage with openat/openat2, and ditto statx path side
oddities.
Cc: [email protected]: # v5.6
Reported-by: Max Kellermann <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
do_splice() is used by io_uring, as will be do_tee(). Move f_mode
checks from sys_{splice,tee}() to do_{splice,tee}(), so they're
enforced for io_uring as well.
Fixes: 7d67af2c0134 ("io_uring: add splice(2) support")
Reported-by: Jann Horn <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If copy_to_user() in io_uring_setup() failed, we'll leak many kernel
resources, which will be recycled until process terminates. This bug
can be reproduced by using mprotect to set params to PROT_READ. To fix
this issue, refactor io_uring_create() a bit to add a new 'struct
io_uring_params __user *params' parameter and move the copy_to_user()
in io_uring_setup() to io_uring_setup(), if copy_to_user() failed,
we can free kernel resource properly.
Suggested-by: Jens Axboe <[email protected]>
Signed-off-by: Xiaoguang Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The prepare_to_wait() and finish_wait() calls in io_uring_cancel_files()
are mismatched. Currently I don't see any issues related this bug, just
find it by learning codes.
Signed-off-by: Xiaoguang Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull more btrfs fixes from David Sterba:
"A few more stability fixes, minor build warning fixes and git url
fixup:
- fix partial loss of prealloc extent past i_size after fsync
- fix potential deadlock due to wrong transaction handle passing via
journal_info
- fix gcc 4.8 struct intialization warning
- update git URL in MAINTAINERS entry"
* tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
MAINTAINERS: btrfs: fix git repo URL
btrfs: fix gcc-4.8 build warning for struct initializer
btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info
btrfs: fix partial loss of prealloc extent past i_size after fsync
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- Fix a memory leak when dev_iommu gets freed and a sub-pointer does
not
- Build dependency fixes for Mediatek, spapr_tce, and Intel IOMMU
driver
- Export iommu_group_get_for_dev() only for GPLed modules
- Fix AMD IOMMU interrupt remapping when x2apic is enabled
- Fix error path in the QCOM IOMMU driver probe function
* tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/qcom: Fix local_base status check
iommu: Properly export iommu_group_get_for_dev()
iommu/vt-d: Use right Kconfig option name
iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system
iommu: spapr_tce: Disable compile testing to fix build on book3s_32 config
iommu/mediatek: Fix MTK_IOMMU dependencies
iommu: Fix the memory leak in dev_iommu_free()
|
|
The git repo listed for btrfs hasn't been updated in over a year.
List the current one instead.
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
- prevent the intel_pstate driver from printing excessive diagnostic
messages in some cases (Chris Wilson)
- make the hibernation restore kernel freeze kernel threads as well as
user space tasks (Dexuan Cui)
- fix the ACPI device PM disagnostic messages to include the correct
power state name (Kai-Heng Feng).
* tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: ACPI: Output correct message on target power state
PM: hibernate: Freeze kernel threads in software_resume()
cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once
|
|
* pm-cpufreq:
cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once
* pm-sleep:
PM: hibernate: Freeze kernel threads in software_resume()
|
|
Pull iomap fix from Darrick Wong:
"Hoist the check for an unrepresentable FIBMAP return value into
ioctl_fibmap.
The internal kernel function can handle 64-bit values (and is needed
to fix a regression on ext4 + jbd2). It is only the userspace ioctl
that is so old that it cannot deal"
* tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fibmap: Warn and return an error in case of block > INT_MAX
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- fix handling of backchannel binding in BIND_CONN_TO_SESSION
Bugfixes:
- Fix a credential use-after-free issue in pnfs_roc()
- Fix potential posix_acl refcnt leak in nfs3_set_acl
- defer slow parts of rpc_free_client() to a workqueue
- Fix an Oopsable race in __nfs_list_for_each_server()
- Fix trace point use-after-free race
- Regression: the RDMA client no longer responds to server disconnect
requests
- Fix return values of xdr_stream_encode_item_{present, absent}
- _pnfs_return_layout() must always wait for layoutreturn completion
Cleanups:
- Remove unreachable error conditions"
* tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a race in __nfs_list_for_each_server()
NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
NFSv4: Remove unreachable error condition due to rpc_run_task()
SUNRPC: Remove unreachable error condition
xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}
xprtrdma: Fix trace point use-after-free race
xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()
nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
|
|
git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"Core:
- Documentation typo fixes
- fix the channel indexes
- dmatest: fixes for process hang and iterations
Drivers:
- hisilicon: build error fix without PCI_MSI
- ti-k3: deadlock fix
- uniphier-xdmac: fix for reg region
- pch: fix data race
- tegra: fix clock state"
* tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: dmatest: Fix process hang when reading 'wait' parameter
dmaengine: dmatest: Fix iteration non-stop logic
dmaengine: tegra-apb: Ensure that clock is enabled during of DMA synchronization
dmaengine: fix channel index enumeration
dmaengine: mmp_tdma: Reset channel error on release
dmaengine: mmp_tdma: Do not ignore slave config validation errors
dmaengine: pch_dma.c: Avoid data race between probe and irq handler
dt-bindings: dma: uniphier-xdmac: switch to single reg region
include/linux/dmaengine: Typos fixes in API documentation
dmaengine: xilinx_dma: Add missing check for empty list
dmaengine: ti: k3-psil: fix deadlock on error path
dmaengine: hisilicon: Fix build error without PCI_MSI
|
|
Pull VFIO fixes from Alex Williamson:
- copy_*_user validity check for new vfio_dma_rw interface (Yan Zhao)
- Fix a potential math overflow (Yan Zhao)
- Use follow_pfn() for calculating PFNMAPs (Sean Christopherson)
* tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio:
vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()
vfio: avoid possible overflow in vfio_iommu_type1_pin_pages
vfio: checking of validity of user vaddr in vfio_dma_rw
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Add -fasynchronous-unwind-tables to the vDSO CFLAGS"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: vdso: Add -fasynchronous-unwind-tables to cflags
|
|
Pull io_uring fixes from Jens Axboe:
- Fix for statx not grabbing the file table, making AT_EMPTY_PATH fail
- Cover a few cases where async poll can handle retry, eliminating the
need for an async thread
- fallback request busy/free fix (Bijan)
- syzbot reported SQPOLL thread exit fix for non-preempt (Xiaoguang)
- Fix extra put of req for sync_file_range (Pavel)
- Always punt splice async. We'll improve this for 5.8, but wanted to
eliminate the inode mutex lock from the non-blocking path for 5.7
(Pavel)
* tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block:
io_uring: punt splice async because of inode mutex
io_uring: check non-sync defer_list carefully
io_uring: fix extra put in sync_file_range()
io_uring: use cond_resched() in io_ring_ctx_wait_and_kill()
io_uring: use proper references for fallback_req locking
io_uring: only force async punt if poll based retry can't handle it
io_uring: enable poll retry for any file with ->read_iter / ->write_iter
io_uring: statx must grab the file table for valid fd
|
|
Pull block fixes from Jens Axboe:
"A few fixes for this release:
- NVMe pull request from Christoph, with a single fix for a double
free in the namespace error handling.
- Kill the bd_openers check in blk_drop_partitions(), fixing a
regression in this merge window (Christoph)"
* tag 'block-5.7-2020-05-01' of git://git.kernel.dk/linux-block:
block: remove the bd_openers checks in blk_drop_partitions
nvme: prevent double free in nvme_alloc_ns() error handling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Three driver bugfixes, and two reverts because the original patches
revealed underlying problems which the Tegra guys are now working on"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: aspeed: Avoid i2c interrupt status clear race condition.
i2c: amd-mp2-pci: Fix Oops in amd_mp2_pci_init() error handling
Revert "i2c: tegra: Better handle case where CPU0 is busy for a long time"
Revert "i2c: tegra: Synchronize DMA before termination"
i2c: iproc: generate stop event for slave writes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a collection of small fixes around this time:
- One more try for fixing PCM OSS regression
- HD-audio: a new quirk for Lenovo, the improved driver blacklisting,
a lock fix in the minor error path, and a fix for the possible race
at monitor notifiaction
- USB-audio: a quirk ID fix, a fix for POD HD500 workaround"
* tag 'sound-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Correct a typo of NuPrime DAC-10 USB ID
ALSA: opti9xx: shut up gcc-10 range warning
ALSA: hda/hdmi: fix without unlocked before return
ALSA: hda/hdmi: fix race in monitor detection during probe
ALSA: hda/realtek - Two front mics on a Lenovo ThinkCenter
ALSA: line6: Fix POD HD500 audio playback
ALSA: pcm: oss: Place the plugin buffer overflow checks correctly (for 5.7)
ALSA: pcm: oss: Place the plugin buffer overflow checks correctly
ALSA: hda: Match both PCI ID and SSID for driver blacklist
|
|
Pull drm fixes from Dave Airlie:
"Regular scheduled fixes for graphics. Nothing to extreme bunch of
amdgpu fixes, i915 and qxl fixes, along with some misc ones.
All seems to be progressing normally.
core:
- EDID off by one DTD fix
- DP mst write return code fix
dma-buf:
- fix SET_NAME ioctl uapi
- doc fixes
amdgpu:
- Fix a green screen on resume issue
- PM fixes for SR-IOV SDMA fix for navi
- Renoir display fixes
- Cursor and pageflip stuttering fixes
- Misc additional display fixes
- (uapi) Add additional DCC tiling flags for navi1x
i915:
- Fix selftest refcnt leak (Xiyu)
- Fix gem vma lock (Chris)
- Fix gt's i915_request.timeline acquire by checking if cacheline is
valid (Chris)
- Fix IRQ postinistall fault masks (Matt)
qxl:
- use after gree fix
- fix lost kunmap
- release leak fix
virtio:
- context destruction fix"
* tag 'drm-fixes-2020-05-01' of git://anongit.freedesktop.org/drm/drm: (26 commits)
dma-buf: fix documentation build warnings
drm/qxl: qxl_release use after free
drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper()
drm/i915: Use proper fault mask in interrupt postinstall too
drm/amd/display: Use cursor locking to prevent flip delays
drm/amd/display: Update downspread percent to match spreadsheet for DCN2.1
drm/amd/display: Defer cursor update around VUPDATE for all ASIC
drm/amd/display: fix rn soc bb update
drm/amd/display: check if REFCLK_CNTL register is present
drm/amdgpu: bump version for invalidate L2 before SDMA IBs
drm/amdgpu: invalidate L2 before SDMA IBs (v2)
drm/amdgpu: add tiling flags from Mesa
drm/amd/powerplay: avoid using pm_en before it is initialized revised
Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"
drm/qxl: qxl_release leak in qxl_hw_surface_alloc()
drm/qxl: qxl_release leak in qxl_draw_dirty_fb()
drm/virtio: only destroy created contexts
drm/dp_mst: Fix drm_dp_send_dpcd_write() return code
drm/i915/gt: Check cacheline is valid before acquiring
drm/i915/gem: Hold obj->vma.lock over for_each_ggtt_vma()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four minor fixes: three in drivers and one in the core.
The core one allows an additional state change that fixes a regression
introduced by an update to the aacraid driver in the previous merge
window"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target/iblock: fix WRITE SAME zeroing
scsi: qla2xxx: check UNLOADING before posting async work
scsi: qla2xxx: set UNLOADING before waiting for session deletion
scsi: core: Allow the state change from SDEV_QUIESCE to SDEV_BLOCK
|