Age | Commit message (Collapse) | Author | Files | Lines |
|
Nothing in bcache actually uses the ->queuedata field.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of setting up the queuedata as well just use one private data
field.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of setting up the queuedata as well just use one private data
field.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of setting up the queuedata as well just use one private data
field.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of setting up the queuedata as well just use one private data
field.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of setting up the queuedata as well just use one private data
field.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of setting up the queuedata as well just use one private data
field.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of setting up the queuedata as well just use one private data
field.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of setting up the queuedata as well just use one private data
field.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Just use rq directly, the usage of list_entry_rq() doesn't make any
sense.
Signed-off-by: Hou Tao <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
There are several copies of get_eflags() and set_eflags() and they all are
buggy. Consolidate them and fix them. The fixes are:
Add memory clobbers. These are probably unnecessary but they make sure
that the compiler doesn't move something past one of these calls when it
shouldn't.
Respect the redzone on x86_64. There has no failure been observed related
to this, but it's definitely a bug.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/982ce58ae8dea2f1e57093ee894760e35267e751.1593191971.git.luto@kernel.org
|
|
Clear the weird flags before logging to improve strace output --
logging results while, say, TF is set does no one any favors.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/907bfa5a42d4475b8245e18b67a04b13ca51ffdb.1593191971.git.luto@kernel.org
|
|
Add EFLAGS.AC to the mix.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/12924e2fe2c5826568b7fc9436d85ca7f5eb1743.1593191971.git.luto@kernel.org
|
|
The SYSENTER frame setup was nonsense. It worked by accident because the
normal code into which the Xen asm jumped (entry_SYSENTER_32/compat) threw
away SP without touching the stack. entry_SYSENTER_compat was recently
modified such that it relied on having a valid stack pointer, so now the
Xen asm needs to invoke it with a valid stack.
Fix it up like SYSCALL: use the Xen-provided frame and skip the bare
metal prologue.
Fixes: 1c3e5d3f60e2 ("x86/entry: Make entry_64_compat.S objtool clean")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Link: https://lkml.kernel.org/r/947880c41ade688ff4836f665d0c9fcaa9bd1201.1593191971.git.luto@kernel.org
|
|
The SYSENTER asm (32-bit and compat) contains fixups for regs->sp and
regs->flags. Move the fixups into C and fix some comments while at it.
This is a valid cleanup all by itself, and it also simplifies the
subsequent patch that will fix Xen PV SYSENTER.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/fe62bef67eda7fac75b8f3dbafccf571dc4ece6b.1593191971.git.luto@kernel.org
|
|
Now that the entry stack is a full page, it's too easy to regress the
system call entry code and end up on the wrong stack without noticing.
Assert that all system calls (SYSCALL64, SYSCALL32, SYSENTER, and INT80)
are on the right stack and have pt_regs in the right place.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/52059e42bb0ab8551153d012d68f7be18d72ff8e.1593191971.git.luto@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
Two fixups
- It fixes wrong return value by returing proper error value instead of
fixed one.
- It fixes ref count leak in mic_pre_enable.
One cleanup
- It removes dev_err() call on platform_get_irq() failure because
platform_get_irq() call dev_err() itself on failure.
Signed-off-by: Dave Airlie <[email protected]>
From: Inki Dae <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
https://gitlab.freedesktop.org/drm/msm into drm-fixes
A few fixes, mostly fallout from the address space refactor and dpu
color processing.
Signed-off-by: Dave Airlie <[email protected]>
From: Rob Clark <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/ <CAF6AEGv0SSXArdYs=mOLqJPJdkvk8CpxaJGecqgbOGazQ2n5og@mail.gmail.com
|
|
[Why]
Changes that are fast don't require updating DLG parameters making
this call unnecessary. Considering this is an expensive call it should
not be done on every flip.
DML touches clocks, p-state support, DLG params and a few other DC
internal flags and these aren't expected during fast. A hang has been
reported with this change when called on every flip which suggests that
modifying these fields is not recommended behavior on fast updates.
[How]
Guard the validation to only happen if update type isn't FAST.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1191
Fixes: a24eaa5c51255b ("drm/amd/display: Revalidate bandwidth before commiting DC updates")
Signed-off-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Else there will be memory leak if alloc_disk() fails.
Fixes: 6a27b656fc02 ("block: virtio-blk: support multi virt queues per virtio-blk device")
Signed-off-by: Hou Tao <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 6ae72bfa656e ("PCI: Unify pcie_find_root_port() and
pci_find_pcie_root_port()") broke acpi_pci_bridge_d3() because calling
pcie_find_root_port() on a Root Port returned NULL when it should return
the Root Port, which in turn broke power management of PCIe hierarchies.
Rework pcie_find_root_port() so it returns its argument when it is already
a Root Port.
[bhelgaas: test device only once, test for PCIe]
Fixes: 6ae72bfa656e ("PCI: Unify pcie_find_root_port() and pci_find_pcie_root_port()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mika Westerberg <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat fixes from Namjae Jeon:
- Zero out unused characters of FileName field to avoid a complaint
from some fsck tool.
- Fix memory leak on error paths.
- Fix unnecessary VOL_DIRTY set when calling rmdir on non-empty
directory.
- Call sync_filesystem() for read-only remount (Fix generic/452 test in
xfstests)
- Add own fsync() to flush dirty metadata.
* tag 'exfat-for-5.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: flush dirty metadata in fsync
exfat: move setting VOL_DIRTY over exfat_remove_entries()
exfat: call sync_filesystem for read-only remount
exfat: add missing brelse() calls on error paths
exfat: Set the unused characters of FileName field to the value 0000h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem fixes from James Morris:
"Two simple fixes for v5.8:
- Fix hook iteration and default value for inode_copy_up_xattr
(KP Singh)
- Fix the key_permission LSM hook function type (Sami Tolvanen)"
* tag 'fixes-v5.8-rc3-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security: Fix hook iteration and default value for inode_copy_up_xattr
security: fix the key_permission LSM hook function type
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"Include PCRs 8 & 9 in per TPM 2.0 bank boot_aggregate calculation.
Prior to Linux 5.8 the SHA1 "boot_aggregate" value was padded with 0's
and extended into the other TPM 2.0 banks.
Included in the Linux 5.8 open window, TPM 2.0 PCR bank specific
"boot_aggregate" values (PCRs 0 - 7) are calculated and extended into the TPM banks.
Distro releases are now shipping grub2 with TPM support, which extend
PCRs 8 & 9. I'd like for PCRs 8 & 9 to be included in the new
"boot_aggregate" calculations.
For backwards compatibility, if the hash is SHA1, these new PCRs are
not included in the boot aggregate"
* tag 'integrity-v5.8-fix-2' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: extend boot_aggregate with kernel measurements
|
|
There is a statement that is indented one level too deeply, fix it
by removing a tab.
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Move .nr_active update and request assignment into blk_mq_get_driver_tag(),
all are good to do during getting driver tag.
Meantime blk-flush related code is simplified and flush request needn't
to update the request table manually any more.
Signed-off-by: Ming Lei <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
It is used by blk-mq.c only, so move it to the source file.
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
blk_mq_get_driver_tag() is only used by blk-mq.c and is supposed to
stay in blk-mq.c, so move it and preparing for cleanup code of
get/put driver tag.
Meantime hctx_may_queue() is moved to header file and it is fine
since it is defined as inline always.
No functional change.
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Since 5.7, we've been using task_work to trigger async running of
requests in the context of the original task. This generally works
great, but there's a case where if the task is currently blocked
in the kernel waiting on a condition to become true, it won't process
task_work. Even though the task is woken, it just checks whatever
condition it's waiting on, and goes back to sleep if it's still false.
This is a problem if that very condition only becomes true when that
task_work is run. An example of that is the task registering an eventfd
with io_uring, and it's now blocked waiting on an eventfd read. That
read could depend on a completion event, and that completion event
won't get trigged until task_work has been run.
Use the TWA_SIGNAL notification for task_work, so that we ensure that
the task always runs the work when queued.
Cc: [email protected] # v5.7
Signed-off-by: Jens Axboe <[email protected]>
|
|
So that the target task will exit the wait_event_interruptible-like
loop and call task_work_run() asap.
The patch turns "bool notify" into 0,TWA_RESUME,TWA_SIGNAL enum, the
new TWA_SIGNAL flag implies signal_wake_up(). However, it needs to
avoid the race with recalc_sigpending(), so the patch also adds the
new JOBCTL_TASK_WORK bit included in JOBCTL_PENDING_MASK.
TODO: once this patch is merged we need to change all current users
of task_work_add(notify = true) to use TWA_RESUME.
Cc: [email protected] # v5.7
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Tiger Lake's new unique ACPI device ID for Fan is not valid
because of missing 'C' in the ID. Use correct fan device ID.
Fixes: c248dfe7e0ca ("ACPI: fan: Add Tiger Lake ACPI device ID")
Signed-off-by: Sumeet Pawnikar <[email protected]>
Cc: 5.6+ <[email protected]> # 5.6+
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Adjust the reg property to fix the following warning seen with
'make dt_binding_check':
Documentation/devicetree/bindings/thermal/ti,am654-thermal.example.dt.yaml: example-0: thermal@42050000:reg:0: [0, 1107623936, 0, 604] is too long
Signed-off-by: Fabio Estevam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
Remove the soc unit address to fix the following warnings seen with
'make dt_binding_check':
Documentation/devicetree/bindings/thermal/thermal-sensor.example.dts:22.20-49.11: Warning (unit_address_vs_reg): /example-0/soc@0: node has a unit name, but no reg or ranges property
Documentation/devicetree/bindings/thermal/thermal-zones.example.dts:23.20-50.11: Warning (unit_address_vs_reg): /example-0/soc@0: node has a unit name, but no reg or ranges property
Signed-off-by: Fabio Estevam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[robh: also fix thermal-zones.yaml example]
Signed-off-by: Rob Herring <[email protected]>
|
|
Pass the sysreg unit name to fix the following warning seen with
'make dt_binding_check':
Warning (unit_address_vs_reg): /example-0/sysreg: node has a reg or ranges property, but no unit name
Signed-off-by: Fabio Estevam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
Remove the leading zeroes to fix the following warning seen with
'make dt_binding_check':
Documentation/devicetree/bindings/usb/aspeed,usb-vhub.example.dts:37.33-42.23: Warning (unit_address_format): /example-0/usb-vhub@1e6a0000/vhub-strings/string@0409: unit name should not have leading 0s
Reviewed-by: Tao Ren <[email protected]>
Signed-off-by: Fabio Estevam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
There are two processed schema files:
- processed-schema-examples.yaml
Used for 'make dt_binding_check'. This is always a full schema.
- processed-schema.yaml
Used for 'make dtbs_check'. This may be a full schema, or a smaller
subset if DT_SCHEMA_FILES is given by a user.
If DT_SCHEMA_FILES is not specified, they are the same. You can copy
the former to the latter instead of running dt-mk-schema twice. This
saves the cpu time a lot when you do 'make dt_binding_check dtbs_check'
because building the full schema takes a couple of seconds.
If DT_SCHEMA_FILES is specified, processed-schema.yaml is generated
based on the specified yaml files.
Signed-off-by: Masahiro Yamada <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
Currently, processed-schema.yaml is always built, but it is actually
used only for 'make dtbs_check'.
'make dt_binding_check' uses processed-schema-example.yaml instead.
Build processed-schema.yaml only for 'make dtbs_check'.
Signed-off-by: Masahiro Yamada <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
We are having more and more schema files.
Commit 8b6b80218b01 ("dt-bindings: Fix command line length limit
calling dt-mk-schema") fixed the 'Argument list too long' error of
the schema checks, but the same error happens while cleaning too.
'make clean' after 'make dt_binding_check' fails as follows:
$ make dt_binding_check
[ snip ]
$ make clean
make[2]: execvp: /bin/sh: Argument list too long
make[2]: *** [scripts/Makefile.clean:52: __clean] Error 127
make[1]: *** [scripts/Makefile.clean:66: Documentation/devicetree/bindings] Error 2
make: *** [Makefile:1763: _clean_Documentation] Error 2
'make dt_binding_check' generates so many .example.dts, .dt.yaml files,
which are passed to the 'rm' command when you run 'make clean'.
I added a small hack to use the 'find' command to clean up most of the
build artifacts before they are processed by scripts/Makefile.clean
Signed-off-by: Masahiro Yamada <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
Fix unit address to match the first address specified in the reg
property of the node in example.
Signed-off-by: Kangmin Park <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
Since commit e69f5dc623f9 ("dt-bindings: serial: Convert 8250 to
json-schema"), the schema for "ns16550a" is checked.
'make dt_binding_check' emits the following warning:
uart@5,00200000: $nodename:0: 'uart@5,00200000' does not match '^serial(@[0-9a-f,]+)*$'
Rename the node to follow the pattern defined in
Documentation/devicetree/bindings/serial/serial.yaml
While I was here, I removed leading zeros from unit names.
Signed-off-by: Masahiro Yamada <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
Sync with upstream dtc primarily to pickup the I2C bus check fixes. The
interrupt_provider check is noisy, so turn it off for now.
This adds the following commits from upstream:
9d7888cbf19c dtc: Consider one-character strings as strings
8259d59f59de checks: Improve i2c reg property checking
fdabcf2980a4 checks: Remove warning for I2C_OWN_SLAVE_ADDRESS
2478b1652c8d libfdt: add extern "C" for C++
f68bfc2668b2 libfdt: trivial typo fix
7be250b4d059 libfdt: Correct condition for reordering blocks
81e0919a3e21 checks: Add interrupt provider test
85e5d839847a Makefile: when building libfdt only, do not add unneeded deps
b28464a550c5 Fix some potential unaligned accesses in dtc
Signed-off-by: Rob Herring <[email protected]>
|
|
More and more drivers want to get batching requests queued from
block layer, such as mmc, and tcp based storage drivers. Also
current in-tree users have virtio-scsi, virtio-blk and nvme.
For none, we already support batching dispatch.
But for io scheduler, every time we just take one request from scheduler
and pass the single request to blk_mq_dispatch_rq_list(). This way makes
batching dispatch not possible when io scheduler is applied. One reason
is that we don't want to hurt sequential IO performance, becasue IO
merge chance is reduced if more requests are dequeued from scheduler
queue.
Try to support batching dispatch for io scheduler by starting with the
following simple approach:
1) still make sure we can get budget before dequeueing request
2) use hctx->dispatch_busy to evaluate if queue is busy, if it is busy
we fackback to non-batching dispatch, otherwise dequeue as many as
possible requests from scheduler, and pass them to blk_mq_dispatch_rq_list().
Wrt. 2), we use similar policy for none, and turns out that SCSI SSD
performance got improved much.
In future, maybe we can develop more intelligent algorithem for batching
dispatch.
Baolin has tested this patch and found that MMC performance is improved[3].
[1] https://lore.kernel.org/linux-block/20200512075501.GF1531898@T590/#r
[2] https://lore.kernel.org/linux-block/[email protected]/
[3] https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=TXA@mail.gmail.com/
Signed-off-by: Ming Lei <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pass obtained budget count to blk_mq_dispatch_rq_list(), and prepare
for supporting fully batching submission.
With the obtained budget count, it is easier to put extra budgets
in case of .queue_rq failure.
Meantime remove the old 'got_budget' parameter.
Signed-off-by: Ming Lei <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE is returned from
.queue_rq, the 'list' variable always holds this rq which isn't
queued to LLD successfully.
So blk_mq_dispatch_rq_list() always returns false from the branch
of '!list_empty(list)'.
No functional change.
Signed-off-by: Ming Lei <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Move code for getting driver tag and budget into one helper, so
blk_mq_dispatch_rq_list gets a bit simplified, and easier to read.
Meantime move updating of 'no_tag' and 'no_budget_available' into
the branch for handling partial dispatch because that is exactly
consumer of the two local variables.
Also rename the parameter of 'got_budget' as 'ask_budget'.
No functional change.
Signed-off-by: Ming Lei <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
All requests in the 'list' of blk_mq_dispatch_rq_list belong to same
hctx, so it is better to pass hctx instead of request queue, because
blk-mq's dispatch target is hctx instead of request queue.
Signed-off-by: Ming Lei <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
blk-mq budget is abstract from scsi's device queue depth, and it is
always per-request-queue instead of hctx.
It can be quite absurd to get a budget from one hctx, then dequeue a
request from scheduler queue, and this request may not belong to this
hctx, at least for bfq and deadline.
So fix the mess and always pass request queue to get/put budget
callback.
Signed-off-by: Ming Lei <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Douglas Anderson <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Choo! Choo! All aboard the Split Lock Express, with direct service to
Wreckage!
Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
SLD-enabled CPU model to avoid writing MSR_TEST_CTRL. MSR_TEST_CTRL
exists, and is writable, on many generations of CPUs. Writing the MSR,
even with '0', can result in bizarre, undocumented behavior.
This fixes a crash on Haswell when resuming from suspend with a live KVM
guest. Because APs use the standard SMP boot flow for resume, they will
go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
which runs even when sld_state==sld_off to ensure SLD is disabled. On
Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
succeed and _may_ take the SMT _sibling_ out of VMX root mode.
When KVM has an active guest, KVM performs VMXON as part of CPU onlining
(see kvm_starting_cpu()). Because SMP boot is serialized, the resulting
flow is effectively:
on_each_ap_cpu() {
WRMSR(MSR_TEST_CTRL, 0)
VMXON
}
As a result, the WRMSR can disable VMX on a different CPU that has
already done VMXON. This ultimately results in a #UD on VMPTRLD when
KVM regains control and attempt run its vCPUs.
The above voodoo was confirmed by reworking KVM's VMXON flow to write
MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
Further verification of the insanity was done by redoing VMXON on all
APs after the initial WRMSR->VMXON sequence. The additional VMXON,
which should VM-Fail, occasionally succeeded, and also eliminated the
unexpected #UD on VMPTRLD.
The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
almost always hang (even when fudging VMXON), a #UD on a random Jcc was
observed, suspend/resume stability is qualitatively poor, and so on and
so forth.
kernel BUG at arch/x86/kvm/x86.c:386!
CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G D
Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
RIP: 0010:kvm_spurious_fault+0xf/0x20
Call Trace:
vmx_vcpu_load_vmcs+0x1fb/0x2b0
vmx_vcpu_load+0x3e/0x160
kvm_arch_vcpu_load+0x48/0x260
finish_task_switch+0x140/0x260
__schedule+0x460/0x720
_cond_resched+0x2d/0x40
kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0
kvm_vcpu_ioctl+0x363/0x5c0
ksys_ioctl+0x88/0xa0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4c/0x170
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: dbaba47085b0c ("x86/split_lock: Rework the initialization flow of split lock detection")
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
In flush_delete_work, instead of flushing each individual pending
delayed work item, cancel and re-queue them for immediate execution.
The waiting isn't needed here because we're already waiting for all
queued work items to complete in gfs2_flush_delete_work. This makes the
code more efficient, but more importantly, it avoids sleeping during a
rhashtable walk, inside rcu_read_lock().
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
Log flush operations (gfs2_log_flush()) can target a specific transaction.
But if the function encounters errors (e.g. io errors) and withdraws,
the transaction was only freed it if was queued to one of the ail lists.
If the withdraw occurred before the transaction was queued to the ail1
list, function ail_drain never freed it. The result was:
BUG gfs2_trans: Objects remaining in gfs2_trans on __kmem_cache_shutdown()
This patch makes log_flush() add the targeted transaction to the ail1
list so that function ail_drain() will find and free it properly.
Cc: [email protected] # v5.7+
Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>
|