Age | Commit message (Collapse) | Author | Files | Lines |
|
The hisi_sas_slot.is_internal member is not set properly for ATA commands
which the driver sends directly. A TMF struct pointer is normally used as a
test to set this, but it is NULL for those commands. It's not ideal, but
pass an empty TMF struct to set that member properly.
Link: https://lore.kernel.org/r/[email protected]
Fixes: dc313f6b125b ("scsi: hisi_sas: Factor out task prep and delivery code")
Reported-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Bounds checking when parsing init scripts embedded in the BIOS reject
access to the last byte. This causes driver initialization to fail on
Apple eMac's with GeForce 2 MX GPUs, leaving the system with no working
console.
This is probably only seen on OpenFirmware machines like PowerPC Macs
because the BIOS image provided by OF is only the used parts of the ROM,
not a power-of-two blocks read from PCI directly so PCs always have
empty bytes at the end that are never accessed.
Signed-off-by: Nick Lopez <[email protected]>
Fixes: 4d4e9907ff572 ("drm/nouveau/bios: guard against out-of-bounds accesses to image")
Cc: <[email protected]> # v4.10+
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently a use-after-free may occur if a sas_task is aborted by the upper
layer before we handle the I/O completion in mpi_ssp_completion() or
mpi_sata_completion().
In this case, the following are the two steps in handling those I/O
completions:
- Call complete() to inform the upper layer handler of completion of
the I/O.
- Release driver resources associated with the sas_task in
pm8001_ccb_task_free() call.
When complete() is called, the upper layer may free the sas_task. As such,
we should not touch the associated sas_task afterwards, but we do so in the
pm8001_ccb_task_free() call.
Fix by swapping the complete() and pm8001_ccb_task_free() calls ordering.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Damien Le Moal <[email protected]>
Acked-by: Jack Wang <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Currently a use-after-free may occur if a TMF sas_task is aborted before we
handle the IO completion in mpi_ssp_completion(). The abort occurs due to
timeout.
When the timeout occurs, the SAS_TASK_STATE_ABORTED flag is set and the
sas_task is freed in pm8001_exec_internal_tmf_task().
However, if the I/O completion occurs later, the I/O completion still
thinks that the sas_task is available. Fix this by clearing the ccb->task
if the TMF times out - the I/O completion handler does nothing if this
pointer is cleared.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Damien Le Moal <[email protected]>
Acked-by: Jack Wang <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
make W=1 complains of an undescribed function parameter:
drivers/scsi/pm8001/pm80xx_hwi.c:3938: warning: Function parameter or member 'circularQ' not described in 'process_one_iomb'
Fix it.
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Damien Le Moal <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Acked-by: Jack Wang <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
There was an AQ error I40E_AQ_RC_EINVAL when trying
to reset bw limit as part of bw allocation setup.
This was caused by trying to reset bw limit with
DCB enabled. Bw limit should not be reset when
DCB is enabled. The code was relying on the pf->flags
to check if DCB is enabled but if only 1 TC is available
this flag will not be set even though DCB is enabled.
Add a check for number of TC and if it is 1
don't try to reset bw limit even if pf->flags shows
DCB as disabled.
Fixes: fa38e30ac73f ("i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled")
Suggested-by: Alexander Lobakin <[email protected]> # Flatten the condition
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Tested-by: Imam Hassan Reza Biswas <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Syzbot tripped over the following complaint from the kernel:
WARNING: CPU: 2 PID: 15402 at mm/util.c:597 kvmalloc_node+0x11e/0x125 mm/util.c:597
While trying to run XFS_IOC_GETBMAP against the following structure:
struct getbmap fubar = {
.bmv_count = 0x22dae649,
};
Obviously, this is a crazy huge value since the next thing that the
ioctl would do is allocate 37GB of memory. This is enough to make
kvmalloc mad, but isn't large enough to trip the validation functions.
In other words, I'm fussing with checks that were **already sufficient**
because that's easier than dealing with 644 internal bug reports. Yes,
that's right, six hundred and forty-four.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
Reviewed-by: Catherine Hoang <[email protected]>
|
|
ACPI core now requires crc32() but the kernel build can fail when
CRC32 is not set/enabled, so select it in the ACPI Kconfig entry.
Fixes this build error:
ia64-linux-ld: drivers/acpi/scan.o: in function `acpi_store_pld_crc':
include/acpi/platform/aclinuxex.h:62: undefined reference to `crc32_le'
Fixes: 882c982dada4 ("acpi: Store CRC-32 hash of the _PLD in struct acpi_device")
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: Guenter Roeck <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
We currently use ->cmd_per_lun as initial queue depth for setting up the
budget_map. Martin Wilck reported that it is common for the queue_depth to
be subsequently updated in slave_configure() based on detected hardware
characteristics.
As a result, for some drivers, the static host template settings for
cmd_per_lun and can_queue won't actually get used in practice. And if the
default values are used to allocate the budget_map, memory may be consumed
unnecessarily.
Fix the issue by reallocating the budget_map after ->slave_configure()
returns. At that time the device queue_depth should accurately reflect what
the hardware needs.
Link: https://lore.kernel.org/r/[email protected]
Cc: Bart Van Assche <[email protected]>
Reported-by: Martin Wilck <[email protected]>
Suggested-by: Martin Wilck <[email protected]>
Tested-by: Martin Wilck <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
vdso_test_abi contains a batch of tests that verify the validity of the
vDSO ABI.
When a vDSO symbol is not found the relevant test is skipped reporting
KSFT_SKIP. All the tests return values are then added in a single
variable which is checked to verify failures. This approach can have
side effects which result in reporting the wrong kselftest exit status.
Fix vdso_test_abi verifying the return code of each test separately.
Cc: Shuah Khan <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Reported-by: Cristian Marussi <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
Running tests with a debug kernel shows that bnx2fc_recv_frame() is
modifying the per_cpu lport stats counters in a non-mpsafe way. Just boot
a debug kernel and run the bnx2fc driver with the hardware enabled.
[ 1391.699147] BUG: using smp_processor_id() in preemptible [00000000] code: bnx2fc_
[ 1391.699160] caller is bnx2fc_recv_frame+0xbf9/0x1760 [bnx2fc]
[ 1391.699174] CPU: 2 PID: 4355 Comm: bnx2fc_l2_threa Kdump: loaded Tainted: G B
[ 1391.699180] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013
[ 1391.699183] Call Trace:
[ 1391.699188] dump_stack_lvl+0x57/0x7d
[ 1391.699198] check_preemption_disabled+0xc8/0xd0
[ 1391.699205] bnx2fc_recv_frame+0xbf9/0x1760 [bnx2fc]
[ 1391.699215] ? do_raw_spin_trylock+0xb5/0x180
[ 1391.699221] ? bnx2fc_npiv_create_vports.isra.0+0x4e0/0x4e0 [bnx2fc]
[ 1391.699229] ? bnx2fc_l2_rcv_thread+0xb7/0x3a0 [bnx2fc]
[ 1391.699240] bnx2fc_l2_rcv_thread+0x1af/0x3a0 [bnx2fc]
[ 1391.699250] ? bnx2fc_ulp_init+0xc0/0xc0 [bnx2fc]
[ 1391.699258] kthread+0x364/0x420
[ 1391.699263] ? _raw_spin_unlock_irq+0x24/0x50
[ 1391.699268] ? set_kthread_struct+0x100/0x100
[ 1391.699273] ret_from_fork+0x22/0x30
Restore the old get_cpu/put_cpu code with some modifications to reduce the
size of the critical section.
Link: https://lore.kernel.org/r/[email protected]
Fixes: d576a5e80cd0 ("bnx2fc: Improve stats update mechanism")
Tested-by: Guangwu Zhang <[email protected]>
Acked-by: Saurav Kashyap <[email protected]>
Signed-off-by: John Meneghini <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Current code handles completions for SATA devices in mpi_sata_completion()
and mpi_sata_event().
However, at the time when any SATA event happens, for almost all the event
types, the command is still in the target. It is therefore incorrect to
complete the task in sata_event().
There are some events for which we get sata_completions, some need recovery
procedure and others abort. All the tasks must be completed via
sata_completion() path.
Removed the task done related code from sata_events(). For tasks where we
don't get completions, let top layer call abort() to abort the command post
timeout.
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Jack Wang <[email protected]>
Co-developed-by: Viswas G <[email protected]>
Signed-off-by: Viswas G <[email protected]>
Signed-off-by: Ajish Koshy <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
After commit e3beca48a45b ("irqdomain/treewide: Keep firmware node
unconditionally allocated"). For tear down scenario, fn is only freed
after fail to allocate ir_domain, though it also should be freed in case
dmar_enable_qi returns error.
Besides free fn, irq_domain and ir_msi_domain need to be removed as well
if intel_setup_irq_remapping fails to enable queued invalidation.
Improve the rewinding path by add out_free_ir_domain and out_free_fwnode
lables per Baolu's suggestion.
Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated")
Suggested-by: Lu Baolu <[email protected]>
Signed-off-by: Guoqing Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
|
|
The code is mostly free of W=1 warning, so fix the following:
drivers/iommu/iommu.c:996: warning: expecting prototype for iommu_group_for_each_dev(). Prototype was for __iommu_group_for_each_dev() instead
drivers/iommu/iommu.c:3048: warning: Function parameter or member 'drvdata' not described in 'iommu_sva_bind_device'
drivers/iommu/ioasid.c:354: warning: Function parameter or member 'ioasid' not described in 'ioasid_get'
drivers/iommu/omap-iommu.c:1098: warning: expecting prototype for omap_iommu_suspend_prepare(). Prototype was for omap_iommu_prepare() instead
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
|
|
Kasan has reported the following use after free on dev->iommu.
when a device probe fails and it is in process of freeing dev->iommu
in dev_iommu_free function, a deferred_probe_work_func runs in parallel
and tries to access dev->iommu->fwspec in of_iommu_configure path thus
causing use after free.
BUG: KASAN: use-after-free in of_iommu_configure+0xb4/0x4a4
Read of size 8 at addr ffffff87a2f1acb8 by task kworker/u16:2/153
Workqueue: events_unbound deferred_probe_work_func
Call trace:
dump_backtrace+0x0/0x33c
show_stack+0x18/0x24
dump_stack_lvl+0x16c/0x1e0
print_address_description+0x84/0x39c
__kasan_report+0x184/0x308
kasan_report+0x50/0x78
__asan_load8+0xc0/0xc4
of_iommu_configure+0xb4/0x4a4
of_dma_configure_id+0x2fc/0x4d4
platform_dma_configure+0x40/0x5c
really_probe+0x1b4/0xb74
driver_probe_device+0x11c/0x228
__device_attach_driver+0x14c/0x304
bus_for_each_drv+0x124/0x1b0
__device_attach+0x25c/0x334
device_initial_probe+0x24/0x34
bus_probe_device+0x78/0x134
deferred_probe_work_func+0x130/0x1a8
process_one_work+0x4c8/0x970
worker_thread+0x5c8/0xaec
kthread+0x1f8/0x220
ret_from_fork+0x10/0x18
Allocated by task 1:
____kasan_kmalloc+0xd4/0x114
__kasan_kmalloc+0x10/0x1c
kmem_cache_alloc_trace+0xe4/0x3d4
__iommu_probe_device+0x90/0x394
probe_iommu_group+0x70/0x9c
bus_for_each_dev+0x11c/0x19c
bus_iommu_probe+0xb8/0x7d4
bus_set_iommu+0xcc/0x13c
arm_smmu_bus_init+0x44/0x130 [arm_smmu]
arm_smmu_device_probe+0xb88/0xc54 [arm_smmu]
platform_drv_probe+0xe4/0x13c
really_probe+0x2c8/0xb74
driver_probe_device+0x11c/0x228
device_driver_attach+0xf0/0x16c
__driver_attach+0x80/0x320
bus_for_each_dev+0x11c/0x19c
driver_attach+0x38/0x48
bus_add_driver+0x1dc/0x3a4
driver_register+0x18c/0x244
__platform_driver_register+0x88/0x9c
init_module+0x64/0xff4 [arm_smmu]
do_one_initcall+0x17c/0x2f0
do_init_module+0xe8/0x378
load_module+0x3f80/0x4a40
__se_sys_finit_module+0x1a0/0x1e4
__arm64_sys_finit_module+0x44/0x58
el0_svc_common+0x100/0x264
do_el0_svc+0x38/0xa4
el0_svc+0x20/0x30
el0_sync_handler+0x68/0xac
el0_sync+0x160/0x180
Freed by task 1:
kasan_set_track+0x4c/0x84
kasan_set_free_info+0x28/0x4c
____kasan_slab_free+0x120/0x15c
__kasan_slab_free+0x18/0x28
slab_free_freelist_hook+0x204/0x2fc
kfree+0xfc/0x3a4
__iommu_probe_device+0x284/0x394
probe_iommu_group+0x70/0x9c
bus_for_each_dev+0x11c/0x19c
bus_iommu_probe+0xb8/0x7d4
bus_set_iommu+0xcc/0x13c
arm_smmu_bus_init+0x44/0x130 [arm_smmu]
arm_smmu_device_probe+0xb88/0xc54 [arm_smmu]
platform_drv_probe+0xe4/0x13c
really_probe+0x2c8/0xb74
driver_probe_device+0x11c/0x228
device_driver_attach+0xf0/0x16c
__driver_attach+0x80/0x320
bus_for_each_dev+0x11c/0x19c
driver_attach+0x38/0x48
bus_add_driver+0x1dc/0x3a4
driver_register+0x18c/0x244
__platform_driver_register+0x88/0x9c
init_module+0x64/0xff4 [arm_smmu]
do_one_initcall+0x17c/0x2f0
do_init_module+0xe8/0x378
load_module+0x3f80/0x4a40
__se_sys_finit_module+0x1a0/0x1e4
__arm64_sys_finit_module+0x44/0x58
el0_svc_common+0x100/0x264
do_el0_svc+0x38/0xa4
el0_svc+0x20/0x30
el0_sync_handler+0x68/0xac
el0_sync+0x160/0x180
Fix this by setting dev->iommu to NULL first and
then freeing dev_iommu structure in dev_iommu_free
function.
Suggested-by: Robin Murphy <[email protected]>
Signed-off-by: Vijayanand Jitta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
|
|
In some case, like after a transfer timeout, master->cur_msg pointer
is NULL which led to a kernel crash when trying to use master->cur_msg->spi.
mtk_spi_can_dma(), pointed by master->can_dma, doesn't use this parameter
avoid the problem by setting NULL as second parameter.
Fixes: a568231f46322 ("spi: mediatek: Add spi bus for Mediatek MT8173")
Signed-off-by: Benjamin Gaignard <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
After the recent changes made by commit c2e39305299f01 ("btrfs: clear
extent buffer uptodate when we fail to write it") and its followup fix,
commit 651740a5024117 ("btrfs: check WRITE_ERR when trying to read an
extent buffer"), we can now end up not cleaning up space reservations of
log tree extent buffers after a transaction abort happens, as well as not
cleaning up still dirty extent buffers.
This happens because if writeback for a log tree extent buffer failed,
then we have cleared the bit EXTENT_BUFFER_UPTODATE from the extent buffer
and we have also set the bit EXTENT_BUFFER_WRITE_ERR on it. Later on,
when trying to free the log tree with free_log_tree(), which iterates
over the tree, we can end up getting an -EIO error when trying to read
a node or a leaf, since read_extent_buffer_pages() returns -EIO if an
extent buffer does not have EXTENT_BUFFER_UPTODATE set and has the
EXTENT_BUFFER_WRITE_ERR bit set. Getting that -EIO means that we return
immediately as we can not iterate over the entire tree.
In that case we never update the reserved space for an extent buffer in
the respective block group and space_info object.
When this happens we get the following traces when unmounting the fs:
[174957.284509] BTRFS: error (device dm-0) in cleanup_transaction:1913: errno=-5 IO failure
[174957.286497] BTRFS: error (device dm-0) in free_log_tree:3420: errno=-5 IO failure
[174957.399379] ------------[ cut here ]------------
[174957.402497] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:127 btrfs_put_block_group+0x77/0xb0 [btrfs]
[174957.407523] Modules linked in: btrfs overlay dm_zero (...)
[174957.424917] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1
[174957.426689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[174957.428716] RIP: 0010:btrfs_put_block_group+0x77/0xb0 [btrfs]
[174957.429717] Code: 21 48 8b bd (...)
[174957.432867] RSP: 0018:ffffb70d41cffdd0 EFLAGS: 00010206
[174957.433632] RAX: 0000000000000001 RBX: ffff8b09c3848000 RCX: ffff8b0758edd1c8
[174957.434689] RDX: 0000000000000001 RSI: ffffffffc0b467e7 RDI: ffff8b0758edd000
[174957.436068] RBP: ffff8b0758edd000 R08: 0000000000000000 R09: 0000000000000000
[174957.437114] R10: 0000000000000246 R11: 0000000000000000 R12: ffff8b09c3848148
[174957.438140] R13: ffff8b09c3848198 R14: ffff8b0758edd188 R15: dead000000000100
[174957.439317] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000
[174957.440402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[174957.441164] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0
[174957.442117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[174957.443076] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[174957.443948] Call Trace:
[174957.444264] <TASK>
[174957.444538] btrfs_free_block_groups+0x255/0x3c0 [btrfs]
[174957.445238] close_ctree+0x301/0x357 [btrfs]
[174957.445803] ? call_rcu+0x16c/0x290
[174957.446250] generic_shutdown_super+0x74/0x120
[174957.446832] kill_anon_super+0x14/0x30
[174957.447305] btrfs_kill_super+0x12/0x20 [btrfs]
[174957.447890] deactivate_locked_super+0x31/0xa0
[174957.448440] cleanup_mnt+0x147/0x1c0
[174957.448888] task_work_run+0x5c/0xa0
[174957.449336] exit_to_user_mode_prepare+0x1e5/0x1f0
[174957.449934] syscall_exit_to_user_mode+0x16/0x40
[174957.450512] do_syscall_64+0x48/0xc0
[174957.450980] entry_SYSCALL_64_after_hwframe+0x44/0xae
[174957.451605] RIP: 0033:0x7f328fdc4a97
[174957.452059] Code: 03 0c 00 f7 (...)
[174957.454320] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[174957.455262] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97
[174957.456131] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0
[174957.457118] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40
[174957.458005] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000
[174957.459113] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000
[174957.460193] </TASK>
[174957.460534] irq event stamp: 0
[174957.461003] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[174957.461947] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.463147] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.465116] softirqs last disabled at (0): [<0000000000000000>] 0x0
[174957.466323] ---[ end trace bc7ee0c490bce3af ]---
[174957.467282] ------------[ cut here ]------------
[174957.468184] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:3976 btrfs_free_block_groups+0x330/0x3c0 [btrfs]
[174957.470066] Modules linked in: btrfs overlay dm_zero (...)
[174957.483137] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1
[174957.484691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[174957.486853] RIP: 0010:btrfs_free_block_groups+0x330/0x3c0 [btrfs]
[174957.488050] Code: 00 00 00 ad de (...)
[174957.491479] RSP: 0018:ffffb70d41cffde0 EFLAGS: 00010206
[174957.492520] RAX: ffff8b08d79310b0 RBX: ffff8b09c3848000 RCX: 0000000000000000
[174957.493868] RDX: 0000000000000001 RSI: fffff443055ee600 RDI: ffffffffb1131846
[174957.495183] RBP: ffff8b08d79310b0 R08: 0000000000000000 R09: 0000000000000000
[174957.496580] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8b08d7931000
[174957.498027] R13: ffff8b09c38492b0 R14: dead000000000122 R15: dead000000000100
[174957.499438] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000
[174957.500990] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[174957.502117] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0
[174957.503513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[174957.504864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[174957.506167] Call Trace:
[174957.506654] <TASK>
[174957.507047] close_ctree+0x301/0x357 [btrfs]
[174957.507867] ? call_rcu+0x16c/0x290
[174957.508567] generic_shutdown_super+0x74/0x120
[174957.509447] kill_anon_super+0x14/0x30
[174957.510194] btrfs_kill_super+0x12/0x20 [btrfs]
[174957.511123] deactivate_locked_super+0x31/0xa0
[174957.511976] cleanup_mnt+0x147/0x1c0
[174957.512610] task_work_run+0x5c/0xa0
[174957.513309] exit_to_user_mode_prepare+0x1e5/0x1f0
[174957.514231] syscall_exit_to_user_mode+0x16/0x40
[174957.515069] do_syscall_64+0x48/0xc0
[174957.515718] entry_SYSCALL_64_after_hwframe+0x44/0xae
[174957.516688] RIP: 0033:0x7f328fdc4a97
[174957.517413] Code: 03 0c 00 f7 d8 (...)
[174957.521052] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[174957.522514] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97
[174957.523950] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0
[174957.525375] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40
[174957.526763] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000
[174957.528058] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000
[174957.529404] </TASK>
[174957.529843] irq event stamp: 0
[174957.530256] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[174957.531061] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.532075] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.533083] softirqs last disabled at (0): [<0000000000000000>] 0x0
[174957.533865] ---[ end trace bc7ee0c490bce3b0 ]---
[174957.534452] BTRFS info (device dm-0): space_info 4 has 1070841856 free, is not full
[174957.535404] BTRFS info (device dm-0): space_info total=1073741824, used=2785280, pinned=0, reserved=49152, may_use=0, readonly=65536 zone_unusable=0
[174957.537029] BTRFS info (device dm-0): global_block_rsv: size 0 reserved 0
[174957.537859] BTRFS info (device dm-0): trans_block_rsv: size 0 reserved 0
[174957.538697] BTRFS info (device dm-0): chunk_block_rsv: size 0 reserved 0
[174957.539552] BTRFS info (device dm-0): delayed_block_rsv: size 0 reserved 0
[174957.540403] BTRFS info (device dm-0): delayed_refs_rsv: size 0 reserved 0
This also means that in case we have log tree extent buffers that are
still dirty, we can end up not cleaning them up in case we find an
extent buffer with EXTENT_BUFFER_WRITE_ERR set on it, as in that case
we have no way for iterating over the rest of the tree.
This issue is very often triggered with test cases generic/475 and
generic/648 from fstests.
The issue could almost be fixed by iterating over the io tree attached to
each log root which keeps tracks of the range of allocated extent buffers,
log_root->dirty_log_pages, however that does not work and has some
inconveniences:
1) After we sync the log, we clear the range of the extent buffers from
the io tree, so we can't find them after writeback. We could keep the
ranges in the io tree, with a separate bit to signal they represent
extent buffers already written, but that means we need to hold into
more memory until the transaction commits.
How much more memory is used depends a lot on whether we are able to
allocate contiguous extent buffers on disk (and how often) for a log
tree - if we are able to, then a single extent state record can
represent multiple extent buffers, otherwise we need multiple extent
state record structures to track each extent buffer.
In fact, my earlier approach did that:
https://lore.kernel.org/linux-btrfs/3aae7c6728257c7ce2279d6660ee2797e5e34bbd.1641300250.git.fdmanana@suse.com/
However that can cause a very significant negative impact on
performance, not only due to the extra memory usage but also because
we get a larger and deeper dirty_log_pages io tree.
We got a report that, on beefy machines at least, we can get such
performance drop with fsmark for example:
https://lore.kernel.org/linux-btrfs/20220117082426.GE32491@xsang-OptiPlex-9020/
2) We would be doing it only to deal with an unexpected and exceptional
case, which is basically failure to read an extent buffer from disk
due to IO failures. On a healthy system we don't expect transaction
aborts to happen after all;
3) Instead of relying on iterating the log tree or tracking the ranges
of extent buffers in the dirty_log_pages io tree, using the radix
tree that tracks extent buffers (fs_info->buffer_radix) to find all
log tree extent buffers is not reliable either, because after writeback
of an extent buffer it can be evicted from memory by the release page
callback of the btree inode (btree_releasepage()).
Since there's no way to be able to properly cleanup a log tree without
being able to read its extent buffers from disk and without using more
memory to track the logical ranges of the allocated extent buffers do
the following:
1) When we fail to cleanup a log tree, setup a flag that indicates that
failure;
2) Trigger writeback of all log tree extent buffers that are still dirty,
and wait for the writeback to complete. This is just to cleanup their
state, page states, page leaks, etc;
3) When unmounting the fs, ignore if the number of bytes reserved in a
block group and in a space_info is not 0 if, and only if, we failed to
cleanup a log tree. Also ignore only for metadata block groups and the
metadata space_info object.
This is far from a perfect solution, but it serves to silence test
failures such as those from generic/475 and generic/648. However having
a non-zero value for the reserved bytes counters on unmount after a
transaction abort, is not such a terrible thing and it's completely
harmless, it does not affect the filesystem integrity in any way.
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Clang static analysis reports this problem
ioctl.c:3333:8: warning: 3rd function call argument is an
uninitialized value
ret = exclop_start_or_cancel_reloc(fs_info,
cancel is only set in one branch of an if-check and is always used. So
initialize to false.
Fixes: 1a15eb724aae ("btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls")
Reviewed-by: Filipe Manana <[email protected]>
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
At ioctl.c:create_snapshot(), we allocate a pending snapshot structure and
then attach it to the transaction's list of pending snapshots. After that
we call btrfs_commit_transaction(), and if that returns an error we jump
to 'fail' label, where we kfree() the pending snapshot structure. This can
result in a later use-after-free of the pending snapshot:
1) We allocated the pending snapshot and added it to the transaction's
list of pending snapshots;
2) We call btrfs_commit_transaction(), and it fails either at the first
call to btrfs_run_delayed_refs() or btrfs_start_dirty_block_groups().
In both cases, we don't abort the transaction and we release our
transaction handle. We jump to the 'fail' label and free the pending
snapshot structure. We return with the pending snapshot still in the
transaction's list;
3) Another task commits the transaction. This time there's no error at
all, and then during the transaction commit it accesses a pointer
to the pending snapshot structure that the snapshot creation task
has already freed, resulting in a user-after-free.
This issue could actually be detected by smatch, which produced the
following warning:
fs/btrfs/ioctl.c:843 create_snapshot() warn: '&pending_snapshot->list' not removed from list
So fix this by not having the snapshot creation ioctl directly add the
pending snapshot to the transaction's list. Instead add the pending
snapshot to the transaction handle, and then at btrfs_commit_transaction()
we add the snapshot to the list only when we can guarantee that any error
returned after that point will result in a transaction abort, in which
case the ioctl code can safely free the pending snapshot and no one can
access it anymore.
CC: [email protected] # 5.10+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Check item size before accessing the device item to avoid out of bound
access, similar to inode_item check.
Signed-off-by: Su Yue <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
while mounting the crafted image, out-of-bounds access happens:
[350.429619] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
[350.429636] index 1048096 is out of range for type 'page *[16]'
[350.429650] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 #1
[350.429652] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[350.429653] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
[350.429772] Call Trace:
[350.429774] <TASK>
[350.429776] dump_stack_lvl+0x47/0x5c
[350.429780] ubsan_epilogue+0x5/0x50
[350.429786] __ubsan_handle_out_of_bounds+0x66/0x70
[350.429791] btrfs_get_16+0xfd/0x120 [btrfs]
[350.429832] check_leaf+0x754/0x1a40 [btrfs]
[350.429874] ? filemap_read+0x34a/0x390
[350.429878] ? load_balance+0x175/0xfc0
[350.429881] validate_extent_buffer+0x244/0x310 [btrfs]
[350.429911] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
[350.429935] end_bio_extent_readpage+0x3af/0x850 [btrfs]
[350.429969] ? newidle_balance+0x259/0x480
[350.429972] end_workqueue_fn+0x29/0x40 [btrfs]
[350.429995] btrfs_work_helper+0x71/0x330 [btrfs]
[350.430030] ? __schedule+0x2fb/0xa40
[350.430033] process_one_work+0x1f6/0x400
[350.430035] ? process_one_work+0x400/0x400
[350.430036] worker_thread+0x2d/0x3d0
[350.430037] ? process_one_work+0x400/0x400
[350.430038] kthread+0x165/0x190
[350.430041] ? set_kthread_struct+0x40/0x40
[350.430043] ret_from_fork+0x1f/0x30
[350.430047] </TASK>
[350.430077] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
check_leaf() is checking the leaf:
corrupt leaf: root=4 block=29396992 slot=1, bad key order, prev (16140901064495857664 1 0) current (1 204 12582912)
leaf 29396992 items 6 free space 3565 generation 6 owner DEV_TREE
leaf 29396992 flags 0x1(WRITTEN) backref revision 1
fs uuid a62e00e8-e94e-4200-8217-12444de93c2e
chunk uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
item 0 key (16140901064495857664 INODE_ITEM 0) itemoff 3955 itemsize 40
generation 0 transid 0 size 0 nbytes 17592186044416
block group 0 mode 52667 links 33 uid 0 gid 2104132511 rdev 94223634821136
sequence 100305 flags 0x2409000(none)
atime 0.0 (1970-01-01 08:00:00)
ctime 2973280098083405823.4294967295 (-269783007-01-01 21:37:03)
mtime 18446744071572723616.4026825121 (1902-04-16 12:40:00)
otime 9249929404488876031.4294967295 (622322949-04-16 04:25:58)
item 1 key (1 DEV_EXTENT 12582912) itemoff 3907 itemsize 48
dev extent chunk_tree 3
chunk_objectid 256 chunk_offset 12582912 length 8388608
chunk_tree_uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
The corrupted leaf of device tree has an inode item. The leaf passed
checksum and others checks in validate_extent_buffer until check_leaf_item().
Because of the key type BTRFS_INODE_ITEM, check_inode_item() is called even we
are in the device tree. Since the
item offset + sizeof(struct btrfs_inode_item) > eb->len, out-of-bounds access
is triggered.
The item end vs leaf boundary check has been done before
check_leaf_item(), so fix it by checking item size in check_inode_item()
before access of the inode item in extent buffer.
Other check functions except check_dev_item() in check_leaf_item()
have their item size checks.
The commit for check_dev_item() is followed.
No regression observed during running fstests.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
CC: [email protected] # 5.10+
CC: Wenqing Liu <[email protected]>
Signed-off-by: Su Yue <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Quota disable ioctl starts a transaction before waiting for the qgroup
rescan worker completes. However, this wait can be infinite and results
in deadlock because of circular dependency among the quota disable
ioctl, the qgroup rescan worker and the other task with transaction such
as block group relocation task.
The deadlock happens with the steps following:
1) Task A calls ioctl to disable quota. It starts a transaction and
waits for qgroup rescan worker completes.
2) Task B such as block group relocation task starts a transaction and
joins to the transaction that task A started. Then task B commits to
the transaction. In this commit, task B waits for a commit by task A.
3) Task C as the qgroup rescan worker starts its job and starts a
transaction. In this transaction start, task C waits for completion
of the transaction that task A started and task B committed.
This deadlock was found with fstests test case btrfs/115 and a zoned
null_blk device. The test case enables and disables quota, and the
block group reclaim was triggered during the quota disable by chance.
The deadlock was also observed by running quota enable and disable in
parallel with 'btrfs balance' command on regular null_blk devices.
An example report of the deadlock:
[372.469894] INFO: task kworker/u16:6:103 blocked for more than 122 seconds.
[372.479944] Not tainted 5.16.0-rc8 #7
[372.485067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[372.493898] task:kworker/u16:6 state:D stack: 0 pid: 103 ppid: 2 flags:0x00004000
[372.503285] Workqueue: btrfs-qgroup-rescan btrfs_work_helper [btrfs]
[372.510782] Call Trace:
[372.514092] <TASK>
[372.521684] __schedule+0xb56/0x4850
[372.530104] ? io_schedule_timeout+0x190/0x190
[372.538842] ? lockdep_hardirqs_on+0x7e/0x100
[372.547092] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[372.555591] schedule+0xe0/0x270
[372.561894] btrfs_commit_transaction+0x18bb/0x2610 [btrfs]
[372.570506] ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
[372.578875] ? free_unref_page+0x3f2/0x650
[372.585484] ? finish_wait+0x270/0x270
[372.591594] ? release_extent_buffer+0x224/0x420 [btrfs]
[372.599264] btrfs_qgroup_rescan_worker+0xc13/0x10c0 [btrfs]
[372.607157] ? lock_release+0x3a9/0x6d0
[372.613054] ? btrfs_qgroup_account_extent+0xda0/0xda0 [btrfs]
[372.620960] ? do_raw_spin_lock+0x11e/0x250
[372.627137] ? rwlock_bug.part.0+0x90/0x90
[372.633215] ? lock_is_held_type+0xe4/0x140
[372.639404] btrfs_work_helper+0x1ae/0xa90 [btrfs]
[372.646268] process_one_work+0x7e9/0x1320
[372.652321] ? lock_release+0x6d0/0x6d0
[372.658081] ? pwq_dec_nr_in_flight+0x230/0x230
[372.664513] ? rwlock_bug.part.0+0x90/0x90
[372.670529] worker_thread+0x59e/0xf90
[372.676172] ? process_one_work+0x1320/0x1320
[372.682440] kthread+0x3b9/0x490
[372.687550] ? _raw_spin_unlock_irq+0x24/0x50
[372.693811] ? set_kthread_struct+0x100/0x100
[372.700052] ret_from_fork+0x22/0x30
[372.705517] </TASK>
[372.709747] INFO: task btrfs-transacti:2347 blocked for more than 123 seconds.
[372.729827] Not tainted 5.16.0-rc8 #7
[372.745907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[372.767106] task:btrfs-transacti state:D stack: 0 pid: 2347 ppid: 2 flags:0x00004000
[372.787776] Call Trace:
[372.801652] <TASK>
[372.812961] __schedule+0xb56/0x4850
[372.830011] ? io_schedule_timeout+0x190/0x190
[372.852547] ? lockdep_hardirqs_on+0x7e/0x100
[372.871761] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[372.886792] schedule+0xe0/0x270
[372.901685] wait_current_trans+0x22c/0x310 [btrfs]
[372.919743] ? btrfs_put_transaction+0x3d0/0x3d0 [btrfs]
[372.938923] ? finish_wait+0x270/0x270
[372.959085] ? join_transaction+0xc75/0xe30 [btrfs]
[372.977706] start_transaction+0x938/0x10a0 [btrfs]
[372.997168] transaction_kthread+0x19d/0x3c0 [btrfs]
[373.013021] ? btrfs_cleanup_transaction.isra.0+0xfc0/0xfc0 [btrfs]
[373.031678] kthread+0x3b9/0x490
[373.047420] ? _raw_spin_unlock_irq+0x24/0x50
[373.064645] ? set_kthread_struct+0x100/0x100
[373.078571] ret_from_fork+0x22/0x30
[373.091197] </TASK>
[373.105611] INFO: task btrfs:3145 blocked for more than 123 seconds.
[373.114147] Not tainted 5.16.0-rc8 #7
[373.120401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[373.130393] task:btrfs state:D stack: 0 pid: 3145 ppid: 3141 flags:0x00004000
[373.140998] Call Trace:
[373.145501] <TASK>
[373.149654] __schedule+0xb56/0x4850
[373.155306] ? io_schedule_timeout+0x190/0x190
[373.161965] ? lockdep_hardirqs_on+0x7e/0x100
[373.168469] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[373.175468] schedule+0xe0/0x270
[373.180814] wait_for_commit+0x104/0x150 [btrfs]
[373.187643] ? test_and_set_bit+0x20/0x20 [btrfs]
[373.194772] ? kmem_cache_free+0x124/0x550
[373.201191] ? btrfs_put_transaction+0x69/0x3d0 [btrfs]
[373.208738] ? finish_wait+0x270/0x270
[373.214704] ? __btrfs_end_transaction+0x347/0x7b0 [btrfs]
[373.222342] btrfs_commit_transaction+0x44d/0x2610 [btrfs]
[373.230233] ? join_transaction+0x255/0xe30 [btrfs]
[373.237334] ? btrfs_record_root_in_trans+0x4d/0x170 [btrfs]
[373.245251] ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
[373.253296] relocate_block_group+0x105/0xc20 [btrfs]
[373.260533] ? mutex_lock_io_nested+0x1270/0x1270
[373.267516] ? btrfs_wait_nocow_writers+0x85/0x180 [btrfs]
[373.275155] ? merge_reloc_roots+0x710/0x710 [btrfs]
[373.283602] ? btrfs_wait_ordered_extents+0xd30/0xd30 [btrfs]
[373.291934] ? kmem_cache_free+0x124/0x550
[373.298180] btrfs_relocate_block_group+0x35c/0x930 [btrfs]
[373.306047] btrfs_relocate_chunk+0x85/0x210 [btrfs]
[373.313229] btrfs_balance+0x12f4/0x2d20 [btrfs]
[373.320227] ? lock_release+0x3a9/0x6d0
[373.326206] ? btrfs_relocate_chunk+0x210/0x210 [btrfs]
[373.333591] ? lock_is_held_type+0xe4/0x140
[373.340031] ? rcu_read_lock_sched_held+0x3f/0x70
[373.346910] btrfs_ioctl_balance+0x548/0x700 [btrfs]
[373.354207] btrfs_ioctl+0x7f2/0x71b0 [btrfs]
[373.360774] ? lockdep_hardirqs_on_prepare+0x410/0x410
[373.367957] ? lockdep_hardirqs_on_prepare+0x410/0x410
[373.375327] ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
[373.383841] ? find_held_lock+0x2c/0x110
[373.389993] ? lock_release+0x3a9/0x6d0
[373.395828] ? mntput_no_expire+0xf7/0xad0
[373.402083] ? lock_is_held_type+0xe4/0x140
[373.408249] ? vfs_fileattr_set+0x9f0/0x9f0
[373.414486] ? selinux_file_ioctl+0x349/0x4e0
[373.420938] ? trace_raw_output_lock+0xb4/0xe0
[373.427442] ? selinux_inode_getsecctx+0x80/0x80
[373.434224] ? lockdep_hardirqs_on+0x7e/0x100
[373.440660] ? force_qs_rnp+0x2a0/0x6b0
[373.446534] ? lock_is_held_type+0x9b/0x140
[373.452763] ? __blkcg_punt_bio_submit+0x1b0/0x1b0
[373.459732] ? security_file_ioctl+0x50/0x90
[373.466089] __x64_sys_ioctl+0x127/0x190
[373.472022] do_syscall_64+0x3b/0x90
[373.477513] entry_SYSCALL_64_after_hwframe+0x44/0xae
[373.484823] RIP: 0033:0x7f8f4af7e2bb
[373.490493] RSP: 002b:00007ffcbf936178 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[373.500197] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8f4af7e2bb
[373.509451] RDX: 00007ffcbf936220 RSI: 00000000c4009420 RDI: 0000000000000003
[373.518659] RBP: 00007ffcbf93774a R08: 0000000000000013 R09: 00007f8f4b02d4e0
[373.527872] R10: 00007f8f4ae87740 R11: 0000000000000246 R12: 0000000000000001
[373.537222] R13: 00007ffcbf936220 R14: 0000000000000000 R15: 0000000000000002
[373.546506] </TASK>
[373.550878] INFO: task btrfs:3146 blocked for more than 123 seconds.
[373.559383] Not tainted 5.16.0-rc8 #7
[373.565748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[373.575748] task:btrfs state:D stack: 0 pid: 3146 ppid: 2168 flags:0x00000000
[373.586314] Call Trace:
[373.590846] <TASK>
[373.595121] __schedule+0xb56/0x4850
[373.600901] ? __lock_acquire+0x23db/0x5030
[373.607176] ? io_schedule_timeout+0x190/0x190
[373.613954] schedule+0xe0/0x270
[373.619157] schedule_timeout+0x168/0x220
[373.625170] ? usleep_range_state+0x150/0x150
[373.631653] ? mark_held_locks+0x9e/0xe0
[373.637767] ? do_raw_spin_lock+0x11e/0x250
[373.643993] ? lockdep_hardirqs_on_prepare+0x17b/0x410
[373.651267] ? _raw_spin_unlock_irq+0x24/0x50
[373.657677] ? lockdep_hardirqs_on+0x7e/0x100
[373.664103] wait_for_completion+0x163/0x250
[373.670437] ? bit_wait_timeout+0x160/0x160
[373.676585] btrfs_quota_disable+0x176/0x9a0 [btrfs]
[373.683979] ? btrfs_quota_enable+0x12f0/0x12f0 [btrfs]
[373.691340] ? down_write+0xd0/0x130
[373.696880] ? down_write_killable+0x150/0x150
[373.703352] btrfs_ioctl+0x3945/0x71b0 [btrfs]
[373.710061] ? find_held_lock+0x2c/0x110
[373.716192] ? lock_release+0x3a9/0x6d0
[373.722047] ? __handle_mm_fault+0x23cd/0x3050
[373.728486] ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
[373.737032] ? set_pte+0x6a/0x90
[373.742271] ? do_raw_spin_unlock+0x55/0x1f0
[373.748506] ? lock_is_held_type+0xe4/0x140
[373.754792] ? vfs_fileattr_set+0x9f0/0x9f0
[373.761083] ? selinux_file_ioctl+0x349/0x4e0
[373.767521] ? selinux_inode_getsecctx+0x80/0x80
[373.774247] ? __up_read+0x182/0x6e0
[373.780026] ? count_memcg_events.constprop.0+0x46/0x60
[373.787281] ? up_write+0x460/0x460
[373.792932] ? security_file_ioctl+0x50/0x90
[373.799232] __x64_sys_ioctl+0x127/0x190
[373.805237] do_syscall_64+0x3b/0x90
[373.810947] entry_SYSCALL_64_after_hwframe+0x44/0xae
[373.818102] RIP: 0033:0x7f1383ea02bb
[373.823847] RSP: 002b:00007fffeb4d71f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[373.833641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1383ea02bb
[373.842961] RDX: 00007fffeb4d7210 RSI: 00000000c0109428 RDI: 0000000000000003
[373.852179] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
[373.861408] R10: 00007f1383daec78 R11: 0000000000000202 R12: 00007fffeb4d874a
[373.870647] R13: 0000000000493099 R14: 0000000000000001 R15: 0000000000000000
[373.879838] </TASK>
[373.884018]
Showing all locks held in the system:
[373.894250] 3 locks held by kworker/4:1/58:
[373.900356] 1 lock held by khungtaskd/63:
[373.906333] #0: ffffffff8945ff60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260
[373.917307] 3 locks held by kworker/u16:6/103:
[373.923938] #0: ffff888127b4f138 ((wq_completion)btrfs-qgroup-rescan){+.+.}-{0:0}, at: process_one_work+0x712/0x1320
[373.936555] #1: ffff88810b817dd8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x73f/0x1320
[373.951109] #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_qgroup_rescan_worker+0x1f6/0x10c0 [btrfs]
[373.964027] 2 locks held by less/1803:
[373.969982] #0: ffff88813ed56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80
[373.981295] #1: ffffc90000b3b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9e2/0x1060
[373.992969] 1 lock held by btrfs-transacti/2347:
[373.999893] #0: ffff88813d4887a8 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0xe3/0x3c0 [btrfs]
[374.015872] 3 locks held by btrfs/3145:
[374.022298] #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl_balance+0xc3/0x700 [btrfs]
[374.034456] #1: ffff88813d48a0a0 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0xfe5/0x2d20 [btrfs]
[374.047646] #2: ffff88813d488838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x354/0x930 [btrfs]
[374.063295] 4 locks held by btrfs/3146:
[374.069647] #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl+0x38b1/0x71b0 [btrfs]
[374.081601] #1: ffff88813d488bb8 (&fs_info->subvol_sem){+.+.}-{3:3}, at: btrfs_ioctl+0x38fd/0x71b0 [btrfs]
[374.094283] #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_disable+0xc8/0x9a0 [btrfs]
[374.106885] #3: ffff88813d489800 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_disable+0xd5/0x9a0 [btrfs]
[374.126780] =============================================
To avoid the deadlock, wait for the qgroup rescan worker to complete
before starting the transaction for the quota disable ioctl. Clear
BTRFS_FS_QUOTA_ENABLE flag before the wait and the transaction to
request the worker to complete. On transaction start failure, set the
BTRFS_FS_QUOTA_ENABLE flag again. These BTRFS_FS_QUOTA_ENABLE flag
changes can be done safely since the function btrfs_quota_disable is not
called concurrently because of fs_info->subvol_sem.
Also check the BTRFS_FS_QUOTA_ENABLE flag in qgroup_rescan_init to avoid
another qgroup rescan worker to start after the previous qgroup worker
completed.
CC: [email protected] # 5.4+
Suggested-by: Nikolay Borisov <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
[BUG]
The following super simple script would crash btrfs at unmount time, if
CONFIG_BTRFS_ASSERT() is set.
mkfs.btrfs -f $dev
mount $dev $mnt
xfs_io -f -c "pwrite 0 4k" $mnt/file
umount $mnt
mount -r ro $dev $mnt
btrfs scrub start -Br $mnt
umount $mnt
This will trigger the following ASSERT() introduced by commit
0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
late stage of umount").
That patch is definitely not the cause, it just makes enough noise for
developers.
[CAUSE]
We will start transaction for the following call chain during scrub:
scrub_enumerate_chunks()
|- btrfs_inc_block_group_ro()
|- btrfs_join_transaction()
However for RO mount, there is no running transaction at all, thus
btrfs_join_transaction() will start a new transaction.
Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
btrfs_commit_super() to commit the new but empty transaction.
And leads to the ASSERT().
The bug has been there for a long time. Only the new ASSERT() makes it
noisy enough to be noticed.
[FIX]
For read-only scrub on read-only mount, there is no need to start a
transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
if it's read-only, skip all chunk allocation and go inc_block_group_ro()
directly.
CC: [email protected] # 5.4+
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This way we can more easily find the next free IOCTL number when
adding new IOCTLs.
Fixes: be50b2065dfa ("kvm: x86: Add support for getting/setting expanded xstate buffer")
Signed-off-by: Janosch Frank <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Since the commit in the Fixes tag below, 'wm->input_dev' is a managed
resource that doesn't need to be explicitly unregistered or freed (see
devm_input_allocate_device() documentation)
So, remove some unless line of code to slightly simplify it.
Fixes: c72f61e74073 ("Input: wm97xx: split out touchscreen registering")
Signed-off-by: Christophe JAILLET <[email protected]>
Acked-by: Charles Keepax <[email protected]>
Link: https://lore.kernel.org/r/87dce7e80ea9b191843fa22415ca3aef5f3cc2e6.1643529968.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Mark Brown <[email protected]>
|
|
Initially the driver accessed the registers using u32 __iomem but then
in the blamed commit it changed it to use regmap. The problem is that now
the offset of the registers is not calculated anymore at word offset but
at byte offset. Therefore make sure to multiply the offset with word size.
Acked-by: Steen Hegelund <[email protected]>
Reviewed-by: Colin Foster <[email protected]>
Fixes: 2afbbab45c261a ("pinctrl: microchip-sgpio: update to support regmap")
Signed-off-by: Horatiu Vultur <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
|
|
When we replace TCP with SMC and a fallback occurs, there may be
some socket waitqueue entries remaining in smc socket->wq, such
as eppoll_entries inserted by userspace applications.
After the fallback, data flows over TCP/IP and only clcsocket->wq
will be woken up. Applications can't be notified by the entries
which were inserted in smc socket->wq before fallback. So we need
a mechanism to wake up smc socket->wq at the same time if some
entries remaining in it.
The current workaround is to transfer the entries from smc socket->wq
to clcsock->wq during the fallback. But this may cause a crash
like this:
general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G E 5.16.0+ #107
RIP: 0010:__wake_up_common+0x65/0x170
Call Trace:
<IRQ>
__wake_up_common_lock+0x7a/0xc0
sock_def_readable+0x3c/0x70
tcp_data_queue+0x4a7/0xc40
tcp_rcv_established+0x32f/0x660
? sk_filter_trim_cap+0xcb/0x2e0
tcp_v4_do_rcv+0x10b/0x260
tcp_v4_rcv+0xd2a/0xde0
ip_protocol_deliver_rcu+0x3b/0x1d0
ip_local_deliver_finish+0x54/0x60
ip_local_deliver+0x6a/0x110
? tcp_v4_early_demux+0xa2/0x140
? tcp_v4_early_demux+0x10d/0x140
ip_sublist_rcv_finish+0x49/0x60
ip_sublist_rcv+0x19d/0x230
ip_list_rcv+0x13e/0x170
__netif_receive_skb_list_core+0x1c2/0x240
netif_receive_skb_list_internal+0x1e6/0x320
napi_complete_done+0x11d/0x190
mlx5e_napi_poll+0x163/0x6b0 [mlx5_core]
__napi_poll+0x3c/0x1b0
net_rx_action+0x27c/0x300
__do_softirq+0x114/0x2d2
irq_exit_rcu+0xb4/0xe0
common_interrupt+0xba/0xe0
</IRQ>
<TASK>
The crash is caused by privately transferring waitqueue entries from
smc socket->wq to clcsock->wq. The owners of these entries, such as
epoll, have no idea that the entries have been transferred to a
different socket wait queue and still use original waitqueue spinlock
(smc socket->wq.wait.lock) to make the entries operation exclusive,
but it doesn't work. The operations to the entries, such as removing
from the waitqueue (now is clcsock->wq after fallback), may cause a
crash when clcsock waitqueue is being iterated over at the moment.
This patch tries to fix this by no longer transferring wait queue
entries privately, but introducing own implementations of clcsock's
callback functions in fallback situation. The callback functions will
forward the wakeup to smc socket->wq if clcsock->wq is actually woken
up and smc socket->wq has remaining entries.
Fixes: 2153bd1e3d3d ("net/smc: Transfer remaining wait queue entries during fallback")
Suggested-by: Karsten Graul <[email protected]>
Signed-off-by: Wen Gu <[email protected]>
Acked-by: Karsten Graul <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The TCSS_DDI_STATUS register is indexed by tc_port not by the FIA port
index, fix this up. This only caused an issue on TC#3/4 ports in legacy
mode, as in all other cases the two indices either match (on TC#1/2) or
the TCSS_DDI_STATUS_READY flag is set regardless of something being
connected or not (on TC#1/2/3/4 in dp-alt and tbt-alt modes).
Reported-and-tested-by: Chia-Lin Kao (AceLan) <[email protected]>
Fixes: 55ce306c2aa1 ("drm/i915/adl_p: Implement TC sequences")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4698
Cc: José Roberto de Souza <[email protected]>
Cc: <[email protected]> # v5.14+
Signed-off-by: Imre Deak <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 516b33460c5bee78b2055637b0547bdb0e6af754)
Signed-off-by: Tvrtko Ursulin <[email protected]>
|
|
All timestamps returned by GuC for GuC PMU busyness are captured from
GUC PM TIMESTAMP. Since this timestamp does not tick when GuC goes idle,
kmd uses RING_TIMESTAMP to measure busyness of an engine with an active
context. In further stress testing, the MMIO read of the RING_TIMESTAMP
is seen to cause a rare hang. Resolve the issue by using gt specific
timestamp from PM which is in sync with the GuC PM timestamp.
Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Alan Previn <[email protected]>
Signed-off-by: John Harrison <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 721fd84ea1fe957453587efad5fdc44dfba58e04)
Signed-off-by: Tvrtko Ursulin <[email protected]>
|
|
Smatch detected a divide by zero bug in check_overlay_scaling().
drivers/gpu/drm/i915/display/intel_overlay.c:976 check_overlay_scaling()
error: potential divide by zero bug '/ rec->dst_height'.
drivers/gpu/drm/i915/display/intel_overlay.c:980 check_overlay_scaling()
error: potential divide by zero bug '/ rec->dst_width'.
Prevent this by ensuring that the dst height and width are non-zero.
Fixes: 02e792fbaadb ("drm/i915: implement drmmode overlay support v4")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20220124122409.GA31673@kili
(cherry picked from commit cf5b64f7f10b28bebb9b7c9d25e7aee5cbe43918)
Signed-off-by: Tvrtko Ursulin <[email protected]>
|
|
Don't use the interruptable version of the timeline mutex lock in the
error path of eb_pin_timeline as the cleanup must always happen.
v2:
(John Harrison)
- Don't check for interrupt during mutex lock
v3:
(Tvrtko)
- A comment explaining why lock helper isn't used
Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: John Harrison <[email protected]>
Signed-off-by: John Harrison <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit cb935c4618bd2ff9058feee4af7088446da6a763)
Signed-off-by: Tvrtko Ursulin <[email protected]>
|
|
Allocate intel_engine_coredump_alloc with ALLOW_FAIL rather than
GFP_KERNEL to fully decouple the error capture from fence signalling.
v2:
(John Harrison)
- Fix typo in commit message (s/do/to)
Fixes: 8b91cdd4f8649 ("drm/i915: Use __GFP_KSWAPD_RECLAIM in the capture code")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: John Harrison <[email protected]>
Signed-off-by: John Harrison <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 4f72fc3c7f3d9f29a438bb0e17c7773f2fc8242a)
Signed-off-by: Tvrtko Ursulin <[email protected]>
|
|
The ASUS GU603 (Zephyrus M16 - SSID 1043:16b2) requires a quirk similar to
other ASUS devices for correctly routing the 4 integrated speakers. This
fixes it by adding a corresponding quirk entry, which connects the bass
speakers to the proper DAC.
Signed-off-by: Albert Geantă <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
reboot from Windows
This commit switches the Gigabyte X570 Aorus Xtreme from using the
ALC1220_FIXUP_CLEVO_P950 to the ALC1220_FIXUP_GB_X570 quirk. This fixes
the no-audio after reboot from windows problem.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205275
Signed-off-by: Christian Lachner <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
chipset)
Newer versions of the X570 Master come with a newer revision of the
mainboard chipset - the X570S. These boards have the same ALC1220 codec
but seem to initialize the codec with a different parameter in Coef 0x7
which causes the output audio to be very low. We therefore write a
known-good value to Coef 0x7 to fix that. As the value is the exact same
as on the other X570(non-S) boards the same quirk-function can be shared
between both generations.
This commit adds the Gigabyte X570S Aorus Master to the list of boards
using the ALC1220_FIXUP_GB_X570 quirk. This fixes both, the silent output
and the no-audio after reboot from windows problems.
This work has been tested by the folks over at the level1techs forum here:
https://forum.level1techs.com/t/has-anybody-gotten-audio-working-in-linux-on-aorus-x570-master/154072
Signed-off-by: Christian Lachner <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
quirks
The initial commit of the new Gigabyte X570 ALC1220 quirks lacked the
fixup-model entry in alc882_fixup_models[]. It seemed not to cause any ill
effects but for completeness sake this commit makes up for that.
Signed-off-by: Christian Lachner <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
The COEF access is done with two steps: setting the index then read or
write the data. When multiple COEF accesses are performed
concurrently, the index and data might be paired unexpectedly.
In most cases, this isn't a big problem as the COEF setup is done at
the initialization, but some dynamic changes like the mute LED may hit
such a race.
For avoiding the racy COEF accesses, this patch introduces a new
mutex coef_mutex to alc_spec, and wrap the COEF accessing functions
with it.
Reported-by: Alexander Sergeyev <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Now that the VFS will do something with the return values from
->sync_fs, make ours pass on error codes.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Christian Brauner <[email protected]>
|
|
Strangely, dquot_quota_sync ignores the return code from the ->sync_fs
call, which means that quotacalls like Q_SYNC never see the error. This
doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Christian Brauner <[email protected]>
|
|
Strangely, sync_filesystem ignores the return code from the ->sync_fs
call, which means that syscalls like syncfs(2) never see the error.
This doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Christian Brauner <[email protected]>
|
|
If we fail to synchronize the filesystem while preparing to freeze the
fs, abort the freeze.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Christian Brauner <[email protected]>
|
|
|
|
This reverts commit 478ba09edc1f2f2ee27180a06150cb2d1a686f9c.
That commit was meant as a fix for setattrs with by fd (e.g. ftruncate)
to use an open fid instead of the first fid it found on lookup.
The proper fix for that is to use the fid associated with the open file
struct, available in iattr->ia_file for such operations, and was
actually done just before in 66246641609b ("9p: retrieve fid from file
when file instance exist.")
As such, this commit is no longer required.
Furthermore, changing lookup to return open fids first had unwanted side
effects, as it turns out the protocol forbids the use of open fids for
further walks (e.g. clone_fid) and we broke mounts for some servers
enforcing this rule.
Note this only reverts to the old working behaviour, but it's still
possible for lookup to return open fids if dentry->d_fsdata is not set,
so more work is needed to make sure we respect this rule in the future,
for example by adding a flag to the lookup functions to only match
certain fid open modes depending on caller requirements.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 478ba09edc1f ("fs/9p: search open fids first")
Cc: [email protected] # v5.11+
Reported-by: ron minnich <[email protected]>
Reported-by: [email protected]
Signed-off-by: Dominique Martinet <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Drop an unused private data field in the AIC driver
- Various fixes to the realtek-rtl driver
- Make the GICv3 ITS driver compile again in !SMP configurations
- Force reset of the GICv3 ITSs at probe time to avoid issues during kexec
- Yet another kfree/bitmap_free conversion
- Various DT updates (Renesas, SiFive)
* tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples
dt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts
dt-bindings: irqchip: renesas-irqc: Add R-Car V3U support
irqchip/gic-v3-its: Reset each ITS's BASERn register before probe
irqchip/gic-v3-its: Fix build for !SMP
irqchip/loongson-pch-ms: Use bitmap_free() to free bitmap
irqchip/realtek-rtl: Service all pending interrupts
irqchip/realtek-rtl: Fix off-by-one in routing
irqchip/realtek-rtl: Map control data to virq
irqchip/apple-aic: Drop unused ipi_hwirq field
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Prevent accesses to the per-CPU cgroup context list from another CPU
except the one it belongs to, to avoid list corruption
- Make sure parent events are always woken up to avoid indefinite hangs
in the traced workload
* tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix cgroup event list management
perf: Always wake the parent event
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
"Make sure the membarrier-rseq fence commands are part of the reported
set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY"
* tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add another Intel CPU model to the list of CPUs supporting the
processor inventory unique number
- Allow writing to MCE thresholding sysfs files again - a previous
change had accidentally disabled it and no one noticed. Goes to show
how much is this stuff used
* tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
x86/MCE/AMD: Allow thresholding interface updates after init
|
|
Merge misc fixes from Andrew Morton:
"12 patches.
Subsystems affected by this patch series: sysctl, binfmt, ia64, mm
(memory-failure, folios, kasan, and psi), selftests, and ocfs2"
* emailed patches from Andrew Morton <[email protected]>:
ocfs2: fix a deadlock when commit trans
jbd2: export jbd2_journal_[grab|put]_journal_head
psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
mm, kasan: use compare-exchange operation to set KASAN page tag
kasan: test: fix compatibility with FORTIFY_SOURCE
tools/testing/scatterlist: add missing defines
mm: page->mapping folio->mapping should have the same offset
memory-failure: fetch compound_head after pgmap_pfn_valid()
ia64: make IA64_MCA_RECOVERY bool instead of tristate
binfmt_misc: fix crash when load/unload module
include/linux/sysctl.h: fix register_sysctl_mount_point() return type
|
|
commit 6f1b228529ae introduces a regression which can deadlock as
follows:
Task1: Task2:
jbd2_journal_commit_transaction ocfs2_test_bg_bit_allocatable
spin_lock(&jh->b_state_lock) jbd_lock_bh_journal_head
__jbd2_journal_remove_checkpoint spin_lock(&jh->b_state_lock)
jbd2_journal_put_journal_head
jbd_lock_bh_journal_head
Task1 and Task2 lock bh->b_state and jh->b_state_lock in different
order, which finally result in a deadlock.
So use jbd2_journal_[grab|put]_journal_head instead in
ocfs2_test_bg_bit_allocatable() to fix it.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 6f1b228529ae ("ocfs2: fix race between searching chunks and release journal_head from buffer_head")
Signed-off-by: Joseph Qi <[email protected]>
Reported-by: Gautham Ananthakrishna <[email protected]>
Tested-by: Gautham Ananthakrishna <[email protected]>
Reported-by: Saeed Mirzamohammadi <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "ocfs2: fix a deadlock case".
This fixes a deadlock case in ocfs2. We firstly export jbd2 symbols
jbd2_journal_[grab|put]_journal_head as preparation and later use them
in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the
deadlock.
This patch (of 2):
This exports symbols jbd2_journal_[grab|put]_journal_head, which will be
used outside modules, e.g. ocfs2.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Gautham Ananthakrishna <[email protected]>
Cc: Saeed Mirzamohammadi <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|