Age | Commit message (Collapse) | Author | Files | Lines |
|
A duplicated line 'select DRM_KMS_HELPER' was introduced in Kconfig file
by commit 09717af7d13d ("drm: Remove CONFIG_DRM_KMS_CMA_HELPER option"),
so remove it.
Fixes: 09717af7d13d ("drm: Remove CONFIG_DRM_KMS_CMA_HELPER option")
Signed-off-by: Liu Ying <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
Signed-off-by: Philipp Zabel <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Dell Vostro 5568 laptop has lis3lv02d, but its i2c address is not known
to the kernel. Add this address.
Output of "cat /sys/devices/platform/lis3lv02d/position" on Dell Vostro
5568 laptop:
- Horizontal: (-18,0,1044)
- Front elevated: (522,-18,1080)
- Left elevated: (-18,-360,1080)
- Upside down: (36,108,-1134)
Signed-off-by: Nam Cao <[email protected]>
Reviewed-by: Jean Delvare <[email protected]>
Reviewed-by: Pali Rohár <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
When the I2C controllers are running in DMA mode, it is the DMA engine
that performs the memory accesses rather than the I2C controller. Pass
the DMA engine's struct device pointer to the DMA API to make sure the
correct DMA operations are used.
This fixes an issue where the DMA engine's SMMU stream ID needs to be
misleadingly set for the I2C controllers in device tree.
Suggested-by: Robin Murphy <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
In piix4_probe(), the piix4 adapter will be registered in:
piix4_probe()
piix4_add_adapters_sb800() / piix4_add_adapter()
i2c_add_adapter()
Based on the probed device type, piix4_add_adapters_sb800() or single
piix4_add_adapter() will be called.
For the former case, piix4_adapter_count is set as the number of adapters,
while for antoher case it is not set and kept default *zero*.
When piix4 is removed, piix4_remove() removes the adapters added in
piix4_probe(), basing on the piix4_adapter_count value.
Because the count is zero for the single adapter case, the adapter won't
be removed and makes the sources allocated for adapter leaked, such as
the i2c client and device.
These sources can still be accessed by i2c or bus and cause problems.
An easily reproduced case is that if a new adapter is registered, i2c
will get the leaked adapter and try to call smbus_algorithm, which was
already freed:
Triggered by: rmmod i2c_piix4 && modprobe max31730
BUG: unable to handle page fault for address: ffffffffc053d860
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3752 Comm: modprobe Tainted: G
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:i2c_default_probe (drivers/i2c/i2c-core-base.c:2259) i2c_core
RSP: 0018:ffff888107477710 EFLAGS: 00000246
...
<TASK>
i2c_detect (drivers/i2c/i2c-core-base.c:2302) i2c_core
__process_new_driver (drivers/i2c/i2c-core-base.c:1336) i2c_core
bus_for_each_dev (drivers/base/bus.c:301)
i2c_for_each_dev (drivers/i2c/i2c-core-base.c:1823) i2c_core
i2c_register_driver (drivers/i2c/i2c-core-base.c:1861) i2c_core
do_one_initcall (init/main.c:1296)
do_init_module (kernel/module/main.c:2455)
...
</TASK>
---[ end trace 0000000000000000 ]---
Fix this problem by correctly set piix4_adapter_count as 1 for the
single adapter so it can be normally removed.
Fixes: 528d53a1592b ("i2c: piix4: Fix probing of reserved ports on AMD Family 16h Model 30h")
Signed-off-by: Chen Zhongjin <[email protected]>
Reviewed-by: Jean Delvare <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
When thermnal zones are defined, trip points definitions are mandatory.
Define a couple of critical trip points for monitoring of existing
PMIC and SOC thermal zones.
This was lost between txt to yaml conversion and was re-enforced recently
via the commit 8c596324232d ("dt-bindings: thermal: Fix missing required property")
Cc: Rob Herring <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: [email protected]
Signed-off-by: Cristian Marussi <[email protected]>
Fixes: f7b636a8d83c ("arm64: dts: juno: add thermal zones for scpi sensors")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sudeep Holla <[email protected]>
|
|
Use devres to allocate the dedicated deferred_tx_wq polling workqueue so
as to automatically trigger the proper resource release on error path.
Reported-by: Dan Carpenter <[email protected]>
Fixes: 5a3b7185c47c ("firmware: arm_scmi: Add atomic mode support to virtio transport")
Signed-off-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sudeep Holla <[email protected]>
|
|
SCMI virtio transport device managed allocations must use the main
platform device in devres operations instead of the channel devices.
Cc: Peter Hilber <[email protected]>
Fixes: 46abe13b5e3d ("firmware: arm_scmi: Add virtio transport")
Signed-off-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sudeep Holla <[email protected]>
|
|
SCMI Rx channels are optional and they can fail to be setup when not
present but anyway channels setup routines must bail-out on memory errors.
Make channels setup, and related probing, fail when memory errors are
reported on Rx channels.
Fixes: 5c8a47a5a91d ("firmware: arm_scmi: Make scmi core independent of the transport type")
Signed-off-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sudeep Holla <[email protected]>
|
|
SCMI transports based on shared memory, at start of transmissions, have
to wait for the shared Tx channel area to be eventually freed by the
SCMI platform before accessing the channel. In fact the channel is owned
by the SCMI platform until marked as free by the platform itself and,
as such, cannot be used by the agent until relinquished.
As a consequence a badly misbehaving SCMI platform firmware could lock
the channel indefinitely and make the kernel side SCMI stack loop
forever waiting for such channel to be freed, possibly hanging the
whole boot sequence.
Add a timeout to the existent Tx waiting spin-loop so that, when the
system ends up in this situation, the SCMI stack can at least bail-out,
nosily warn the user, and abort the transmission.
Reported-by: YaxiongTian <[email protected]>
Suggested-by: YaxiongTian <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Etienne Carriere <[email protected]>
Cc: Florian Fainelli <[email protected]>
Signed-off-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sudeep Holla <[email protected]>
|
|
Suppress the capability to unbind the core SCMI driver since all the
SCMI stack protocol drivers depend on it.
Fixes: aa4f886f3893 ("firmware: arm_scmi: add basic driver infrastructure for SCMI")
Signed-off-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sudeep Holla <[email protected]>
|
|
Platform drivers .remove callbacks are not supposed to fail and report
errors. Such errors are indeed ignored by the core platform drivers
and the driver unbind process is anyway completed.
The SCMI core platform driver as it is now, instead, bails out reporting
an error in case of an explicit unbind request.
Fix the removal path by adding proper device links between the core SCMI
device and the SCMI protocol devices so that a full SCMI stack unbind is
triggered when the core driver is removed. The remove process does not
bail out anymore on the anomalous conditions triggered by an explicit
unbind but the user is still warned.
Reported-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sudeep Holla <[email protected]>
|
|
Add Jay Fang as the maintainer of the HiSilicon LPC BUS Driver, replacing
John Garry.
Signed-off-by: Jay Fang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
Recent changes to the thermal framework has made the trip
points (trips) for thermal zones compulsory, which made
the Ux500 DTS files break validation and also stopped
probing because of similar changes to the code.
Fix this by adding an "outer bounding box": battery thermal
zones should not get warmer than 70 degress, then we will
shut down.
Fixes: 8c596324232d ("dt-bindings: thermal: Fix missing required property")
Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization")
Signed-off-by: Linus Walleij <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 6.1:
- Fix imx93-pd driver to release resources when error occurs in probe.
- A series from Ioana Ciornei to add missing clock frequencies for MDIO
controllers on LayerScape SoCs, so that the kernel driver can work
independently from bootloader.
- A series from Li Jun to fix USB power domain setup in i.MX8MM/N device
trees.
- Fix CPLD_Dn pull configuration for MX8Menlo board to avoid interfering
with CPLD power off functionality.
- Fix ctrl_sleep_moci GPIO setup for verdin-imx8mp board.
- Fix DT schema check warnings on uSDHC clocks for imx8-ss-conn device
tree.
- Fix up gpcv2 DT bindings to have an optional `power-domains` property.
- A couple of i.MX93 device tree fixes on S4MU interrupt and gpio-ranges
of GPIO controllers.
- Keep PU regulator on for Quad and QuadPlus based imx6dl-yapp4 boards to
work around a hardware design flaw in supply voltage distribution.
- Fix user push-button GPIO offset on imx6qdl-gw59 boards.
* tag 'imx-fixes-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: ls208xa: specify clock frequencies for the MDIO controllers
arm64: dts: ls1088a: specify clock frequencies for the MDIO controllers
arm64: dts: lx2160a: specify clock frequencies for the MDIO controllers
soc: imx: imx93-pd: Fix the error handling path of imx93_pd_probe()
arm64: dts: imx93: correct gpio-ranges
arm64: dts: imx93: correct s4mu interrupt names
dt-bindings: power: gpcv2: add power-domains property
arm64: dts: imx8: correct clock order
ARM: dts: imx6dl-yapp4: Do not allow PM to switch PU regulator off on Q/QP
ARM: dts: imx6qdl-gw59{10,13}: fix user pushbutton GPIO offset
arm64: dts: imx8mn: Correct the usb power domain
arm64: dts: imx8mn: remove otg1 power domain dependency on hsio
arm64: dts: imx8mm: correct usb power domains
arm64: dts: imx8mm: remove otg1/2 power domain dependency on hsio
arm64: dts: verdin-imx8mp: fix ctrl_sleep_moci
arm64: dts: imx8mm: Enable CPLD_Dn pull down resistor on MX8Menlo
Link: https://lore.kernel.org/r/20221101031547.GB125525@dragon
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
No need to postpone this to the commit release path, since no packets
are walking over this object, this is accessed from control plane only.
This helped uncovered UAF triggered by races with the netlink notifier.
Fixes: 9dd732e0bdf5 ("netfilter: nf_tables: memleak flow rule from commit path")
Reported-by: [email protected]
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
commit release path is invoked via call_rcu and it runs lockless to
release the objects after rcu grace period. The netlink notifier handler
might win race to remove objects that the transaction context is still
referencing from the commit release path.
Call rcu_barrier() to ensure pending rcu callbacks run to completion
if the list of transactions to be destroyed is not empty.
Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Reported-by: [email protected]
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
On 32-bit kernels, 64-bit syscall arguments are split into two
registers. For that to work with syscall wrappers, the prototype of the
syscall must have the argument split so that the wrapper macro properly
unpacks the arguments from pt_regs.
The fanotify_mark() syscall is one such syscall, which already has a
split prototype, guarded behind ARCH_SPLIT_ARG64.
So select ARCH_SPLIT_ARG64 to get that prototype and fix fanotify_mark()
on 32-bit kernels with syscall wrappers.
Note also that fanotify_mark() is the only usage of ARCH_SPLIT_ARG64.
Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Recently, we got two syzkaller problems because of oversize packet
when napi frags enabled.
One of the problems is because the first seg size of the iov_iter
from user space is very big, it is 2147479538 which is bigger than
the threshold value for bail out early in __alloc_pages(). And
skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc
reserves without __GFP_NOWARN flag. Thus we got a warning as following:
========================================================
WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
...
Call trace:
__alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
__alloc_pages_node include/linux/gfp.h:550 [inline]
alloc_pages_node include/linux/gfp.h:564 [inline]
kmalloc_large_node+0x94/0x350 mm/slub.c:4038
__kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
__kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
__skb_grow include/linux/skbuff.h:2779 [inline]
tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
The other problem is because odd IPv6 packets without NEXTHDR_NONE
extension header and have big packet length, it is 2127925 which is
bigger than ETH_MAX_MTU(65535). After ipv6_gso_pull_exthdrs() in
ipv6_gro_receive(), network_header offset and transport_header offset
are all bigger than U16_MAX. That would trigger skb->network_header
and skb->transport_header overflow error, because they are all '__u16'
type. Eventually, it would affect the value for __skb_push(skb, value),
and make it be a big value. After __skb_push() in ipv6_gro_receive(),
skb->data would less than skb->head, an out of bounds memory bug occurred.
That would trigger the problem as following:
==================================================================
BUG: KASAN: use-after-free in eth_type_trans+0x100/0x260
...
Call trace:
dump_backtrace+0xd8/0x130
show_stack+0x1c/0x50
dump_stack_lvl+0x64/0x7c
print_address_description.constprop.0+0xbc/0x2e8
print_report+0x100/0x1e4
kasan_report+0x80/0x120
__asan_load8+0x78/0xa0
eth_type_trans+0x100/0x260
napi_gro_frags+0x164/0x550
tun_get_user+0xda4/0x1270
tun_chr_write_iter+0x74/0x130
do_iter_readv_writev+0x130/0x1ec
do_iter_write+0xbc/0x1e0
vfs_writev+0x13c/0x26c
To fix the problems, restrict the packet size less than
(ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN) which has considered reserved
skb space in napi_alloc_skb() because transport_header is an offset from
skb->head. Add len check in tun_napi_alloc_frags() simply.
Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Ziyang Xuan <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Changed maintainers for vnic driver, since Dany has new responsibilities.
Also added Nick Child as reviewer.
Signed-off-by: Rick Lindsley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
blk_mq_flush_plug_list() empties ->mq_list and request we'd peeked there
before that call is gone; in any case, we are not dealing with a mix
of requests for different queues now - there's no requests left in the
plug.
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
With the introduction of syscall wrappers all wrappers for syscalls with
64-bit arguments must be handled specially, not only those that have
unaligned 64-bit arguments. This left out the fallocate() and
sync_file_range2() syscalls.
Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper")
Fixes: e23750623835 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Signed-off-by: Andreas Schwab <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The macros are defined backwards.
This affects the following compat syscalls:
- compat_sys_truncate64()
- compat_sys_ftruncate64()
- compat_sys_fallocate()
- compat_sys_sync_file_range()
- compat_sys_fadvise64_64()
- compat_sys_readahead()
- compat_sys_pread64()
- compat_sys_pwrite64()
Fixes: 43d5de2b67d7 ("asm-generic: compat: Support BE for long long args in 32-bit ABIs")
Signed-off-by: Andreas Schwab <[email protected]>
[mpe: Add list of affected syscalls]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes and regression fixes:
- fix a corner case when handling tree-mod-log chagnes in reallocated
notes
- fix crash on raid0 filesystems created with <5.4 mkfs.btrfs that
could lead to division by zero
- add missing super block checksum verification after thawing
filesystem
- handle one more case in send when dealing with orphan files
- fix parameter type mismatch for generation when reading dentry
- improved error handling in raid56 code
- better struct bio packing after recent cleanups"
* tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't use btrfs_chunk::sub_stripes from disk
btrfs: fix type of parameter generation in btrfs_get_dentry
btrfs: send: fix send failure of a subcase of orphan inodes
btrfs: make thaw time super block check to also verify checksum
btrfs: fix tree mod log mishandling of reallocated nodes
btrfs: reorder btrfs_bio for better packing
btrfs: raid56: avoid double freeing for rbio if full_stripe_write() failed
btrfs: raid56: properly handle the error when unable to find the missing stripe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM fix from Paul Moore:
"A single patch to the capabilities code to fix a potential memory leak
in the xattr allocation error handling"
* tag 'lsm-pr-20221031' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
capabilities: fix potential memleak on error path from vfs_getxattr_alloc()
|
|
There are two capabilities related to ring-based dirty page tracking:
KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Both are
supported by x86. However, arm64 supports KVM_CAP_DIRTY_LOG_RING_ACQ_REL
only when the feature is supported on arm64. The userspace doesn't have
to enable the advertised capability, meaning KVM_CAP_DIRTY_LOG_RING can
be enabled on arm64 by userspace and it's wrong.
Fix it by double checking if the capability has been advertised prior to
enabling it. It's rejected to enable the capability if it hasn't been
advertised.
Fixes: 17601bfed909 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
Reported-by: Sean Christopherson <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Oliver Upton <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.1-fixes
xfs: fix various problems with log intent item recovery
Starting with 6.1-rc1, CONFIG_FORTIFY_SOURCE checks became smart enough
to detect memcpy() callers that copy beyond what seems to be the end of
a struct. Unfortunately, gcc has a bug wherein it cannot reliably
compute the size of a struct containing another struct containing a flex
array at the end. This is the case with the xfs log item format
structures, which means that -rc1 starts complaining all over the place.
Fix these problems by memcpying the struct head and the flex arrays
separately. Although it's tempting to use the FLEX_ARRAY macros, the
structs involved are part of the ondisk log format. Some day we're
going to want to make the ondisk log contents endian-safe, which means
that we will have to stop using memcpy entirely.
While we're at it, fix some deficiencies in the validation of recovered
log intent items -- if the size of the recovery buffer is not even large
enough to cover the flex array record count in the head, we should abort
the recovery of that item immediately.
The last patch of this series changes the EFI/EFD sizeof functions names
and behaviors to be consistent with the similarly named sizeof helpers
for other log intent items.
v2: fix more inadequate log intent done recovery validation and dump
corrupt recovered items
Signed-off-by: Darrick J. Wong <[email protected]>
* tag 'fix-log-recovery-misuse-6.1_2022-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: dump corrupt recovered log intent items to dmesg consistently
xfs: actually abort log recovery on corrupt intent-done log items
xfs: refactor all the EFI/EFD log item sizeof logic
xfs: fix memcpy fortify errors in EFI log format copying
xfs: fix memcpy fortify errors in RUI log format copying
xfs: fix memcpy fortify errors in CUI log format copying
xfs: fix memcpy fortify errors in BUI log format copying
xfs: fix validation in attr log item recovery
|
|
We've been (ab)using XFS_REFC_COW_START as both an integer quantity and
a bit flag, even though it's *only* a bit flag. Rename the variable to
reflect its nature and update the cast target since we're not supposed
to be comparing it to xfs_agblock_t now.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
We're supposed to initialize the list head of an object before adding it
to another list. Fix that, and stop using the kmem_{alloc,free} calls
from the Irix days.
Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
As we've seen, refcount records use the upper bit of the rc_startblock
field to ensure that all the refcount records are at the right side of
the refcount btree. This works because an AG is never allowed to have
more than (1U << 31) blocks in it. If we ever encounter a filesystem
claiming to have that many blocks, we absolutely do not want reflink
touching it at all.
However, this test at the start of xfs_refcount_recover_cow_leftovers is
slightly incorrect -- it /should/ be checking that agblocks isn't larger
than the XFS_MAX_CRC_AG_BLOCKS constant, and it should check that the
constant is never large enough to conflict with that CoW flag.
Note that the V5 superblock verifier has not historically rejected
filesystems where agblocks >= XFS_MAX_CRC_AG_BLOCKS, which is why this
ended up in the COW recovery routine.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Now that we've separated the startblock and CoW/shared extent domain in
the incore refcount record structure, check the domain whenever we
retrieve a record to ensure that it's still in the domain that we want.
Depending on the circumstances, a change in domain either means we're
done processing or that we've found a corruption and need to fail out.
The refcount check in xchk_xref_is_cow_staging is redundant since
_get_rec has done that for a long time now, so we can get rid of it.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Now that we have an explicit enum for shared and CoW staging extents, we
can get rid of the old FIND_RCEXT flags. Omit a couple of conversions
that disappear in the next patches.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Create a helper function to ensure that CoW staging extent records have
a single refcount and that shared extent records have more than 1
refcount. We'll put this to more use in the next patch.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Now that we've broken out the startblock and shared/cow domain in the
incore refcount extent record structure, update the tracepoints to
report the domain.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Just prior to committing the reflink code into upstream, the xfs
maintainer at the time requested that I find a way to shard the refcount
records into two domains -- one for records tracking shared extents, and
a second for tracking CoW staging extents. The idea here was to
minimize mount time CoW reclamation by pushing all the CoW records to
the right edge of the keyspace, and it was accomplished by setting the
upper bit in rc_startblock. We don't allow AGs to have more than 2^31
blocks, so the bit was free.
Unfortunately, this was a very late addition to the codebase, so most of
the refcount record processing code still treats rc_startblock as a u32
and pays no attention to whether or not the upper bit (the cow flag) is
set. This is a weakness is theoretically exploitable, since we're not
fully validating the incoming metadata records.
Fuzzing demonstrates practical exploits of this weakness. If the cow
flag of a node block key record is corrupted, a lookup operation can go
to the wrong record block and start returning records from the wrong
cow/shared domain. This causes the math to go all wrong (since cow
domain is still implicit in the upper bit of rc_startblock) and we can
crash the kernel by tricking xfs into jumping into a nonexistent AG and
tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL.
To fix this, start tracking the domain as an explicit part of struct
xfs_refcount_irec, adjust all refcount functions to check the domain
of a returned record, and alter the function definitions to accept them
where necessary.
Found by fuzzing keys[2].cowflag = add in xfs/464.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Consolidate the open-coded xfs_refcount_irec fields into an actual
struct and use the existing _btrec_to_irec to decode the ondisk record.
This will reduce code churn in the next patch.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
If log recovery decides that an intent item is corrupt and wants to
abort the mount, capture a hexdump of the corrupt log item in the kernel
log for further analysis. Some of the log item code already did this,
so we're fixing the rest to do it consistently.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Structure definitions for incore objects do not belong in the ondisk
format header. Move them to the incore types header where they belong.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
If log recovery picks up intent-done log items that are not of the
correct size it needs to abort recovery and fail the mount. Debug
assertions are not good enough.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
If we're in the middle of a deferred refcount operation and decide to
roll the transaction to avoid overflowing the transaction space, we need
to check the new agbno/aglen parameters that we're about to record in
the new intent. Specifically, we need to check that the new extent is
completely within the filesystem, and that continuation does not put us
into a different AG.
If the keys of a node block are wrong, the lookup to resume an
xfs_refcount_adjust_extents operation can put us into the wrong record
block. If this happens, we might not find that we run out of aglen at
an exact record boundary, which will cause the loop control to do the
wrong thing.
The previous patch should take care of that problem, but let's add this
extra sanity check to stop corruption problems sooner than later.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Refactor all the open-coded sizeof logic for EFI/EFD log item and log
format structures into common helper functions whose names reflect the
struct names.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Create a predicate function to verify that a given agbno/blockcount pair
fit entirely within a single allocation group and don't suffer
mathematical overflows. Refactor the existng open-coded logic; we're
going to add more calls to this function in the next patch.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
memcpy. Since we're already fixing problems with BUI item copying, we
should fix it everything else.
An extra difficulty here is that the ef[id]_extents arrays are declared
as single-element arrays. This is not the convention for flex arrays in
the modern kernel, and it causes all manner of problems with static
checking tools, since they often cannot tell the difference between a
single element array and a flex array.
So for starters, change those array[1] declarations to array[]
declarations to signal that they are proper flex arrays and adjust all
the "size-1" expressions to fit the new declaration style.
Next, refactor the xfs_efi_copy_format function to handle the copying of
the head and the flex array members separately. While we're at it, fix
a minor validation deficiency in the recovery function.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Prior to calling xfs_refcount_adjust_extents, we trimmed agbno/aglen
such that the end of the range would not be in the middle of a refcount
record. If this is no longer the case, something is seriously wrong
with the btree. Bail out with a corruption error.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
memcpy. Since we're already fixing problems with BUI item copying, we
should fix it everything else.
Refactor the xfs_rui_copy_format function to handle the copying of the
head and the flex array members separately. While we're at it, fix a
minor validation deficiency in the recovery function.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
memcpy. Since we're already fixing problems with BUI item copying, we
should fix it everything else.
Refactor the xfs_cui_copy_format function to handle the copying of the
head and the flex array members separately. While we're at it, fix a
minor validation deficiency in the recovery function.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
memcpy. Unfortunately, it doesn't handle flex arrays correctly:
------------[ cut here ]------------
memcpy: detected field-spanning write (size 48) of single field "dst_bui_fmt" at fs/xfs/xfs_bmap_item.c:628 (size 16)
Fix this by refactoring the xfs_bui_copy_format function to handle the
copying of the head and the flex array members separately. While we're
at it, fix a minor validation deficiency in the recovery function.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Before we start fixing all the complaints about memcpy'ing log items
around, let's fix some inadequate validation in the xattr log item
recovery code and get rid of the (now trivial) copy_format function.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
When doing a direct IO write using a iocb with nowait and dsync set, we
end up not syncing the file once the write completes.
This is because we tell iomap to not call generic_write_sync(), which
would result in calling btrfs_sync_file(), in order to avoid a deadlock
since iomap can call it while we are holding the inode's lock and
btrfs_sync_file() needs to acquire the inode's lock. The deadlock happens
only if the write happens synchronously, when iomap_dio_rw() calls
iomap_dio_complete() before it returns. Instead we do the sync ourselves
at btrfs_do_write_iter().
For a nowait write however we can end up not doing the sync ourselves at
at btrfs_do_write_iter() because the write could have been queued, and
therefore we get -EIOCBQUEUED returned from iomap in such case. That makes
us skip the sync call at btrfs_do_write_iter(), as we don't do it for
any error returned from btrfs_direct_write(). We can't simply do the call
even if -EIOCBQUEUED is returned, since that would block the task waiting
for IO, both for the data since there are bios still in progress as well
as potentially blocking when joining a log transaction and when syncing
the log (writing log trees, super blocks, etc).
So let iomap do the sync call itself and in order to avoid deadlocks for
the case of synchronous writes (without nowait), use __iomap_dio_rw() and
have ourselves call iomap_dio_complete() after unlocking the inode.
A test case will later be sent for fstests, after this is fixed in Linus'
tree.
Fixes: 51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO reads and writes")
Reported-by: Марк Коренберг <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/CAEmTpZGRKbzc16fWPvxbr6AfFsQoLmz-Lcg-7OgJOZDboJ+SGQ@mail.gmail.com/
CC: [email protected] # 6.0+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
The kernel robot complained about this:
>> fs/xfs/xfs_file.c:1266:31: sparse: sparse: incorrect type in return expression (different base types) @@ expected int @@ got restricted vm_fault_t @@
fs/xfs/xfs_file.c:1266:31: sparse: expected int
fs/xfs/xfs_file.c:1266:31: sparse: got restricted vm_fault_t
fs/xfs/xfs_file.c:1314:21: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted vm_fault_t [usertype] ret @@ got int @@
fs/xfs/xfs_file.c:1314:21: sparse: expected restricted vm_fault_t [usertype] ret
fs/xfs/xfs_file.c:1314:21: sparse: got int
Fix the incorrect return type for these two functions.
While we're at it, make the !fsdax version return VM_FAULT_SIGBUS
because a zero return value will cause some callers to try to lock
vmf->page, which we never set here.
Fixes: ea6c49b784f0 ("xfs: support CoW in fsdax mode")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
After allocation 'dip' is tested instead of 'dip->csums'. Fix it.
Fixes: 642c5d34da53 ("btrfs: allocate the btrfs_dio_private as part of the iomap dio bio")
CC: [email protected] # 5.19+
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|