blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-08-27	tls: allocate the fallback aead after checking that the cipher is valid	Sabrina Dubroca	1	-10/+10
	No need to allocate the aead if we're going to fail afterwards. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/335e32511ed55a0b30f3f81a78fa8f323b3bdf8f.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tls: expand use of tls_cipher_desc in tls_set_device_offload	Sabrina Dubroca	1	-18/+4
	tls_set_device_offload is already getting iv and rec_seq sizes from tls_cipher_desc. We can now also check if the cipher_type coming from userspace is valid and can be offloaded. We can also remove the runtime check on rec_seq, since we validate it at compile time. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/8ab71b8eca856c7aaf981a45fe91ac649eb0e2e9.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tls: validate cipher descriptions at compile time	Sabrina Dubroca	1	-0/+18
	Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/b38fb8cf60e099e82ae9979c3c9c92421042417c.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tls: extend tls_cipher_desc to fully describe the ciphers	Sabrina Dubroca	2	-9/+64
	- add nonce, usually equal to iv_size but not for chacha - add offsets into the crypto_info for each field - add algorithm name - add offloadable flag Also add helpers to access each field of a crypto_info struct described by a tls_cipher_desc. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/39d5f476d63c171097764e8d38f6f158b7c109ae.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tls: rename tls_cipher_size_desc to tls_cipher_desc	Sabrina Dubroca	4	-52/+52
	We're going to add other fields to it to fully describe a cipher, so the "_size" name won't match the contents. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/76ca6c7686bd6d1534dfa188fb0f1f6fabebc791.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tls: reduce size of tls_cipher_size_desc	Sabrina Dubroca	4	-9/+20
	tls_cipher_size_desc indexes ciphers by their type, but we're not using indices 0..50 of the array. Each struct tls_cipher_size_desc is 20B, so that's a lot of unused memory. We can reindex the array starting at the lowest used cipher_type. Introduce the get_cipher_size_desc helper to find the right item and avoid out-of-bounds accesses, and make tls_cipher_size_desc's size explicit so that gcc reminds us to update TLS_CIPHER_MIN/MAX when we add a new cipher. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/5e054e370e240247a5d37881a1cd93a67c15f4ca.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tls: add TLS_CIPHER_ARIA_GCM_* to tls_cipher_size_desc	Sabrina Dubroca	1	-0/+2
	Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/b2e0fb79e6d0a4478be9bf33781dc9c9281c9d56.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tls: move tls_cipher_size_desc to net/tls/tls.h	Sabrina Dubroca	2	-10/+10
	It's only used in net/tls/*, no need to bloat include/net/tls.h. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/dd9fad80415e5b3575b41f56b331871038362eab.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	selftests: tls: test some invalid inputs for setsockopt	Sabrina Dubroca	1	-0/+25
	This test will need to be updated if new ciphers are added. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/bfcfa9cffda56d2064296ab7c99a05775dd4c28e.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	selftests: tls: add getsockopt test	Sabrina Dubroca	1	-0/+35
	The kernel accepts fetching either just the version and cipher type, or exactly the per-cipher struct. Also check that getsockopt returns what we just passed to the kernel. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/81a007ca13de9a74f4af45635d06682cdb385a54.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	selftests: tls: add test variants for aria-gcm	Sabrina Dubroca	2	-0/+25
	Only supported for TLS1.2. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/ccf4a4d3f3820f8ff30431b7629f5210cb33fa89.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	Merge branch 'tools-net-ynl-add-support-for-netlink-raw-families'	Jakub Kicinski	15	-72/+2632
	Donald Hunter says: ==================== tools/net/ynl: Add support for netlink-raw families This patchset adds support for netlink-raw families such as rtnetlink. Patch 1 fixes a typo in existing schemas Patch 2 contains the schema definition Patches 3 & 4 update the schema documentation Patches 5 - 9 extends ynl Patches 10 - 12 add several netlink-raw specs The netlink-raw schema is very similar to genetlink-legacy and I thought about making the changes there and symlinking to it. On balance I thought that might be problematic for accurate schema validation. rtnetlink doesn't seem to fit into unified or directional message enumeration models. It seems like an 'explicit' model would be useful, to force the schema author to specify the message ids directly. There is not yet support for notifications because ynl currently doesn't support defining 'event' properties on a 'do' operation. The message ids are shared so ops need to be both sync and async. I plan to look at this in a future patch. The link and route messages contain different nested attributes dependent on the type of link or route. Decoding these will need some kind of attr-space selection that uses the value of another attribute as the selector key. These nested attributes have been left with type 'binary' for now. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	doc/netlink: Add spec for rt route messages	Donald Hunter	1	-0/+327
	Add schema for rt route with support for getroute, newroute and delroute. Routes can be dumped with filter attributes like this: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/rt_route.yaml \ --dump getroute --json '{"rtm-family": 2, "rtm-table": 254}' Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	doc/netlink: Add spec for rt link messages	Donald Hunter	1	-0/+1432
	Add schema for rt link with support for newlink, dellink, getlink, setlink and getstats. A dummy link can be created like this: sudo ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/rt_link.yaml \ --do newlink --create \ --json '{"ifname": "dummy0", "linkinfo": {"kind": "dummy"}}' For example, offload stats can be fetched like this: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/rt_link.yaml \ --dump getstats --json '{ "filter-mask": 8 }' Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	doc/netlink: Add spec for rt addr messages	Donald Hunter	1	-0/+179
	Add schema for rt addr with support for: - newaddr, deladdr, getaddr (dump) Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tools/net/ynl: Add support for create flags	Donald Hunter	3	-8/+22
	Add support for using NLM_F_REPLACE, _EXCL, _CREATE and _APPEND flags in requests. Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tools/net/ynl: Implement nlattr array-nest decoding in ynl	Donald Hunter	1	-0/+13
	Add support for the 'array-nest' attribute type that is used by several netlink-raw families. Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tools/net/ynl: Add support for netlink-raw families	Donald Hunter	1	-33/+91
	Refactor the ynl code to encapsulate protocol specifics into NetlinkProtocol and GenlProtocol. Signed-off-by: Donald Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tools/net/ynl: Fix extack parsing with fixed header genlmsg	Donald Hunter	1	-25/+40
	Move decode_fixed_header into YnlFamily and add a _fixed_header_size method to allow extack decoding to skip the fixed header. Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	tools/ynl: Add mcast-group schema parsing to ynl	Donald Hunter	1	-0/+31
	Add a SpecMcastGroup class to the nlspec lib. Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	doc/netlink: Document the netlink-raw schema extensions	Donald Hunter	2	-0/+59
	Add a doc page for netlink-raw that describes the schema attributes needed for netlink-raw. Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	doc/netlink: Update genetlink-legacy documentation	Donald Hunter	3	-13/+35
	Add documentation for recently added genetlink-legacy schema attributes. Remove statements about 'work in progress' and 'todo'. Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	doc/netlink: Add a schema for netlink-raw families	Donald Hunter	1	-0/+410
	This schema is largely a copy of the genetlink-legacy schema with the following modifications: - change the schema id to netlink-raw - add a top-level protonum property, e.g. 0 (for NETLINK_ROUTE) - change the protocol enumeration to netlink-raw, removing the genetlink options. - replace doc references to generic netlink with raw netlink - add a value property to mcast-group definitions Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	doc/netlink: Fix typo in genetlink-* schemas	Donald Hunter	2	-2/+2
	Fix typo verion -> version in genetlink-c and genetlink-legacy. Signed-off-by: Donald Hunter <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	Merge branch 'devlink-mlx5-add-port-function-attributes-for-ipsec'	Jakub Kicinski	15	-108/+898
	Saeed Mahameed says: ==================== {devlink,mlx5}: Add port function attributes for ipsec From Dima: Introduce hypervisor-level control knobs to set the functionality of PCI VF devices passed through to guests. The administrator of a hypervisor host may choose to change the settings of a port function from the defaults configured by the device firmware. The software stack has two types of IPsec offload - crypto and packet. Specifically, the ip xfrm command has sub-commands for "state" and "policy" that have an "offload" parameter. With ip xfrm state, both crypto and packet offload types are supported, while ip xfrm policy can only be offloaded in packet mode. The series introduces two new boolean attributes of a port function: ipsec_crypto and ipsec_packet. The goal is to provide a similar level of granularity for controlling VF IPsec offload capabilities, which would be aligned with the software model. This will allow users to decide if they want both types of offload enabled for a VF, just one of them, or none at all (which is the default). At a high level, the difference between the two knobs is that with ipsec_crypto, only XFRM state can be offloaded. Specifically, only the crypto operation (Encrypt/Decrypt) is offloaded. With ipsec_packet, both XFRM state and policy can be offloaded. Furthermore, in addition to crypto operation offload, IPsec encapsulation is also offloaded. For XFRM state, choosing between crypto and packet offload types is possible. From the HW perspective, different resources may be required for each offload type. Examples of when a user prefers to enable IPsec packet offload for a VF when using switchdev mode: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet disable $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet enable This enables the corresponding IPsec capability of the function before it's enumerated, so when the driver reads the capability from the device firmware, it is enabled. The driver is then able to configure corresponding features and ops of the VF net device to support IPsec state and policy offloading. v2: https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	net/mlx5: Implement devlink port function cmds to control ipsec_packet	Dima Chumak	6	-4/+173
	Implement devlink port function commands to enable / disable IPsec packet offloads. This is used to control the IPsec capability of the device. When ipsec_offload is enabled for a VF, it prevents adding IPsec packet offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec packet offloads on the PF, it's not allowed to enable ipsec_packet on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	net/mlx5: Implement devlink port function cmds to control ipsec_crypto	Dima Chumak	7	-1/+431
	Implement devlink port function commands to enable / disable IPsec crypto offloads. This is used to control the IPsec capability of the device. When ipsec_crypto is enabled for a VF, it prevents adding IPsec crypto offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec crypto offloads on the PF, it's not allowed to enable ipsec_crypto on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	net/mlx5: Provide an interface to block change of IPsec capabilities	Leon Romanovsky	4	-1/+61
	mlx5 HW can't perform IPsec offload operation simultaneously both on PF and VFs at the same time. While the previous patches added devlink knobs to change IPsec capabilities dynamically, there is a need to add a logic to block such IPsec capabilities for the cases when IPsec is already configured. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	net/mlx5: Add IFC bits to support IPsec enable/disable	Leon Romanovsky	1	-0/+3
	Add hardware definitions to allow to control IPSec capabilities. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	net/mlx5e: Rewrite IPsec vs. TC block interface	Leon Romanovsky	3	-93/+38
	In the commit 366e46242b8e ("net/mlx5e: Make IPsec offload work together with eswitch and TC"), new API to block IPsec vs. TC creation was introduced. Internally, that API used devlink lock to avoid races with userspace, but it is not really needed as dev->priv.eswitch is stable and can't be changed. So remove dependency on devlink lock and move block encap code back to its original place. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	net/mlx5: Drop extra layer of locks in IPsec	Leon Romanovsky	1	-14/+4
	There is no need in holding devlink lock as it gives nothing compared to already used write mode_lock. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	devlink: Expose port function commands to control IPsec packet offloads	Dima Chumak	4	-0/+97
	Expose port function commands to enable / disable IPsec packet offloads, this is used to control the port IPsec capabilities. When IPsec packet is disabled for a function of the port (default), function cannot offload IPsec packet operations (encapsulation and XFRM policy offload). When enabled, IPsec packet operations can be offloaded by the function of the port, which includes crypto operation (Encrypt/Decrypt), IPsec encapsulation and XFRM state and policy offload. Example of a PCI VF port which supports IPsec packet offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet disable $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet enable Signed-off-by: Dima Chumak <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	devlink: Expose port function commands to control IPsec crypto offloads	Dima Chumak	4	-0/+96
	Expose port function commands to enable / disable IPsec crypto offloads, this is used to control the port IPsec capabilities. When IPsec crypto is disabled for a function of the port (default), function cannot offload any IPsec crypto operations (Encrypt/Decrypt and XFRM state offloading). When enabled, IPsec crypto operations can be offloaded by the function of the port. Example of a PCI VF port which supports IPsec crypto offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable $ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable Signed-off-by: Dima Chumak <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-27	Linux 6.5	Linus Torvalds	1	-1/+1

2023-08-27	dt-bindings: PCI: qcom: Fix SDX65 compatible	Krzysztof Kozlowski	1	-5/+7
	Commit c0aba9f32801 ("dt-bindings: PCI: qcom: Add SDX65 SoC") adding SDX65 was never tested and is clearly bogus. The qcom,sdx65-pcie-ep compatible is followed by a fallback in DTS, and there is no driver matched by this compatible. Driver matches by its fallback qcom,sdx55-pcie-ep. This also fixes dtbs_check warnings like: qcom-sdx65-mtp.dtb: pcie-ep@1c00000: compatible: ['qcom,sdx65-pcie-ep', 'qcom,sdx55-pcie-ep'] is too long [kwilczynski: commit log] Fixes: c0aba9f32801 ("dt-bindings: PCI: qcom: Add SDX65 SoC") Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Acked-by: Conor Dooley <[email protected]> Cc: [email protected]
2023-08-27	ext4: fix slab-use-after-free in ext4_es_insert_extent()	Baokun Li	1	-14/+30
	Yikebaer reported an issue: ================================================================== BUG: KASAN: slab-use-after-free in ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 Read of size 4 at addr ffff888112ecc1a4 by task syz-executor/8438 CPU: 1 PID: 8438 Comm: syz-executor Not tainted 6.5.0-rc5 #1 Call Trace: [...] kasan_report+0xba/0xf0 mm/kasan/report.c:588 ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] Allocated by task 8438: [...] kmem_cache_zalloc include/linux/slab.h:693 [inline] __es_alloc_extent fs/ext4/extents_status.c:469 [inline] ext4_es_insert_extent+0x672/0xcb0 fs/ext4/extents_status.c:873 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] Freed by task 8438: [...] kmem_cache_free+0xec/0x490 mm/slub.c:3823 ext4_es_try_to_merge_right fs/ext4/extents_status.c:593 [inline] __es_insert_extent+0x9f4/0x1440 fs/ext4/extents_status.c:802 ext4_es_insert_extent+0x2ca/0xcb0 fs/ext4/extents_status.c:882 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] ================================================================== The flow of issue triggering is as follows: 1. remove es raw es es removed es1 \|-------------------\| -> \|----\|.......\|------\| 2. insert es es insert es1 merge with es es1 merge with es and free es1 \|----\|.......\|------\| -> \|------------\|------\| -> \|-------------------\| es merges with newes, then merges with es1, frees es1, then determines if es1->es_len is 0 and triggers a UAF. The code flow is as follows: ext4_es_insert_extent es1 = __es_alloc_extent(true); es2 = __es_alloc_extent(true); __es_remove_extent(inode, lblk, end, NULL, es1) __es_insert_extent(inode, &newes, es1) ---> insert es1 to es tree __es_insert_extent(inode, &newes, es2) ext4_es_try_to_merge_right ext4_es_free_extent(inode, es1) ---> es1 is freed if (es1 && !es1->es_len) // Trigger UAF by determining if es1 is used. We determine whether es1 or es2 is used immediately after calling __es_remove_extent() or __es_insert_extent() to avoid triggering a UAF if es1 or es2 is freed. Reported-by: Yikebaer Aizezi <[email protected]> Closes: https://lore.kernel.org/lkml/CALcu4raD4h9coiyEBL4Bm0zjDwxC2CyPiTwsP3zFuhot6y9Beg@mail.gmail.com Fixes: 2a69c450083d ("ext4: using nofail preallocation in ext4_es_insert_extent()") Cc: [email protected] Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	libfs: remove redundant checks of s_encoding	Eric Biggers	1	-12/+2
	Now that neither ext4 nor f2fs allows inodes with the casefold flag to be instantiated when unsupported, it's unnecessary to repeatedly check for support later on during random filesystem operations. Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: remove redundant checks of s_encoding	Eric Biggers	2	-4/+4
	Now that ext4 does not allow inodes with the casefold flag to be instantiated when unsupported, it's unnecessary to repeatedly check for support later on during random filesystem operations. Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: reject casefold inode flag without casefold feature	Eric Biggers	1	-1/+4
	It is invalid for the casefold inode flag to be set without the casefold superblock feature flag also being set. e2fsck already considers this case to be invalid and handles it by offering to clear the casefold flag on the inode. __ext4_iget() also already considered this to be invalid, sort of, but it only got so far as logging an error message; it didn't actually reject the inode. Make it reject the inode so that other code doesn't have to handle this case. This matches what f2fs does. Note: we could check 's_encoding != NULL' instead of ext4_has_feature_casefold(). This would make the check robust against the casefold feature being enabled by userspace writing to the page cache of the mounted block device. However, it's unsolvable in general for filesystems to be robust against concurrent writes to the page cache of the mounted block device. Though this very particular scenario involving the casefold feature is solvable, we should not pretend that we can support this model, so let's just check the casefold feature. tune2fs already forbids enabling casefold on a mounted filesystem. Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: use LIST_HEAD() to initialize the list_head in mballoc.c	Ruan Jinjie	1	-13/+5
	Use LIST_HEAD() to initialize the list_head instead of open-coding it. Signed-off-by: Ruan Jinjie <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: do not mark inode dirty every time when appending using delalloc	Liu Song	1	-26/+62
	In the delalloc append write scenario, if inode's i_size is extended due to buffer write, there are delalloc writes pending in the range up to i_size, and no need to touch i_disksize since writeback will push i_disksize up to i_size eventually. Offers significant performance improvement in high-frequency append write scenarios. I conducted tests in my 32-core environment by launching 32 concurrent threads to append write to the same file. Each write operation had a length of 1024 bytes and was repeated 100000 times. Without using this patch, the test was completed in 7705 ms. However, with this patch, the test was completed in 5066 ms, resulting in a performance improvement of 34%. Moreover, in test scenarios of Kafka version 2.6.2, using packet size of 2K, with this patch resulted in a 10% performance improvement. Signed-off-by: Liu Song <[email protected]> Suggested-by: Jan Kara <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: rename s_error_work to s_sb_upd_work	Theodore Ts'o	2	-19/+22
	The most common use that s_error_work will get scheduled is now the periodic update of the superblock. So rename it to s_sb_upd_work. Also rename the function flush_stashed_error_work() to update_super_work(). Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: add periodic superblock update check	Vitaliy Kuznetsov	1	-1/+61
	This patch introduces a mechanism to periodically check and update the superblock within the ext4 file system. The main purpose of this patch is to keep the disk superblock up to date. The update will be performed if more than one hour has passed since the last update, and if more than 16MB of data have been written to disk. This check and update is performed within the ext4_journal_commit_callback function, ensuring that the superblock is written while the disk is active, rather than based on a timer that may trigger during disk idle periods. Discussion https://www.spinics.net/lists/linux-ext4/msg85865.html Signed-off-by: Vitaliy Kuznetsov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: drop dio overwrite only flag and associated warning	Brian Foster	1	-15/+10
	The commit referenced below opened up concurrent unaligned dio under shared locking for pure overwrites. In doing so, it enabled use of the IOMAP_DIO_OVERWRITE_ONLY flag and added a warning on unexpected -EAGAIN returns as an extra precaution, since ext4 does not retry writes in such cases. The flag itself is advisory in this case since ext4 checks for unaligned I/Os and uses appropriate locking up front, rather than on a retry in response to -EAGAIN. As it turns out, the warning check is susceptible to false positives because there are scenarios where -EAGAIN can be expected from lower layers without necessarily having IOCB_NOWAIT set on the iocb. For example, one instance of the warning has been seen where io_uring sets IOCB_HIPRI, which in turn results in REQ_POLLED\|REQ_NOWAIT on the bio. This results in -EAGAIN if the block layer is unable to allocate a request, etc. [Note that there is an outstanding patch to untangle REQ_POLLED and REQ_NOWAIT such that the latter relies on IOCB_NOWAIT, which would also address this instance of the warning.] Another instance of the warning has been reproduced by syzbot. A dio write is interrupted down in __get_user_pages_locked() waiting on the mm lock and returns -EAGAIN up the stack. If the iomap dio iteration layer has made no progress on the write to this point, -EAGAIN returns up to the filesystem and triggers the warning. This use of the overwrite flag in ext4 is precautionary and half-baked. I.e., ext4 doesn't actually implement overwrite checking in the iomap callbacks when the flag is set, so the only extra verification it provides are i_size checks in the generic iomap dio layer. Combined with the tendency for false positives, the added verification is not worth the extra trouble. Remove the flag, associated warning, and update the comments to document when concurrent unaligned dio writes are allowed and why said flag is not used. Cc: [email protected] Reported-by: [email protected] Reported-by: Pengfei Xu <[email protected]> Fixes: 310ee0902b8d ("ext4: allow concurrent unaligned dio overwrites") Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: add correct group descriptors and reserved GDT blocks to system zone	Wang Jianjian	3	-8/+17
	When setup_system_zone, flex_bg is not initialized so it is always 1. Use a new helper function, ext4_num_base_meta_blocks() which does not depend on sbi->s_log_groups_per_flex being initialized. [ Squashed two patches in the Link URL's below together into a single commit, which is simpler to review/understand. Also fix checkpatch warnings. --TYT ] Cc: [email protected] Signed-off-by: Wang Jianjian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: remove unused function declaration	Cai Xinchen	1	-6/+0
	These functions do not have its function implementation. So those function declaration is useless. Remove these Signed-off-by: Cai Xinchen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: mballoc: avoid garbage value from err	Su Hui	1	-1/+1
	clang's static analysis warning: fs/ext4/mballoc.c line 4178, column 6, Branch condition evaluates to a garbage value. err is uninitialized and will be judged when 'len <= 0' or it first enters the loop while the condition "!ext4_sb_block_valid()" is true. Although this can't make problems now, it's better to correct it. Signed-off-by: Su Hui <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple()	Lu Hongfei	1	-1/+1
	Signed-off-by: Lu Hongfei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: change the type of blocksize in ext4_mb_init_cache()	Lu Hongfei	1	-1/+1
	The return value type of i_blocksize() is 'unsigned int', so the type of blocksize has been modified from 'int' to 'unsigned int' to ensure data type consistency. Signed-off-by: Lu Hongfei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-08-27	ext4: fix unttached inode after power cut with orphan file feature enabled	Zhihao Cheng	1	-0/+3
	Running generic/475(filesystem consistent tests after power cut) could easily trigger unattached inode error while doing fsck: Unattached zero-length inode 39405. Clear? no Unattached inode 39405 Connect to /lost+found? no Above inconsistence is caused by following process: P1 P2 ext4_create inode = ext4_new_inode_start_handle // itable records nlink=1 ext4_add_nondir err = ext4_add_entry // ENOSPC ext4_append ext4_bread ext4_getblk ext4_map_blocks // returns ENOSPC drop_nlink(inode) // won't be updated into disk inode ext4_orphan_add(handle, inode) ext4_orphan_file_add ext4_journal_stop(handle) jbd2_journal_commit_transaction // commit success >> power cut << ext4_fill_super ext4_load_and_init_journal // itable records nlink=1 ext4_orphan_cleanup ext4_process_orphan if (inode->i_nlink) // true, inode won't be deleted Then, allocated inode will be reserved on disk and corresponds to no dentries, so e2fsck reports 'unattached inode' problem. The problem won't happen if orphan file feature is disabled, because ext4_orphan_add() will update disk inode in orphan list mode. There are several places not updating disk inode while putting inode into orphan area, such as ext4_add_nondir(), ext4_symlink() and whiteout in ext4_rename(). Fix it by updating inode into disk in all error branches of these places. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217605 Fixes: 02f310fcf47f ("ext4: Speedup ext4 orphan inode handling") Signed-off-by: Zhihao Cheng <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>