aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-22bnxt_en: Use the new VNIC to create ntuple filtersPavan Chebbi1-6/+27
The newly created vnic (BNXT_VNIC_NTUPLE) is ready to be used to create ntuple filters when supported by firmware. All RX rings can be used regardless of the RSS indirection setting on the default VNIC. Reviewed-by: Somnath Kotur <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22bnxt_en: Create and setup the additional VNIC for adding ntuple filtersPavan Chebbi2-10/+36
Allocate and setup the additional VNIC for ntuple filters if this new method is supported by the firmware. Even though this VNIC is only used for ntuple filters with direct ring destinations, we still setup the RSS hash to be identical to the default VNIC so that each RX packet will have the correct hash in the RX completion. This VNIC is always at VNIC index BNXT_VNIC_NTUPLE. Reviewed-by: Somnath Kotur <[email protected]> Signed-off-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22bnxt_en: Provision for an additional VNIC for ntuple filtersPavan Chebbi2-5/+32
On newer chips that support the ring table index method for ntuple filters, the current scheme of using the same VNIC for both RSS and ntuple filters will not work in all cases. An ntuple filter can only be directed to a destination ring if that destination ring is also in the RSS indirection table. To support ntuple filters with any arbitratry RSS indirection table that may only include a subset of the rings, we need to use a separate VNIC for ntuple filters. This patch provisions the additional VNIC. The next patch will allocate additional VNIC from firmware and set it up. Reviewed-by: Somnath Kotur <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic indexPavan Chebbi3-21/+24
Replace hard coded 0 index with more meaningful BNXT_VNIC_DEFAULT. Reviewed-by: Kalesh AP <[email protected]> Reviewed-by: Somnath Kotur <[email protected]> Signed-off-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22bnxt_en: Refactor bnxt_set_features()Pavan Chebbi1-7/+12
Refactor bnxt_set_features() function to have a common function to re-init. We'll need this to reinitialize when ntuple configuration changes. Reviewed-by: Kalesh AP <[email protected]> Reviewed-by: Andy Gospodarek <[email protected]> Signed-off-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22bnxt_en: Add bnxt_get_total_vnics() to calculate number of VNICsVenkat Duvvuru1-11/+17
Refactor the code by adding a new function to calculate the number of required VNICs. This is used in multiple places when reserving or checking resources. Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Venkat Duvvuru <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22bnxt_en: Check additional resources in bnxt_check_rings()Michael Chan1-0/+2
bnxt_check_rings() is called to check if we have enough resource assets to satisfy the new number of ethtool channels. If the asset test fails, the ethtool operation will fail gracefully. Otherwise we will proceed and commit to use the new number of channels. If it fails to allocate any resources, the chip will fail to come up. For completeness, check all possible resources before committing to the new settings. Add the missing ring group and RSS context asset tests in bnxt_check_rings(). Reviewed-by: Pavan Chebbi <[email protected]> Reviewed-by: Somnath Kotur <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22bnxt_en: Improve RSS context reservation infrastructurePavan Chebbi2-23/+36
Add RSS context fields to struct bnxt_hw_rings and struct bnxt_hw_resc. With these, we can now specific the exact number of RSS contexts to reserve and store the reserved value. The original code relies on other resources to infer the number of RSS contexts to reserve and the reserved value is not stored. This improved infrastructure will make the RSS context accounting more complete and is needed by later patches. Signed-off-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22bnxt_en: Explicitly specify P5 completion rings to reserveMichael Chan2-7/+13
The current code assumes that every RX ring group and every TX ring requires a completion ring on P5_PLUS chips. Now that we have the bnxt_hw_rings structure, add the cp_p5 field so that it can be explicitly specified. This makes the logic more clear. Reviewed-by: Ajit Khaparde <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22bnxt_en: Refactor ring reservation functionsMichael Chan2-130/+132
The current functions to reserve hardware rings pass in 6 different ring or resource types as parameters. Add a structure bnxt_hw_rings to consolidate all these parameters and pass the structure pointer instead to these functions. Add 2 related helper functions also. This makes the code cleaner and makes it easier to add new resources to be reserved. Reviewed-by: Ajit Khaparde <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22selftests/iommu: fix the config fragmentMuhammad Usama Anjum1-2/+3
The config fragment doesn't follow the correct format to enable those config options which make the config options getting missed while merging with other configs. ➜ merge_config.sh -m .config tools/testing/selftests/iommu/config Using .config as base Merging tools/testing/selftests/iommu/config ➜ make olddefconfig .config:5295:warning: unexpected data: CONFIG_IOMMUFD .config:5296:warning: unexpected data: CONFIG_IOMMUFD_TEST While at it, add CONFIG_FAULT_INJECTION as well which is needed for CONFIG_IOMMUFD_TEST. If CONFIG_FAULT_INJECTION isn't present in base config (such as x86 defconfig), CONFIG_IOMMUFD_TEST doesn't get enabled. Fixes: 57f0988706fe ("iommufd: Add a selftest") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-02-22drm/syncobj: handle NULL fence in syncobj_eventfd_entry_funcErik Kurzinger1-1/+12
During syncobj_eventfd_entry_func, dma_fence_chain_find_seqno may set the fence to NULL if the given seqno is signaled and a later seqno has already been submitted. In that case, the eventfd should be signaled immediately which currently does not happen. This is a similar issue to the one addressed by commit b19926d4f3a6 ("drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence."). As a fix, if the return value of dma_fence_chain_find_seqno indicates success but it sets the fence to NULL, we will assign a stub fence to ensure the following code still signals the eventfd. v1 -> v2: assign a stub fence instead of signaling the eventfd Signed-off-by: Erik Kurzinger <[email protected]> Fixes: c7a472297169 ("drm/syncobj: add IOCTL to register an eventfd") Signed-off-by: Simon Ser <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-22iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlockJason Gunthorpe1-26/+12
If the SMMU is configured to use a two level CD table then arm_smmu_write_ctx_desc() allocates a CD table leaf internally using GFP_KERNEL. Due to recent changes this is being done under a spinlock to iterate over the device list - thus it will trigger a sleeping while atomic warning: arm_smmu_sva_set_dev_pasid() mutex_lock(&sva_lock); __arm_smmu_sva_bind() arm_smmu_mmu_notifier_get() spin_lock_irqsave() arm_smmu_write_ctx_desc() arm_smmu_get_cd_ptr() arm_smmu_alloc_cd_leaf_table() dmam_alloc_coherent(GFP_KERNEL) This is a 64K high order allocation and really should not be done atomically. At the moment the rework of the SVA to follow the new API is half finished. Recently the CD table memory was moved from the domain to the master, however we have the confusing situation where the SVA code is wrongly using the RID domains device's list to track which CD tables the SVA is installed in. Remove the logic to replicate the CD across all the domain's masters during attach. We know which master and which CD table the PASID should be installed in. Right now SVA only works when dma-iommu.c is in control of the RID translation, which means we have a single iommu_domain shared across the entire group and that iommu_domain is not shared outside the group. Critically this means that the iommu_group->devices list and RID's smmu_domain->devices list describe the same set of masters. For PCI cases the core code also insists on singleton groups so there is only one entry in the smmu_domain->devices list that is equal to the master being passed in to arm_smmu_sva_set_dev_pasid(). Only non-PCI cases may have multi-device groups. However, the core code will repeat the calls to arm_smmu_sva_set_dev_pasid() across the entire iommu_group->devices list. Instead of having arm_smmu_mmu_notifier_get() indirectly loop over all the devices in the group via the RID's smmu_domain, rely on __arm_smmu_sva_bind() to be called for each device in the group and install the repeated CD entry that way. This avoids taking the spinlock to access the devices list and permits the arm_smmu_write_ctx_desc() to use a sleeping allocation. Leave the arm_smmu_mm_release() as a confusing situation, this requires tracking attached masters inside the SVA domain. Removing the loop allows arm_smmu_write_ctx_desc() to be called outside the spinlock and thus is safe to use GFP_KERNEL. Move the clearing of the CD into arm_smmu_sva_remove_dev_pasid() so that arm_smmu_mmu_notifier_get/put() remain paired functions. Fixes: 24503148c545 ("iommu/arm-smmu-v3: Refactor write_ctx_desc") Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Michael Shavit <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-02-22Merge branch 'mctp-core-protocol-updates-minor-fixes-tests'Paolo Abeni9-55/+630
Jeremy Kerr says: ==================== MCTP core protocol updates, minor fixes & tests This series implements some procotol improvements for AF_MCTP, particularly for systems with multiple MCTP networks defined. For those, we need to add the network ID to the tag lookups, which then suggests an updated version of the tag allocate / drop ioctl to allow the net ID to be specified there too. The ioctl change affects uabi, so might warrant some extra attention. There are also a couple of new kunit tests for multiple-net configurations. We have a fix for populating the flow data when fragmenting, and a testcase for that too. Of course, any queries/comments/etc., please let me know! ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: tests: Add a test for proper tag creation on local outputJeremy Kerr1-0/+75
Ensure we have the correct key parameters on sending a message. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: tests: Test that outgoing skbs have flow data populatedJeremy Kerr3-0/+138
When CONFIG_MCTP_FLOWS is enabled, outgoing skbs should have their SKB_EXT_MCTP extension set for drivers to consume. Add two tests for local-to-output routing that check for the flow extensions: one for the simple single-packet case, and one for fragmentation. We now make MCTP_TEST select MCTP_FLOWS, so we always get coverage of these flow tests. The tests are skippable if MCTP_FLOWS is (otherwise) disabled, but that would need manual config tweaking. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: copy skb ext data when fragmentingJeremy Kerr2-0/+11
If we're fragmenting on local output, the original packet may contain ext data for the MCTP flows. We'll want this in the resulting fragment skbs too. So, do a skb_ext_copy() in the fragmentation path, and implement the MCTP-specific parts of an ext copy operation. Fixes: 67737c457281 ("mctp: Pass flow data & flow release events to drivers") Reported-by: Jian Zhang <[email protected]> Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: tests: Add MCTP net isolation testsJeremy Kerr1-0/+161
Add a couple of tests that excersise the new net-specific sk_key and bind lookups Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: tests: Add netid argument to __mctp_route_test_initJeremy Kerr1-4/+7
We'll want to create net-specific test setups in an upcoming change, so allow the caller to provide a non-default netid. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: provide a more specific tag allocation ioctlJeremy Kerr2-20/+129
Now that we have net-specific tags, extend the tag allocation ioctls (SIOCMCTPALLOCTAG / SIOCMCTPDROPTAG) to allow a network parameter to be passed to the tag allocation. We also add a local_addr member to the ioc struct, to allow for a future finer-grained tag allocation using local EIDs too. We don't add any specific support for that now though, so require MCTP_ADDR_ANY or MCTP_ADDR_NULL for those at present. The old ioctls will still work, but allocate for the default MCTP net. These are now marked as deprecated in the header. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: separate key correlation across netsJeremy Kerr4-16/+40
Currently, we lookup sk_keys from the entire struct net_namespace, which may contain multiple MCTP net IDs. In those cases we want to distinguish between endpoints with the same EID but different net ID. Add the net ID data to the struct mctp_sk_key, populate on add and filter on this during route lookup. For the ioctl interface, we use a default net of MCTP_INITIAL_DEFAULT_NET (ie., what will be in use for single-net configurations), but we'll extend the ioctl interface to provide net-specific tag allocation in an upcoming change. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: tests: create test skbs with the correct net and deviceJeremy Kerr2-8/+17
In our test skb creation functions, we're not setting up the net and device data. This doesn't matter at the moment, but we will want to add support for distinct net IDs in future. Set the ->net identifier on the test MCTP device, and ensure that test skbs are set up with the correct device-related data on creation. Create a helper for setting skb->dev and mctp_skb_cb->net. We have a few cases where we're calling __mctp_cb() to initialise the cb (which we need for the above) separately, so integrate this into the skb creation helpers. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: make key lookups match the ANY address on either local or peerJeremy Kerr1-3/+11
We may have an ANY address in either the local or peer address of a sk_key, and may want to match on an incoming daddr or saddr being ANY. Do this by altering the conflicting-tag lookup to also accept ANY as the local/peer address. We don't want mctp_address_matches to match on the requested EID being ANY, as that is a specific lookup case on packet input. Reported-by: Eric Chuang <[email protected]> Reported-by: Anthony <[email protected]> Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: Add some detail on the key allocation implementationJeremy Kerr1-0/+37
We could do with a little more comment on where MCTP_ADDR_ANY will match in the key allocations. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: mctp: avoid confusion over local/peer dest/source addressesJeremy Kerr3-11/+11
We have a double-swap of local and peer addresses in mctp_alloc_local_tag; the arguments in both call sites are swapped, but there is also a swap in the implementation of alloc_local_tag. This is opaque because we're using source/dest address references, which don't match the local/peer semantics. Avoid this confusion by naming the arguments as 'local' and 'peer', and remove the double swap. The calling order now matches mctp_key_alloc. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22spi: cadence-qspi: add system-wide suspend and resume callbacksThéo Lebrun1-2/+18
Each SPI controller is expected to call the spi_controller_suspend() and spi_controller_resume() callbacks at system-wide suspend and resume. It (1) handles the kthread worker for queued controllers and (2) marks the controller as suspended to have spi_sync() fail while the controller is unavailable. Those two operations do not require the controller to be active, we do not need to increment the runtime PM usage counter. Signed-off-by: Théo Lebrun <[email protected]> Link: https://msgid.link/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2024-02-22spi: cadence-qspi: put runtime in runtime PM hooks namesThéo Lebrun1-4/+4
Follow kernel naming convention with regards to power-management callback function names. The convention in the kernel is: - prefix_suspend means the system-wide suspend callback; - prefix_runtime_suspend means the runtime PM suspend callback. The same applies to resume callbacks. Signed-off-by: Théo Lebrun <[email protected]> Reviewed-by: Dhruva Gole <[email protected]> Link: https://msgid.link/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2024-02-22spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooksThéo Lebrun1-7/+2
The ->runtime_suspend() and ->runtime_resume() callbacks are not expected to call spi_controller_suspend() and spi_controller_resume(). Remove calls to those in the cadence-qspi driver. Those helpers have two roles currently: - They stop/start the queue, including dealing with the kworker. - They toggle the SPI controller SPI_CONTROLLER_SUSPENDED flag. It requires acquiring ctlr->bus_lock_mutex. Step one is irrelevant because cadence-qspi is not queued. Step two however has two implications: - A deadlock occurs, because ->runtime_resume() is called in a context where the lock is already taken (in the ->exec_op() callback, where the usage count is incremented). - It would disallow all operations once the device is auto-suspended. Here is a brief call tree highlighting the mutex deadlock: spi_mem_exec_op() ... spi_mem_access_start() mutex_lock(&ctlr->bus_lock_mutex) cqspi_exec_mem_op() pm_runtime_resume_and_get() cqspi_resume() spi_controller_resume() mutex_lock(&ctlr->bus_lock_mutex) ... spi_mem_access_end() mutex_unlock(&ctlr->bus_lock_mutex) ... Fixes: 0578a6dbfe75 ("spi: spi-cadence-quadspi: add runtime pm support") Signed-off-by: Théo Lebrun <[email protected]> Link: https://msgid.link/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2024-02-22spi: cadence-qspi: fix pointer reference in runtime PM hooksThéo Lebrun1-4/+2
dev_get_drvdata() gets used to acquire the pointer to cqspi and the SPI controller. Neither embed the other; this lead to memory corruption. On a given platform (Mobileye EyeQ5) the memory corruption is hidden inside cqspi->f_pdata. Also, this uninitialised memory is used as a mutex (ctlr->bus_lock_mutex) by spi_controller_suspend(). Fixes: 2087e85bb66e ("spi: cadence-quadspi: fix suspend-resume implementations") Reviewed-by: Dhruva Gole <[email protected]> Signed-off-by: Théo Lebrun <[email protected]> Link: https://msgid.link/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2024-02-22drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is setErik Kurzinger1-2/+4
When waiting for a syncobj timeline point whose fence has not yet been submitted with the WAIT_FOR_SUBMIT flag, a callback is registered using drm_syncobj_fence_add_wait and the thread is put to sleep until the timeout expires. If the fence is submitted before then, drm_syncobj_add_point will wake up the sleeping thread immediately which will proceed to wait for the fence to be signaled. However, if the WAIT_AVAILABLE flag is used instead, drm_syncobj_fence_add_wait won't get called, meaning the waiting thread will always sleep for the full timeout duration, even if the fence gets submitted earlier. If it turns out that the fence *has* been submitted by the time it eventually wakes up, it will still indicate to userspace that the wait completed successfully (it won't return -ETIME), but it will have taken much longer than it should have. To fix this, we must call drm_syncobj_fence_add_wait if *either* the WAIT_FOR_SUBMIT flag or the WAIT_AVAILABLE flag is set. The only difference being that with WAIT_FOR_SUBMIT we will also wait for the fence to be signaled after it has been submitted while with WAIT_AVAILABLE we will return immediately. IGT test patch: https://lists.freedesktop.org/archives/igt-dev/2024-January/067537.html v1 -> v2: adjust lockdep_assert_none_held_once condition (cherry picked from commit 8c44ea81634a4a337df70a32621a5f3791be23df) Fixes: 01d6c3578379 ("drm/syncobj: add support for timeline point wait v8") Signed-off-by: Erik Kurzinger <[email protected]> Signed-off-by: Simon Ser <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Simon Ser <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-22btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserveFilipe Manana2-1/+17
At btrfs_use_block_rsv() we read the size of a block reserve without locking its spinlock, which makes KCSAN complain because the size of a block reserve is always updated while holding its spinlock. The report from KCSAN is the following: [653.313148] BUG: KCSAN: data-race in btrfs_update_delayed_refs_rsv [btrfs] / btrfs_use_block_rsv [btrfs] [653.314755] read to 0x000000017f5871b8 of 8 bytes by task 7519 on cpu 0: [653.314779] btrfs_use_block_rsv+0xe4/0x2f8 [btrfs] [653.315606] btrfs_alloc_tree_block+0xdc/0x998 [btrfs] [653.316421] btrfs_force_cow_block+0x220/0xe38 [btrfs] [653.317242] btrfs_cow_block+0x1ac/0x568 [btrfs] [653.318060] btrfs_search_slot+0xda2/0x19b8 [btrfs] [653.318879] btrfs_del_csums+0x1dc/0x798 [btrfs] [653.319702] __btrfs_free_extent.isra.0+0xc24/0x2028 [btrfs] [653.320538] __btrfs_run_delayed_refs+0xd3c/0x2390 [btrfs] [653.321340] btrfs_run_delayed_refs+0xae/0x290 [btrfs] [653.322140] flush_space+0x5e4/0x718 [btrfs] [653.322958] btrfs_preempt_reclaim_metadata_space+0x102/0x2f8 [btrfs] [653.323781] process_one_work+0x3b6/0x838 [653.323800] worker_thread+0x75e/0xb10 [653.323817] kthread+0x21a/0x230 [653.323836] __ret_from_fork+0x6c/0xb8 [653.323855] ret_from_fork+0xa/0x30 [653.323887] write to 0x000000017f5871b8 of 8 bytes by task 576 on cpu 3: [653.323906] btrfs_update_delayed_refs_rsv+0x1a4/0x250 [btrfs] [653.324699] btrfs_add_delayed_data_ref+0x468/0x6d8 [btrfs] [653.325494] btrfs_free_extent+0x76/0x120 [btrfs] [653.326280] __btrfs_mod_ref+0x6a8/0x6b8 [btrfs] [653.327064] btrfs_dec_ref+0x50/0x70 [btrfs] [653.327849] walk_up_proc+0x236/0xa50 [btrfs] [653.328633] walk_up_tree+0x21c/0x448 [btrfs] [653.329418] btrfs_drop_snapshot+0x802/0x1328 [btrfs] [653.330205] btrfs_clean_one_deleted_snapshot+0x184/0x238 [btrfs] [653.330995] cleaner_kthread+0x2b0/0x2f0 [btrfs] [653.331781] kthread+0x21a/0x230 [653.331800] __ret_from_fork+0x6c/0xb8 [653.331818] ret_from_fork+0xa/0x30 So add a helper to get the size of a block reserve while holding the lock. Reading the field while holding the lock instead of using the data_race() annotation is used in order to prevent load tearing. Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-22btrfs: fix data races when accessing the reserved amount of block reservesFilipe Manana2-13/+29
At space_info.c we have several places where we access the ->reserved field of a block reserve without taking the block reserve's spinlock first, which makes KCSAN warn about a data race since that field is always updated while holding the spinlock. The reports from KCSAN are like the following: [117.193526] BUG: KCSAN: data-race in btrfs_block_rsv_release [btrfs] / need_preemptive_reclaim [btrfs] [117.195148] read to 0x000000017f587190 of 8 bytes by task 6303 on cpu 3: [117.195172] need_preemptive_reclaim+0x222/0x2f0 [btrfs] [117.195992] __reserve_bytes+0xbb0/0xdc8 [btrfs] [117.196807] btrfs_reserve_metadata_bytes+0x4c/0x120 [btrfs] [117.197620] btrfs_block_rsv_add+0x78/0xa8 [btrfs] [117.198434] btrfs_delayed_update_inode+0x154/0x368 [btrfs] [117.199300] btrfs_update_inode+0x108/0x1c8 [btrfs] [117.200122] btrfs_dirty_inode+0xb4/0x140 [btrfs] [117.200937] btrfs_update_time+0x8c/0xb0 [btrfs] [117.201754] touch_atime+0x16c/0x1e0 [117.201789] filemap_read+0x674/0x728 [117.201823] btrfs_file_read_iter+0xf8/0x410 [btrfs] [117.202653] vfs_read+0x2b6/0x498 [117.203454] ksys_read+0xa2/0x150 [117.203473] __s390x_sys_read+0x68/0x88 [117.203495] do_syscall+0x1c6/0x210 [117.203517] __do_syscall+0xc8/0xf0 [117.203539] system_call+0x70/0x98 [117.203579] write to 0x000000017f587190 of 8 bytes by task 11 on cpu 0: [117.203604] btrfs_block_rsv_release+0x2e8/0x578 [btrfs] [117.204432] btrfs_delayed_inode_release_metadata+0x7c/0x1d0 [btrfs] [117.205259] __btrfs_update_delayed_inode+0x37c/0x5e0 [btrfs] [117.206093] btrfs_async_run_delayed_root+0x356/0x498 [btrfs] [117.206917] btrfs_work_helper+0x160/0x7a0 [btrfs] [117.207738] process_one_work+0x3b6/0x838 [117.207768] worker_thread+0x75e/0xb10 [117.207797] kthread+0x21a/0x230 [117.207830] __ret_from_fork+0x6c/0xb8 [117.207861] ret_from_fork+0xa/0x30 So add a helper to get the reserved amount of a block reserve while holding the lock. The value may be not be up to date anymore when used by need_preemptive_reclaim() and btrfs_preempt_reclaim_metadata_space(), but that's ok since the worst it can do is cause more reclaim work do be done sooner rather than later. Reading the field while holding the lock instead of using the data_race() annotation is used in order to prevent load tearing. Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-22btrfs: send: don't issue unnecessary zero writes for trailing holeFilipe Manana1-4/+13
If we have a sparse file with a trailing hole (from the last extent's end to i_size) and then create an extent in the file that ends before the file's i_size, then when doing an incremental send we will issue a write full of zeroes for the range that starts immediately after the new extent ends up to i_size. While this isn't incorrect because the file ends up with exactly the same data, it unnecessarily results in using extra space at the destination with one or more extents full of zeroes instead of having a hole. In same cases this results in using megabytes or even gigabytes of unnecessary space. Example, reproducer: $ cat test.sh #!/bin/bash DEV=/dev/sdh MNT=/mnt/sdh mkfs.btrfs -f $DEV mount $DEV $MNT # Create 1G sparse file. xfs_io -f -c "truncate 1G" $MNT/foobar # Create base snapshot. btrfs subvolume snapshot -r $MNT $MNT/mysnap1 # Create send stream (full send) for the base snapshot. btrfs send -f /tmp/1.snap $MNT/mysnap1 # Now write one extent at the beginning of the file and one somewhere # in the middle, leaving a gap between the end of this second extent # and the file's size. xfs_io -c "pwrite -S 0xab 0 128K" \ -c "pwrite -S 0xcd 512M 128K" \ $MNT/foobar # Now create a second snapshot which is going to be used for an # incremental send operation. btrfs subvolume snapshot -r $MNT $MNT/mysnap2 # Create send stream (incremental send) for the second snapshot. btrfs send -p $MNT/mysnap1 -f /tmp/2.snap $MNT/mysnap2 # Now recreate the filesystem by receiving both send streams and # verify we get the same content that the original filesystem had # and file foobar has only two extents with a size of 128K each. umount $MNT mkfs.btrfs -f $DEV mount $DEV $MNT btrfs receive -f /tmp/1.snap $MNT btrfs receive -f /tmp/2.snap $MNT echo -e "\nFile fiemap in the second snapshot:" # Should have: # # 128K extent at file range [0, 128K[ # hole at file range [128K, 512M[ # 128K extent file range [512M, 512M + 128K[ # hole at file range [512M + 128K, 1G[ xfs_io -r -c "fiemap -v" $MNT/mysnap2/foobar # File should be using 256K of data (two 128K extents). echo -e "\nSpace used by the file: $(du -h $MNT/mysnap2/foobar | cut -f 1)" umount $MNT Running the test, we can see with fiemap that we get an extent for the range [512M, 1G[, while in the source filesystem we have an extent for the range [512M, 512M + 128K[ and a hole for the rest of the file (the range [512M + 128K, 1G[): $ ./test.sh (...) File fiemap in the second snapshot: /mnt/sdh/mysnap2/foobar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..255]: 26624..26879 256 0x0 1: [256..1048575]: hole 1048320 2: [1048576..2097151]: 2156544..3205119 1048576 0x1 Space used by the file: 513M This happens because once we finish processing an inode, at finish_inode_if_needed(), we always issue a hole (write operations full of zeros) if there's a gap between the end of the last processed extent and the file's size, even if that range is already a hole in the parent snapshot. Fix this by issuing the hole only if the range is not already a hole. After this change, running the test above, we get the expected layout: $ ./test.sh (...) File fiemap in the second snapshot: /mnt/sdh/mysnap2/foobar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..255]: 26624..26879 256 0x0 1: [256..1048575]: hole 1048320 2: [1048576..1048831]: 26880..27135 256 0x1 3: [1048832..2097151]: hole 1048320 Space used by the file: 256K A test case for fstests will follow soon. CC: [email protected] # 6.1+ Reported-by: Dorai Ashok S A <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reviewed-by: Sweet Tea Dorminy <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-22btrfs: dev-replace: properly validate device namesDavid Sterba1-4/+20
There's a syzbot report that device name buffers passed to device replace are not properly checked for string termination which could lead to a read out of bounds in getname_kernel(). Add a helper that validates both source and target device name buffers. For devid as the source initialize the buffer to empty string in case something tries to read it later. This was originally analyzed and fixed in a different way by Edward Adam Davis (see links). Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Link: https://lore.kernel.org/linux-btrfs/[email protected]/ CC: [email protected] # 4.19+ CC: Edward Adam Davis <[email protected]> Reported-and-tested-by: [email protected] Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-22btrfs: zoned: don't skip block group profile checks on conventional zonesJohannes Thumshirn1-0/+9
On a zoned filesystem with conventional zones, we're skipping the block group profile checks for the conventional zones. This allows converting a zoned filesystem's data block groups to RAID when all of the zones backing the chunk are on conventional zones. But this will lead to problems, once we're trying to allocate chunks backed by sequential zones. So also check for conventional zones when loading a block group's profile on them. Reported-by: HAN Yuwei <[email protected]> Link: https://lore.kernel.org/all/[email protected]/#t Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Naohiro Aota <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-02-22Merge tag 'ath-next-20240222' of ↵Kalle Valo41-412/+2335
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath ath.git patches for v6.9 We have support for QCA2066 now and also several new features in ath12k. Major changes: ath12k * firmware-2.bin support * support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) * QCN9274: support split-PHY devices * WCN7850: enable Power Save Mode in station mode * WCN7850: P2P support ath11k: * QCA6390 & WCN6855: support 2 concurrent station interfaces * QCA2066 support
2024-02-22drm/ttm: Fix an invalid freeing on already freed page in error pathThomas Hellström1-1/+1
If caching mode change fails due to, for example, OOM we free the allocated pages in a two-step process. First the pages for which the caching change has already succeeded. Secondly the pages for which a caching change did not succeed. However the second step was incorrectly freeing the pages already freed in the first step. Fix. Signed-off-by: Thomas Hellström <[email protected]> Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path") Cc: Christian König <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Christian Koenig <[email protected]> Cc: Huang Rui <[email protected]> Cc: [email protected] Cc: <[email protected]> # v6.4+ Reviewed-by: Matthew Auld <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-22ARM: dts: renesas: rcar-gen2: Add missing #interrupt-cells to DA9063 nodesGeert Uytterhoeven8-0/+8
make dtbs_check W=2: arch/arm/boot/dts/renesas/r8a7790-lager.dts:444.11-458.5: Warning (interrupt_provider): /i2c-mux4/pmic@58: Missing '#interrupt-cells' in interrupt provider ... Fix this by adding the missing #interrupt-cells properties. Reported-by: Rob Herring <[email protected]> Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/a351e503ea97fb1af68395843f513925ff1bdf26.1707922460.git.geert+renesas@glider.be
2024-02-22l2tp: pass correct message length to ip6_append_dataTom Parkin1-1/+1
l2tp_ip6_sendmsg needs to avoid accounting for the transport header twice when splicing more data into an already partially-occupied skbuff. To manage this, we check whether the skbuff contains data using skb_queue_empty when deciding how much data to append using ip6_append_data. However, the code which performed the calculation was incorrect: ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0; ...due to C operator precedence, this ends up setting ulen to transhdrlen for messages with a non-zero length, which results in corrupted packets on the wire. Add parentheses to correct the calculation in line with the original intent. Fixes: 9d4c75800f61 ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()") Cc: David Howells <[email protected]> Cc: [email protected] Signed-off-by: Tom Parkin <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22Revert "ACPI: EC: Use a spin lock without disabing interrupts"Rafael J. Wysocki1-46/+66
Commit eb9299beadbd ("ACPI: EC: Use a spin lock without disabing interrupts") introduced an unexpected user-visible change in behavior, which is a significant CPU load increase when the EC is in use. This most likely happens due to increased spinlock contention and so reducing this effect would require a major rework of the EC driver locking. There is no time for this in the current cycle, so revert commit eb9299beadbd. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218511 Reported-by: Dieter Mummenschanz <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-02-22Merge tag 'nf-24-02-22' of ↵Paolo Abeni3-43/+57
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) If user requests to wake up a table and hook fails, restore the dormant flag from the error path, from Florian Westphal. 2) Reset dst after transferring it to the flow object, otherwise dst gets released twice from the error path. 3) Release dst in case the flowtable selects a direct xmit path, eg. transmission to bridge port. Otherwise, dst is memleaked. 4) Register basechain and flowtable hooks at the end of the command. Error path releases these datastructure without waiting for the rcu grace period. 5) Use kzalloc() to initialize struct nft_hook to fix a KMSAN report on access to hook type, also from Florian Westphal. netfilter pull request 24-02-22 * tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: use kzalloc for hook allocation netfilter: nf_tables: register hooks last when adding new chain/flowtable netfilter: nft_flow_offload: release dst in case direct xmit path is used netfilter: nft_flow_offload: reset dst in route object after setting up flow netfilter: nf_tables: set dormant flag on hook register failure ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22Merge tag 'for-netdev' of ↵Paolo Abeni15-17/+217
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-02-22 The following pull-request contains BPF updates for your *net* tree. We've added 11 non-merge commits during the last 24 day(s) which contain a total of 15 files changed, 217 insertions(+), 17 deletions(-). The main changes are: 1) Fix a syzkaller-triggered oops when attempting to read the vsyscall page through bpf_probe_read_kernel and friends, from Hou Tao. 2) Fix a kernel panic due to uninitialized iter position pointer in bpf_iter_task, from Yafang Shao. 3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel, from Martin KaFai Lau. 4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET) due to incorrect truesize accounting, from Sebastian Andrzej Siewior. 5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready, from Shigeru Yoshida. 6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be resolved, from Hari Bathini. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready() selftests/bpf: Add negtive test cases for task iter bpf: Fix an issue due to uninitialized bpf_iter_task selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel selftest/bpf: Test the read of vsyscall page under x86-64 x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h bpf, scripts: Correct GPL license name xsk: Add truesize to skb_add_rx_frag(). bpf: Fix warning for bpf_cpumask in verifier ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHYSiddharth Vadapalli1-1/+3
Commit bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG") extended support of the driver from the existing support for RTL8211F(D)(I)-CG PHY to the newer RTL8211F(D)(I)-VD-CG PHY. While that commit indicated that the RTL8211F_PHYCR2 register is not supported by the "VD-CG" PHY model and therefore updated the corresponding section in rtl8211f_config_init() to be invoked conditionally, the call to "genphy_soft_reset()" was left as-is, when it should have also been invoked conditionally. This is because the call to "genphy_soft_reset()" was first introduced by the commit 0a4355c2b7f8 ("net: phy: realtek: add dt property to disable CLKOUT clock") since the RTL8211F guide indicates that a PHY reset should be issued after setting bits in the PHYCR2 register. As the PHYCR2 register is not applicable to the "VD-CG" PHY model, fix the rtl8211f_config_init() function by invoking "genphy_soft_reset()" conditionally based on the presence of the "PHYCR2" register. Fixes: bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG") Signed-off-by: Siddharth Vadapalli <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22Merge branch 'ioam6-fix-write-to-cloned-skb-s'Paolo Abeni3-67/+76
Justin Iurman says: ==================== ioam6: fix write to cloned skb's Make sure the IOAM data insertion is not applied on cloned skb's. As a consequence, ioam selftests needed a refactoring. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22selftests: ioam: refactoring to align with the fixJustin Iurman2-67/+66
ioam6_parser uses a packet socket. After the fix to prevent writing to cloned skb's, the receiver does not see its IOAM data anymore, which makes input/forward ioam-selftests to fail. As a workaround, ioam6_parser now uses an IPv6 raw socket and leverages ancillary data to get hop-by-hop options. As a consequence, the hook is "after" the IOAM data insertion by the receiver and all tests are working again. Signed-off-by: Justin Iurman <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22Fix write to cloned skb in ipv6_hop_ioam()Justin Iurman1-0/+10
ioam6_fill_trace_data() writes inside the skb payload without ensuring it's writeable (e.g., not cloned). This function is called both from the input and output path. The output path (ioam6_iptunnel) already does the check. This commit provides a fix for the input path, inside ipv6_hop_ioam(). It also updates ip6_parse_tlv() to refresh the network header pointer ("nh") when returning from ipv6_hop_ioam(). Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Reported-by: Paolo Abeni <[email protected]> Signed-off-by: Justin Iurman <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22phonet/pep: fix racy skb_queue_empty() useRémi Denis-Courmont1-9/+32
The receive queues are protected by their respective spin-lock, not the socket lock. This could lead to skb_peek() unexpectedly returning NULL or a pointer to an already dequeued socket buffer. Fixes: 9641458d3ec4 ("Phonet: Pipe End Point for Phonet Pipes protocol") Signed-off-by: Rémi Denis-Courmont <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22phonet: take correct lock to peek at the RX queueRémi Denis-Courmont1-2/+2
The receive queue is protected by its embedded spin-lock, not the socket lock, so we need the former lock here (and only that one). Fixes: 107d0d9b8d9a ("Phonet: Phonet datagram transport protocol") Reported-by: Luosili <[email protected]> Signed-off-by: Rémi Denis-Courmont <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-22erofs: fix refcount on the metabuf used for inode lookupSandeep Dhavale1-14/+14
In erofs_find_target_block() when erofs_dirnamecmp() returns 0, we do not assign the target metabuf. This causes the caller erofs_namei()'s erofs_put_metabuf() at the end to be not effective leaving the refcount on the page. As the page from metabuf (buf->page) is never put, such page cannot be migrated or reclaimed. Fix it now by putting the metabuf from previous loop and assigning the current metabuf to target before returning so caller erofs_namei() can do the final put as it was intended. Fixes: 500edd095648 ("erofs: use meta buffers for inode lookup") Cc: <[email protected]> # 5.18+ Signed-off-by: Sandeep Dhavale <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Jingbo Xu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2024-02-21PPPoL2TP: Add more code snippetsSamuel Thibault1-6/+129
The existing documentation was not telling that one has to create a PPP channel and a PPP interface to get PPPoL2TP data offloading working. Also, tunnel switching was not mentioned, so that people were thinking it was not supported, while it actually is. Signed-off-by: Samuel Thibault <[email protected]> Acked-by: Tom Parkin <[email protected]> Link: https://lore.kernel.org/r/20240217211425.qj576u3jmaa6yidf@begin Signed-off-by: Jakub Kicinski <[email protected]>