Age | Commit message (Collapse) | Author | Files | Lines |
|
The objects which use cache handling should reference their own handler
object not the internal data structure it uses to track the nodes.
Have the "users" of the mmu notifier code pass opaque objects which can
then be properly used in the mmu callbacks depending on the owners needs.
This patch has the additional benefit that operations no longer require a
look up in a list to find the handlers.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is
opened, and unregisters it when the device is closed. The driver
incorrectly assumes that the close will always happen from the same
context as the open. In particular, closes due to SIGKILL or OOM killer
activity may happen from a different context. In these cases, the wrong
mm is passed to mmu_notifier_unregister(), which causes improper reference
counting for the victim mm, and eventual memory corruption.
Preserve the mm for all open file descriptors and use this mm rather than
current->mm for memory operations for the lifetime of that fd. Note: this
patch leaves 1 use of current->mm in place. This use is removed in a
follow on patch because other functional changes were required prior to
that use being removed.
If registration fails, there is no reason to keep the handler object
around. Free the handler object rather than add it to the list to
prevent any mmu_notifier operations, including unregister, when
registration fails.
Suggested-by: Jim Foraker <[email protected]>
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The user SDMA in-use claim bit is in the structure that gets zeroed out
once the claim is made. Move the request in-use flag into its own bit
array and use that for atomic claims. This cleans up the claim code and
removes any race possibility.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
If input validation fails, properly free the request before returning.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
If unable to insert node into the RB tree cache, node will be freed
before returning from the function. Null out iovec's pointer to node
so iovec does not try to free it later.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Save the current capability state at user context creation
time. Report this saved value for all shared contexts.
Also get rid of unnecessary hfi1_get_base_kinfo function.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
If a context has not been assigned or assignment failed, pq may be NULL.
Move the unregister within the protection of the null check.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Clarify the names of the TID mmu functions.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Checking if the rb tree is empty is redundant with the while loop which is
emptying the rb tree.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Rearrange the file open call in prep for new changes.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
For bool parameters "false" should be used
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
subctxt is not used, just remove it.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
__mmu_rb_remove was called in only 1 place which was a very simple
call site. Combine this function into its caller.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Remove, insert, and invalidate are always provided. No
need to test.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This makes it more clear what these functions are
operating on.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Parameter names to function declarations make it more clear
what those parameters do.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
These are no longer needed.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Brackets should be on the next line of a function
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Expand the serial number space by using more bits
from the GUID.
Reviewed-by: Jubin John <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The driver pads non-double word multiple message sizes but it doesn't
account for this padding when the packet length is calculated. Also, the
data length is miscalculated for message sizes less than 4 bytes due to
the bit representation in LRH. And there's a check for non-double word
multiple message sizes that prevents these messages from being sent.
This patch fixes length miscalculations and enables the functionality to
send non-double word multiple message sizes.
Reviewed-by: Harish Chegondi <[email protected]>
Signed-off-by: Sebastian Sanchez <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The use of the specific opcode test is redundant since
all ack entry users correctly manipulate the mr pointer
to selectively trigger the reference clearing.
The overly specific test hinders the use of implementation
specific operations.
The change needs to get rid of the union to insure that
an atomic value is not seen as an MR pointer.
Reviewed-by: Ashutosh Dixit <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Checking the return value of the memory allocation call in
init_pervl_scs() was missed. Recently the kmalloc() was changed to
kzalloc() which identified the problem.
While fixing this issue 2 other bugs were noticed. First, the array
being allocated is accessed in the nomem path which can be reached before
it is allocated. Second, kernel_send_context was not released on error.
Fix both of these by creating a more common memory unwind label structure.
Fixes: 35f6befc8441 ("staging/rdma/hfi1: Add qp to send context mapping for PIO")
Reported-by: Leon Romanovsky <[email protected]>
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Rename the deferred bmap-free to extent_free and make them only
trigger when we're really running deferred ops.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Nothing ever uses the extent array in the rmap update done redo
item, so remove it before it is fixed in the on-disk log format.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
We only need the temporary cursor in _btree_lshift if we're shifting
in an overlapped btree. Therefore, factor that into a single block
of code so we avoid unnecessary cursor duplication.
Also fix use of the wrong cursor when checking for corruption in
xfs_btree_rshift().
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
In the lshift/rshift functions we don't use the key variable for
anything now, so remove the variable and its initializer. The
update_keys functions figure out the key for a block on their own.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
These are internal btree functions; we don't need them to be
dispatched via function pointers. Make them static again and
just check the overlapped flag to figure out what we need to
do. The strategy behind this patch was suggested by Christoph.
Signed-off-by: Darrick J. Wong <[email protected]>
Suggested-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Originally-From: Dave Chinner <[email protected]>
Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems
Signed-off-by: Dave Chinner <[email protected]>
[[email protected]: move the experimental tag]
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Allow a caller of xfs_alloc_fix_freelist to disable rmapbt updates
when fixing the AG freelist. xfs_repair needs this during phase 5
to be able to adjust the freelist while it's reconstructing the rmap
btree; the missing entries will be added back at the very end of
phase 5 once the AGFL contents settle down.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Swapping extents between two inodes requires the owner to be updated
in the rmap tree for all the extents that are swapped. This code
does not yet exist, so switch off the XFS_IOC_SWAPEXT ioctl until
support has been implemented. This will need to be done before the
rmap btree code can have the experimental tag removed.
This functionality will be provided in a (much) later patch, using
some of the reflink deferred block remapping functionality to
accomlish extent swapping with rmap updates.
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Originally-From: Dave Chinner <[email protected]>
So such blocks can be correctly identified and have their operations
structures attached to validate recovery has not resulted in a
correct block.
Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Originally-From: Dave Chinner <[email protected]>
So xfs_info and other userspace utilities know the filesystem is
using this feature.
Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
When we map, unmap, or convert an extent in a file's data or attr
fork, schedule a respective update in the rmapbt. Previous versions
of this patch required a 1:1 correspondence between bmap and rmap,
but this is no longer true as we now have ability to make interval
queries against the rmapbt.
We use the deferred operations code to handle redo operations
atomically and deadlock free. This plumbs in all five rmap actions
(map, unmap, convert extent, alloc, free); we'll use the first three
now for file data, and reflink will want the last two. We also add
an error injection site to test log recovery.
Finally, we need to fix the bmap shift extent code to adjust the
rmaps correctly.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred rmap updates. We'll wire up the existing code to
our new deferred mechanism later.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Provide a mechanism for higher levels to create RUI/RUD items, submit
them to the log, and a stub function to deal with recovered RUI items.
These parts will be connected to the rmapbt in a later patch.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Create rmap update intent/done log items to record redo information in
the log. Because we need to roll transactions between updating the
bmbt mapping and updating the reverse mapping, we also have to track
the status of the metadata updates that will be recorded in the
post-roll transactions, just in case we crash before committing the
final transaction. This mechanism enables log recovery to finish what
was already started.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Add a couple of helper functions to encapsulate rmap btree insert and
delete operations. Add tracepoints to the update function.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Provide a function to convert an unwritten rmap extent to a real one
and vice versa.
[ dchinner: Note that this algorithm and code was derived from the
existing bmapbt unwritten extent conversion code in
xfs_bmap_add_extent_unwritten_real(). ]
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Originally-From: Dave Chinner <[email protected]>
Now that we have records in the rmap btree, we need to remove them
when extents are freed. This needs to find the relevant record in
the btree and remove/trim/split it accordingly.
[[email protected]: make rmap routines handle the enlarged keyspace]
[dchinner: remove remaining unused debug printks]
[darrick: fix a bug when growfs in an AG with an rmap ending at EOFS]
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Originally-From: Dave Chinner <[email protected]>
Now all the btree, free space and transaction infrastructure is in
place, we can finally add the code to insert reverse mappings to the
rmap btree. Freeing will be done in a separate patch, so just the
addition operation can be focussed on here.
[darrick: handle owner offsets when adding rmaps]
[dchinner: remove remaining debug printk statements]
[darrick: move unwritten bit to rm_offset]
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Now that the generic btree code supports querying all records within a
range of keys, use that functionality to allow us to ask for all the
extents mapped to a range of physical blocks.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Now that the generic btree code supports overlapping intervals, plug
in the rmap btree to this functionality. We will need it to find
potential left neighbors in xfs_rmap_{alloc,free} later in the patch
set.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Originally-From: Dave Chinner <[email protected]>
Implement the generic btree operations needed to manipulate rmap
btree blocks. This is very similar to the per-ag freespace btree
implementation, and uses the AGFL for allocation and freeing of
blocks.
Adapt the rmap btree to store owner offsets within each rmap record,
and to handle the primary key being redefined as the tuple
[agblk, owner, offset]. The expansion of the primary key is crucial
to allowing multiple owners per extent.
[darrick: adapt the btree ops to deal with offsets]
[darrick: remove init_rec_from_key]
[darrick: move unwritten bit to rm_offset]
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|