Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull VFIO updates from Alex Williamson:
- Block accesses to disabled MMIO space (Alex Williamson)
- VFIO device migration API (Kirti Wankhede)
- type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede)
- PCI NULL capability masking (Alex Williamson)
- Memory leak fixes (Qian Cai)
- Reference leak fix (Qiushi Wu)
* tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio:
vfio iommu: typecast corrections
vfio iommu: Use shift operation for 64-bit integer division
vfio/mdev: Fix reference count leak in add_mdev_supported_type
vfio: Selective dirty page tracking if IOMMU backed device pins pages
vfio iommu: Add migration capability to report supported features
vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
vfio iommu: Implementation of ioctl for dirty pages tracking
vfio iommu: Add ioctl definition for dirty pages tracking
vfio iommu: Cache pgsize_bitmap in struct vfio_iommu
vfio iommu: Remove atomicity of ref_count of pinned pages
vfio: UAPI for migration interface for device state
vfio/pci: fix memory leaks of eventfd ctx
vfio/pci: fix memory leaks in alloc_perm_bits()
vfio-pci: Mask cap zero
vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
vfio-pci: Fault mmaps to enable vma tracking
vfio/type1: Support faulting PFNMAP vmas
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull READ_IMPLIES_EXEC changes from Borislav Petkov:
"Split the old READ_IMPLIES_EXEC workaround from executable
PT_GNU_STACK now that toolchains long support PT_GNU_STACK marking and
there's no need anymore to force modern programs into having all its
user mappings executable instead of only the stack and the PROT_EXEC
ones.
Disable that automatic READ_IMPLIES_EXEC forcing on x86-64 and
arm64.
Add tables documenting how READ_IMPLIES_EXEC is handled on x86-64, arm
and arm64.
By Kees Cook"
* tag 'core_core_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64/elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces
arm32/64/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
arm32/64/elf: Add tables to document READ_IMPLIES_EXEC
x86/elf: Disable automatic READ_IMPLIES_EXEC on 64-bit
x86/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
x86/elf: Add table to document READ_IMPLIES_EXEC
|
|
Use kfree(buf) in blocked_fl_read() because the memory is allocated with
kzalloc(). Use kfree(t) in blocked_fl_write() because the memory is
allocated with kcalloc().
Signed-off-by: Denis Efremov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This fixes a crash introduced by recent is_kdump_kernel() check.
The source of the crash is that kdump kernel can be loaded on a
system with already created VFs. But for such VFs, it will follow
a logic path of PF and eventually crash.
Thus, we are partially reverting back previous changes and instead
use is_kdump_kernel is a single init point of PF init, where we
disable SRIOV explicitly.
Fixes: 37d4f8a6b41f ("net: qed: Disable SRIOV functionality inside kdump kernel")
Cc: Bhupesh Sharma <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: Alok Prasad <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fix the following gcc-9.3 warning when building with 'make W=1':
net/vmw_vsock/vmci_transport.c:2058:6: warning: no previous prototype
for ‘vmci_vsock_transport_cb’ [-Wmissing-prototypes]
2058 | void vmci_vsock_transport_cb(bool is_host)
| ^~~~~~~~~~~~~~~~~~~~~~~
Fixes: b1bba80a4376 ("vsock/vmci: register vmci_transport only when VMCI guest/host are active")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This has no code changes, but it's a typo noticed in other clean-ups,
so we might as well fix it. IS_ENABLED() takes full names, and should
have the "CONFIG_" prefix.
Reported-by: Joe Perches <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When converting the MSCC PHY driver to shared PHY packages, the Serdes
configuration in vsc8584_config_init was modified to use 'base_addr'
instead of 'base' as the port number. But 'base_addr' isn't equal to
'addr' for all PHYs inside the package, which leads to the Serdes still
being enabled on those ports. This patch fixes it.
Fixes: deb04e9c0ff2 ("net: phy: mscc: use phy_package_shared")
Signed-off-by: Antoine Tenart <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Dan Murphy says:
====================
Fixes for OF_MDIO flag
There are some residual drivers that check the CONFIG_OF_MDIO flag using the
if defs. Using this check does not work when the OF_MDIO is configured as a
module. Using the IS_ENABLED macro checks if the flag is declared as built-in
or as a module.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
When CONFIG_OF_MDIO is set to be a module the code block is not
compiled. Use the IS_ENABLED macro that checks for both built in as
well as module.
Fixes: 4f58e6dceb0e4 ("net: phy: Cleanup the Edge-Rate feature in Microsemi PHYs.")
Signed-off-by: Dan Murphy <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When CONFIG_OF_MDIO is set to be a module the code block is not
compiled. Use the IS_ENABLED macro that checks for both built in as
well as module.
Fixes: cf41a51db8985 ("of/phylib: Use device tree properties to initialize Marvell PHYs.")
Signed-off-by: Dan Murphy <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When CONFIG_OF_MDIO is set to be a module the code block is not
compiled. Use the IS_ENABLED macro that checks for both built in as
well as module.
Fixes: 2a10154abcb75 ("net: phy: dp83867: Add TI dp83867 phy")
Signed-off-by: Dan Murphy <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When CONFIG_OF_MDIO is set to be a module the code block is not
compiled. Use the IS_ENABLED macro that checks for both built in as
well as module.
Fixes: 01db923e83779 ("net: phy: dp83869: Add TI dp83869 phy")
Signed-off-by: Dan Murphy <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit ca23cb0bc50f ("mvneta: MVNETA_SKB_HEADROOM set last 3 bits to zero")
added headroom alignment check against 8.
Hovewer (if we imagine that NET_SKB_PAD or XDP_PACKET_HEADROOM is not
aligned to cacheline size), it actually aligns headroom down, while
skb/xdp_buff headroom should be *at least* equal to one of the values
(depending on XDP prog presence).
So, fix the check to align the value up. This satisfies both
hardware/driver and network stack requirements.
Fixes: ca23cb0bc50f ("mvneta: MVNETA_SKB_HEADROOM set last 3 bits to zero")
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This code generates a Smatch warning:
net/ethtool/linkinfo.c:143 ethnl_set_linkinfo()
warn: variable dereferenced before check 'info' (see line 119)
Fortunately, the "info" pointer is never NULL so the check can be
removed.
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Michal Kubecek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to
make it safe against parallel page table manipulations without
relying on an IPI for serialisation.
- A series of fixes & enhancements to make our machine check handling
more robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions
on Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
Wolfram Sang, Xiongfeng Wang.
* tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
powerpc/pseries: Make vio and ibmebus initcalls pseries specific
cxl: Remove dead Kconfig options
powerpc: Add POWER10 architected mode
powerpc/dt_cpu_ftrs: Add MMA feature
powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
powerpc: Add support for ISA v3.1
powerpc: Add new HWCAP bits
powerpc/64s: Don't set FSCR bits in INIT_THREAD
powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
powerpc/64s: Don't let DT CPU features set FSCR_DSCR
powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
powerpc/module_64: Consolidate ftrace code
powerpc/32: Disable KASAN with pages bigger than 16k
powerpc/uaccess: Don't set KUEP by default on book3s/32
powerpc/uaccess: Don't set KUAP by default on book3s/32
powerpc/8xx: Reduce time spent in allow_user_access() and friends
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
- Harden CONFIG_STRICT_MODULE_RWX by rejecting any module that has
SHF_WRITE|SHF_EXECINSTR sections
- Remove and clean up nested #ifdefs, as it makes code hard to read
* tag 'modules-for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Harden STRICT_MODULE_RWX
module: break nested ARCH_HAS_STRICT_MODULE_RWX and STRICT_MODULE_RWX #ifdefs
|
|
|
|
Before this patch, transactions could be merged into the system
transaction by function gfs2_merge_trans(), but the transaction ail
lists were never merged. Because the ail flushing mechanism can run
separately, bd elements can be attached to the transaction's buffer
list during the transaction (trans_add_meta, etc) but quickly moved
to its ail lists. Later, in function gfs2_trans_end, the transaction
can be freed (by gfs2_trans_end) while it still has bd elements
queued to its ail lists, which can cause it to either lose track of
the bd elements altogether (memory leak) or worse, reference the bd
elements after the parent transaction has been freed.
Although I've not seen any serious consequences, the problem becomes
apparent with the previous patch's addition of:
gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list));
to function gfs2_trans_free().
This patch adds logic into gfs2_merge_trans() to move the merged
transaction's ail lists to the sdp transaction. This prevents the
use-after-free. To do this properly, we need to hold the ail lock,
so we pass sdp into the function instead of the transaction itself.
Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
This patch adds a new slab for gfs2 transactions. That allows us to
reduce kernel memory fragmentation, have better organization of data
for analysis of vmcore dumps. A new centralized function is added to
free the slab objects, and it exposes use-after-free by giving
warnings if a transaction is freed while it still has bd elements
attached to its buffers or ail lists. We make sure to initialize
those transaction ail lists so we can check their integrity when freeing.
At a later time, we should add a slab initialization function to
make it more efficient, but for this initial patch I wanted to
minimize the impact.
Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
Since transactions may be freed shortly after they're created, before
a log_flush occurs, we need to initialize their ail1 and ail2 lists
earlier. Before this patch, the ail1 list was initialized in gfs2_log_flush().
This moves the initialization to the point when the transaction is first
created.
Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
queue_limits::logical_block_size got changed from unsigned short to
unsigned int, but it was forgotten to update crypt_io_hints() to use the
new type. Fix it.
Fixes: ad6bf88a6c19 ("block: fix an integer overflow in logical block size")
Cc: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
When there are many DM multipath devices it really helps to have
additional context for which DM device a failed or reinstated path is
part of.
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Add more DMDEBUG that shows arguments passed and caller, and another
that shows state of related flags at end of queue_if_no_path().
Also add queue_if_no_path DMDEBUG to multipath_resume().
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Do not allow saving disabled queue_if_no_path if already saved as
enabled; implies multiple suspends (which shouldn't ever happen). Log
if this unlikely scenario is ever triggered.
Also, only write MPATHF_SAVED_QUEUE_IF_NO_PATH during presuspend or if
"fail_if_no_path" message. MPATHF_SAVED_QUEUE_IF_NO_PATH is no longer
always modified, e.g.: even if queue_if_no_path()'s save_old_value
argument wasn't set. This just implies a bit tighter control over
the management of MPATHF_SAVED_QUEUE_IF_NO_PATH. Side-effect is
multipath_resume() doesn't reset MPATHF_QUEUE_IF_NO_PATH unless
MPATHF_SAVED_QUEUE_IF_NO_PATH was set (during presuspend); and at that
time the MPATHF_SAVED_QUEUE_IF_NO_PATH bit gets cleared. So
MPATHF_SAVED_QUEUE_IF_NO_PATH's use is much more narrow in scope.
Last, but not least, do _not_ disable queue_if_no_path during noflush
suspend. There is no need/benefit to saving off queue_if_no_path via
MPATHF_SAVED_QUEUE_IF_NO_PATH and clearing MPATHF_QUEUE_IF_NO_PATH for
noflush suspend -- by avoiding this needless queue_if_no_path flag
churn there is less potential for MPATHF_QUEUE_IF_NO_PATH to get lost.
Which avoids potential for IOs to be errored back up to userspace
during DM multipath's handling of path failures.
That said, this last change papers over a reported issue concerning
request-based dm-multipath's interaction with blk-mq, relative to
suspend and resume: multipath_endio is being called _before_
multipath_resume. This should never happen if DM suspend's
blk_mq_quiesce_queue() + dm_wait_for_completion() is genuinely waiting
for all inflight blk-mq requests to complete. Similarly:
drivers/md/dm.c:__dm_resume() clearly calls dm_table_resume_targets()
_before_ dm_start_queue()'s blk_mq_unquiesce_queue() is called. If
the queue isn't even restarted until after multipath_resume(); the BIG
question that still needs answering is: how can multipath_end_io beat
multipath_resume in a race!?
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Remove micro-optimization that infers device is between presuspend and
resume (was done purely to avoid call to dm_noflush_suspending, which
isn't expensive anyway).
Remove flags argument since they are no longer checked.
And remove must_push_back_bio() since it was simply a call to
__must_push_back().
Signed-off-by: Mike Snitzer <[email protected]>
|
|
When specifying several devices the superblock location must be
checked to ensure the devices are specified in the correct order.
Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Prefer full zones when selecting the next zone for reclaim.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
per-device reclaim should select zones on that device only.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
When allocating a zone, pass in an indicator on which device the zone
should be allocated; this increases performance for a multi-device
setup because reclaim will now allocate zones on the device for which
reclaim is running.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Remove the hard-coded limit of two devices and support an unlimited
number of additional zoned devices.
Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Random and sequential zones should be part of the respective
device structure to make arbitration between devices possible.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Instead of having one reclaim workqueue for the entire set we should
be allocating a reclaim workqueue per device; doing so will reduce
contention and should boost performance for a multi-device setup.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Add a metadata pointer within struct dmz_dev and use it as argument
for blkdev_report_zones() instead of the metadata itself.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Add a pointer, to the containing device, within struct dm_zone and
kill dmz_zone_to_dev().
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Checking the tertiary superblock just consists of validating UUIDs,
crcs, and the generation number; it doesn't have contents which would
be required during the actual operation.
So allocate a temporary superblock when checking tertiary devices to
avoid having to store it together with the 'real' superblocks.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
The zones array is getting really large, and large arrays tend to
wreak havoc with the CPU caches. So convert it to xarray to become
more cache friendly.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Colin Ian King <[email protected]> # fix leak in dmz_insert
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Instead of counting the number of reserved zones in dmz_free_zone(),
mark the zone as 'reserved' during allocation and simplify
dmz_free_zone().
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Instead of just reporting the errno, add some more verbose debugging
message in the reclaim path.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
The secondary superblock must reside on the same device as the primary
superblock, so there is no need to re-calculate the device.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Use dm_bufio_forget_buffers instead of a block-by-block loop that
calls dm_bufio_forget. dm_bufio_forget_buffers is faster than the loop
because it searches for used buffers using rb-tree.
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Introduce a function forget_buffer_locked that forgets a range of
buffers. It is more efficient than calling forget_buffer in a loop.
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
dm-bufio uses unnatural ordering in the rb-tree - blocks with smaller
numbers were put to the right node and blocks with bigger numbers were
put to the left node.
Reverse that logic so that it's natural.
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
When trying to upgrade the iopen glock from a shared to an exclusive lock in
gfs2_evict_inode, abort the wait if there is contention on the corresponding
inode glock: in that case, the inode must still be in active use on another
node, and we're not guaranteed to get the iopen glock anytime soon.
To make this work even better, when we notice contention on the iopen glock and
we can't evict the corresponsing inode and release the iopen glock immediately,
poke the inode glock. The other node(s) trying to acquire the lock can then
abort instead of timing out.
Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous
version of this patch.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
Wake up the sdp->sd_async_glock_wait wait queue when setting the GLF_DEMOTE
flag.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
In delete_work_func, if the iopen glock still has an inode attached,
limit the inode lookup to that specific generation number: in the likely
case that the inode was deleted on the node on which the inode's link
count dropped to zero, we can skip verifying the on-disk block type and
reading in the inode. The same applies if another node that had the
inode open managed to delete the inode before us.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
Move the inode generation number check from gfs2_lookup_by_inum into
gfs2_inode_lookup: gfs2_inode_lookup may be able to decide that an inode with
the given inode generation number cannot exist without having to verify the
block type or reading the inode from disk.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode
generation number will qualify: a valid inode never has a zero no_formal_ino.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
When an inode's link count drops to zero and the inode is cached on
other nodes, the current behavior of gfs2 is to immediately give up and
to rely on the other node(s) to delete the inode if there is iopen glock
contention. This leads to resource group glock bouncing and the loss of
caching. With the previous patches in place, we can fix that by not
giving up immediately.
When the inode is still open on other nodes, those nodes won't be able
to evict the inode and give up the iopen glock. In that case, our lock
conversion request will time out. The unlink system call will block for
the duration of the iopen lock conversion request. We're also holding
the inode glock in EX mode for an extended duration, so other nodes
won't be able to make progress on the inode, either.
This is worse than what we had before, but we can prevent other nodes
from getting stuck by aborting our iopen locking request if there is
contention on the inode glock. This will the the subject of a future
patch.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|