Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Marcin Slusarz <[email protected]>
Cc: Christine Caulfield <[email protected]>
Cc: David Teigland <[email protected]>
Cc: [email protected]
Signed-off-by: David Teigland <[email protected]>
|
|
The semaphore connections_lock is used as a mutex. Convert it to the mutex
API.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Cc: Christine Caulfield <[email protected]>
Cc: David Teigland <[email protected]>
Cc: Steven Whitehouse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: David Teigland <[email protected]>
|
|
This patch addresses a problem introduced with the last round of
lowcomms patches where the 'othercon' connections do not get freed when
the DLM shuts down.
This results in the error message
"slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all
objects"
and the DLM cannot be restarted without a system reboot.
See bz#428119
Signed-off-by: Patrick Caulfield <[email protected]>
Signed-off-by: Fabio M. Di Nitto <[email protected]>
Signed-off-by: David Teigland <[email protected]>
|
|
A common problem occurs when multiple IP addresses within the same
subnet are assigned to the same NIC. If we make a connection attempt to
another address on the same subnet as one of those addresses, the
connection attempt will not necessarily be routed from the address we
want.
In the case of the DLM, the other nodes will quickly drop the connection
attempt, causing problems.
This patch makes the DLM bind to the local address it acquired from the
cluster manager when using TCP prior to making a connection, obviating
the need for administrators to "fix" their systems or use clever routing
tricks.
Signed-off-by: Lon Hohberger <[email protected]>
Signed-off-by: Patrick Caulfield <[email protected]>
Signed-off-by: David Teigland <[email protected]>
|
|
Use SO_RCVBUFFORCE instead.
Signed-off-by: David S. Miller <[email protected]>
|
|
Under high recovery loads dlm_sendd can monopolise the CPU and cause soft lockups.
This one extra and one moved cond_resched() make it yield a little more during
such times keeping work moving.
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
This patch fixes the slight mess made in lowcomms closing by previous patches
and fixes all sorts of DLM hangs.
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
The last patch to clean out 'othercon' structures only fixed half the problem.
The attached addresses the other situations too, and fixes bz#238490
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
When we build a sockaddr_storage for an IP address, clear the unused parts as
they could be used for node comparisons.
I have seen this occasionally make sctp connections fail.
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
This patch clears the othercon pointer and frees the memory when a connnection
is closed. This could cause a small memory leak when nodes leave the cluster.
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <[email protected]>
|
|
Cc: Steven Whitehouse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
This patch fixes Red Hat bz#245892
Opening a tcp connection from a cluster member to another cluster member
targeting the dlm port it is enough to stop every dlm operation in the cluster.
This means that GFS and rgmanager will hang.
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
This patch clears the user_data of active sockets as part of cleanup.
This prevents any late-arriving data from trying to add jobs to the work
queue while we are tidying up.
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-Off-By: David Teigland <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
Replace some printk with log_print, and fix some simple cases of lines
over 80. Also, return -ENOTCONN if lowcomms_start fails due to no local
IP address being available.
Signed-off-by: David Teigland <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
Fix a few range & initialization bugs in lowcomms.
- max_nodeid is really the highest nodeid encountered, so all loops must include
it in their iterations.
- clean dlm_local_count & connection_idr so we can do a clean restart.
- Remove a spurious BUG_ON
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
When you attempt to release a lockspace in DLM, it will hang trying to down a
semaphore that has already been downed. The attached patch fixes the problem.
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
Cc: Patrick Caulfield <[email protected]>
|
|
This patch consolidates the TCP & SCTP protocols for the DLM into a single file
and makes it switchable at run-time (well, at least before the DLM actually
starts up!)
For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel
socket API but that has already been twice ACKed so it should be OK.
The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c
& lowcomms-tcp.c files.
Signed-off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
The following patch adds a TCP based communications layer
to the DLM which is compile time selectable. The existing SCTP
layer gives the advantage of allowing multihoming, whereas
the TCP layer has been heavily tested in previous versions of
the DLM and is known to be robust and therefore can be used as
a baseline for performance testing.
Signed-off-by: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
I didn't spot that the msg_iovlen was set to 2 if there
were two elements in the iovec but left at zero if not :(
I think this might be why bob was still seeing trouble.
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
The DLM always passes the iovec length as 1, this is wrong when the circular
buffer wraps round.
Signed-Off-By: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Doing the kmap() while holding the spinlock was causing recursive spinlock
problems. It seems the kmap was scheduling, although there was no warning
as I'd expect. Patrick, do we need locking around the kmap?
Signed-off-by: David Teigland <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
The nodeinfo_lock rwsem needs to be initialized when the module is loaded
instead of when the dlm is first used.
Signed-off-by: David Teigland <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
Change names of local_nodeid to dlm_local_nodeid to prevent a
namespace collision. Changed other local variable to match.
Cc: David Teigland <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
When a node is removed from a lockspace configuration, close our
connection to it, clearing any remaining messages for it.
Signed-off-by: David Teigland <[email protected]>
Signed-off-by: Patrick Caulfield <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
This is the core of the distributed lock manager which is required
to use GFS2 as a cluster filesystem. It is also used by CLVM and
can be used as a standalone lock manager independantly of either
of these two projects.
It implements VAX-style locking modes.
Signed-off-by: David Teigland <[email protected]>
Signed-off-by: Steve Whitehouse <[email protected]>
|