aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2010-05-21drbd: Fix: Do not detach, if a bio with a barrier failsPhilipp Reisner1-1/+1
Introduced a few days ago: commit 45bb912bd5ea4d2b3a270a93cbdf767a0e2df6f5 Author: Lars Ellenberg <[email protected]> Date: Fri May 14 17:10:48 2010 +0200 Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21drbd: Ensure to not trigger late-new-UUID creation multiple timesPhilipp Reisner1-7/+11
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21drbd: Do not Oops when C_STANDALONE when uuid gets generatedPhilipp Reisner1-1/+4
Got introduces with commit 0c3f34516e8c5a1a0ba3585a7777d32bbbdf4ecb Author: Philipp Reisner <[email protected]> Date: Mon May 17 16:10:43 2010 +0200 drbd: Create new current UUID as late as possible Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21writeback: fix mixed up arguments to bdi_start_writeback()Jens Axboe1-1/+1
The laptop mode timer had the nr_pages and sb_locked arguments mixed up. Signed-off-by: Jens Axboe <[email protected]>
2010-05-21writeback: fix problem with !CONFIG_BLOCK compilationJens Axboe3-0/+7
When CONFIG_BLOCK isn't enabled: mm/page-writeback.c: In function 'laptop_mode_timer_fn': mm/page-writeback.c:708: error: dereferencing pointer to incomplete type mm/page-writeback.c:709: error: dereferencing pointer to incomplete type Fix this by essentially eliminating the laptop sync handlers when CONFIG_BLOCK isn't set, as most are only used from the block layer code. The exception is laptop_sync_completion() which is used from sys_sync(), make that an empty declaration in that case. Reported-by: Randy Dunlap <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21block: improve automatic native capacity unlockingTejun Heo2-14/+60
Currently, native capacity unlocking is initiated only when a recognized partition extends beyond the end of the disk. However, there are several other unhandled cases where truncated capacity can lead to misdetection of partitions. * Partition table is fully beyond EOD. * Partition table is partially beyond EOD (daisy chained ones). * Recognized partition starts beyond EOD. This patch updates generic partition check code such that all the above three cases are handled too. For the first two, @state tracks whether low level partition check code tried to read beyond EOD during partition scan and triggers native capacity unlocking accordingly. The third is now handled similarly to the original unlocking case. Signed-off-by: Tejun Heo <[email protected]> Cc: Ben Hutchings <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21block: use struct parsed_partitions *state universally in partition check codeTejun Heo30-239/+225
Make the following changes to partition check code. * Add ->bdev to struct parsed_partitions. * Introduce read_part_sector() which is a simple wrapper around read_dev_sector() which takes struct parsed_partitions *state instead of @bdev. * For functions which used to take @state and @bdev, drop @bdev. For functions which used to take @bdev, replace it with @state. * While updating, drop superflous checks on NULL state/bdev in ldm.c. This cleans up the API a bit and enables better handling of IO errors during partition check as the generic partition check code now has much better visibility into what went wrong in the low level code paths. Signed-off-by: Tejun Heo <[email protected]> Cc: Ben Hutchings <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21block,ide: simplify bdops->set_capacity() to ->unlock_native_capacity()Tejun Heo5-36/+24
bdops->set_capacity() is unnecessarily generic. All that's required is a simple one way notification to lower level driver telling it to try to unlock native capacity. There's no reason to pass in target capacity or return the new capacity. The former is always the inherent native capacity and the latter can be handled via the usual device resize / revalidation path. In fact, the current API is always used that way. Replace ->set_capacity() with ->unlock_native_capacity() which take only @disk and doesn't return anything. IDE which is the only current user of the API is converted accordingly. Signed-off-by: Tejun Heo <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Bartlomiej Zolnierkiewicz <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21block: restart partition scan after resizing a deviceTejun Heo1-10/+6
Device resize via ->set_capacity() can reveal new partitions (e.g. in chained partition table formats such as dos extended parts). Restart partition scan from the beginning after resizing a device. This change also makes libata always revalidate the disk after resize which makes lower layer native capacity unlocking implementation simpler and more robust as resize can be handled in the usual path. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Ben Hutchings <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21buffer: make invalidate_bdev() drain all percpu LRU add cachesTejun Heo1-0/+1
invalidate_bdev() should release all page cache pages which are clean and not being used; however, if some pages are still in the percpu LRU add caches on other cpus, those pages are considered in used and don't get released. Fix it by calling lru_add_drain_all() before trying to invalidate pages. This problem was discovered while testing block automatic native capacity unlocking. Null pages which were read before automatic unlocking didn't get released by invalidate_bdev() and ended up interfering with partition scan after unlocking. Signed-off-by: Tejun Heo <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21block: remove all rcu head initializationsPaul E. McKenney2-2/+0
Remove all rcu head inits. We don't care about the RCU head state before passing it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can keep track of objects on stack. Signed-off-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-21writeback: fixups for !dirty_writeback_centisecsJens Axboe3-5/+12
Commit 69b62d01 fixed up most of the places where we would enter busy schedule() spins when disabling the periodic background writeback. This fixes up the sb timer so that it doesn't get hammered on with the delay disabled, and ensures that it gets rearmed if needed when /proc/sys/vm/dirty_writeback_centisecs gets modified. bdi_forker_task() also needs to check for !dirty_writeback_centisecs and use schedule() appropriately, fix that up too. Signed-off-by: Jens Axboe <[email protected]>
2010-05-21writeback: bdi_writeback_task() must set task state before calling schedule()Jens Axboe1-2/+7
Calling schedule without setting the task state to non-running will return immediately, so ensure that we set it properly and check our sleep conditions after doing so. This is a fixup for commit 69b62d01. Signed-off-by: Jens Axboe <[email protected]>
2010-05-21writeback: ensure that WB_SYNC_NONE writeback with sb pinned is syncJens Axboe1-5/+11
Even if the writeout itself isn't a data integrity operation, we need to ensure that the caller doesn't drop the sb umount sem before we have actually done the writeback. This is a fixup for commit e913fc82. Signed-off-by: Jens Axboe <[email protected]>
2010-05-18drivers/block/drbd: Use kzallocJulia Lawall1-2/+1
Use kzalloc rather than the combination of kmalloc and memset. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,size,flags; statement S; @@ -x = kmalloc(size,flags); +x = kzalloc(size,flags); if (x == NULL) S -memset(x, 0, size); // </smpl> Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Create new current UUID as late as possiblePhilipp Reisner3-6/+39
The choice was to either delay creation of the new UUID until IO got thawed or to delay it until the first IO request. Both are correct, the later is more friendly to users of dual-primary setups, that actually only write on one side. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: If we detect late that IO got frozen, retry after we thawed.Philipp Reisner2-9/+28
If we detect late (= after grabing mdev->req_lock) that IO got frozen, we return 1 to generic_make_request(), which simply will retry to make a request for that bio. In the subsequent call of generic_make_request() into drbd_make_request_26() we sleep in inc_ap_bio(). Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: always use_bmbv, ignore settingLars Ellenberg3-5/+6
Now that the peer may handle multi-bio EEs, we can ignore the peer's limit, and concentrate on the limits of the local IO stack. This is safe accross drbd protocol versions, as our queue_max_sectors() will be adjusted accordingly. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: allow resync requests to be larger than max_segment_sizeLars Ellenberg1-7/+6
this should allow for better background resync performance. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Allow drbd_epoch_entries to use multiple bios.Lars Ellenberg6-324/+480
This should allow for better performance if the lower level IO stack of the peers differs in limits exposed either via the queue, or via some merge_bvec_fn. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: reduce sizeof struct drbd_epoch_entry by 8 byte by aligning membersLars Ellenberg1-5/+1
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Fixes to the new delay_probes codePhilipp Reisner3-9/+8
* Only send delay_probes with protocol 93 or newer * drbd_send_delay_probes() is called only from worker context, no atomic_t needed for delay_seq Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: A fixes to the new resync speed codePhilipp Reisner2-4/+6
* Mention P_DELAY_PROBE in the packet naming array * Do not corrupt the mdev->data.work list in case the timer goes off before delay_probe_work got handled by the worker * Do not mod_timer() twice for a single delay_probe pair Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Proc bits of new resync speed stuffPhilipp Reisner1-2/+18
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Control the actual resync rate based on the queuing delay of data packetsPhilipp Reisner1-1/+14
In a setup with a high bandwidth and high latency network, eventually involving deep queues in routers, it is beneficial to only fill those queues up to an limited extend with resync data. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Actually send delay probesPhilipp Reisner2-2/+47
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Four new configuration settings for resync speed controlPhilipp Reisner3-0/+24
To reasonably control resync speed over drbd-proxy connections, drbd has to measure the current delay of packets transmitted over the (possibly congested) data socket vs the meta-data socket. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Sending of delay_probesPhilipp Reisner2-0/+34
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Receiving of delay_probesPhilipp Reisner3-1/+118
Delay_probes are new packets in the DRBD protocol, which allow DRBD to know the current delay packets have on the data socket. (relative to the meta data socket) Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Fixed bitmap in case of online-grow without resyncPhilipp Reisner1-4/+1
The "surplus" bits of the old (smaller) bitmap must be clean in case of online-grow without resync. Note: Reverted 67ae8b80d4a116ab3b7094eb3723506b20c06dff as well, since the lines added by this patch are redundant. The bits get set by the bm_set_surplus(b) call before that. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Added transmission faults to the fault injection codePhilipp Reisner3-2/+10
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: bugfix: Make resize work, if remote's size was limiting and increased ↵Philipp Reisner1-5/+2
in the meantime Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Implemented the --assume-clean option for drbdsetup resizePhilipp Reisner3-3/+12
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Added some missing staticsPhilipp Reisner1-4/+4
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Make sure to resync all of the new storage upon online resizePhilipp Reisner1-0/+6
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Implemented flags for the resize packetPhilipp Reisner4-18/+25
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Implemented the set_new_bits parameter for drbd_bm_resize()Philipp Reisner4-6/+10
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: made determin_dev_size's parameter an flag enumPhilipp Reisner2-4/+8
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: New handler: initial-split-brainAdam Gandelman1-1/+4
Some wish to be notified of all instances of split brain, not just those that go unresolved. The initial-split-brain handler is called to notify someone upon detection of all split brain conditions even if auto-recovery policies are configured. Signed-off-by: Adam Gandelman <[email protected]> Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: fail_requests_early: remove incorrect and unnecessary optimizationLars Ellenberg1-5/+0
The condition does not fit the commend (I may well be Primary, even if I lost the disk earlier and now the connection). And this is catched below anyways, where it also gets logged. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: check for corrupt or malicous sector addresses when receiving dataLars Ellenberg1-0/+10
Even if it should never happen if the peer does behave, we need to double check, and not even attempt access beyond end of device. It usually would be caught by lower layers, resulting in "IO error", but may also end up in the internal meta data area. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: cleanup: This code path to trigger a resync is no longer neededPhilipp Reisner1-17/+0
Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: don't start a resync without access to up-to-date DataLars Ellenberg2-1/+4
In case both nodes are "inconsistent", invalidate would have started a resync anyways, without a chance to ever succeed, just filling the logs with warning messages. Simply disallow that state change, re-using the SS_NO_UP_TO_DATE_DISK return value. This also changes the corresponding error string to "Need access to UpToDate Data" -- I found the "Refusing to be Primary without at least one UpToDate disk" answer misleading in some situations anyways. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: fix potential protocol errorLars Ellenberg1-1/+4
Don't forget to drain the digest in case we cannot satisfy a checksum based resync or online-verify request. It would additionally cause a protocoll error, dropping the connection. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: remove bogus ASSERTLars Ellenberg1-1/+0
block_id may be ID_SYNCER, as well as checksum based resync request magic, or online verify magic. Let's just drop that ASSERT. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: fix regression: attach while connected failedLars Ellenberg1-1/+6
commit e4f925e12ea5daaa9baf2dd5af9c4951721dae95 Author: Philipp Reisner <[email protected]> Date: Wed Mar 17 14:18:41 2010 +0100 drbd: Do not upgrade state to Outdated if already Inconsistent prevented the necessary state transition for attaching while connected (Diskless -> Consistent respectively Outdated). This is the fix for the fix. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: Do not upgrade state to Outdated if already Inconsistent [Bugz 277]Philipp Reisner1-1/+1
There was a race condition: In a situation with a SyncSource+Primary and a SyncTarget+Secondary node, and a resync dependency to some other device. After both nodes decided to do the resync, the other device finishes its resync process. At that time SyncSource already sent the P_SYNC_UUID packet, and already updated its peer disk state to Inconsistent. The SyncTarget node waits for the P_SYNC_UUID and sends a state packet to report the resync dependency change. That packet still carries a disk state of Outdated. Impact: If application writes come in, during that time on the Primary node, those do not get replicated, and the out-of-sync counter gets increased. => The completion of resync is not detected on the primary node. => stalled. Those blocks get resync'ed with the next resync, since the are get marked as out-of-sync in the bitmap. In order to fix this, we filter out that wrong state change in the sanitize_state() function. Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-18drbd: use proc_create_data with explicit NULL argumentLars Ellenberg1-1/+1
To document that we know about deprecation of proc_create, even though we are not affected, as we don't use the ->data member, open code proc_create_data(..., NULL); Signed-off-by: Philipp Reisner <[email protected]> Signed-off-by: Lars Ellenberg <[email protected]>
2010-05-17writeback: Update dirty flags in two stepsDmitry Monakhov1-4/+11
Filesystems with delalloc support may dirty inode during writepages. As result inode will have dirty metadata flags even after write_inode. In fact we have two dedicated functions for proper data and metadata writeback. It is reasonable to separate flags updates in two stages. https://bugzilla.kernel.org/show_bug.cgi?id=15906 Signed-off-by: Dmitry Monakhov <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-17writeback: fix WB_SYNC_NONE writeback from umountJens Axboe5-15/+51
When umount calls sync_filesystem(), we first do a WB_SYNC_NONE writeback to kick off writeback of pending dirty inodes, then follow that up with a WB_SYNC_ALL to wait for it. Since umount already holds the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all writeback happens as WB_SYNC_ALL. This can greatly slow down umount, since WB_SYNC_ALL writeback is a data integrity operation and thus a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems it's a lot slower. Signed-off-by: Jens Axboe <[email protected]>