aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2009-06-22fusion: mptsas, fix lock imbalanceJiri Slaby1-2/+2
Fix two typos in mptsas_not_responding_devices. It was mutex_lock instead of unlock. Signed-off-by: Jiri Slaby <[email protected]> Acked-by: "Desai, Kashyap" <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2009-06-22Merge branch 'next-s3c' of git://aeryn.fluff.org.uk/bjdooks/linux into develRussell King4-7/+13
2009-06-22[ARM] S3C: Fix gpio-config off-by-one bugMarek Szyprowski1-1/+1
Fix gpio-config off-by-one bug. Without this patch, touching GPA0 pin on S3C64XX platform causes kernel oops. Reviewed-by: Kyungmin Park <[email protected]> Signed-off-by: Marek Szyprowski <[email protected]> Signed-off-by: Ben Dooks <[email protected]>
2009-06-22[ARM] S3C64XX: add to_irq() support for EINT() GPIOMarek Szyprowski1-0/+6
N group Add to_irq() function to onvert gpio to irq for external interrupt group (GPN). Signed-off-by: Marek Szyprowski <[email protected]> Signed-off-by: Kyungmin Park <[email protected]> Signed-off-by: Ben Dooks <[email protected]>
2009-06-22[ARM] S3C64XX: clock.c: fix typo in usb-host clock ctrlbitPeter Korsgaard1-1/+1
The usb-host clock was using the wrong define (the SCLK enable for the usb-host-bus) to change the HCLK register instead of the HCLK_UHOST bit. Signed-off-by: Peter Korsgaard <[email protected]> Signed-off-by: Ben Dooks <[email protected]>
2009-06-22[ARM] S3C64XX: fix HCLK gate definesPeter Korsgaard1-5/+5
A few typos seems to have sneaked into the HCLK gate defines, causing the usb host clock to not get enabled. Fix them according to the reference manual and throw in the 3d accel bit for good measure. Signed-off-by: Peter Korsgaard <[email protected]> Signed-off-by: Ben Dooks <[email protected]>
2009-06-22ALSA: ctxfi - Add PM supportWai Yew CHAY9-107/+354
Added the suspend/resume support to ctxfi driver. The team tested on the following seems ok: AMD Athlon 64 3500+ / ASUS A8N-E / 512MB DDR ATI / Radeon X1300 20k1 & 20k2 cards Signed-off-by: Wai Yew CHAY <[email protected]> Singed-off-by: Ryan RICHARDS <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2009-06-22netfilter: xt_rateest: fix comparison with selfPatrick McHardy1-1/+1
As noticed by T�r�k Edwin <[email protected]>: Compiling the kernel with clang has shown this warning: net/netfilter/xt_rateest.c:69:16: warning: self-comparison always results in a constant value ret &= pps2 == pps2; ^ Looking at the code: if (info->flags & XT_RATEEST_MATCH_BPS) ret &= bps1 == bps2; if (info->flags & XT_RATEEST_MATCH_PPS) ret &= pps2 == pps2; Judging from the MATCH_BPS case it seems to be a typo, with the intention of comparing pps1 with pps2. http://bugzilla.kernel.org/show_bug.cgi?id=13535 Signed-off-by: Patrick McHardy <[email protected]>
2009-06-22netfilter: xt_quota: fix incomplete initializationJan Engelhardt1-0/+1
Commit v2.6.29-rc5-872-gacc738f ("xtables: avoid pointer to self") forgot to copy the initial quota value supplied by iptables into the private structure, thus counting from whatever was in the memory kmalloc returned. Signed-off-by: Jan Engelhardt <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2009-06-22netfilter: nf_log: fix direct userspace memory access in proc handlerPatrick McHardy1-5/+11
Signed-off-by: Patrick McHardy <[email protected]>
2009-06-22netfilter: fix some sparse endianess warningsPatrick McHardy2-8/+8
net/netfilter/xt_NFQUEUE.c:46:9: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:46:9: expected unsigned int [unsigned] [usertype] ipaddr net/netfilter/xt_NFQUEUE.c:46:9: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:68:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:68:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:68:10: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:69:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:69:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:69:10: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:70:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:70:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:70:10: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:71:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:71:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:71:10: got restricted unsigned int net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types) net/netfilter/xt_cluster.c:20:55: expected unsigned int net/netfilter/xt_cluster.c:20:55: got restricted unsigned int const [usertype] ip net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types) net/netfilter/xt_cluster.c:20:55: expected unsigned int net/netfilter/xt_cluster.c:20:55: got restricted unsigned int const [usertype] ip Signed-off-by: Patrick McHardy <[email protected]>
2009-06-22netfilter: nf_conntrack: fix conntrack lookup racePatrick McHardy1-2/+4
The RCU protected conntrack hash lookup only checks whether the entry has a refcount of zero to decide whether it is stale. This is not sufficient, entries are explicitly removed while there is at least one reference left, possibly more. Explicitly check whether the entry has been marked as dying to fix this. Signed-off-by: Patrick McHardy <[email protected]>
2009-06-22netfilter: nf_conntrack: fix confirmation race conditionPatrick McHardy1-1/+8
New connection tracking entries are inserted into the hash before they are fully set up, namely the CONFIRMED bit is not set and the timer not started yet. This can theoretically lead to a race with timer, which would set the timeout value to a relative value, most likely already in the past. Perform hash insertion as the final step to fix this. Signed-off-by: Patrick McHardy <[email protected]>
2009-06-22netfilter: nf_conntrack: death_by_timeout() fixEric Dumazet1-2/+8
death_by_timeout() might delete a conntrack from hash list and insert it in dying list. nf_ct_delete_from_lists(ct); nf_ct_insert_dying_list(ct); I believe a (lockless) reader could *catch* ct while doing a lookup and miss the end of its chain. (nulls lookup algo must check the null value at the end of lookup and should restart if the null value is not the expected one. cf Documentation/RCU/rculist_nulls.txt for details) We need to change nf_conntrack_init_net() and use a different "null" value, guaranteed not being used in regular lists. Choose very large values, since hash table uses [0..size-1] null values. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2009-06-22[S390] Update default configuration.Martin Schwidefsky1-18/+54
Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] kprobes: defer setting of ctlblk stateHeiko Carstens1-11/+7
get_krobe_ctlblk returns a per cpu kprobe control block which holds the state of the current cpu wrt to kprobe. When inserting/removing a kprobe the state of the cpu which replaces the code is changed to KPROBE_SWAP_INST. This however is done when preemption is still enabled. So the state of the current cpu doesn't necessarily reflect the real state. To fix this move the code that changes the state to non-preemptible context. Reported-by: Ananth N Mavinakayanahalli <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] Enable tick based perf_counter on s390.Martin Schwidefsky3-0/+15
Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] dasd: fix refcounting in dasd_change_stateSebastian Ott1-3/+5
To set a dasd online dasd_change_state is called twice. The first cycle will schedule initial analysis of the device, set the rc to -EAGAIN and will not touch the device state any more. The initial analysis will in turn call dasd_change_state to increase the state to the final DASD_STATE_ONLINE. If the dasd_change_state on the second thread outruns the other one both finish with the state set to DASD_STATE_ONLINE and the device refcount will be decreased by 2. Fix this by leaving dasd_change_state on rc == -EAGAIN so that the refcount will always be decreased by 1. Signed-off-by: Sebastian Ott <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] lockless idle time accountingMartin Schwidefsky3-17/+35
Replace the spinlock used in the idle time accounting with a sequence counter mechanism analog to seqlock. Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] driver_data accessMartin Schwidefsky7-13/+13
Replace the remaining direct accesses to the driver_data pointer with calls to the dev_get_drvdata() and dev_set_drvdata() functions. Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] pm: fix build error for !SMPHeiko Carstens1-2/+4
Fix build error for !SMP: arch/s390/power/built-in.o: In function `swsusp_arch_resume': (.text+0x1b4): undefined reference to `smp_get_phys_cpu_id' arch/s390/power/built-in.o: In function `swsusp_arch_resume': (.text+0x288): undefined reference to `smp_switch_boot_cpu_in_resume' make: *** [.tmp_vmlinux1] Error 1 Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] dasd_pm: fix stop flag handlingStefan Haberland2-10/+12
The stop flags are handled in the generic restore function so the stop flag is removed also for FBA and DIAG devices. Signed-off-by: Stefan Haberland <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] ap/zcrypt: Suspend/Resume ap bus and zcryptFelix Beck1-2/+83
Add Suspend/Resume support to ap bus and zcrypt. All enhancements are done in the ap bus. No changes in the crypto card specific part are necessary. Signed-off-by: Felix Beck <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] qdio: Sanitize do_QDIO sanity checksJan Glauber2-8/+3
Remove unneeded sanity checks from do_QDIO since this is the hot path. Change the type of bufnr and count to unsigned int so the check for the maximum value works. Reported-by: Roel Kluin <[email protected]> Signed-off-by: Jan Glauber <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] qdio: leave inbound SBALs primedJan Glauber1-7/+0
It is not required to change the state of primed SBALs. Leaving them primed saves a SQBS instruction under z/VM. Signed-off-by: Jan Glauber <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] qdio: merge AI tasklet into interrupt handlerJan Glauber1-44/+21
Since the adapter interrupt tasklet only schedules the queue tasklets and contains no code that requires serialization in can be merged with the adapter interrupt handler. That possibly safes some CPU cycles. Signed-off-by: Jan Glauber <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] qdio: extract all primed SBALs at onceJan Glauber1-28/+6
For devices without QIOASSIST primed SBALS were extracted in a loop. Remove the loop since get_buf_states can already return more than one primed SBAL. Signed-off-by: Jan Glauber <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] qdio: fix check for running under z/VMJan Glauber1-35/+13
The check whether qdio runs under z/VM was incorrect since SIGA-Sync is not set if the device runs with QIOASSIST. Use MACHINE_IS_VM instead to prevent polling under z/VM. Merge qdio_inbound_q_done and tiqdio_is_inbound_q_done. Signed-off-by: Jan Glauber <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] qdio: move adapter interrupt tasklet codeJan Glauber4-80/+75
Move the adapter interrupt tasklet function to the qdio main code since all the functions used by the tasklet are located there. Signed-off-by: Jan Glauber <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] Use del_timer instead of del_timer_syncMichael Holzheu1-1/+1
When syncing the sclp console queue, we call del_timer_sync() while holding the "sclp_con_lock" spinlock. This lock is also taken in the timer function "sclp_console_timeout". Therefore the sync version of del_timer() cannot be used here. Because the synchronous deletion of the timer is only needed in the suspend callback and in that case only one CPU is remaining and therefore it is not possible that the timer function is running in parallel, we can safely use del_timer() instead of del_timer_sync(). Signed-off-by: Michael Holzheu <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] s390: remove DEBUG_MALLOCPekka Enberg1-9/+0
The kernel now has kmemleak and kmemtrace so there's no reason to keep this ugly s390 hack around. I am not sure how it's supposed to work on SMP anyway as it uses a global variable to temporarily store the return value of all kmalloc() calls: void *b; #define kmalloc(x...) (PRINT_INFO(" kmalloc %p\n",b=kmalloc(x)),b) Cc: <[email protected]> Cc: Heiko Carstens <[email protected]> Signed-off-by: Pekka Enberg <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] vt220 console: convert from bootmem to slabHeiko Carstens1-13/+5
The slab allocator is earlier available so convert the bootmem allocations to slab/gfp allocations. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] sclp console: convert from bootmem to slabHeiko Carstens1-3/+2
The slab allocator is earlier available so convert the bootmem allocations to slab/gfp allocations. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] 3270 console: convert from bootmem to slabHeiko Carstens2-38/+7
The slab allocator is earlier available so convert the bootmem allocations to slab/gfp allocations. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] 3215 console: convert from bootmem to slabHeiko Carstens1-11/+7
The slab allocator is earlier available so convert the bootmem allocations to slab/gfp allocations. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22[S390] time: convert from bootmem to slabHeiko Carstens2-16/+5
The slab allocator is earlier available so convert the bootmem allocations to slab/gfp allocations. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2009-06-22xtensa: enable m41t80 driver in s6105_defconfigDaniel Glockner1-3/+53
Signed-off-by: Daniel Glockner <[email protected]> Cc: David Brownell <[email protected]> Cc: Alessandro Zummo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2009-06-22xtensa: add m41t62 rtc to s6105 platformDaniel Glockner1-0/+3
Signed-off-by: Daniel Glockner <[email protected]> Cc: David Brownell <[email protected]> Cc: Alessandro Zummo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2009-06-22xtensa: enable s6gmac in s6105_defconfigDaniel Glockner1-1/+48
Signed-off-by: Daniel Glockner <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oskar Schirmer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2009-06-22xtensa: s6105 specific configuration for s6gmacOskar Schirmer2-0/+100
Platform-specific configuration for the s6gmac driver, including the PHY interrupt line. Signed-off-by: Daniel Glockner <[email protected]> Signed-off-by: Oskar Schirmer <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2009-06-22s6gmac: xtensa s6000 on-chip ethernet driverOskar Schirmer3-0/+1085
The s6000 on-chip MAC supports 10/100/1000Mbit and is connected to an external PHY via MII or RGMII interface. [[email protected]: don't use device->bus_id directly] Signed-off-by: Oskar Schirmer <[email protected]> Signed-off-by: Daniel Glockner <[email protected]> Acked-by: "David S. Miller" <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Chris Zankel <[email protected]>
2009-06-22xtensa: support s6000 gpio irqs and alternate function selectionDaniel Glöckner5-10/+171
Implement an irq chip to handle interrupts via gpio. The GPIO chip initialization function now takes a bitmask denoting pins that should be configured for their alternate function. changes compared to v1: - fixed bug on edge interrupt configuration - accommodated to function name change - moved definition of VARIANT_NR_IRQS to this patch - renamed __XTENSA_S6000_IRQ_H to _XTENSA_S6000_IRQ_H as requested Signed-off-by: Daniel Glöckner <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Chris Zankel <[email protected]>
2009-06-22xtensa: s6000 dma engine supportOskar Schirmer3-1/+561
There are four slightly different dma engines on the s6000 family. One for memory-memory transfers, the other three for memory-device. This patch implements a platform-specific kernel-API to control these engines. It is needed for the network, video, audio peripherals on s6000. Signed-off-by: Oskar Schirmer <[email protected]> Signed-off-by: Daniel Glockner <[email protected]> Signed-off-by: Fabian Godehardt <[email protected]> Cc: Daniel Glockner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Chris Zankel <[email protected]>
2009-06-22xtensa: allow variant to initialize own irq chipsDaniel Glöckner2-1/+13
There was already a PLATFORM_NR_IRQS define, which is now accompanied by a VARIANT_NR_IRQS. To be able to initialize these interrupts, init_IRQ now calls a variant specific hook. Changes compared to v1: - adapted to new CONFIG_VARIANT_IRQ_EXT - removed definition and call of platform_init_IRQ as there already is a platform_init_irq defined in asm/platform.h with a weak default in kernel/platform.c - renamed variant_init_IRQ to variant_init_irq Note that I could not find the call site of platform_init_irq although it is stated in platform.h that it is called from init_IRQ. Signed-off-by: Daniel Glöckner <[email protected]> Signed-off-by: Chris Zankel <[email protected]>
2009-06-22xtensa: cache inquiry and unaligned cache handling functionsOskar Schirmer1-0/+95
The existing xtensa cache handling functions work on page-aligned memory regions. These functions are needed for the s6000 dma engine which can work on a byte-granularity. Signed-off-by: Oskar Schirmer <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Daniel Glockner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Chris Zankel <[email protected]>
2009-06-22dm mpath: change to be request basedKiyoshi Ueda1-65/+128
This patch converts dm-multipath target to request-based from bio-based. Basically, the patch just converts the I/O unit from struct bio to struct request. In the course of the conversion, it also changes the I/O queueing mechanism. The change in the I/O queueing is described in details as follows. I/O queueing mechanism change ----------------------------- In I/O submission, map_io(), there is no mechanism change from bio-based, since the clone request is ready for retry as it is. However, in I/O complition, do_end_io(), there is a mechanism change from bio-based, since the clone request is not ready for retry. In do_end_io() of bio-based, the clone bio has all needed memory for resubmission. So the target driver can queue it and resubmit it later without memory allocations. The mechanism has almost no overhead. On the other hand, in do_end_io() of request-based, the clone request doesn't have clone bios, so the target driver can't resubmit it as it is. To resubmit the clone request, memory allocation for clone bios is needed, and it takes some overheads. To avoid the overheads just for queueing, the target driver doesn't queue the clone request inside itself. Instead, the target driver asks dm core for queueing and remapping the original request of the clone request, since the overhead for queueing is just a freeing memory for the clone request. As a result, the target driver doesn't need to record/restore the information of the original request for resubmitting the clone request. So dm_bio_details in dm_mpath_io is removed. multipath_busy() --------------------- The target driver returns "busy", only when the following case: o The target driver will map I/Os, if map() function is called and o The mapped I/Os will wait on underlying device's queue due to their congestions, if map() function is called now. In other cases, the target driver doesn't return "busy". Otherwise, dm core will keep the I/Os and the target driver can't do what it wants. (e.g. the target driver can't map I/Os now, so wants to kill I/Os.) Signed-off-by: Kiyoshi Ueda <[email protected]> Signed-off-by: Jun'ichi Nomura <[email protected]> Acked-by: Hannes Reinecke <[email protected]> Signed-off-by: Alasdair G Kergon <[email protected]>
2009-06-22dm: disable interrupt when taking map_lockKiyoshi Ueda1-6/+9
This patch disables interrupt when taking map_lock to avoid lockdep warnings in request-based dm. request-based dm takes map_lock after taking queue_lock with disabling interrupt: spin_lock_irqsave(queue_lock) q->request_fn() == dm_request_fn() => dm_get_table() => read_lock(map_lock) while queue_lock could be (but isn't) taken in interrupt context. Signed-off-by: Kiyoshi Ueda <[email protected]> Signed-off-by: Jun'ichi Nomura <[email protected]> Acked-by: Christof Schmitt <[email protected]> Acked-by: Hannes Reinecke <[email protected]> Signed-off-by: Alasdair G Kergon <[email protected]>
2009-06-22dm: do not set QUEUE_ORDERED_DRAIN if request basedKiyoshi Ueda3-1/+16
Request-based dm doesn't have barrier support yet. So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm. Since the device type is decided at the first table loading time, the flag set is deferred until then. Signed-off-by: Kiyoshi Ueda <[email protected]> Signed-off-by: Jun'ichi Nomura <[email protected]> Acked-by: Hannes Reinecke <[email protected]> Signed-off-by: Alasdair G Kergon <[email protected]>
2009-06-22dm: enable request based optionKiyoshi Ueda4-26/+285
This patch enables request-based dm. o Request-based dm and bio-based dm coexist, since there are some target drivers which are more fitting to bio-based dm. Also, there are other bio-based devices in the kernel (e.g. md, loop). Since bio-based device can't receive struct request, there are some limitations on device stacking between bio-based and request-based. type of underlying device bio-based request-based ---------------------------------------------- bio-based OK OK request-based -- OK The device type is recognized by the queue flag in the kernel, so dm follows that. o The type of a dm device is decided at the first table binding time. Once the type of a dm device is decided, the type can't be changed. o Mempool allocations are deferred to at the table loading time, since mempools for request-based dm are different from those for bio-based dm and needed mempool type is fixed by the type of table. o Currently, request-based dm supports only tables that have a single target. To support multiple targets, we need to support request splitting or prevent bio/request from spanning multiple targets. The former needs lots of changes in the block layer, and the latter needs that all target drivers support merge() function. Both will take a time. Signed-off-by: Kiyoshi Ueda <[email protected]> Signed-off-by: Jun'ichi Nomura <[email protected]> Signed-off-by: Alasdair G Kergon <[email protected]>
2009-06-22dm: prepare for request based optionKiyoshi Ueda4-4/+725
This patch adds core functions for request-based dm. When struct mapped device (md) is initialized, md->queue has an I/O scheduler and the following functions are used for request-based dm as the queue functions: make_request_fn: dm_make_request() pref_fn: dm_prep_fn() request_fn: dm_request_fn() softirq_done_fn: dm_softirq_done() lld_busy_fn: dm_lld_busy() Actual initializations are done in another patch (PATCH 2). Below is a brief summary of how request-based dm behaves, including: - making request from bio - cloning, mapping and dispatching request - completing request and bio - suspending md - resuming md bio to request ============== md->queue->make_request_fn() (dm_make_request()) calls __make_request() for a bio submitted to the md. Then, the bio is kept in the queue as a new request or merged into another request in the queue if possible. Cloning and Mapping =================== Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()), when requests are dispatched after they are sorted by the I/O scheduler. dm_request_fn() checks busy state of underlying devices using target's busy() function and stops dispatching requests to keep them on the dm device's queue if busy. It helps better I/O merging, since no merge is done for a request once it is dispatched to underlying devices. Actual cloning and mapping are done in dm_prep_fn() and map_request() called from dm_request_fn(). dm_prep_fn() clones not only request but also bios of the request so that dm can hold bio completion in error cases and prevent the bio submitter from noticing the error. (See the "Completion" section below for details.) After the cloning, the clone is mapped by target's map_rq() function and inserted to underlying device's queue using blk_insert_cloned_request(). Completion ========== Request completion can be hooked by rq->end_io(), but then, all bios in the request will have been completed even error cases, and the bio submitter will have noticed the error. To prevent the bio completion in error cases, request-based dm clones both bio and request and hooks both bio->bi_end_io() and rq->end_io(): bio->bi_end_io(): end_clone_bio() rq->end_io(): end_clone_request() Summary of the request completion flow is below: blk_end_request() for a clone request => blk_update_request() => bio->bi_end_io() == end_clone_bio() for each clone bio => Free the clone bio => Success: Complete the original bio (blk_update_request()) Error: Don't complete the original bio => blk_finish_request() => rq->end_io() == end_clone_request() => blk_complete_request() => dm_softirq_done() => Free the clone request => Success: Complete the original request (blk_end_request()) Error: Requeue the original request end_clone_bio() completes the original request on the size of the original bio in successful cases. Even if all bios in the original request are completed by that completion, the original request must not be completed yet to keep the ordering of request completion for the stacking. So end_clone_bio() uses blk_update_request() instead of blk_end_request(). In error cases, end_clone_bio() doesn't complete the original bio. It just frees the cloned bio and gives over the error handling to end_clone_request(). end_clone_request(), which is called with queue lock held, completes the clone request and the original request in a softirq context (dm_softirq_done()), which has no queue lock, to avoid a deadlock issue on submission of another request during the completion: - The submitted request may be mapped to the same device - Request submission requires queue lock, but the queue lock has been held by itself and it doesn't know that The clone request has no clone bio when dm_softirq_done() is called. So target drivers can't resubmit it again even error cases. Instead, they can ask dm core for requeueing and remapping the original request in that cases. suspend ======= Request-based dm uses stopping md->queue as suspend of the md. For noflush suspend, just stops md->queue. For flush suspend, inserts a marker request to the tail of md->queue. And dispatches all requests in md->queue until the marker comes to the front of md->queue. Then, stops dispatching request and waits for the all dispatched requests to complete. After that, completes the marker request, stops md->queue and wake up the waiter on the suspend queue, md->wait. resume ====== Starts md->queue. Signed-off-by: Kiyoshi Ueda <[email protected]> Signed-off-by: Jun'ichi Nomura <[email protected]> Signed-off-by: Alasdair G Kergon <[email protected]>