aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2010-10-20fs/ceph/xattr.c: Use kmemdupJulia Lawall1-2/+1
Convert a sequence of kmalloc and memcpy to use kmemdup. The semantic patch that performs this transformation is: (http://coccinelle.lip6.fr/) // <smpl> @@ expression a,flag,len; expression arg,e1,e2; statement S; @@ a = - \(kmalloc\|kzalloc\)(len,flag) + kmemdup(arg,len,flag) <... when != a if (a == NULL || ...) S ...> - memcpy(a,arg,len+1); // </smpl> Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20rbd: passing wrong variable to bvec_kunmap_irq()Dan Carpenter1-1/+1
We should be passing "buf" here insead of "bv". This is tricky because it's not the same as kmap() and kunmap(). GCC does warn about it if you compile on i386 with CONFIG_HIGHMEM. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20rbd: null vs ERR_PTRDan Carpenter1-2/+2
ceph_alloc_page_vector() returns ERR_PTR(-ENOMEM) on errors. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: fix num_pages_free accounting in pagelistSage Weil1-0/+1
Decrement the free page counter when removing a page from the free_list. Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl.Greg Farnum3-1/+70
Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: don't crash when passed bad mount optionsYehuda Sadeh1-1/+1
This only happened when parse_extra_token was not passed to ceph_parse_option() (hence, only happened in rbd). Signed-off-by: Yehuda Sadeh <[email protected]>
2010-10-20ceph: fix debugfs warningsRandy Dunlap1-1/+2
Include "super.h" outside of CONFIG_DEBUG_FS to eliminate a compiler warning: fs/ceph/debugfs.c:266: warning: 'struct ceph_fs_client' declared inside parameter list fs/ceph/debugfs.c:266: warning: its scope is only this definition or declaration, which is probably not what you want fs/ceph/debugfs.c:271: warning: 'struct ceph_fs_client' declared inside parameter list Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Yehuda Sadeh <[email protected]>
2010-10-20block: rbd: removing unnecessary testYehuda Sadeh1-4/+0
rbd_get_segment() can't return a negative value, we don't need to check the return output. Signed-off-by: Yehuda Sadeh <[email protected]>
2010-10-20block: rbd: fixed may leaksVasiliy Kulikov1-6/+8
rbd_client_create() doesn't free rbdc, this leads to many leaks. seg_len in rbd_do_op() is unsigned, so (seg_len < 0) makes no sense. Also if fixed check fails then seg_name is leaked. Signed-off-by: Vasiliy Kulikov <[email protected]> Signed-off-by: Yehuda Sadeh <[email protected]>
2010-10-20ceph: switch from BKL to lock_flocks()Sage Weil1-5/+6
Switch from using the BKL explicitly to the new lock_flocks() interface. Eventually this will turn into a spinlock. Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: preallocate flock state without locks heldGreg Farnum2-15/+44
When the lock_kernel() turns into lock_flocks() and a spinlock, we won't be able to do allocations with the lock held. Preallocate space without the lock, and retry if the lock state changes out from underneath us. Signed-off-by: Greg Farnum <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursorGreg Farnum2-9/+118
These facilitate preallocation of pages so that we can encode into the pagelist in an atomic context. Signed-off-by: Greg Farnum <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: use mapping->nrpages to determine if mapping is emptySage Weil1-12/+1
This is simpler and faster. Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: only invalidate on check_caps if we actually have pagesSage Weil1-1/+1
The i_rdcache_gen value only implies we MAY have cached pages; actually check the mapping to see if it's worth bothering with an invalidate. Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: do not hide .snap in root directorySage Weil1-1/+0
Snaps in the root directory are now supported by the MDS, and harmless on older versions. Signed-off-by: Sage Weil <[email protected]>
2010-10-20rbd: introduce rados block device (rbd), based on libcephYehuda Sadeh6-2/+1944
The rados block device (rbd), based on osdblk, creates a block device that is backed by objects stored in the Ceph distributed object storage cluster. Each device consists of a single metadata object and data striped over many data objects. The rbd driver supports read-only snapshots. Signed-off-by: Yehuda Sadeh <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: factor out libceph from Ceph file systemYehuda Sadeh73-1838/+2566
This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph-rbd: osdc support for osd call and rollback operationsYehuda Sadeh3-0/+29
This will be used for rbd snapshots administration. Signed-off-by: Yehuda Sadeh <[email protected]>
2010-10-20ceph: messenger and osdc changes for rbdYehuda Sadeh6-101/+436
Allow the messenger to send/receive data in a bio. This is added so that we wouldn't need to copy the data into pages or some other buffer when doing IO for an rbd block device. We can now have trailing variable sized data for osd ops. Also osd ops encoding is more modular. Signed-off-by: Yehuda Sadeh <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: refactor osdc requests creation functionsYehuda Sadeh2-57/+155
The osd requests creation are being decoupled from the vino parameter, allowing clients using the osd to use other arbitrary object names that are not necessarily vino based. Also, calc_raw_layout now takes a snap id. Signed-off-by: Yehuda Sadeh <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: lookup pool in osdmap by nameYehuda Sadeh2-0/+15
Implement a pool lookup by name. This will be used by rbd. Signed-off-by: Yehuda Sadeh <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20x86: Spread tlb flush vector between nodesShaohua Li1-1/+47
Currently flush tlb vector allocation is based on below equation: sender = smp_processor_id() % 8 This isn't optimal, CPUs from different node can have the same vector, this causes a lot of lock contention. Instead, we can assign the same vectors to CPUs from the same node, while different node has different vectors. This has below advantages: a. if there is lock contention, the lock contention is between CPUs from one node. This should be much cheaper than the contention between nodes. b. completely avoid lock contention between nodes. This especially benefits kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity to specific node. In my test, this could reduce > 20% CPU overhead in extreme case.The test machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd to the first CPU of the node. I run a workload with 4 sequential mmap file read thread. The files are empty sparse file. This workload will trigger a lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the extreme tlb flush lock contention because otherwise kswapd keeps migrating between CPUs of a node and I can't get stable result. Sure in real workload, we can't always see so big tlb flush lock contention, but it's possible. [ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ] Signed-off-by: Shaohua Li <[email protected]> LKML-Reference: <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-10-20percpu: Introduce a read-mostly percpu APIShaohua Li2-0/+13
Add a new readmostly percpu section and API. This can be used to avoid dirtying data lines which are generally not written to, which is especially important for data which may be accessed by processors other than the one for which the percpu area belongs to. [ hpa: moved it *after* the page-aligned section, for obvious reasons. ] Signed-off-by: Shaohua Li <[email protected]> LKML-Reference: <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-10-20Linux 2.6.36Linus Torvalds1-1/+1
2010-10-20Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linusLinus Torvalds7-9/+14
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: MIPS: O32 compat/N32: Fix to use compat syscall wrappers for AIO syscalls. MAINTAINERS: Change list for ioc_serial to linux-serial. SERIAL: ioc3_serial: Return -ENOMEM on memory allocation failure MIPS: jz4740: Fix Kbuild Platform file. MIPS: Repair Kbuild make clean breakage.
2010-10-20virtio: console: Don't block entire guest if host doesn't read dataAmit Shah1-3/+14
If the host is slow in reading data or doesn't read data at all, blocking write calls not only blocked the program that called write() but the entire guest itself. To overcome this, let's not block till the host signals it has given back the virtio ring element we passed it. Instead, send the buffer to the host and return to userspace. This operation then becomes similar to how non-blocking writes work, so let's use the existing code for this path as well. This code change also ensures blocking write calls do get blocked if there's not enough room in the virtio ring as well as they don't return -EAGAIN to userspace. Signed-off-by: Amit Shah <[email protected]> Acked-by: Hans de Goede <[email protected]> CC: [email protected] Signed-off-by: Rusty Russell <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6Linus Torvalds2-3/+3
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: [SCSI] bsg: fix incorrect device_status value [SCSI] Fix VPD inquiry page wrapper
2010-10-20x86, mm: Fix incorrect data type in vmalloc_sync_all()Borislav Petkov1-1/+1
arch/x86/mm/fault.c: In function 'vmalloc_sync_all': arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast introduced by 617d34d9e5d8326ec8f188c616aa06ac59d083fe. Signed-off-by: Borislav Petkov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-10-20spi/omap2_mcspi: Verify TX reg is empty after TX only xfer with DMAIlkka Koskinen1-13/+26
In case of TX only with DMA, the driver assumes that the data has been transferred once DMA callback in invoked. However, SPI's shift register may still contain data. Thus, the driver is supposed to verify that the register is empty and the end of the SPI transfer has been reached. Signed-off-by: Ilkka Koskinen <[email protected]> Tested-by: Tuomas Katila <[email protected]> Acked-by: Tony Lindgren <[email protected]> Signed-off-by: Grant Likely <[email protected]>
2010-10-20spi/omap2_mcspi: disable channel after TX_ONLY transfer in PIO modeJason Wang1-0/+6
In the TX_ONLY transfer, the SPI controller also receives data simultaneously and saves them in the rx register. After the TX_ONLY transfer, the rx register will hold the random data received during the last tx transaction. If the direct following transfer is RX_ONLY, this random data has the possibility to affect this transfer like this: When the SPI controller is changed from TX_ONLY to RX_ONLY, the random data makes the rx register full immediately and triggers a dummy write automatically(in SPI RX_ONLY transfers, we need a dummy write to trigger the first transaction). So the first data received in the RX_ONLY transfer will be that random data instead of something meaningful. We can avoid this by inserting a Disable/Re-enable toggle of the channel after the TX_ONLY transfer, since it purges the rx register. Signed-off-by: Jason Wang <[email protected]> Tested-by: Grazvydas Ignotas <[email protected]> Acked-by: Tony Lindgren <[email protected]> Signed-off-by: Grant Likely <[email protected]>
2010-10-20Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-44/+19
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix fs/gs reload oops with invalid ldt
2010-10-20arm: remove machine_desc.io_pg_offst and .phys_ioNicolas Pitre361-852/+1
Since we're now using addruart to establish the debug mapping, we can remove the io_pg_offst and phys_io members of struct machine_desc. The various declarations were removed using the following script: grep -rl MACHINE_START arch/arm | xargs \ sed -i '/MACHINE_START/,/MACHINE_END/ { /\.\(phys_io\|io_pg_offst\)/d }' [ Initial patch was from Jeremy Kerr, example script from Russell King ] Signed-off-by: Nicolas Pitre <[email protected]> Acked-by: Eric Miao <eric.miao at canonical.com>
2010-10-20arm: use addruart macro to establish debug mappingsJeremy Kerr1-4/+19
Since we can get both physical and virtual addresses from the addruart macro, we can use this to establish the debug mappings. In the case of CONFIG_DEBUG_ICEDCC, we don't need any mappings, but may still need to setup r7 correctly. Incorporating ASM changes from Nicolas Pitre <[email protected]>. Signed-off-by: Jeremy Kerr <[email protected]> Tested-by: Kevin Hilman <[email protected]>
2010-10-20arm: return both physical and virtual addresses from addruartJeremy Kerr56-454/+440
Rather than checking the MMU status in every instance of addruart, do it once in kernel/debug.S, and change the existing addruart macros to return both physical and virtual addresses. The main debug code can then select the appropriate address to use. This will also allow us to retreive the address of a uart for the MMU state that we're not current in. Updated with fixes for OMAP from Jason Wang <[email protected]> and Tony Lindgren <[email protected]>, and fix for versatile express from Lorenzo Pieralisi <[email protected]>. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Signed-off-by: Jason Wang <[email protected]> Signed-off-by: Tony Lindgren <[email protected]> Tested-by: Kevin Hilman <[email protected]>
2010-10-20arm/debug: consolidate addruart macros for CONFIG_DEBUG_ICEDCCJeremy Kerr1-11/+2
We have the same (empty) macro for all IDEDCC flavours, so consolidate it to one. Signed-off-by: Jeremy Kerr <[email protected]>
2010-10-20ARM: make struct machine_desc definition coherent with its commentNicolas Pitre1-1/+2
As mentioned in the comment right at the top, the first four fields are directly accessed by assembly code in head.S. Move nr_irqs so the comment is true again. Signed-off-by: Nicolas Pitre <[email protected]>
2010-10-20apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsetsRobert Richter4-41/+154
We want the BIOS to setup the EILVT APIC registers. The offsets were hardcoded and BIOS settings were overwritten by the OS. Now, the subsystems for MCE threshold and IBS determine the LVT offset from the registers the BIOS has setup. If the BIOS setup is buggy on a family 10h system, a workaround enables IBS. If the OS determines an invalid register setup, a "[Firmware Bug]: " error message is reported. We need this change also for upcomming cpu families. Signed-off-by: Robert Richter <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-10-20apic, x86: Check if EILVT APIC registers are available (AMD only)Robert Richter2-9/+75
This patch implements checks for the availability of LVT entries (APIC500-530) and reserves it if used. The check becomes necessary since we want to let the BIOS provide the LVT offsets. The offsets should be determined by the subsystems using it like those for MCE threshold or IBS. On K8 only offset 0 (APIC500) and MCE interrupts are supported. Beginning with family 10h at least 4 offsets are available. Since offsets must be consistent for all cores, we keep track of the LVT offsets in software and reserve the offset for the same vector also to be used on other cores. An offset is freed by setting the entry to APIC_EILVT_MASKED. If the BIOS is right, there should be no conflicts. Otherwise a "[Firmware Bug]: ..." error message is generated. However, if software does not properly determines the offsets, it is not necessarily a BIOS bug. Signed-off-by: Robert Richter <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-10-20Merge branch 'linus' into irq/coreIngo Molnar331-1493/+2628
Merge reason: update to almost-final-.36 Signed-off-by: Ingo Molnar <[email protected]>
2010-10-19PM / Wakeup: Show wakeup sources statistics in debugfsRafael J. Wysocki1-0/+85
There may be wakeup sources that aren't associated with any devices and their statistics information won't be available from sysfs. Also, for debugging purposes it is convenient to have all of the wakeup sources statistics available from one place. For these reasons, introduce new file "wakeup_sources" in debugfs containing those statistics. Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]>
2010-10-19Merge branch 'devel-stable' into develRussell King685-5907/+21747
2010-10-19Merge branch 'for-rmk' of git://git.pengutronix.de/git/imx/linux-2.6 into ↵Russell King33-190/+1101
devel-stable
2010-10-19x86, mm: Hold mm->page_table_lock while doing vmalloc_syncJeremy Fitzhardinge4-4/+36
Take mm->page_table_lock while syncing the vmalloc region. This prevents a race with the Xen pagetable pin/unpin code, which expects that the page_table_lock is already held. If this race occurs, then Xen can see an inconsistent page type (a page can either be read/write or a pagetable page, and pin/unpin converts it between them), which will cause either the pin or the set_p[gm]d to fail; either will crash the kernel. vmalloc_sync_all() should be called rarely, so this extra use of page_table_lock should not interfere with its normal users. The mm pointer is stashed in the pgd page's index field, as that won't be otherwise used for pgds. Reported-by: Ian Campbell <[email protected]> Originally-by: Jan Beulich <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Jeremy Fitzhardinge <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-10-19x86, mm: Fix bogus whitespace in sync_global_pgds()Jeremy Fitzhardinge1-22/+22
Whitespace cleanup only. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-10-19Merge branch 'for-rmk' of ↵Russell King244-2950/+5277
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into devel-stable Conflicts: arch/arm/mach-at91/include/mach/system.h arch/arm/mach-imx/mach-cpuimx27.c AT91 conflict resolution: Acked-by: Anders Larsen <[email protected]> IMX conflict resolution confirmed by Uwe Kleine-König.
2010-10-19Merge branch 'msm-core' of ↵Russell King42-250/+5977
git://codeaurora.org/quic/kernel/dwalker/linux-msm into devel-stable
2010-10-19Merge branch 'tip/perf/core' of ↵Ingo Molnar6-131/+373
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2010-10-19MIPS: O32 compat/N32: Fix to use compat syscall wrappers for AIO syscalls.Michel Thebeau2-6/+6
[Ralf: Michel's original patch only fixed N32; I replicated the same fix for O32.] Signed-off-by: Michel Thebeau <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2010-10-19MAINTAINERS: Change list for ioc_serial to linux-serial.Ralf Baechle1-1/+1
IOC3 is also being used on SGI MIPS systems but this particular driver is only being used on IA64 systems so linux-mips made no sense as a list. Pat also thinks [email protected] is the better list. Signed-off-by: Ralf Baechle <[email protected]>
2010-10-19SERIAL: ioc3_serial: Return -ENOMEM on memory allocation failureJulia Lawall1-0/+1
In this code, 0 is returned on memory allocation failure, even though other failures return -ENOMEM or other similar values. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression ret; expression x,e1,e2,e3; @@ ret = 0 ... when != ret = e1 *x = \(kmalloc\|kcalloc\|kzalloc\)(...) ... when != ret = e2 if (x == NULL) { ... when != ret = e3 return ret; } // </smpl> Signed-off-by: Julia Lawall <[email protected]> To: Pat Gefre <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/1704/ Signed-off-by: Ralf Baechle <[email protected]>