aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2010-10-27ext4: call mpage_da_submit_io() from mpage_da_map_blocks()Theodore Ts'o1-33/+33
Eventually we need to completely reorganize the ext4 writepage callpath, but for now, we simplify things a little by calling mpage_da_submit_io() from mpage_da_map_blocks(), since all of the places where we call mpage_da_map_blocks() it is followed up by a call to mpage_da_submit_io(). We're also a wee bit better with respect to error handling, but there are still a number of issues where it's not clear what the right thing is to do with ext4 functions deep in the writeback codepath fails. Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: use KMEM_CACHE instead of kmem_cache_createTheodore Ts'o2-14/+7
Also remove the SLAB_RECLAIM_ACCOUNT flag from the system zone kmem cache. This slab tends to be fairly static, so it shouldn't be marked as likely to have free pages that can be reclaimed. Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: use search_dirblock() in ext4_dx_find_entry()Theodore Ts'o1-21/+12
Use the search_dirblock() in ext4_dx_find_entry(). It makes the code easier to read, and it takes advantage of common code. It also saves 100 bytes or so of text space. Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: Brad Spengler <[email protected]>
2010-10-27ext4: avoid uninitialized memory references in ext3_htree_next_block()Theodore Ts'o1-15/+15
If the first block of htree directory is missing '.' or '..' but is otherwise a valid directory, and we do a lookup for '.' or '..', it's possible to dereference an uninitialized memory pointer in ext4_htree_next_block(). We avoid this by moving the special case from ext4_dx_find_entry() to ext4_find_entry(); this also means we can optimize ext4_find_entry() slightly when NFS looks up "..". Thanks to Brad Spengler for pointing a Clang warning that led me to look more closely at this code. The warning was harmless, but it was useful in pointing out code that was too ugly to live. This warning was also reported by Roman Borisov. Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: Brad Spengler <[email protected]>
2010-10-27ext4: remove unused ext4_sb_info membersEric Sandeen1-5/+0
Not that these take up a lot of room, but the structure is long enough as it is, and there's no need to confuse people with these various undocumented & unused structure members... Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: queue conversion after adding to inode's completed IO listEric Sandeen1-3/+3
By queuing the io end on the unwritten workqueue before adding it to our inode's list of completed IOs, I think we run the risk of the work getting completed, and the IO freed, before we try to add it to the inode's i_completed_io_list. It should be safe to add it to the inode's list of completed IOs, and -then- queue it for completion, I think. Thanks to Dave Chinner for pointing out the race. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Jiaying Zhang <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: don't use ext4_allocation_contexts for tracingEric Sandeen2-91/+41
Many tracepoints were populating an ext4_allocation_context to pass in, but this requires a slab allocation even when tracepoints are off. In fact, 4 of 5 of these allocations were only for tracing. In addition, we were only using a small fraction of the 144 bytes of this structure for this purpose. We can do away with all these alloc/frees of the ac and simply pass in the bits we care about, instead. I tested this by turning on tracing and running through xfstests on x86_64. I did not actually do anything with the trace output, however. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: fix oops in trace_ext4_mb_release_group_paEric Sandeen1-3/+0
Our QA reported an oops in the ext4_mb_release_group_pa tracing, and Josef Bacik pointed out that it was because we may have a non-null but uninitialized ac_inode in the allocation context. I can reproduce it when running xfstests with ext4 tracepoints on, on a CONFIG_SLAB_DEBUG kernel. We call trace_ext4_mb_release_group_pa from 2 places, ext4_mb_discard_group_preallocations and ext4_mb_discard_lg_preallocations In both cases we allocate an ac as a container just for tracing (!) and never fill in the ac_inode. There's no reason to be assigning, testing, or printing it as far as I can see, so just remove it from the tracepoint. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: fix potential infinite loop in ext4_da_writepages()Toshiyuki Okajima1-1/+1
On linux-2.6.36-rc2, if we execute the following script, we can hang the system when the /bin/sync command is executed: ======================================================================== #!/bin/sh echo -n "HANG UP TEST: " /bin/dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1M 2> /dev/null /sbin/mkfs.ext4 -Fq /tmp/img /bin/mount -o loop -t ext4 /tmp/img /mnt /bin/dd if=/dev/zero of=/mnt/file bs=1 count=1 \ seek=$((16*1024*1024*1024*1024-4096)) 2> /dev/null /bin/sync /bin/umount /mnt echo "DONE" exit 0 ======================================================================== We can see the following backtrace if we get the kdump when this hangup occurs: ====================================================================== kthread() => bdi_writeback_thread() => wb_do_writeback() => wb_writeback() => writeback_inodes_wb() => writeback_sb_inodes() => writeback_single_inode() => ext4_da_writepages() ---+ ^ infinite | | loop | +-------------+ ====================================================================== The reason why this hangup happens is described as follows: 1) We write the last extent block of the file whose size is the filesystem maximum size. 2) "BH_Delay" flag is set on the buffer_head of its block. 3) - the member, "m_lblk" of struct mpage_da_data is 4294967295 (UINT_MAX) - the member, "m_len" of struct mpage_da_data is 1 mpage_put_bnr_to_bhs() which is called via ext4_da_writepages() cannot clear "BH_Delay" flag of the buffer_head because the type of m_lblk is ext4_lblk_t and then m_lblk + m_len is overflow. Therefore an infinite loop occurs because ext4_da_writepages() cannot write the page (which corresponds to the block) since "BH_Delay" flag isn't cleared. ---------------------------------------------------------------------- static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, struct ext4_map_blocks *map) { ... int blocks = map->m_len; ... do { // cur_logical = 4294967295 // map->m_lblk = 4294967295 // blocks = 1 // *** map->m_lblk + blocks == 0 (OVERFLOW!) *** // (cur_logical >= map->m_lblk + blocks) => true if (cur_logical >= map->m_lblk + blocks) break; ---------------------------------------------------------------------- NOTE: Mounting with the nodelalloc option will avoid this codepath, and thus, avoid this hang Signed-off-by: Toshiyuki Okajima <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: improve llseek error handling for overly large seek offsetsToshiyuki Okajima3-2/+45
The llseek system call should return EINVAL if passed a seek offset which results in a write error. What this maximum offset should be depends on whether or not the huge_file file system feature is set, and whether or not the file is extent based or not. If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be written (write systemcall) is different from the maximum size which can be sought (lseek systemcall). For example, the following 2 cases demonstrates the differences between the maximum size which can be written, versus the seek offset allowed by the llseek system call: #1: mkfs.ext3 <dev>; mount -t ext4 <dev> #2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev> Table. the max file size which we can write or seek at each filesystem feature tuning and file flag setting +============+===============================+===============================+ | \ File flag| | | | \ | !EXT4_EXTENTS_FL | EXT4_EXTETNS_FL | |case \| | | +------------+-------------------------------+-------------------------------+ | #1 | write: 2194719883264 | write: -------------- | | | seek: 2199023251456 | seek: -------------- | +------------+-------------------------------+-------------------------------+ | #2 | write: 4402345721856 | write: 17592186044415 | | | seek: 17592186044415 | seek: 17592186044415 | +------------+-------------------------------+-------------------------------+ The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes (= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped maxbytes). Although generic_file_llseek uses only extent-mapped maxbytes. (llseek of ext4_file_operations is generic_file_llseek which uses sb->s_maxbytes.) Therefore we create ext4 llseek function which uses 2 maxbytes. The new own function originates from generic_file_llseek(). If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes. Signed-off-by: Toshiyuki Okajima <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: Andreas Dilger <[email protected]>
2010-10-27ext4: don't update sb journal_devnum when RO devMaciej Żenczykowski1-1/+1
An ext4 filesystem on a read-only device, with an external journal which is at a different device number then recorded in the superblock will fail to honor the read-only setting of the device and trigger a superblock update (write). For example: - ext4 on a software raid which is in read-only mode - external journal on a read-write device which has changed device num - attempt to mount with -o journal_dev=<new_number> - hits BUG_ON(mddev->ro = 1) in md.c Cc: Theodore Ts'o <[email protected]> Signed-off-by: Maciej Żenczykowski <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: use sb_issue_zeroout in ext4_ext_zerooutLukas Czerner1-62/+7
Change ext4_ext_zeroout to use sb_issue_zeroout instead of its own approach to zero out extents. Signed-off-by: Lukas Czerner <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: use sb_issue_zeroout in setup_new_group_blocksLukas Czerner1-33/+13
Use sb_issue_zeroout to zero out inode table and descriptor table blocks instead of old approach which involves journaling. Signed-off-by: Lukas Czerner <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: add interface to advertise ext4 features in sysfsLukas Czerner3-11/+65
User-space should have the opportunity to check what features doest ext4 support in each particular copy. This adds easy interface by creating new "features" directory in sys/fs/ext4/. In that directory files advertising feature names can be created. Add lazy_itable_init to the feature list. Signed-off-by: Lukas Czerner <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: add support for lazy inode table initializationLukas Czerner4-3/+611
When the lazy_itable_init extended option is passed to mke2fs, it considerably speeds up filesystem creation because inode tables are not zeroed out. The fact that parts of the inode table are uninitialized is not a problem so long as the block group descriptors, which contain information regarding how much of the inode table has been initialized, has not been corrupted However, if the block group checksums are not valid, e2fsck must scan the entire inode table, and the the old, uninitialized data could potentially cause e2fsck to report false problems. Hence, it is important for the inode tables to be initialized as soon as possble. This commit adds this feature so that mke2fs can safely use the lazy inode table initialization feature to speed up formatting file systems. This is done via a new new kernel thread called ext4lazyinit, which is created on demand and destroyed, when it is no longer needed. There is only one thread for all ext4 filesystems in the system. When the first filesystem with inititable mount option is mounted, ext4lazyinit thread is created, then the filesystem can register its request in the request list. This thread then walks through the list of requests picking up scheduled requests and invoking ext4_init_inode_table(). Next schedule time for the request is computed by multiplying the time it took to zero out last inode table with wait multiplier, which can be set with the (init_itable=n) mount option (default is 10). We are doing this so we do not take the whole I/O bandwidth. When the thread is no longer necessary (request list is empty) it frees the appropriate structures and exits (and can be created later later by another filesystem). We do not disturb regular inode allocations in any way, it just do not care whether the inode table is, or is not zeroed. But when zeroing, we have to skip used inodes, obviously. Also we should prevent new inode allocations from the group, while zeroing is on the way. For that we take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem in the ext4_claim_inode, so when we are unlucky and allocator hits the group which is currently being zeroed, it just has to wait. This can be suppresed using the mount option no_init_itable. Signed-off-by: Lukas Czerner <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27Add helper function for blkdev_issue_zeroout (sb_issue_discard)Lukas Czerner1-0/+8
This is done the same way as helper sb_issue_discard for blkdev_issue_discard. Signed-off-by: Lukas Czerner <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27jbd2: Add sanity check for attempts to start handle during umountTheodore Ts'o2-0/+11
An attempt to modify the file system during the call to jbd2_destroy_journal() can lead to a system lockup. So add some checking to make it much more obvious when this happens to and to determine where the offending code is located. Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: fix NULL pointer dereference in print_daily_error_info()Sergey Senozhatsky1-0/+1
Fix NULL pointer dereference in print_daily_error_info, when called on unmounted fs (EXT4_SB(sb) returns NULL), by removing error reporting timer in ext4_put_super. Google-Bug-Id: 3017663 Signed-off-by: Sergey Senozhatsky <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: don't hold spinlock while calling ext4_issue_discard()Lukas Czerner1-2/+2
We can't hold the block group spinlock because we ext4_issue_discard() calls wait and hence can get rescheduled. Google-Bug-Id: 3017678 Signed-off-by: Lukas Czerner <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: check for negative error code from sb_issue_discardLukas Czerner1-1/+1
sb_issue_discard() is returning negative error code, so check for -EOPNOTSUPP. Signed-off-by: Lukas Czerner <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: don't bump up LONG_MAX nr_to_write by a factor of 8Eric Sandeen1-3/+6
I'm uneasy with lots of stuff going on in ext4_da_writepages(), but bumping nr_to_write from LLONG_MAX to -8 clearly isn't making anything better, so avoid the multiplier in that case. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: stop looping in ext4_num_dirty_pages when max_pages reachedEric Sandeen1-1/+3
Today we simply break out of the inner loop when we have accumulated max_pages; this keeps scanning forwad and doing pagevec_lookup_tag() in the while (!done) loop, this does potentially a lot of work with no net effect. When we have accumulated max_pages, just clean up and return. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: use dedicated slab caches for group_info structuresCurt Wohlgemuth2-21/+78
ext4_group_info structures are currently allocated with kmalloc(). With a typical 4K block size, these are 136 bytes each -- meaning they'll each consume a 256-byte slab object. On a system with many ext4 large partitions, that's a lot of wasted kernel slab space. (E.g., a single 1TB partition will have about 8000 block groups, using about 2MB of slab, of which nearly 1MB is wasted.) This patch creates an array of slab pointers created as needed -- depending on the superblock block size -- and uses these slabs to allocate the group info objects. Google-Bug-Id: 2980809 Signed-off-by: Curt Wohlgemuth <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds145-1822/+4008
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits) ehea: Fixing statistics bonding: Fix lockdep warning after bond_vlan_rx_register() tunnels: Fix tunnels change rcu protection caif-u5500: Build config for CAIF shared mem driver caif-u5500: CAIF shared memory mailbox interface caif-u5500: CAIF shared memory transport protocol caif-u5500: Adding shared memory include drivers/isdn: delete double assignment drivers/net/typhoon.c: delete double assignment drivers/net/sb1000.c: delete double assignment qlcnic: define valid vlan id range qlcnic: reduce rx ring size qlcnic: fix mac learning ehea: fix use after free inetpeer: __rcu annotations fib_rules: __rcu annotates ctarget tunnels: add __rcu annotations net: add __rcu annotations to protocol ipv4: add __rcu annotations to routes.c qlge: bugfix: Restoring the vlan setting. ...
2010-10-28drm/radeon/kms: enable unmappable vram for evergreenAlex Deucher1-0/+2
Evergreen now has blit support, but unmappable vram support was disabled in c919b371cb734f42b1130e706ecee262f8d9261d (drm/radeon/kms: avoid corner case issue with unmappable vram V2) due to merge ordering. This re-enables unmappable vram on evergreen. Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2010-10-27ext4: avoid null dereference in trace_ext4_mballoc_discardWen Congyang1-2/+3
ac->inode is set to null in function ext4_mb_release_group_pa(), and then trace_ext4_mballoc_discard(ac) is called, the kernel will panic. BUG: unable to handle kernel NULL pointer dereference at 000000a4 IP: [<f87e1714>] ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4] *pdpt = 0000000000abd001 *pde = 0000000000000000 Oops: 0000 [#1] SMP Pid: 550, comm: flush-8:16 Not tainted 2.6.36-rc1 #1 SE7320EP2/Altos G530 EIP: 0060:[<f87e1714>] EFLAGS: 00010206 CPU: 1 EIP is at ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4] EAX: f32ac840 EBX: f3f1cf88 ECX: f32ac840 EDX: 00000000 ESI: f32ac83c EDI: f880b9d8 EBP: 00000000 ESP: f4b77ae4 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process flush-8:16 (pid: 550, ti=f4b76000 task=f613e540 task.ti=f4b76000) Call Trace: [<f87f5ac1>] ? ext4_mb_release_group_pa+0x121/0x150 [ext4] [<f87f8356>] ? ext4_mb_discard_group_preallocations+0x336/0x400 [ext4] [<f87fb7f1>] ? ext4_mb_new_blocks+0x3d1/0x4f0 [ext4] [<c05a6c5b>] ? __make_request+0x10b/0x440 [<f87f1fb4>] ? ext4_ext_map_blocks+0x1334/0x1980 [ext4] [<c04ac78a>] ? rb_reserve_next_event+0xaa/0x3b0 [<f87d18d6>] ? ext4_map_blocks+0xd6/0x1d0 [ext4] [<f87d2da7>] ? mpage_da_map_blocks+0xc7/0x8a0 [ext4] [<c04c8a68>] ? find_get_pages_tag+0x38/0x110 [<c04d23a5>] ? __pagevec_release+0x15/0x20 [<f87d3ca5>] ? ext4_da_writepages+0x2b5/0x5d0 [ext4] [<c04cfbe0>] ? __writepage+0x0/0x30 [<c04d0e34>] ? do_writepages+0x14/0x30 [<c0526600>] ? writeback_single_inode+0xa0/0x240 [<c0526971>] ? writeback_sb_inodes+0xc1/0x180 [<c0526ab8>] ? writeback_inodes_wb+0x88/0x140 [<c0526d7b>] ? wb_writeback+0x20b/0x320 [<c045aca7>] ? lock_timer_base+0x27/0x50 [<c0526fe0>] ? wb_do_writeback+0x150/0x190 [<c05270a8>] ? bdi_writeback_thread+0x88/0x1f0 [<c043b680>] ? complete+0x40/0x60 [<c0527020>] ? bdi_writeback_thread+0x0/0x1f0 [<c0469474>] ? kthread+0x74/0x80 [<c0469400>] ? kthread+0x0/0x80 [<c040a23e>] ? kernel_thread_helper+0x6/0x10 Signed-off-by: Wen Congyang <[email protected]> Acked-by: Steven Rostedt <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-28drm/radeon/kms: fix tiled db height calculation on 6xx/7xxAlex Deucher1-6/+8
Calculate height based on the slice bitfield rather than the size. Same as Dave's CB fix. Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2010-10-28drm/radeon/kms: fix handling of tex lookup disable in cs checker on r2xxAlex Deucher4-0/+7
There are cases when multiple texture units have to be enabled, but not actually used to sample. This patch checks to see if the lookup_disable bit is set and if so, skips the texture check. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=25544 Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Signed-off-by: Dave Airlie <[email protected]>
2010-10-27jbd2: Fix I/O hang in jbd2_journal_release_jbd_inodeBrian King3-6/+12
This fixes a hang seen in jbd2_journal_release_jbd_inode on a lot of Power 6 systems running with ext4. When we get in the hung state, all I/O to the disk in question gets blocked where we stay indefinitely. Looking at the task list, I can see we are stuck in jbd2_journal_release_jbd_inode waiting on a wake up. I added some debug code to detect this scenario and dump additional data if we were stuck in jbd2_journal_release_jbd_inode for longer than 30 minutes. When it hit, I was able to see that i_flags was 0, suggesting we missed the wake up. This patch changes i_flags to be an unsigned long, uses bit operators to access it, and adds barriers around the accesses. Prior to applying this patch, we were regularly hitting this hang on numerous systems in our test environment. After applying the patch, the hangs no longer occur. Signed-off-by: Brian King <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27ext4: fix EOFBLOCKS_FL handlingTheodore Ts'o1-29/+69
It turns out we have several problems with how EOFBLOCKS_FL is handled. First of all, there was a fencepost error where we were not clearing the EOFBLOCKS_FL when fill in the last uninitialized block, but rather when we allocate the next block _after_ the uninitalized block. Secondly we were not testing to see if we needed to clear the EOFBLOCKS_FL when writing to the file O_DIRECT or when were converting an uninitialized block (which is the most common case). Google-Bug-Id: 2928259 Signed-off-by: "Theodore Ts'o" <[email protected]>
2010-10-27fasync: Fix placement of FASYNC flag commentLinus Torvalds1-3/+3
In commit f7347ce4ee7c ("fasync: re-organize fasync entry insertion to allow it under a spinlock") Arnd took an earlier patch of mine that had the comment about the FASYNC flag above the wrong function. When the fasync_add_entry() function was split to introduce the new fasync_insert_entry() helper function, the code that actually cares about the FASYNC bit moved to that new helper. So just move the comment to the right point. Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27Merge branch 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bklLinus Torvalds10-70/+114
* 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: locks: turn lock_flocks into a spinlock fasync: re-organize fasync entry insertion to allow it under a spinlock locks/nfsd: allocate file lock outside of spinlock lockd: fix nlmsvc_notify_blocked locking lockd: push lock_flocks down
2010-10-27epoll: make epoll_wait() use the hrtimer range featureShawn Bohrer3-17/+22
This make epoll use hrtimers for the timeout value which prevents epoll_wait() from timing out up to a millisecond early. This mirrors the behavior of select() and poll(). Signed-off-by: Shawn Bohrer <[email protected]> Cc: Al Viro <[email protected]> Acked-by: Davide Libenzi <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27select: rename estimate_accuracy() to select_estimate_accuracy()Andrew Morton1-3/+3
Make it a subsystem-specific identifier because we wish to amke it non-static in the next patch ("epoll: make epoll_wait() use the hrtimer range feature"). Cc: Shawn Bohrer <[email protected]> Cc: Al Viro <[email protected]> Cc: Davide Libenzi <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27Remove duplicate includes from many filesZimny Lech9-10/+0
Signed-off-by: Zimny Lech <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27ramoops: use the platform data structure instead of module paramsKyungmin Park2-2/+43
As each board and system has different memory for ramoops. It's better to define the platform data instead of module params. [[email protected]: fix ramoops_remove() return type] Signed-off-by: Kyungmin Park <[email protected]> Cc: Marco Stornelli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27kernel/resource.c: handle reinsertion of an already-inserted resourceHuang Shijie1-0/+2
If the same resource is inserted to the resource tree (maybe not on purpose), a dead loop will be created. In this situation, The kernel does not report any warning or error :( The command below will show a endless print. #cat /proc/iomem [[email protected]: add WARN_ON()] Signed-off-by: Huang Shijie <[email protected]> Cc: Jesse Barnes <[email protected]> Cc: Bjorn Helgaas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27kfifo: fix kfifo_alloc() to return a signed int valueStefani Seibold1-12/+21
Add a new __kfifo_int_must_check_helper() helper function, which is needed for kfifo_alloc() to return the right signed integer value. The origin __kfifo_must_check_helper() helper was renamed into __kfifo_uint_must_check_helper() to show the sign which is expected and returned. (And revert the temporary disabling of __kfifo_must_check_helper()) Signed-off-by: Stefani Seibold <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27w1: don't allow arbitrary users to remove w1 devicesBrian Swetland1-4/+4
The search/pullup/add/remove device attributes were 0666 which would allow arbitrary users to affect the 1 wire bus. Change to 0664 to prevent that. I found this patch in the Android tree, apparently this has never been sent upstream so doing it now. Signed-off-by: Brian Swetland <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Cc: Evgeniy Polyakov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27alpha: remove dma64_addr_t usageFUJITA Tomonori1-2/+2
dma_addr_t is always 64 bit on alpha. So let's use dma_addr_t instead. Signed-off-by: FUJITA Tomonori <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27mips: remove dma64_addr_t usageFUJITA Tomonori1-1/+1
dma64_addr_t looks pointless (at least there is no point that an architecture has the own dma64_addr_t typedef). dma_addr_t is set to 32 or 64 bits appropriately. You can use u64 at places where you know that 64 bit address is always necessary. Let's use u64 instead for mips. Signed-off-by: FUJITA Tomonori <[email protected]> Cc: Ralf Baechle <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27sparc: remove dma64_addr_t usageFUJITA Tomonori1-1/+1
dma64_addr_t looks pointless (at least there is no point that an architecture has the own dma64_addr_t typedef). dma_addr_t is set to 32 or 64 bits appropriately. You can use u64 at places where you know that 64 bit address is always necessary. Let's use u64 instead for sparc32. Looks like PCI654_REQUIRED_MASK or PCI64_ADR_BASE isn't used. They can be removed? Signed-off-by: FUJITA Tomonori <[email protected]> Acked-by: David Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27fuse: use release_pages()Miklos Szeredi2-6/+2
Replace iterated page_cache_release() with release_pages(), which is faster and shorter. Needs release_pages() to be exported to modules. Suggested-by: Andrew Morton <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27taskstats: use real microsecond granularity for CPU timesMichael Holzheu5-23/+21
The taskstats interface uses microsecond granularity for the user and system time values. The conversion from cputime to the taskstats values uses the cputime_to_msecs primitive which effectively limits the granularity to milliseconds. Add the cputime_to_usecs primitive for architectures that have better, more precise CPU time values. Remove cputime_to_msecs primitive because there are no more users left. Signed-off-by: Michael Holzheu <[email protected]> Acked-by: Balbir Singh <[email protected]> Cc: Luck Tony <[email protected]> Cc: Shailabh Nagar <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Shailabh Nagar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27taskstats: split fill_pid functionMichael Holzheu1-29/+21
Separate the finding of a task_struct by pid or tgid from filling the taskstats data. This makes the code more readable. Signed-off-by: Michael Holzheu <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27taskstats: separate taskstats commandsMichael Holzheu1-40/+78
Move each taskstats command into a single function. This makes the code more readable and makes it easier to add new commands. Signed-off-by: Michael Holzheu <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27delayacct: align to 8 byte boundary on 64-bit systemsJeff Mahoney1-1/+7
prepare_reply() sets up an skb for the response. The payload contains: +--------------------------------+ | genlmsghdr - 4 bytes | +--------------------------------+ | NLA header - 4 bytes | /* Aggregate header */ +-+------------------------------+ | | NLA header - 4 bytes | /* PID header */ | +------------------------------+ | | pid/tgid - 4 bytes | | +------------------------------+ | | NLA header - 4 bytes | /* stats header */ | + -----------------------------+ <- oops. aligned on 4 byte boundary | | struct taskstats - 328 bytes | +-+------------------------------+ The start of the taskstats struct must be 8 byte aligned on IA64 (and other systems with 8 byte alignment rules for 64-bit types) or runtime alignment warnings will be issued. This patch pads the pid/tgid field out to sizeof(long), which forces the alignment of taskstats. The getdelays userspace code is ok with this since it assumes 32-bit pid/tgid and then honors that header's length field. An array is used to avoid exposing kernel memory contents to userspace in the response. Signed-off-by: Jeff Mahoney <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27delay-accounting: reimplement -c for getdelays.c to report information on a ↵Mel Gorman1-2/+36
target command Task delay-accounting was identified as one means of determining how long a process spends in congestion_wait() without adding new statistics. For example, if the workload should not be doing IO, delay-accounting could reveal how long it was spending in unexpected IO or delays. Unfortunately, on closer examination it was clear that getdelays does not act as documented. Commit a3baf649 ("per-task-delay-accounting: documentation") added Documentation/accounting/getdelays.c with a -c switch that was documented to fork/exec a child and report statistics on it but for reasons that are unclear to me, commit 9e06d3f9 deleted support for this switch but did not update the documentation. It might be an oversight or it might be because the control flow of the program meant that accounting information would be printed once early in the lifetime of the program making it of limited use. This patch reimplements -c for getdelays.c to act as documented. Unlike the original version, it waits until the command completes before printing any information on it. An example of it being used looks like $ ./getdelays -d -c find /home/mel -name mel print delayacct stats ON /home/mel /home/mel/.notes-wine/drive_c/windows/profiles/mel /home/mel/.wine/drive_c/windows/profiles/mel /home/mel/git-configs/dot.kde/share/apps/konqueror/home/mel PID 5923 CPU count real total virtual total delay total 42779 5051232096 5164722692 564207988 IO count delay total 41727 97804147758 SWAP count delay total 0 0 RECLAIM count delay total 0 0 [[email protected]: coding-style fixes] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27namespaces Kconfig: move namespace menu location after the cgroupDaniel Lezcano1-52/+52
We have the namespaces as a menuconfig like the cgroup. The cgroup and the namespace are two base bricks for the containers. It is more logical to put the namespace menu right after the cgroup menu. Signed-off-by: Daniel Lezcano <[email protected]> Cc: Li Zefan <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Paul Menage <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27namespaces Kconfig: remove the cgroup device whitelist experimental tagDaniel Lezcano1-1/+0
This subsystem is merged since a long time now, I think we can consider it mature enough. Signed-off-by: Daniel Lezcano <[email protected]> Cc: Li Zefan <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Paul Menage <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>