aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linusLinus Torvalds15-30/+203
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: simplify CONFIG_SQUASHFS_LZO handling Squashfs: move squashfs_i() definition from squashfs.h Squashfs: get rid of default n in Kconfig Squashfs: add missing check in zlib_wrapper Squashfs: remove unnecessary variable in zlib_wrapper Squashfs: Add XZ compression configuration option Squashfs: add XZ compression support
2011-01-16ACPI: Fix boot problem related to APEI with acpi_disabled setRafael J. Wysocki2-5/+5
Commit 415e12b23792 ("PCI/ACPI: Request _OSC control once for each root bridge (v3)") put the acpi_hest_init() call in acpi_pci_root_init() into a wrong place, presumably because the author confused acpi_pci_disabled with acpi_disabled. Bring the code ordering in acpi_pci_root_init() back to sanity. Additionally, make sure that hest_disable is set when acpi_disabled is set, which is going to prevent acpi_hest_parse(), that still may be executed for acpi_disabled=1 through aer_acpi_firmware_first(), from crashing because of uninitialized hest_tab. Reported-and-tested-by: Andres Salomon <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-01-16PCI / ACPI: Fix build of the AER driver for CONFIG_ACPI unsetRafael J. Wysocki1-3/+0
After commit 415e12b23792 ("PCI/ACPI: Request _OSC control once for each root bridge (v3)") include/linux/pci-acpi.h is included by drivers/pci/pcie/aer/aerdrv.c and if CONFIG_ACPI is unset, the bogus and unnecessary alternative definition of acpi_find_root_bridge_handle() causes a build error to occur. Remove the offending piece of garbage. Reported-and-tested-by: Stephen Rothwell <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-01-16Merge branch 'for-linus' of ↵Linus Torvalds40-854/+1009
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits) sanitize vfsmount refcounting changes fix old umount_tree() breakage autofs4: Merge the remaining dentry ops tables Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Allow d_manage() to be used in RCU-walk mode Remove a further kludge from __do_follow_link() autofs4: Bump version autofs4: Add v4 pseudo direct mount support autofs4: Fix wait validation autofs4: Clean up autofs4_free_ino() autofs4: Clean up dentry operations autofs4: Clean up inode operations autofs4: Remove unused code autofs4: Add d_manage() dentry operation autofs4: Add d_automount() dentry operation Remove the automount through follow_link() kludge code from pathwalk CIFS: Use d_automount() rather than abusing follow_link() NFS: Use d_automount() rather than abusing follow_link() AFS: Use d_automount() rather than abusing follow_link() Add an AT_NO_AUTOMOUNT flag to suppress terminal automount ...
2011-01-16sanitize vfsmount refcounting changesAl Viro10-117/+75
Instead of splitting refcount between (per-cpu) mnt_count and (SMP-only) mnt_longrefs, make all references contribute to mnt_count again and keep track of how many are longterm ones. Accounting rules for longterm count: * 1 for each fs_struct.root.mnt * 1 for each fs_struct.pwd.mnt * 1 for having non-NULL ->mnt_ns * decrement to 0 happens only under vfsmount lock exclusive That allows nice common case for mntput() - since we can't drop the final reference until after mnt_longterm has reached 0 due to the rules above, mntput() can grab vfsmount lock shared and check mnt_longterm. If it turns out to be non-zero (which is the common case), we know that this is not the final mntput() and can just blindly decrement percpu mnt_count. Otherwise we grab vfsmount lock exclusive and do usual decrement-and-check of percpu mnt_count. For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm(); namespace.c uses the latter in places where we don't already hold vfsmount lock exclusive and opencodes a few remaining spots where we need to manipulate mnt_longterm. Note that we mostly revert the code outside of fs/namespace.c back to what we used to have; in particular, normal code doesn't need to care about two kinds of references, etc. And we get to keep the optimization Nick's variant had bought us... Signed-off-by: Al Viro <[email protected]>
2011-01-16fix old umount_tree() breakageAl Viro1-3/+5
Expiry-related code calls umount_tree() several times with the same list to collect vfsmounts to. Which is fine, except that umount_tree() implicitly assumed that the list would be empty on each call - it moves the victims over there and then iterates through the list kicking them out. It's *almost* idempotent, so everything nearly worked. However, mnt->ghosts handling (and thus expirability checks) had been broken - that part was not idempotent... The fix is trivial - use local temporary list, splice it to the the collector list when we are through. Signed-off-by: Al Viro <[email protected]>
2011-01-16btrfs: Require CAP_SYS_ADMIN for filesystem rebalanceBen Hutchings1-0/+4
Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire filesystem and may run uninterruptibly for a long time. This does not seem to be something that an unprivileged user should be able to do. Reported-by: Aron Xu <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_checkJosef Bacik1-5/+0
If we run low on space we could get a bunch of warnings out of btrfs_block_rsv_check, but this is mostly just called via the transaction code to see if we need to end the transaction, it expects to see failures, so let's not WARN and freak everybody out for no reason. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: Fix memory leak in btrfs_read_fs_root_no_radix()Tsutomu Itoh1-0/+1
In btrfs_read_fs_root_no_radix(), 'root' is not freed if btrfs_search_slot() returns error. Signed-off-by: Tsutomu Itoh <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: check NULL or notTsutomu Itoh3-0/+16
Should check if functions returns NULL or not. Signed-off-by: Tsutomu Itoh <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: Don't pass NULL ptr to func that may deref it.Jesper Juhl1-0/+2
Hi, In fs/btrfs/inode.c::fixup_tree_root_location() we have this code: ... if (!path) { err = -ENOMEM; goto out; } ... out: btrfs_free_path(path); return err; btrfs_free_path() passes its argument on to other functions and some of them end up dereferencing the pointer. In the code above that pointer is clearly NULL, so btrfs_free_path() will eventually cause a NULL dereference. There are many ways to cut this cake (fix the bug). The one I chose was to make btrfs_free_path() deal gracefully with NULL pointers. If you disagree, feel free to come up with an alternative patch. Signed-off-by: Jesper Juhl <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: mount failure return value fixDave Young2-4/+8
I happened to pass swap partition as root partition in cmdline, then kernel panic and tell me about "Cannot open root device". It is not correct, in fact it is a fs type mismatch instead of 'no device'. Eventually I found btrfs mounting failed with -EIO, it should be -EINVAL. The logic in init/do_mounts.c: for (p = fs_names; *p; p += strlen(p)+1) { int err = do_mount_root(name, p, flags, root_mount_data); switch (err) { case 0: goto out; case -EACCES: flags |= MS_RDONLY; goto retry; case -EINVAL: continue; } print "Cannot open root device" panic } SO fs type after btrfs will have no chance to mount Here fix the return value as -EINVAL Signed-off-by: Dave Young <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: Mem leak in btrfs_get_acl()Jesper Juhl1-1/+3
It seems to me that we leak the memory allocated to 'value' in btrfs_get_acl() if the call to posix_acl_from_xattr() fails. Here's a patch that attempts to correct that problem. Signed-off-by: Jesper Juhl <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: fix wrong free space information of btrfsMiao Xie5-7/+286
When we store data by raid profile in btrfs with two or more different size disks, df command shows there is some free space in the filesystem, but the user can not write any data in fact, df command shows the wrong free space information of btrfs. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 28.00KB devid 1 size 5.01GB used 2.03GB path /dev/sda9 devid 2 size 10.00GB used 2.01GB path /dev/sda10 # btrfs device scan /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999 (fill the filesystem) # sync # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 3.99GB devid 1 size 5.01GB used 5.01GB path /dev/sda9 devid 2 size 10.00GB used 4.99GB path /dev/sda10 It is because btrfs cannot allocate chunks when one of the pairing disks has no space, the free space on the other disks can not be used for ever, and should be subtracted from the total space, but btrfs doesn't subtract this space from the total. It is strange to the user. This patch fixes it by calcing the free space that can be used to allocate chunks. Implementation: 1. get all the devices free space, and align them by stripe length. 2. sort the devices by the free space. 3. check the free space of the devices, 3.1. if it is not zero, and then check the number of the devices that has more free space than this device, if the number of the devices is beyond the min stripe number, the free space can be used, and add into total free space. if the number of the devices is below the min stripe number, we can not use the free space, the check ends. 3.2. if the free space is zero, check the next devices, goto 3.1 This implementation is just likely fake chunk allocation. After appling this patch, df can show correct space information: # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 0 100% /mnt Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: make the chunk allocator utilize the devices betterMiao Xie2-103/+300
With this patch, we change the handling method when we can not get enough free extents with default size. Implementation: 1. Look up the suitable free extent on each device and keep the search result. If not find a suitable free extent, keep the max free extent 2. If we get enough suitable free extents with default size, chunk allocation succeeds. 3. If we can not get enough free extents, but the number of the extent with default size is >= min_stripes, we just change the mapping information (reduce the number of stripes in the extent map), and chunk allocation succeeds. 4. If the number of the extent with default size is < min_stripes, sort the devices by its max free extent's size descending 5. Use the size of the max free extent on the (num_stripes - 1)th device as the stripe size to allocate the device space By this way, the chunk allocator can allocate chunks as large as possible when the devices' space is not enough and make full use of the devices. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: restructure find_free_dev_extent()Miao Xie2-68/+91
- make it return the start position and length of the max free space when it can not find a suitable free space. - make it more readability Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: fix wrong calculation of stripe sizeMiao Xie1-2/+8
There are two tiny problem: - One is When we check the chunk size is greater than the max chunk size or not, we should take mirrors into account, but the original code didn't. - The other is btrfs shouldn't use the size of the residual free space as the length of of a dup chunk when doing chunk allocation. It is because the device space that a dup chunk needs is twice as large as the chunk size, if we use the size of the residual free space as the length of a dup chunk, we can not get enough free space. Fix it. Signed-off-by: Miao Xie <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: try to reclaim some space when chunk allocation failsMiao Xie1-2/+7
We cannot write data into files when when there is tiny space in the filesystem. Reproduce steps: # mkfs.btrfs /dev/sda1 # mount /dev/sda1 /mnt # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1 # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=99999999999999 (fill the filesystem) # umount /mnt # mount /dev/sda1 /mnt # rm -f /mnt/tmpfile0 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1 (failed with nospec) But if we do the last step again, we can write data successfully. The reason of the problem is that btrfs didn't try to commit the current transaction and reclaim some space when chunk allocation failed. This patch fixes it by committing the current transaction to reclaim some space when chunk allocation fails. Signed-off-by: Miao Xie <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16btrfs: fix wrong data space statisticsMiao Xie1-4/+3
Josef has implemented mixed data/metadata chunks, we must add those chunks' space just like data chunks. Signed-off-by: Miao Xie <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16fs/btrfs: Fix build of ctreeStefan Schmidt1-0/+1
CC [M] fs/btrfs/ctree.o In file included from fs/btrfs/ctree.c:21:0: fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type make[2]: *** [fs/btrfs/ctree.o] Error 1 make[1]: *** [fs/btrfs] Error 2 make: *** [fs] Error 2 We need to include kobject.h here. Reported-by: Jeff Garzik <[email protected]> Fix-suggested-by: Li Zefan <[email protected]> Signed-off-by: Stefan Schmidt <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-01-16Merge branch 'lzo-support' of git://repo.or.cz/linux-btrfs-devel into btrfs-38Chris Mason20-385/+1051
2011-01-16Merge branch 'readonly-snapshots' of git://repo.or.cz/linux-btrfs-devel into ↵Chris Mason7-49/+195
btrfs-38
2011-01-16microblaze: Fix asm/pgtable.hMichal Simek1-2/+4
Function ptep_test_and_clear_young have had wrong the first argument. It is also necessary to add __HAVE macros for ptep_test_and_clear_young and ptep_get_and_clear functions. Error log: In file included from linux/arch/microblaze/include/asm/pgtable.h:570, from arch/microblaze/mm/pgtable.c:35: include/asm-generic/pgtable.h:23: error: conflicting types for 'ptep_test_and_clear_young' linux/arch/microblaze/include/asm/pgtable.h:449: error: previous definition of 'ptep_test_and_clear_young' was here include/asm-generic/pgtable.h:73: error: redefinition of 'ptep_get_and_clear' linux/arch/microblaze/include/asm/pgtable.h:462: error: previous definition of 'ptep_get_and_clear' was here Signed-off-by: Michal Simek <[email protected]>
2011-01-16microblaze: Fix missing pagemap.hMichal Simek1-0/+1
Add missing linux/pagemap.h to solve compilation error. Error log: In file included from linux/arch/microblaze/include/asm/tlb.h:17, from mm/pgtable-generic.c:9: include/asm-generic/tlb.h: In function 'tlb_flush_mmu': include/asm-generic/tlb.h:76: error: implicit declaration of function 'release_pages' include/asm-generic/tlb.h: In function 'tlb_remove_page': include/asm-generic/tlb.h:105: error: implicit declaration of function 'page_cache_release' Signed-off-by: Michal Simek <[email protected]>
2011-01-15dt/flattree: Return virtual address from early_init_dt_alloc_memory_arch()Grant Likely5-16/+8
The physical address is never used by the device tree code when allocating memory for unflattening. Change the architecture's alloc hook to return the virutal address instead. Signed-off-by: Grant Likely <[email protected]>
2011-01-15autofs4: Merge the remaining dentry ops tablesDavid Howells3-20/+4
Merge the remaining autofs4 dentry ops tables. It doesn't matter if d_automount and d_manage are present on something that's not mountable or holdable as these ops are only used if the appropriate flags are set in dentry->d_flags. [AV] switch to ->s_d_op, since now _everything_ on autofs4 is using the same dentry_operations. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15Unexport do_add_mount() and add in follow_automount(), not ->d_automount()David Howells8-89/+101
Unexport do_add_mount() and make ->d_automount() return the vfsmount to be added rather than calling do_add_mount() itself. follow_automount() will then do the addition. This slightly complicates things as ->d_automount() normally wants to add the new vfsmount to an expiration list and start an expiration timer. The problem with that is that the vfsmount will be deleted if it has a refcount of 1 and the timer will not repeat if the expiration list is empty. To this end, we require the vfsmount to be returned from d_automount() with a refcount of (at least) 2. One of these refs will be dropped unconditionally. In addition, follow_automount() must get a 3rd ref around the call to do_add_mount() lest it eat a ref and return an error, leaving the mount we have open to being expired as we would otherwise have only 1 ref on it. d_automount() should also add the the vfsmount to the expiration list (by calling mnt_set_expiry()) and start the expiration timer before returning, if this mechanism is to be used. The vfsmount will be unlinked from the expiration list by follow_automount() if do_add_mount() fails. This patch also fixes the call to do_add_mount() for AFS to propagate the mount flags from the parent vfsmount. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15Allow d_manage() to be used in RCU-walk modeDavid Howells5-12/+22
Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call d_manage() directly. d_manage() needs a parameter to indicate that it is in RCU-walk mode as it isn't allowed to sleep if in that mode (but should return -ECHILD instead). autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon accesses it and otherwise request dropping back to ref-walk mode. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15Remove a further kludge from __do_follow_link()David Howells1-6/+2
Remove a further kludge from __do_follow_link() as it's no longer required with the automount code. This reverts the non-helper-function parts of 051d381259eb57d6074d02a6ba6e90e744f1a29f, which breaks union mounts. Reported-by: [email protected] Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15autofs4: Bump versionIan Kent1-1/+1
Increase the autofs module sub-version so we can tell what kernel implementation is being used from user space debug logging. Signed-off-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15autofs4: Add v4 pseudo direct mount supportIan Kent1-0/+58
Version 4 of autofs provides a pseudo direct mount implementation that relies on directories at the leaves of a directory tree under an indirect mount to trigger mounts. This patch adds support for that functionality. Signed-off-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15autofs4: Fix wait validationIan Kent1-1/+16
It is possible for the check in wait.c:validate_request() to return an incorrect result if the dentry that was mounted upon has changed during the callback. Signed-off-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15autofs4: Clean up autofs4_free_ino()Ian Kent2-22/+0
When this function is called the local reference count does't need to be updated since the dentry is going away and dput definitely must not be called here. Also the autofs info struct field inode isn't used so remove it. Signed-off-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15autofs4: Clean up dentry operationsIan Kent3-29/+26
There are now two distinct dentry operations uses. One for dentrys that trigger mounts and one for dentrys that do not. Rationalize the use of these dentry operations and rename them to reflect their function. Signed-off-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15autofs4: Clean up inode operationsIan Kent3-21/+1
Since the use of ->follow_link() has been eliminated there is no need to separate the indirect and direct inode operations. Signed-off-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15autofs4: Remove unused codeIan Kent2-250/+0
Remove code that is not used due to the use of ->d_automount() and ->d_manage(). Signed-off-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15autofs4: Add d_manage() dentry operationIan Kent4-40/+159
This patch required a previous patch to add the ->d_automount() dentry operation. Add a function to use the newly defined ->d_manage() dentry operation for blocking during mount and expire. Whether the VFS calls the dentry operations d_automount() and d_manage() is controled by the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags. autofs uses the d_automount() operation to callback to user space to request mount operations and the d_manage() operation to block walks into mounts that are under construction or destruction. In order to prevent these functions from being called unnecessarily the DMANAGED_* flags are cleared for cases which would cause this. In the common case the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags are both set for dentrys waiting to be mounted. The DMANAGED_TRANSIT flag is cleared upon successful mount request completion and set during expire runs, both during the dentry expire check, and if selected for expire, is left set until a subsequent successful mount request completes. The exception to this is the so-called rootless multi-mount which has no actual mount at its base. In this case the DMANAGED_AUTOMOUNT flag is cleared upon successful mount request completion as well and set again after a successful expire. Signed-off-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15autofs4: Add d_automount() dentry operationIan Kent4-112/+189
Add a function to use the newly defined ->d_automount() dentry operation for triggering mounts instead of doing the user space callback in ->lookup() and ->d_revalidate(). Note, to be useful the subsequent patch to add the ->d_manage() dentry operation is also needed so the discussion of functionality is deferred to that patch. Signed-off-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15Remove the automount through follow_link() kludge code from pathwalkDavid Howells1-14/+3
Remove the automount through follow_link() kludge code from pathwalk in favour of using d_automount(). Signed-off-by: David Howells <[email protected]> Acked-by: Ian Kent <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15CIFS: Use d_automount() rather than abusing follow_link()David Howells4-67/+80
Make CIFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. [NOTE: THIS IS UNTESTED!] Signed-off-by: David Howells <[email protected]> Cc: Steve French <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15NFS: Use d_automount() rather than abusing follow_link()David Howells5-48/+46
Make NFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <[email protected]> Acked-by: Trond Myklebust <[email protected]> Acked-by: Ian Kent <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15AFS: Use d_automount() rather than abusing follow_link()David Howells4-30/+19
Make AFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15Add an AT_NO_AUTOMOUNT flag to suppress terminal automountDavid Howells4-1/+12
Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount point directories. This can be used by fstatat() users to permit the gathering of attributes on an automount point and also prevent mass-automounting of a directory of automount points by ls. Signed-off-by: David Howells <[email protected]> Acked-by: Ian Kent <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15Add a dentry op to allow processes to be held during pathwalk transitDavid Howells15-50/+126
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (*d_manage)(struct path *path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <[email protected]> Was-Acked-by: Ian Kent <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15Add a dentry op to handle automounting rather than abusing follow_link()David Howells6-58/+198
Add a dentry op (d_automount) to handle automounting directories rather than abusing the follow_link() inode operation. The operation is keyed off a new dentry flag (DCACHE_NEED_AUTOMOUNT). This also makes it easier to add an AT_ flag to suppress terminal segment automount during pathwalk and removes the need for the kludge code in the pathwalk algorithm to handle directories with follow_link() semantics. The ->d_automount() dentry operation: struct vfsmount *(*d_automount)(struct path *mountpoint); takes a pointer to the directory to be mounted upon, which is expected to provide sufficient data to determine what should be mounted. If successful, it should return the vfsmount struct it creates (which it should also have added to the namespace using do_add_mount() or similar). If there's a collision with another automount attempt, NULL should be returned. If the directory specified by the parameter should be used directly rather than being mounted upon, -EISDIR should be returned. In any other case, an error code should be returned. The ->d_automount() operation is called with no locks held and may sleep. At this point the pathwalk algorithm will be in ref-walk mode. Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is added to handle mountpoints. It will return -EREMOTE if the automount flag was set, but no d_automount() op was supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR if the walk point should be used without mounting and 0 if successful. The path will be updated to point to the mounted filesystem if a successful automount took place. __follow_mount() is replaced by follow_managed() which is more generic (especially with the patch that adds ->d_manage()). This handles transits from directories during pathwalk, including automounting and skipping over mountpoints (and holding processes with the next patch). __follow_mount_rcu() will jump out of RCU-walk mode if it encounters an automount point with nothing mounted on it. follow_dotdot*() does not handle automounts as you don't want to trigger them whilst following "..". I've also extracted the mount/don't-mount logic from autofs4 and included it here. It makes the mount go ahead anyway if someone calls open() or creat(), tries to traverse the directory, tries to chdir/chroot/etc. into the directory, or sticks a '/' on the end of the pathname. If they do a stat(), however, they'll only trigger the automount if they didn't also say O_NOFOLLOW. I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their inodes as automount points. This flag is automatically propagated to the dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could save AFS a private flag bit apiece, but is not strictly necessary. It would be preferable to do the propagation in d_set_d_op(), but that doesn't normally have access to the inode. [AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu() succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after that, rather than just returning with ungrabbed *path] Signed-off-by: David Howells <[email protected]> Was-Acked-by: Ian Kent <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-01-15do_lookup() fixAl Viro1-0/+3
do_lookup() has a path leading from LOOKUP_RCU case to non-RCU crossing of mountpoints, which breaks things badly. If we hit need_revalidate: and do nothing in there, we need to come back into LOOKUP_RCU half of things, not to done: in non-RCU one. Signed-off-by: Al Viro <[email protected]>
2011-01-15Merge branch 'slab/urgent' of ↵Linus Torvalds5-7/+17
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: Update Pekka's email address in MAINTAINERS mm/slab.c: make local symbols static slub: Avoid use of slub_lock in show_slab_objects() memory hotplug: one more lock on memory hotplug
2011-01-15Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', ↵Linus Torvalds18-79/+84
'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: avoid pointless blocked-task warnings rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi() * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, olpc: Add missing Kconfig dependencies x86, mrst: Set correct APB timer IRQ affinity for secondary cpu x86: tsc: Fix calibration refinement conditionals to avoid divide by zero x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timekeeping: Make local variables static time: Rename misnamed minsec argument of clocks_calc_mult_shift() * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Remove syscall_exit_fields tracing: Only process module tracepoints once perf record: Add "nodelay" mode, disabled by default perf sched: Fix list of events, dropping unsupported ':r' modifier Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return" perf top: Fix annotate segv perf evsel: Fix order of event list deletion
2011-01-15Merge branch 'devel-stable' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds250-2398/+6535
* 'devel-stable' of master.kernel.org:/home/rmk/linux-2.6-arm: (161 commits) ARM: pxa: fix building issue of missing physmap.h ARM: mmp: PXA910 drive strength FAST using wrong value ARM: mmp: MMP2 drive strength FAST using wrong value ARM: pxa: fix recursive calls in pxa_low_gpio_chip AT91: Support for gsia18s board AT91: Acme Systems FOX Board G20 board files AT91: board-sam9m10g45ek.c: Remove duplicate inclusion of mach/hardware.h ARM: pxa: fix suspend/resume array index miscalculation ARM: pxa: use cpu_has_ipr() consistently in irq.c ARM: pxa: remove unused variable in clock-pxa3xx.c ARM: pxa: fix warning in zeus.c ARM: sa1111: fix typo in sa1111_retrigger_lowirq() ARM mxs: clkdev related compile fixes ARM i.MX mx31_3ds: Fix MC13783 regulator names ARM: plat-stmp3xxx: irq_data conversion. ARM: plat-spear: irq_data conversion. ARM: plat-orion: irq_data conversion. ARM: plat-omap: irq_data conversion. ARM: plat-nomadik: irq_data conversion. ARM: plat-mxc: irq_data conversion. ... Fix up trivial conflict in arch/arm/plat-omap/gpio.c (Lennert Buytenhek's irq_data conversion clashing with some omap irq updates)
2011-01-15Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds21-34/+52
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm: ARM: fix missing branch in __error_a ARM: fix /proc/$PID/stack on SMP ARM: Fix build regression on SA11x0, PXA, and H720x targets ARM: 6625/1: use memblock memory regions for "System RAM" I/O resources ARM: fix wrongly patched constants ARM: 6624/1: fix dependency for CONFIG_SMP_ON_UP ARM: 6623/1: Thumb-2: Fix out-of-range offset for Thumb-2 in proc-v7.S ARM: 6622/1: fix dma_unmap_sg() documentation ARM: 6621/1: bitops: remove condition code clobber for CLZ ARM: 6620/1: Change misleading warning when CONFIG_CMDLINE_FORCE is used ARM: 6619/1: nommu: avoid mapping vectors page when !CONFIG_MMU ARM: sched_clock: make minsec argument to clocks_calc_mult_shift() zero ARM: sched_clock: allow init_sched_clock() to be called early ARM: integrator: fix compile warning in cpu.c ARM: 6616/1: Fix ep93xx-fb init/exit annotations ARM: twd: fix display of twd frequency ARM: udelay: prevent math rounding resulting in short udelays