aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2013-11-06f2fs: avoid to use a NULL point in destroy_segment_managerChao Yu1-0/+2
A NULL point should avoid to be used in destroy_segment_manager after allocating memory fail for f2fs_sm_info. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2013-11-06Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ ↵Ingo Molnar2-0/+3
file move Conflicts: kernel/Makefile There are conflicts in kernel/Makefile due to file moving in the scheduler tree - resolve them. Signed-off-by: Ingo Molnar <[email protected]>
2013-11-05audit: call audit_bprm() only once to add AUDIT_EXECVE informationRichard Guy Briggs1-4/+1
Move the audit_bprm() call from search_binary_handler() to exec_binprm(). This allows us to get rid of the mm member of struct audit_aux_data_execve since bprm->mm will equal current->mm. This also mitigates the issue that ->argc could be modified by the load_binary() call in search_binary_handler(). audit_bprm() was being called to add an AUDIT_EXECVE record to the audit context every time search_binary_handler() was recursively called. Only one reference is necessary. Reported-by: Oleg Nesterov <[email protected]> Cc: Eric Paris <[email protected]> Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Eric Paris <[email protected]> --- This patch is against 3.11, but was developed on Oleg's post-3.11 patches that introduce exec_binprm().
2013-11-05audit: add child record before the create to handle case where create failsJeff Layton1-0/+1
Historically, when a syscall that creates a dentry fails, you get an audit record that looks something like this (when trying to create a file named "new" in "/tmp/tmp.SxiLnCcv63"): type=PATH msg=audit(1366128956.279:965): item=0 name="/tmp/tmp.SxiLnCcv63/new" inode=2138308 dev=fd:02 mode=040700 ouid=0 ogid=0 rdev=00:00 obj=staff_u:object_r:user_tmp_t:s15:c0.c1023 This record makes no sense since it's associating the inode information for "/tmp/tmp.SxiLnCcv63" with the path "/tmp/tmp.SxiLnCcv63/new". The recent patch I posted to fix the audit_inode call in do_last fixes this, by making it look more like this: type=PATH msg=audit(1366128765.989:13875): item=0 name="/tmp/tmp.DJ1O8V3e4f/" inode=141 dev=fd:02 mode=040700 ouid=0 ogid=0 rdev=00:00 obj=staff_u:object_r:user_tmp_t:s15:c0.c1023 While this is more correct, if the creation of the file fails, then we have no record of the filename that the user tried to create. This patch adds a call to audit_inode_child to may_create. This creates an AUDIT_TYPE_CHILD_CREATE record that will sit in place until the create succeeds. When and if the create does succeed, then this record will be updated with the correct inode info from the create. This fixes what was broken in commit bfcec708. Commit 79f6530c should also be backported to stable v3.7+. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Eric Paris <[email protected]> Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-11-05audit: allow unsetting the loginuid (with priv)Eric Paris1-4/+10
If a task has CAP_AUDIT_CONTROL allow that task to unset their loginuid. This would allow a child of that task to set their loginuid without CAP_AUDIT_CONTROL. Thus when launching a new login daemon, a priviledged helper would be able to unset the loginuid and then the daemon, which may be malicious user facing, do not need priv to function correctly. Signed-off-by: Eric Paris <[email protected]> Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-11-05ext2: Fix fs corruption in ext2_get_xip_mem()Jan Kara2-0/+3
Commit 8e3dffc651cb "Ext2: mark inode dirty after the function dquot_free_block_nodirty is called" unveiled a bug in __ext2_get_block() called from ext2_get_xip_mem(). That function called ext2_get_block() mistakenly asking it to map 0 blocks while 1 was intended. Before the above mentioned commit things worked out fine by luck but after that commit we started returning that we allocated 0 blocks while we in fact allocated 1 block and thus allocation was looping until all blocks in the filesystem were exhausted. Fix the problem by properly asking for one block and also add assertion in ext2_get_blocks() to catch similar problems. Reported-and-tested-by: Andiry Xu <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2013-11-05fuse: writepages: protect secondary requests from fuse file releaseMaxim Patlasov1-0/+1
All async fuse requests must be supplied with extra reference to a fuse file. This is necessary to ensure that the fuse file is not released until all in-flight requests are completed. Fuse secondary writeback requests must obey this rule as well. Signed-off-by: Maxim Patlasov <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2013-11-05fuse: writepages: update bdi writeout when deleting secondary requestMaxim Patlasov1-1/+4
BDI_WRITTEN counter is used to estimate bdi bandwidth. It must be incremented every time as bdi ends page writeback. No matter whether it was fulfilled by actual write or by discarding the request (e.g. due to shrunk i_size). Note that even before writepages patches, the case "Got truncated off completely" was handled in fuse_send_writepage() by calling fuse_writepage_finish() which updated BDI_WRITTEN unconditionally. Signed-off-by: Maxim Patlasov <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2013-11-05fuse: writepages: crop secondary requestsMaxim Patlasov1-5/+31
If writeback happens while fuse is in FUSE_NOWRITE condition, the request will be queued but not processed immediately (see fuse_flush_writepages()). Until FUSE_NOWRITE becomes relaxed, more writebacks can happen. They will be queued as "secondary" requests to that first ("primary") request. Existing implementation crops only primary request. This is not correct because a subsequent extending write(2) may increase i_size and then secondary requests won't be cropped properly. The result would be stale data written to the server to a file offset where zeros must be. Similar problem may happen if secondary requests are attached to an in-flight request that was already cropped. The patch solves the issue by cropping all secondary requests in fuse_writepage_end(). Thanks to Miklos for idea. Signed-off-by: Maxim Patlasov <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2013-11-05fuse: writepages: roll back changes if request not foundMaxim Patlasov1-2/+4
fuse_writepage_in_flight() returns false if it fails to find request with given index in fi->writepages. Then the caller proceeds with populating data->orig_pages[] and incrementing req->num_pages. Hence, fuse_writepage_in_flight() must revert changes it made in request before returning false. Signed-off-by: Maxim Patlasov <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2013-11-04Revert "nfsd: remove_stid can be incorporated into nfs4_put_delegation"J. Bruce Fields1-1/+3
This reverts commit 7ebe40f20372688a627ad6c754bc0d1c05df58a9. We forgot the nfs4_put_delegation call in fs/nfsd/nfs4callback.c which should not be unhashing the stateid. This lead to warnings from the idr code when we tried to removed id's twice. Signed-off-by: J. Bruce Fields <[email protected]>
2013-11-04NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_securityTrond Myklebust2-9/+0
We already check for nfs_server_capable(inode, NFS_CAP_SECURITY_LABEL) in nfs4_label_alloc() We check the minor version in _nfs4_server_capabilities before setting NFS_CAP_SECURITY_LABEL. Signed-off-by: Trond Myklebust <[email protected]>
2013-11-04NFSv4: Sanity check the server reply in _nfs4_server_capabilitiesTrond Myklebust1-5/+20
We don't want to be setting capabilities and/or requesting attributes that are not appropriate for the NFSv4 minor version. - Ensure that we clear the NFS_CAP_SECURITY_LABEL capability when appropriate - Ensure that we limit the attribute bitmasks to the mounted_on_fileid attribute and less for NFSv4.0 - Ensure that we limit the attribute bitmasks to suppattr_exclcreat and less for NFSv4.1 - Ensure that we limit it to change_sec_label or less for NFSv4.2 Signed-off-by: Trond Myklebust <[email protected]>
2013-11-04NFSv4.2: encode_readdir - only ask for labels when doing readdirplusTrond Myklebust1-13/+12
Currently, if the server is doing NFSv4.2 and supports labeled NFS, then our on-the-wire READDIR request ends up asking for the label information, which is then ignored unless we're doing readdirplus. This patch ensures that READDIR doesn't ask the server for label information at all unless the readdir->bitmask contains the FATTR4_WORD2_SECURITY_LABEL attribute, and the readdir->plus flag is set. While we're at it, optimise away the 3rd bitmap field if it is zero. Signed-off-by: Trond Myklebust <[email protected]>
2013-11-04nfs: set security label when revalidating inodeJeff Layton1-0/+2
Currently, we fetch the security label when revalidating an inode's attributes, but don't apply it. This is in contrast to the readdir() codepath where we do apply label changes. Cc: Dave Quigley <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2013-11-04xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock orderingDave Chinner1-28/+44
Removing an inode from the namespace involves removing the directory entry and dropping the link count on the inode. Removing the directory entry can result in locking an AGF (directory blocks were freed) and removing a link count can result in placing the inode on an unlinked list which results in locking an AGI. The big problem here is that we have an ordering constraint on AGF and AGI locking - inode allocation locks the AGI, then can allocate a new extent for new inodes, locking the AGF after the AGI. Similarly, freeing the inode removes the inode from the unlinked list, requiring that we lock the AGI first, and then freeing the inode can result in an inode chunk being freed and hence freeing disk space requiring that we lock an AGF. Hence the ordering that is imposed by other parts of the code is AGI before AGF. This means we cannot remove the directory entry before we drop the inode reference count and put it on the unlinked list as this results in a lock order of AGF then AGI, and this can deadlock against inode allocation and freeing. Therefore we must drop the link counts before we remove the directory entry. This is still safe from a transactional point of view - it is not until we get to xfs_bmap_finish() that we have the possibility of multiple transactions in this operation. Hence as long as we remove the directory entry and drop the link count in the first transaction of the remove operation, there are no transactional constraints on the ordering here. Change the ordering of the operations in the xfs_remove() function to align the ordering of AGI and AGF locking to match that of the rest of the code. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Ben Myers <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-11-04Merge branch 'aio-fix' of http://evilpiepirate.org/git/linux-bcacheBenjamin LaHaise1-81/+48
2013-11-04ext4: remove unreachable code in ext4_can_extents_be_merged()Eric Sandeen1-7/+2
Commit ec22ba8e ("ext4: disable merging of uninitialized extents") ensured that if either extent under consideration is uninit, we decline to merge, and immediately return. But right after that test, we test again for an uninit extent; we can never hit this. So just remove the impossible test and associated variable. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Reviewed-by: Zheng Liu <[email protected]>
2013-11-04quota: info leak in quota_getquota()Dan Carpenter1-0/+1
The if_dqblk struct has a 4 byte hole at the end of the struct so uninitialized stack information is leaked to user space. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2013-11-04GFS2: Use generic list_lru for quotaSteven Whitehouse4-66/+85
By using the generic list_lru code, we can now separate the per sb quota list locking from the lru locking. The lru lock is made into the inner-most lock. As a result of this new lock order, we may occasionally see items on the per-sb quota list which are "dead" so that the two places where we traverse that list are updated to take account of that. As a result of this patch, the gfs2 quota shrinker is now NUMA zone aware, and we are also laying the foundations for further improvments in due course. Signed-off-by: Steven Whitehouse <[email protected]> Signed-off-by: Abhijith Das <[email protected]> Tested-by: Abhijith Das <[email protected]> Cc: Dave Chinner <[email protected]>
2013-11-04GFS2: Rename quota qd_lru_lock qd_lockSteven Whitehouse1-35/+35
This is a straight forward rename which is in preparation for introducing the generic list_lru infrastructure in the following patch. Signed-off-by: Steven Whitehouse <[email protected]> Signed-off-by: Abhijith Das <[email protected]> Tested-by: Abhijith Das <[email protected]>
2013-11-04GFS2: Use reflink for quota data cacheSteven Whitehouse2-15/+29
This patch adds reflink support to the quota data cache. It looks a bit strange because we still don't have a sensible split in the lookup by id and the lru list. That is coming in later patches though. The intent here is just to swap the current ref count for reflinks in all cases with as little as possible other change. Signed-off-by: Steven Whitehouse <[email protected]> Signed-off-by: Abhijith Das <[email protected]> Tested-by: Abhijith Das <[email protected]>
2013-11-04f2fs: remove unnecessary TestClearPageError when wait pages writebackChao Yu1-3/+4
In wait_on_node_pages_writeback we will test and clear error flag for all pages in radix tree, but not necessary. So we only do this for pages belong to the specified inode. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2013-11-02Query network adapter info at mount time for debuggingSteve French1-0/+30
When CONFIG_CIFS_STATS2 enabled query adapter info for debugging It is easy now in SMB3 to query the information about the server's network interfaces (and at least Windows 8 and above do this, if not other clients) there are some useful pieces of information you can get including: - all of the network interfaces that the server advertises (not just the one you are mounting over), and with SMB3 supporting multichannel this helps with more than just failover (also aggregating multiple sockets under one mount) - whether the adapter supports RSS (useful to know if you want to estimate whether setting up two or more socket connections to the same address is going to be faster due to RSS offload in the adapter) - whether the server supports RDMA - whether the server has IPv6 interfaces (if you connected over IPv4 but prefer IPv6 e.g.) - what the link speed is (you might want to reconnect over a higher speed interface if available) (Of course we could also rerequest this on every mount cheaplly to the same server, as Windows apparently does, so we can update the adapter info on new mounts, and also on every reconnect if the network interface drops temporarily - so we don't have to rely on info from the first mount to this server) It is trivial to request this information - and certainly will be useful when we get to the point of doing multichannel (and eventually RDMA), but some of this (linkspeed etc.) info may help for debugging in the meantime. Enable this request when CONFIG_CIFS_STATS2 is on (only for smb3 mounts since it is an SMB3 or later ioctl). Signed-off-by: Steve French <[email protected]>
2013-11-02Fix unused variable warning when CIFS POSIX disabledSteve French1-1/+1
Fix unused variable warning when CONFIG_CIFS_POSIX disabled. fs/cifs/ioctl.c: In function 'cifs_ioctl': >> fs/cifs/ioctl.c:40:8: warning: unused variable 'ExtAttrMask' [-Wunused-variable] __u64 ExtAttrMask = 0; ^ Pointed out by 0-DAY kernel build testing backend Signed-off-by: Steve French <[email protected]>
2013-11-02Allow setting per-file compression via CIFS protocolSteve French5-4/+94
An earlier patch allowed setting the per-file compression flag "chattr +c filename" on an smb2 or smb3 mount, and also allowed lsattr to return whether a file on a cifs, or smb2/smb3 mount was compressed. This patch extends the ability to set the per-file compression flag to the cifs protocol, which uses a somewhat different IOCTL mechanism than SMB2, although the payload (the flags stored in the compression_state) are the same. Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
2013-11-02Query File System AlignmentSteven French4-4/+76
In SMB3 it is now possible to query the file system alignment info, and the preferred (for performance) sector size and whether the underlying disk has no seek penalty (like SSD). Query this information at mount time for SMB3, and make it visible in /proc/fs/cifs/DebugData for debugging purposes. This alignment information and preferred sector size info will be helpful for the copy offload patches to setup the right chunks in the CopyChunk requests. Presumably the knowledge that the underlying disk is SSD could also help us make better readahead and writebehind decisions (something to look at in the future). Signed-off-by: Steve French <[email protected]>
2013-11-02Query device characteristics at mount time from server on SMB2/3 not just on ↵Steven French3-10/+28
cifs mounts Currently SMB2 and SMB3 mounts do not query the device information at mount time from the server as is done for cifs. These can be useful for debugging. This is a minor patch, that extends the previous one (which added ability to query file system attributes at mount time - this returns the device characteristics - also via in /proc/fs/cifs/DebugData) Signed-off-by: Steve French <[email protected]>
2013-11-02cifs: Send a logoff request before removing a smb sessionShirish Pargaonkar3-9/+37
Send a smb session logoff request before removing smb session off of the list. On a signed smb session, remvoing a session off of the list before sending a logoff request results in server returning an error for lack of smb signature. Never seen an error during smb logoff, so as per MS-SMB2 3.2.5.1, not sure how an error during logoff should be retried. So for now, if a server returns an error to a logoff request, log the error and remove the session off of the list. Signed-off-by: Shirish Pargaonkar <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
2013-11-02cifs: Make big endian multiplex ID sequences monotonic on the wireTim Gardner6-10/+35
The multiplex identifier (MID) in the SMB header is only ever used by the client, in conjunction with PID, to match responses from the server. As such, the endianess of the MID is not important. However, When tracing packet sequences on the wire, protocol analyzers such as wireshark display MID as little endian. It is much more informative for the on-the-wire MID sequences to match debug information emitted by the CIFS driver. Therefore, one should write and read MID in the SMB header assuming it is always little endian. Observed from wireshark during the protocol negotiation and session setup: Multiplex ID: 256 Multiplex ID: 256 Multiplex ID: 512 Multiplex ID: 512 Multiplex ID: 768 Multiplex ID: 768 After this patch on-the-wire MID values begin at 1 and increase monotonically. Introduce get_next_mid64() for the internal consumers that use the full 64 bit multiplex identifier. Introduce the helpers get_mid() and compare_mid() to make the endian translation clear. Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Tim Gardner <[email protected]> Signed-off-by: Steve French <[email protected]>
2013-11-01sysfs: rename sysfs_assoc_lock and explain what it's aboutTejun Heo3-10/+30
sysfs_assoc_lock is an odd piece of locking. In general, whoever owns a kobject is responsible for synchronizing sysfs operations and sysfs proper assumes that, for example, removal won't race with any other operation; however, this doesn't work for symlinking because an entity performing symlink doesn't usually own the target kobject and thus has no control over its removal. sysfs_assoc_lock synchronizes symlink operations against kobj->sd disassociation so that symlink code doesn't end up dereferencing already freed sysfs_dirent by racing with removal of the target kobject. This is quite obscure and the generic name of the lock and lack of comments make it difficult to understand its role. Let's rename it to sysfs_symlink_target_lock and add comments explaining what's going on. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-11-01sysfs: use generic_file_llseek() for sysfs_file_operationsTejun Heo1-1/+1
13c589d5b0ac6 ("sysfs: use seq_file when reading regular files") converted regular sysfs files to use seq_file. The commit substituted generic_file_llseek() with seq_lseek() for llseek implementation. Before the change, all regular sysfs files were allowed to seek to any position in [0, PAGE_SIZE] as the file size is always PAGE_SIZE and generic_file_llseek() allows any seeking inside the range under file size; however, seq_lseek()'s behavior is different. It traverses the output by repeatedly invoking ->show() until it reaches the target offset or traversal indicates EOF. As seq_files are fully dynamic and may not end at all, it doesn't support seeking from the end (SEEK_END). Apparently, there are userland tools which uses SEEK_END to discover the buffer size to use and the switch to seq_lseek() disturbs them as SEEK_END fails with -EINVAL. The only benefits of using seq_lseek() instead of generic_file_llseek() are * Early failure. If traversing to certain file position should fail, seq_lseek() will report such failures on lseek(2) instead of the following read/write operations. * EOF detection. While SEEK_END is not supported, SEEK_SET/CUR + large offset can be used to detect eof - eof at the time of the seek anyway as the file size may change dynamically. Both aren't necessary for sysfs or prospect kernfs users. Revert to genefic_file_llseek() and preserve the original behavior. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Heiko Carstens <[email protected]> Link: https://lkml.kernel.org/r/20131031114358.GA5551@osiris Tested-by: Heiko Carstens <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-11-01nfsd4: fix discarded security labels on setattrJ. Bruce Fields1-0/+1
Security labels in setattr calls are currently ignored because we forget to set label->len. Cc: [email protected] Reported-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2013-11-01NFS: Fix a missing initialisation when reading the SELinux labelTrond Myklebust1-3/+3
Ensure that _nfs4_do_get_security_label() also initialises the SEQUENCE call correctly, by having it call into nfs4_call_sync(). Reported-by: Jeff Layton <[email protected]> Cc: [email protected] # 3.11+ Signed-off-by: Trond Myklebust <[email protected]>
2013-11-01nfs: fix oops when trying to set SELinux labelJeff Layton1-4/+4
Chao reported the following oops when testing labeled NFS: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4] PGD 277bbd067 PUD 2777ea067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache sg coretemp kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul iTCO_wdt glue_helper ablk_helper cryptd iTCO_vendor_support bnx2 pcspkr serio_raw i7core_edac cdc_ether microcode usbnet edac_core mii lpc_ich i2c_i801 mfd_core shpchp ioatdma dca acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi drm ata_piix libata megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 4 PID: 25657 Comm: chcon Not tainted 3.10.0-33.el7.x86_64 #1 Hardware name: IBM System x3550 M3 -[7944OEJ]-/90Y4784 , BIOS -[D6E150CUS-1.11]- 02/08/2011 task: ffff880178397220 ti: ffff8801595d2000 task.ti: ffff8801595d2000 RIP: 0010:[<ffffffffa0568703>] [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4] RSP: 0018:ffff8801595d3888 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffff8801595d3b30 RCX: 0000000000000b4c RDX: ffff8801595d3b30 RSI: ffff8801595d38e0 RDI: ffff880278b6ec00 RBP: ffff8801595d38c8 R08: ffff8801595d3b30 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801595d38e0 R13: ffff880277a4a780 R14: ffffffffa05686c0 R15: ffff8802765f206c FS: 00007f2c68486800(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000027651a000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffff880277865800 ffff880278b6ec00 ffff880277a4a780 ffff8801595d3948 ffffffffa02ad926 ffff8801595d3b30 ffff8802765f206c Call Trace: [<ffffffffa02ad926>] rpcauth_wrap_req+0x86/0xd0 [sunrpc] [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc] [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc] [<ffffffffa02a1ecb>] call_transmit+0x18b/0x290 [sunrpc] [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc] [<ffffffffa02aae14>] __rpc_execute+0x84/0x400 [sunrpc] [<ffffffffa02ac40e>] rpc_execute+0x5e/0xa0 [sunrpc] [<ffffffffa02a2ea0>] rpc_run_task+0x70/0x90 [sunrpc] [<ffffffffa02a2f03>] rpc_call_sync+0x43/0xa0 [sunrpc] [<ffffffffa055284d>] _nfs4_do_set_security_label+0x11d/0x170 [nfsv4] [<ffffffffa0558861>] nfs4_set_security_label.isra.69+0xf1/0x1d0 [nfsv4] [<ffffffff815fca8b>] ? avc_alloc_node+0x24/0x125 [<ffffffff815fcd2f>] ? avc_compute_av+0x1a3/0x1b5 [<ffffffffa055897b>] nfs4_xattr_set_nfs4_label+0x3b/0x50 [nfsv4] [<ffffffff811bc772>] generic_setxattr+0x62/0x80 [<ffffffff811bcfc3>] __vfs_setxattr_noperm+0x63/0x1b0 [<ffffffff811bd1c5>] vfs_setxattr+0xb5/0xc0 [<ffffffff811bd2fe>] setxattr+0x12e/0x1c0 [<ffffffff811a4d22>] ? final_putname+0x22/0x50 [<ffffffff811a4f2b>] ? putname+0x2b/0x40 [<ffffffff811aa1cf>] ? user_path_at_empty+0x5f/0x90 [<ffffffff8119bc29>] ? __sb_start_write+0x49/0x100 [<ffffffff811bd66f>] SyS_lsetxattr+0x8f/0xd0 [<ffffffff8160cf99>] system_call_fastpath+0x16/0x1b Code: 48 8b 02 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 d0 00 00 00 00 48 c7 45 d8 00 00 00 00 48 c7 45 e0 00 00 00 00 <48> 8b 00 48 8b 00 48 85 c0 0f 84 ae 00 00 00 48 8b 80 b8 03 00 RIP [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4] RSP <ffff8801595d3888> CR2: 0000000000000000 The problem is that _nfs4_do_set_security_label calls rpc_call_sync() directly which fails to do any setup of the SEQUENCE call. Have it use nfs4_call_sync() instead which does the right thing. While we're at it change the name of "args" to "arg" to better match the pattern in _nfs4_do_setattr. Reported-by: Chao Ye <[email protected]> Cc: David Quigley <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Cc: [email protected] # 3.11+ Signed-off-by: Trond Myklebust <[email protected]>
2013-11-01Merge branch 'linus' into sched/coreIngo Molnar31-76/+168
Resolve cherry-picking conflicts: Conflicts: mm/huge_memory.c mm/memory.c mm/mprotect.c See this upstream merge commit for more details: 52469b4fcd4f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <[email protected]>
2013-10-31ext4: avoid bh leak in retry path of ext4_expand_extra_isize_ea()Theodore Ts'o1-0/+1
Reported-by: Dave Jones <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: [email protected]
2013-10-31vfs: decrapify dput(), fix cache behavior under normal loadLinus Torvalds1-2/+3
We do not want to dirty the dentry->d_flags cacheline in dput() just to set the DCACHE_REFERENCED flag when it is already set in the common case anyway. This way the first cacheline of the dentry (which contains the RCU lookup information etc) can stay shared among multiple CPU's. This finishes off some of the details of all the scalability patches merged during the merge window. Also don't mark dentry_kill() for inlining, since it's the uncommon path and inlining it just makes the common path slower due to extra function entry/exit overhead. Signed-off-by: Linus Torvalds <[email protected]>
2013-10-31xfs: fix the extent count when allocating an new indirection array entryJie Liu1-5/+4
At xfs_iext_add(), if extent(s) are being appended to the last page in the indirection array and the new extent(s) don't fit in the page, the number of extents(erp->er_extcount) in a new allocated entry should be the minimum value between count and XFS_LINEAR_EXTS, instead of count. For now, there is no existing test case can demonstrates a problem with the er_extcount being set incorrectly here, but it obviously like a bug. Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Ben Myers <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-31jbd: Revert "jbd: remove dependency on __GFP_NOFAIL"Jan Kara1-4/+4
This reverts commit 05713082ab7690a2b22b044cfc867f346c39cd2d. The idea to remove __GFP_NOFAIL was opposed by Andrew Morton. Although mm guys do want to get rid of __GFP_NOFAIL users, opencoding the allocation retry is even worse. See emails following http://www.gossamer-threads.com/lists/linux/kernel/1809153#1809153 Signed-off-by: Jan Kara <[email protected]>
2013-10-31nfs: fix inverted test for delegation in nfs4_reclaim_open_stateJeff Layton1-1/+1
commit 6686390bab6a0e0 (NFS: remove incorrect "Lock reclaim failed!" warning.) added a test for a delegation before checking to see if any reclaimed locks failed. The test however is backward and is only doing that check when a delegation is held instead of when one isn't. Cc: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Fixes: 6686390bab6a: NFS: remove incorrect "Lock reclaim failed!" warning. Cc: [email protected] # 3.12 Signed-off-by: Trond Myklebust <[email protected]>
2013-10-31ext4: don't count free clusters from a corrupt block groupDarrick J. Wong1-2/+11
A bg that's been flagged "corrupt" by definition has no free blocks, so that the allocator won't be tempted to use the damaged bg. Therefore, we shouldn't count the clusters in the damaged group when calculating free counts. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Reviewed-by: Zheng Liu <[email protected]>
2013-10-31f2fs: avoid to wait all the node blocks during fsyncJaegeuk Kim3-2/+44
Previously, f2fs_sync_file() waits for all the node blocks to be written. But, we don't need to do that, but wait only the inode-related node blocks. This patch adds wait_on_node_pages_writeback() in which waits inode-related node blocks that are on writeback. Signed-off-by: Jaegeuk Kim <[email protected]>
2013-10-30NFSD: Add support for NFS v4.2 operation checkingAnna Schumaker1-3/+5
The server does allow NFS over v4.2, even if it doesn't add any new operations yet. I also switch to using constants to represent the last operation for each minor version since this makes the code cleaner and easier to understand at a quick glance. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2013-10-30xfs: be more forgiving of a v4 secondary sb w/ junk in v5 fieldsEric Sandeen1-2/+11
Today, if xfs_sb_read_verify encounters a v4 superblock with junk past v4 fields which includes data in sb_crc, it will be treated as a failing checksum and a significant corruption. There are known prior bugs which leave junk at the end of the V4 superblock; we don't need to actually fail the verification in this case if other checks pan out ok. So if this is a secondary superblock, and the primary superblock doesn't indicate that this is a V5 filesystem, don't treat this as an actual checksum failure. We should probably check the garbage condition as we do in xfs_repair, and possibly warn about it or self-heal, but that's a different scope of work. Stable folks: This can go back to v3.10, which is what introduced the sb CRC checking that is tripped up by old, stale, incorrect V4 superblocks w/ unzeroed bits. Cc: [email protected] Signed-off-by: Eric Sandeen <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Mark Tinguely <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-30xfs: fix possible NULL dereference in xlog_verify_iclogGeyslan G. Bem1-5/+3
In xlog_verify_iclog a debug check of the incore log buffers prints an error if icptr is null and then goes on to dereference the pointer regardless. Convert this to an assert so that the intention is clear. This was reported by Coverty. Signed-off-by: Ben Myers <[email protected]> Reviewed-by: Eric Sandeen <[email protected]>
2013-10-30xfs:xfs_dir2_node.c: pointer use before check for nullDenis Efremov1-1/+0
ASSERT on args takes place after args dereference. This assertion is redundant since we are going to panic anyway. Found by Linux Driver Verification project (linuxtesting.org) - PVS-Studio analyzer. Signed-off-by: Denis Efremov <[email protected]> Reviewed-by: Ben Myers <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-30xfs: prevent stack overflows from page cache allocationDave Chinner2-2/+10
Page cache allocation doesn't always go through ->begin_write and hence we don't always get the opportunity to set the allocation context to GFP_NOFS. Failing to do this means we open up the direct relcaim stack to recurse into the filesystem and consume a significant amount of stack. On RHEL6.4 kernels we are seeing ra_submit() and generic_file_splice_read() from an nfsd context recursing into the filesystem via the inode cache shrinker and evicting inodes. This is causing truncation to be run (e.g EOF block freeing) and causing bmap btree block merges and free space btree block splits to occur. These btree manipulations are occurring with the call chain already 30 functions deep and hence there is not enough stack space to complete such operations. To avoid these specific overruns, we need to prevent the page cache allocation from recursing via direct reclaim. We can do that because the allocation functions take the allocation context from that which is stored in the mapping for the inode. We don't set that right now, so the default is GFP_HIGHUSER_MOVABLE, which is effectively a GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so when we initialise an inode, set the mapping gfp mask appropriately. This makes the use of AOP_FLAG_NOFS redundant from other parts of the XFS IO path, so get rid of it. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-30xfs: fix static and extern sparse warningsDave Chinner12-8/+18
The kbuild test robot indicated that there were some new sparse warnings in fs/xfs/xfs_dquot_buf.c. Actually, there were a lot more that is wasn't warning about, so fix them all up. Reported-by: kbuild test robot Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Ben Myers <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-30xfs: validity check the directory block leaf entry countDave Chinner1-4/+15
The directory block format verifier fails to check that the leaf entry count is in a valid range, and so if it is corrupted then it can lead to derefencing a pointer outside the block buffer. While we can't exactly validate the count without first walking the directory block, we can ensure the count lands in the valid area within the directory block and hence avoid out-of-block references. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Ben Myers <[email protected]> Signed-off-by: Ben Myers <[email protected]>