Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a new mount option which enables a new "lazytime" mode. This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode. The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.
This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.
For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table. The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives. Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).
Google-Bug-Id: 18297052
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Add missing blk_finish_plug in btrfs_sync_log()
Signed-off-by: Forrest Liu <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
|
|
fs/xfs/xfs_ioctl.c:1146:1: sparse: symbol 'xfs_ioctl_setattr_check_projid' was not declared. Should it be static?
Also fix xfs_ioctl_setattr_check_extsize at the same time.
Signed-off-by: Fengguang Wu <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Growfs updates the secondary superblocks using synchronous unlogged
buffer writes after committing the updates to the primary superblock.
Mark the transaction to the primary superblock as synchronous so that
we guarantee it is committed to disk before we update the secondary
superblocks.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|
|
Pull cifs fixes from Steve French:
"Three small cifs fixes. One fixes a hang under stress, and the other
two are security related"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix MUST SecurityFlags filtering
Complete oplock break jobs before closing file handle
cifs: use memzero_explicit to clear stack buffer
|
|
If we're using NFSv4.1, then we have the ability to let the server know
whether or not we believe that returning a delegation as part of our OPEN
request would be useful.
The feature needs to be used with care, since the client sending the request
doesn't necessarily know how other clients are using that file, and how
they may be affected by the delegation.
For this reason, our initial use of the feature will be to let the server
know when the client believes that handing out a delegation would not be
useful.
The first application for this function is when opening the file using
O_DIRECT.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
leaf_dealloc uses vzalloc as a fallback to kzalloc(GFP_NOFS), so
it clearly does not want any shrinker activity within the fs itself.
convert vzalloc into __vmalloc(GFP_NOFS|__GFP_ZERO) to better achieve
this goal.
Signed-off-by: Oleg Drokin <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
|
|
properly
Use iov_iter_kvec() there, get rid of set_fs() games - now that
rxrpc_send_data() uses iov_iter primitives, it'll handle ITER_KVEC just
fine.
Signed-off-by: Al Viro <[email protected]>
|
|
Under CONFIG_DEBUG_ATOMIC_SLEEP=y, aio_read_event_ring() will throw
warnings like the following due to being called from wait_event
context:
WARNING: CPU: 0 PID: 16006 at kernel/sched/core.c:7300 __might_sleep+0x7f/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810d85a3>] prepare_to_wait_event+0x63/0x110
Modules linked in:
CPU: 0 PID: 16006 Comm: aio-dio-fcntl-r Not tainted 3.19.0-rc6-dgc+ #705
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffffffff821c0372 ffff88003c117cd8 ffffffff81daf2bd 000000000000d8d8
ffff88003c117d28 ffff88003c117d18 ffffffff8109beda ffff88003c117cf8
ffffffff821c115e 0000000000000061 0000000000000000 00007ffffe4aa300
Call Trace:
[<ffffffff81daf2bd>] dump_stack+0x4c/0x65
[<ffffffff8109beda>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8109bf56>] warn_slowpath_fmt+0x46/0x50
[<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110
[<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110
[<ffffffff810bdfcf>] __might_sleep+0x7f/0x90
[<ffffffff81db8344>] mutex_lock+0x24/0x45
[<ffffffff81216b7c>] aio_read_events+0x4c/0x290
[<ffffffff81216fac>] read_events+0x1ec/0x220
[<ffffffff810d8650>] ? prepare_to_wait_event+0x110/0x110
[<ffffffff810fdb10>] ? hrtimer_get_res+0x50/0x50
[<ffffffff8121899d>] SyS_io_getevents+0x4d/0xb0
[<ffffffff81dba5a9>] system_call_fastpath+0x12/0x17
---[ end trace bde69eaf655a4fea ]---
There is not actually a bug here, so annotate the code to tell the
debug logic that everything is just fine and not to fire a false
positive.
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Benjamin LaHaise <[email protected]>
|
|
When attempting to create a gropu without attrs, the warning prints the
name of the group. However, the check for name being a NULL pointer is
wrong: it uses the pointer to the name when it's NULL. Fix it to use
the name if present, otherwise just put an empty string.
Cc: Bruno Prémont <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Javi Merino <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Fix an Oopsable condition when nsm_mon_unmon is called as part of the
namespace cleanup, which now apparently happens after the utsname
has been freed.
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Bruno Prémont <[email protected]>
Cc: [email protected] # 3.18
Signed-off-by: Trond Myklebust <[email protected]>
|
|
* flexfiles: (53 commits)
pnfs: lookup new lseg at lseg boundary
nfs41: .init_read and .init_write can be called with valid pg_lseg
pnfs: Update documentation on the Layout Drivers
pnfs/flexfiles: Add the FlexFile Layout Driver
nfs: count DIO good bytes correctly with mirroring
nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
nfs/flexfiles: send layoutreturn before freeing lseg
nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
nfs41: allow async version layoutreturn
nfs41: add range to layoutreturn args
pnfs: allow LD to ask to resend read through pnfs
nfs: add nfs_pgio_current_mirror helper
nfs: only reset desc->pg_mirror_idx when mirroring is supported
nfs41: add a debug warning if we destroy an unempty layout
pnfs: fail comparison when bucket verifier not set
nfs: mirroring support for direct io
nfs: add mirroring support to pgio layer
pnfs: pass ds_commit_idx through the commit path
...
Conflicts:
fs/nfs/pnfs.c
fs/nfs/pnfs.h
|
|
Before mirroring support was added, the pageio descriptor's pg_lseg was
set to null when an RPC was sent. Because of this, pg_init was called
at lseg boundaries with pg_lseg = NULL, and it could be set to the new
lseg.
Signed-off-by: Weston Andros Adamson <[email protected]>
|
|
With pgio refactoring in v3.15, .init_read and .init_write can be
called with valid pgio->pg_lseg. file layout was fixed at that time
by commit c6194271f (pnfs: filelayout: support non page aligned
layouts). But the generic helper still needs to be fixed.
Cc: [email protected] # 3.15+
Signed-off-by: Peng Tao <[email protected]>
|
|
The flexfile layout is a new layout that extends the
file layout. It is currently being drafted as a specification at
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/
Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
Signed-off-by: Tao Peng <[email protected]>
|
|
When resending to MDS, we might resend multiple mirroring
requests to MDS. As a result, nfs_direct_good_bytes() ends
up counting bytes multiple times, causing application to
get wrong return results in read/write syscalls.
Fix it by tracking start of a dreq and checking the range of
pgio header.
Cc: Weston Andros Adamson <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
|
|
Also take care to stop waiting if someone clears retry bit.
Signed-off-by: Peng Tao <[email protected]>
|
|
To allow pnfs LD to ask direct writes to be resend.
Signed-off-by: Peng Tao <[email protected]>
|
|
Use it to indicate that LD wants to retry layoutget. LD can set
it whenever it wants the common pnfs code to return and retry
pnfs path through a new layout.
The bit gets cleared when client does a new layoutget, when client
closes the file (ROC case), or when kernel needs to evict the inode
(non-ROC case).
Signed-off-by: Peng Tao <[email protected]>
|
|
Otherwise we'll lose error tracking information when
encoding layoutreturn.
pnfs_put_lseg may be called from rpc callbacks. So we should not
call pnfs_send_layoutreturn directly because it can deadlock in
the rpc layer.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
When it is set, generic pnfs would try to send layoutreturn right
before last close/delegation_return regard less NFS_LAYOUT_ROC is
set or not. LD can then make sure layoutreturn is always sent
rather than being omitted.
The difference against NFS_LAYOUT_RETURN is that
NFS_LAYOUT_RETURN_BEFORE_CLOSE does not block usage of the layout so
LD can set it and expect generic layer to try pnfs path at the
same time.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
So that callers can specify which range to return.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
If current IO cannot be completed due to some transient errors,
LD may want to ask generic layer to resend the request through
pnfs again.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count.
For read, we always use pg_mirrors[0], so this effectively gives us freedom
to use pg_mirror_idx to track the actual mirror to read from through out the
IO stack.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
so that we don't reset desc->pg_mirror_idx for read unnecessarily.
Remove WARN_ON_ONCE from __nfs_pageio_add_request to allow LD to
set pg_mirror_idx for read where pg_mirror_count is always 1.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
So that we can detect the case if some layout segments are still
pinned which is surely a bug that we need to fix.
Signed-off-by: Peng Tao <[email protected]>
|
|
This skips the WARN_ON_ONCE, but doesnt change behavior (the memcmp would
fail).
Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
The current mirroring code only notices short writes to the first
mirror. This patch keeps per-mirror byte counts and only considers
a byte to be written once all mirrors report so.
Signed-off-by: Weston Andros Adamson <[email protected]>
|
|
This patch adds mirrored write support to the pgio layer. The default
is to use one mirror, but pgio callers may define callbacks to change
this to any value up to the (arbitrarily selected) limit of 16.
The basic idea is to break out members of nfs_pageio_descriptor that cannot
be shared between mirrored DSes and put them in a new structure.
Signed-off-by: Weston Andros Adamson <[email protected]>
|
|
Pass ds_commit_idx through the nfs commit path. It's used to select
the commit bucket when using pnfs and is ignored when not using pnfs.
Several functions had to be changed: nfs_retry_commit,
nfs_mark_request_commit, pnfs_mark_request_commit and the pnfs layout
driver .mark_request_commit functions.
Signed-off-by: Tom Haynes <[email protected]>
|
|
'ds_commit_idx' is a better name - it is used to select the right
commit bucket for pnfs.
Signed-off-by: Weston Andros Adamson <[email protected]>
|
|
This is needed for mirrored DS support, where multuple requests
cover the same range.
Signed-off-by: Weston Andros Adamson <[email protected]>
|
|
This is needed to support mirrored writes - the first write can't just
trash the lseg, we need to keep it around until all mirrors have
written.
Signed-off-by: Weston Andros Adamson <[email protected]>
|
|
Add a new operation to nfs_pageio_ops that is called on nfs_pageio_complete.
Signed-off-by: Weston Andros Adamson <[email protected]>
|
|
Instead of calling layoutreturn directly, call pnfs_error_mark_layout_for_return
to mark layouts for return and let generic code return layout when
layout segments are freed.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
Conflicts:
fs/nfs/filelayout/filelayout.c
|
|
So that pnfs path is not disabled for ever.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
If current lseg is the last lseg marked with NFS_LSEG_LAYOUTRETURN,
send layoutreturn.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
And if we are to return the same type of layouts, don't bother
sending more layoutgets.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
It marks all matching layout segments as NFS_LSEG_LAYOUTRETURN,
which is an indicator for pnfs_put_lseg() to send layoutreturn,
and also prevents pnfs_update_layout() from using the returning
segments. Once it is set, it never gets cleared.
It also sets proper io failure bit so that pnfs path can be retried
after PNFS_LAYOUTGET_RETRY_TIMEOUT second.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
It allows to specify different iomode to return.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
So that it is possible to return a specific iomode layouts.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
Flexfiles layout would want to use them to report DS IO status.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
Per RFC 5661 Errata 3208:
| A client MAY always forget its layout state and associated
| layout stateid at any time (See also section 12.5.5.1).
| In such case, the client MUST use a non-layout stateid for the next
| LAYOUTGET operation. This will signal the server that the client has
| no more layouts on the file and its respective layout state can be
| released before issuing a new layout in response to LAYOUTGET.
In order to make such a signal unique to server, client needs to serialize
all layoutgets using non-layout stateid. We implement this by serializing
layoutgets when client has no layout segments at hand.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
flexfiles needs to start layoutcommit when necessary
Signed-off-by: Peng Tao <[email protected]>
|
|
lockd assumes hostname exists otherwise kernel oops.
It can be reproduced by following steps:
1. mount flexfile MDS
2. write some files
3. mount DS via nfsv3
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8134f332>] strlen+0x2/0x20
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: nfsd(F) nfs_layout_flexfiles(F) rpcsec_gss_krb5(F) auth_rpcgss(F) nfsv4(F) dns_resolver(F) nfsv3(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) ebtable_nat(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) ip6table_filter(F) ip6_tables(F) bnep(F) snd_ens1371(F) snd_rawmidi(F) snd_ac97_codec(F) btusb(F) ac97_bus(F) snd_seq(F) snd_seq_device(F) snd_pcm(F) ppdev(F) bluetooth(F) 6lowpan_iphc(F) rfkill(F) vmw_balloon(F) snd_timer(F) snd(F) soundcore(F) gameport(F) i2c_piix4(F) e1000(F) vmw_vmci(F) parport_pc(F) parport(F) shpchp(F) uinput(F) xfs(F) libcrc32c(F) vmwgfx(F) ttm(F) drm(F) mptspi(F) scsi_transport_spi(F) mptscsih(F) mptbase(F) i2c_core(F)
CPU: 0 PID: 10397 Comm: mount.nfs Tainted: GF 3.14.7-100.pd_client.001.fc16.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task: ffff880008942600 ti: ffff880007990000 task.ti: ffff880007990000
RIP: 0010:[<ffffffff8134f332>] [<ffffffff8134f332>] strlen+0x2/0x20
RSP: 0018:ffff880007991aa0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880038d39c20 RCX: 0000000000000004
RDX: 0000000000000006 RSI: 0000000000000010 RDI: 0000000000000000
RBP: ffff880007991b38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000014600 R11: 0000000000000400 R12: ffffffff81cc8580
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000004
FS: 00007f90cd2ef880(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001710000 CR4: 00000000001407f0
Stack:
ffffffffa045f52c ffff880001782230 ffff880004141e28 0006880007991ac8
ffffffff816dc14b ffff880000000000 ffff880038d39c20 0000000000000010
0000000481cc0006 0000000000000000 ffffffffa0410be8 000000000000c014
Call Trace:
[<ffffffffa045f52c>] ? nlmclnt_lookup_host+0x4c/0x2c0 [lockd]
[<ffffffff816dc14b>] ? _raw_spin_unlock_bh+0x1b/0x20
[<ffffffffa0410be8>] ? svc_destroy+0xb8/0x140 [sunrpc]
[<ffffffffa045c323>] nlmclnt_init+0x53/0xc0 [lockd]
[<ffffffffa047d2dc>] ? nfs_get_client+0x1cc/0x340 [nfs]
[<ffffffffa047c2e7>] nfs_start_lockd+0xa7/0xd0 [nfs]
[<ffffffffa047df71>] nfs_create_server+0x181/0x5c0 [nfs]
[<ffffffffa04460f3>] nfs3_create_server+0x13/0x30 [nfsv3]
[<ffffffffa048a0bc>] nfs_try_mount+0x21c/0x300 [nfs]
[<ffffffff811ca32d>] ? __kmalloc_track_caller+0x1ad/0x240
[<ffffffffa048b677>] ? nfs_fs_mount+0xc37/0xd80 [nfs]
[<ffffffffa048ad05>] nfs_fs_mount+0x2c5/0xd80 [nfs]
[<ffffffffa048a830>] ? nfs_clone_super+0x140/0x140 [nfs]
[<ffffffffa048a240>] ? nfs_clone_sb_security+0x40/0x40 [nfs]
[<ffffffff811e7e43>] mount_fs+0x43/0x1b0
[<ffffffff81193100>] ? __alloc_percpu+0x10/0x20
[<ffffffff812026e6>] vfs_kern_mount+0x76/0x120
[<ffffffff81204917>] do_mount+0x237/0xa80
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Enable pNFS callbacks to allow flex files to work correctly with a
NFSv3-enabled data server.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
so that flexfile layout client can pass in DS credential instead of
using user cred, which will be done in the next patch.
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|
|
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Tom Haynes <[email protected]>
|