aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-08-18skd: Remove unneeded #include directivesBart Van Assche1-2/+0
Signed-off-by: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18skd: Update maintainer informationBart Van Assche2-1/+6
E-mails sent to [email protected] bounce. Hence remove that e-mail address from the driver. Add an entry to the MAINTAINERS file instead. Signed-off-by: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18skd: Switch to GPLv2Bart Van Assche2-23/+14
This change does not affect any skd driver version derived from a dual licensed code base but makes all code derived from future upstream skd driver versions GPLv2 only. Signed-off-by: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18skd: Submit requests to firmware before triggering the doorbellBart Van Assche1-0/+6
Ensure that the members of struct skd_msg_buf have been transferred to the PCIe adapter before the doorbell is triggered. This patch avoids that I/O fails sporadically and that the following error message is reported: (skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000 Signed-off-by: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18skd: Avoid that module unloading triggers a use-after-freeBart Van Assche1-8/+9
Since put_disk() triggers a disk_release() call and since that last function calls blk_put_queue() if disk->queue != NULL, clear the disk->queue pointer before calling put_disk(). This avoids that unloading the skd kernel module triggers the following use-after-free: WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80 refcount_t: underflow; use-after-free. CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1 Workqueue: events work_for_cpu_fn Call Trace: dump_stack+0x63/0x84 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5a/0x80 refcount_sub_and_test+0x70/0x80 refcount_dec_and_test+0x11/0x20 kobject_put+0x1f/0x50 blk_put_queue+0x15/0x20 disk_release+0xae/0xf0 device_release+0x32/0x90 kobject_release+0x67/0x170 kobject_put+0x2b/0x50 put_disk+0x17/0x20 skd_destruct+0x5c/0x890 [skd] skd_pci_probe+0x124d/0x13a0 [skd] local_pci_probe+0x42/0xa0 work_for_cpu_fn+0x14/0x20 process_one_work+0x19e/0x470 worker_thread+0x1dc/0x4a0 kthread+0x125/0x140 ret_from_fork+0x25/0x30 Signed-off-by: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18block: Relax a check in blk_start_queue()Bart Van Assche1-1/+1
Calling blk_start_queue() from interrupt context with the queue lock held and without disabling IRQs, as the skd driver does, is safe. This patch avoids that loading the skd driver triggers the following warning: WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0 RIP: 0010:blk_start_queue+0x84/0xa0 Call Trace: skd_unquiesce_dev+0x12a/0x1d0 [skd] skd_complete_internal+0x1e7/0x5a0 [skd] skd_complete_other+0xc2/0xd0 [skd] skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd] skd_isr+0x14f/0x180 [skd] irq_forced_thread_fn+0x2a/0x70 irq_thread+0x144/0x1a0 kthread+0x125/0x140 ret_from_fork+0x2a/0x40 Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning") Signed-off-by: Bart Van Assche <[email protected]> Cc: Paolo 'Blaisorblade' Giarrusso <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18xen-blkfront: Avoid that gcc 7 warns about fall-through when building with W=1Bart Van Assche1-1/+1
Signed-off-by: Bart Van Assche <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Roger Pau Monn303251 <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2017-08-18xen-blkback: Avoid that gcc 7 warns about fall-through when building with W=1Bart Van Assche2-1/+3
Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Roger Pau Monn303251 <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2017-08-18xen-blkback: Fix indentationBart Van Assche1-2/+2
Avoid that smatch reports the following warning when building with C=2 CHECK="smatch -p=kernel": drivers/block/xen-blkback/blkback.c:710 xen_blkbk_unmap_prepare() warn: inconsistent indenting Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Roger Pau Monn303251 <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2017-08-18virtio_blk: Use blk_rq_is_scsi()Bart Van Assche1-1/+1
This patch does not change any functionality. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Jason Wang <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2017-08-18ide-floppy: Use blk_rq_is_scsi()Bart Van Assche1-1/+1
This patch does not change any functionality. Signed-off-by: Bart Van Assche <[email protected]> Acked-by: David S. Miller <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2017-08-18genhd: Annotate all part and part_tbl pointer dereferencesBart Van Assche2-8/+22
Annotate gendisk.part_tbl and disk_part_tbl.part dereferences with rcu_dereference_protected(). This patch does not change the behavior of the modified code but ensures that sparse does not complain about disk->part_tbl manipulations nor about part_tbl->part accesses. Additionally, improve documentation of the locking requirements of the modified functions. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18blk-mq-debugfs: Declare a local symbol staticBart Van Assche1-1/+1
This was detected by sparse. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18blk-mq: Make blk_mq_reinit_tagset() calls easier to readBart Van Assche4-15/+14
Since blk_mq_ops.reinit_request is only called from inside blk_mq_reinit_tagset(), make this function pointer an argument of blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops. This patch does not change any functionality but makes blk_mq_reinit_tagset() calls easier to read and to analyze. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Sagi Grimberg <[email protected]> Cc: James Smart <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18block: Unexport blk_queue_end_tag()Bart Van Assche1-1/+0
This function is only used inside the block layer core. Hence unexport it. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-18block: Fix two comments that refer to .queue_rq() return valuesBart Van Assche1-2/+2
Since patch "blk-mq: switch .queue_rq return value to blk_status_t" .queue_rq() returns a BLK_STS_* value instead of a BLK_MQ_RQ_* value. Hence refer to the former in comments about .queue_rq() return values. Fixes: commit 39a70c76b89b ("blk-mq: clarify dispatch may not be drained/blocked by stopping queue") Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Cc: Ming Lei <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-17nbd: change the default nbd partitionsJosef Bacik1-2/+2
There's no reason to have partitions disabled for nbd by default, it costs us nothing to have it enabled and is just confusing/obnoxious to users who try to use partitions with nbd. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-17nbd: allow device creation at a specific indexJosef Bacik1-0/+9
If users really want to use a particular index for their nbd device and it doesn't already exist there's no reason we can't just create it for them. Do this instead of erroring out. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-15loop: fix to a race condition due to the early registration of deviceAnton Volkov1-6/+8
The early device registration made possible a race leading to allocations of disks with wrong minors. This patch moves the device registration further down the loop_init function to make the race infeasible. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Anton Volkov <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-11cfq: Give a chance for arming slice idle timer in case of group_idleRitesh Harjani1-1/+2
In below scenario blkio cgroup does not work as per their assigned weights :- 1. When the underlying device is nonrotational with a single HW queue with depth of >= CFQ_HW_QUEUE_MIN 2. When the use case is forming two blkio cgroups cg1(weight 1000) & cg2(wight 100) and two processes(file1 and file2) doing sync IO in their respective blkio cgroups. For above usecase result of fio (without this patch):- file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan 1 19:41:49 1970 write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec) <...> file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan 1 19:41:49 1970 write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec) <...> // both the process BW is equal even though they belong to diff. cgroups with weight of 1000(cg1) and 100(cg2) In above case (for non rotational NCQ devices), as soon as the request from cg1 is completed and even though it is provided with higher set_slice=10, because of CFQ algorithm when the driver tries to fetch the request, CFQ expires this group without providing any idle time nor weight priority and schedules another cfq group (in this case cg2). And thus both cfq groups(cg1 & cg2) keep alternating to get the disk time and hence loses the cgroup weight based scheduling. Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer) to arm the slice timer in case group_idle is enabled. In case if group_idle is also not required (including for nonrotational NCQ drives), we need to explicitly set group_idle = 0 from sysfs for such cases. With this patch result of fio(for above usecase) :- file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan 1 00:06:08 1970 write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec) <..> file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan 1 00:06:08 1970 write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec) <..> // In this processes BW is as per their respective cgroups weight. Signed-off-by: Ritesh Harjani <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-11block, bfq: boost throughput with flash-based non-queueing devicesPaolo Valente1-10/+19
When a queue associated with a process remains empty, there are cases where throughput gets boosted if the device is idled to await the arrival of a new I/O request for that queue. Currently, BFQ assumes that one of these cases is when the device has no internal queueing (regardless of the properties of the I/O being served). Unfortunately, this condition has proved to be too general. So, this commit refines it as "the device has no internal queueing and is rotational". This refinement provides a significant throughput boost with random I/O, on flash-based storage without internal queueing. For example, on a HiKey board, throughput increases by up to 125%, growing, e.g., from 6.9MB/s to 15.6MB/s with two or three random readers in parallel. Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Luca Miccio <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-11block,bfq: refactor device-idling logicPaolo Valente2-62/+67
The logic that decides whether to idle the device is scattered across three functions. Almost all of the logic is in the function bfq_bfqq_may_idle, but (1) part of the decision is made in bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may switch off idling regardless of the output of bfq_bfqq_may_idle. In addition, both bfq_update_idle_window and bfq_bfqq_must_idle make their decisions as a function of parameters that are used, for similar purposes, also in bfq_bfqq_may_idle. This commit addresses these issues by moving all the logic into bfq_bfqq_may_idle. Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-10block: remove unused syncfull/asyncfull queue flagsJens Axboe2-33/+29
We haven't used these in years, but somehow the definitions still remained. Kill them, and renumber the QUEUE_FLAG_ space. We had a hole in the beginning of the space, too. Signed-off-by: Jens Axboe <[email protected]>
2017-08-09blk-mq: enable checking two part inflight counts at the same timeJens Axboe2-17/+33
Modify blk_mq_in_flight() to count both a partition and root at the same time. Then we only have to call it once, instead of potentially looping the tags twice. Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-09blk-mq: provide internal in-flight variantJens Axboe4-29/+77
We don't have to inc/dec some counter, since we can just iterate the tags. That makes inc/dec a noop, but means we have to iterate busy tags to get an in-flight count. Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-09block: make part_in_flight() take an array of two intsJens Axboe4-9/+20
Instead of returning the count that matches the partition, pass in an array of two ints. Index 0 will be filled with the inflight count for the partition in question, and index 1 will filled with the root inflight count, if the partition passed in is not the root. This is in preparation for being able to calculate both in one go. Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-09block: pass in queue to inflight accountingJens Axboe13-49/+64
No functional change in this patch, just in preparation for basing the inflight mechanism on the queue in question. Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-09blk-mq-tag: check for NULL rq when iterating tagsJens Axboe1-2/+12
Since we introduced blk-mq-sched, the tags->rqs[] array has been dynamically assigned. So we need to check for NULL when iterating, since there's a window of time where the bit is set, but we haven't dynamically assigned the tags->rqs[] array position yet. This is perfectly safe, since the memory backing of the request is never going away while the device is alive. Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-09dm-crypt: don't mess with BIP_BLOCK_INTEGRITYChristoph Hellwig1-3/+0
This flag is never set right after calling bio_integrity_alloc, so don't clear it and confuse the reader. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-09bio-integrity: move the bio integrity profile check earlier in ↵Christoph Hellwig1-3/+3
bio_integrity_prep This makes the code more obvious, and moves the most likely branch first in the function. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-07null_blk: make sure submit_queues > 0weiping zhang1-2/+2
set submit_queues to 1 by default, and make sure it's value > 0. Signed-off-by: weiping zhang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-07null_blk: simplify logic for use_per_node_hctxweiping zhang1-5/+2
make sure submit_queues equal nr_online_nodes. Signed-off-by: weiping zhang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-02block: Add comment to submit_bio_wait()Jan Kara1-0/+4
submit_bio_wait() does not consume bio reference. Add comment about that. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-01blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabledJens Axboe1-0/+10
We recently had a bug in the IPR SCSI driver, where it would end up making the SCSI mid layer run the mq hardware queue with interrupts disabled. This isn't legal, since the software queue locking relies on never being grabbed from interrupt context. Additionally, drivers that set BLK_MQ_F_BLOCKING may schedule from this context. Add a WARN_ON_ONCE() to catch bad users up front. Signed-off-by: Jens Axboe <[email protected]>
2017-07-29blk-mq: blk_mq_requeue_work() doesn't need to save IRQ flagsJens Axboe1-3/+2
We know we're in process context, so don't bother using the IRQ safe versions of the spin lock. Signed-off-by: Jens Axboe <[email protected]>
2017-07-29block: DAC960: shut up format-overflow warningArnd Bergmann1-4/+8
gcc-7 points out that a large controller number would overflow the string length for the procfs name and the firmware version string: drivers/block/DAC960.c: In function 'DAC960_Probe': drivers/block/DAC960.c:6591:38: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] drivers/block/DAC960.c: In function 'DAC960_V1_ReadControllerConfiguration': drivers/block/DAC960.c:1681:40: error: '%02d' directive writing between 2 and 3 bytes into a region of size between 2 and 5 [-Werror=format-overflow=] drivers/block/DAC960.c:1681:40: note: directive argument in the range [0, 255] drivers/block/DAC960.c:1681:3: note: 'sprintf' output between 10 and 14 bytes into a destination of size 12 Both of these seem appropriately sized, and using snprintf() instead of sprintf() improves this by ensuring that even incorrect data won't cause undefined behavior here. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29block: use standard blktrace API to output cgroup info for debug notesShaohua Li5-24/+35
Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgroup' option enabled. cgroup info isn't output as a string by default too, we only do this with 'blk_cgname' option enabled. Also cgroup info is output in different position of the note string. I think these behavior changes aren't a big issue (actually we make trace data shorter which is good), since the blktrace note is solely for debugging. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29blktrace: add an option to allow displaying cgroup pathShaohua Li5-1/+52
By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.015252: 8,0 /test/level D R 24 + 8 [dd] Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29block: always attach cgroup info into bioShaohua Li2-6/+4
blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup info. Acked-by: Tejun Heo <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29blktrace: export cgroup info in traceShaohua Li2-73/+161
Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always comes from flusher thread but the BIOs are for different blk cgroups. Request could be requeued and dispatched from completely different tasks. MD/DM are another examples. This patch tries to fix the gap. We print out cgroup fhandle info in blktrace. Userspace can use open_by_handle_at() syscall to find the cgroup by fhandle. Or userspace can use name_to_handle_at() syscall to find fhandle for a cgroup and use a BPF program to filter out blktrace for a specific cgroup. We add a new 'blk_cgroup' trace option for blk tracer. It's default off. Application which doesn't know the new option isn't affected. When it's on, we output fhandle info right after blk_io_trace with an extra bit set in event action. So from application point of view, blktrace with the option will output new actions. I didn't change blk trace event yet, since I'm not sure if changing the trace event output is an ABI issue. If not, I'll do it later. Acked-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29cgroup: export fhandle info for a cgroupShaohua Li1-0/+8
Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inode number and generation number just like what generic_fh_to_dentry does. And we can avoid the overhead of getting an inode too, since kernfs_node_id (ino and generation) has all the info required. Acked-by: Tejun Heo <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29kernfs: add exportfs operationsShaohua Li3-1/+70
Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for the cgroup. Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29kernfs: introduce kernfs_node_idShaohua Li6-13/+21
inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29kernfs: don't set dentry->d_fsdataShaohua Li6-31/+27
When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode->i_private already points to kernfs_node, and we can get inode from a dentry. So this patch just delete the d_fsdata usage. Acked-by: Tejun Heo <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29kernfs: add an API to get kernfs node from inode numberShaohua Li3-1/+69
Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. This API will be used in blktrace too later, so it should be as fast as possible. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node count/ino number to filter out stale kernfs nodes. Acked-by: Tejun Heo <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29kernfs: implement i_generationShaohua Li3-1/+12
Set i_generation for kernfs inode. This is required to implement exportfs operations. The generation is 32-bit, so it's possible the generation wraps up and we find stale files. To reduce the posssibility, we don't reuse inode numer immediately. When the inode number allocation wraps, we increase generation number. In this way generation/inode number consist of a 64-bit number which is unlikely duplicated. This does make the idr tree more sparse and waste some memory. Since idr manages 32-bit keys, idr uses a 6-level radix tree, each level covers 6 bits of the key. In a 100k inode kernfs, the worst case will have around 300k radix tree node. Each node is 576bytes, so the tree will use about ~150M memory. Sounds not too bad, if this really is a problem, we should find better data structure. Acked-by: Tejun Heo <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-29kernfs: use idr instead of ida to manage inode numberShaohua Li2-6/+13
kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Acked-by: Tejun Heo <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-07-28Merge tag 'devicetree-fixes-for-4.13' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree fixes from Rob Herring: "Two small DT fixes: - Fix error handling in of_irq_to_resource_table() due to of_irq_to_resource() error return changes. - Fix dtx_diff script due to dts include path changes" * tag 'devicetree-fixes-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: irq: fix of_irq_to_resource() error check scripts/dtc: dtx_diff - update include dts paths to match build
2017-07-28Merge tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2-6/+12
Pull NFS client fixes from Anna Schumaker: "More NFS client bugfixes for 4.13. Most of these fix locking bugs that Ben and Neil noticed, but I also have a patch to fix one more access bug that was reported after last week. Stable fixes: - Fix a race where CB_NOTIFY_LOCK fails to wake a waiter - Invalidate file size when taking a lock to prevent corruption Other fixes: - Don't excessively generate tiny writes with fallocate - Use the raw NFS access mask in nfs4_opendata_access()" * tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter NFS: Optimize fallocate by refreshing mapping when needed. NFS: invalidate file size when taking a lock. NFS: Use raw NFS access mask in nfs4_opendata_access()
2017-07-28Merge tag 'xfs-4.13-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds6-3/+39
Pull xfs fixes from Darrick Wong: - fix firstfsb variables that we left uninitialized, which could lead to locking problems. - check for NULL metadata buffer pointers before using them. - don't allow btree cursor manipulation if the btree block is corrupt. Better to just shut down. - fix infinite loop problems in quotacheck. - fix buffer overrun when validating directory blocks. - fix deadlock problem in bunmapi. * tag 'xfs-4.13-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix multi-AG deadlock in xfs_bunmapi xfs: check that dir block entries don't off the end of the buffer xfs: fix quotacheck dquot id overflow infinite loop xfs: check _alloc_read_agf buffer pointer before using xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write xfs: check _btree_check_block value