Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix a fat-fingered conversion to the req_op accessors, and also
use a switch statement to make it more obvious what is being checked.
Signed-off-by: Christoph Hellwig <[email protected]>
Reported-by: Dave Chinner <[email protected]>
Fixes: c2df40 ("drivers: use req op accessor");
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If we fail registering any of the hardware queues, we call
into blk_mq_unregister_disk() with the hotplug mutex already
held. Since blk_mq_unregister_disk() attempts to acquire the
same mutex, we end up in a less than happy place.
Reported-by: Jinpu Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In include/linux/blkdev.h duplicate declarations of the request
struct exist. Cleaned up by removing the second, unneeded
declaration.
Signed-off-by: John Pittman <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
bi_rw should be using bio_set_op_attrs to set bi_rw.
Signed-off-by: Shaun Tancheff <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Mike Christie <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The name for a bdi of a gendisk is derived from the gendisk's devt.
However, since the gendisk is destroyed before the bdi it leaves a
window where a new gendisk could dynamically reuse the same devt while a
bdi with the same name is still live. Arrange for the bdi to hold a
reference against its "owner" disk device while it is registered.
Otherwise we can hit sysfs duplicate name collisions like the following:
WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'
Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
Call Trace:
[<ffffffff8134caec>] dump_stack+0x63/0x87
[<ffffffff8108c351>] __warn+0xd1/0xf0
[<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80
[<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80
[<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90
[<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320
[<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0
[<ffffffff8134ff55>] kobject_add+0x75/0xd0
[<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f
[<ffffffff8148b0a5>] device_add+0x125/0x610
[<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100
[<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20
[<ffffffff811b775c>] bdi_register+0x8c/0x180
[<ffffffff811b7877>] bdi_register_dev+0x27/0x30
[<ffffffff813317f5>] add_disk+0x175/0x4a0
Cc: <[email protected]>
Reported-by: Yi Zhang <[email protected]>
Tested-by: Yi Zhang <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Fixed up missing 0 return in bdi_register_owner().
Signed-off-by: Jens Axboe <[email protected]>
|
|
In case a submitted request gets stuck for some reason, the block layer
can prevent the request starvation by starting the scheduled timeout work.
If this stuck request occurs at the same time another thread has started
a queue freeze, the blk_mq_timeout_work will not be able to acquire the
queue reference and will return silently, thus not issuing the timeout.
But since the request is already holding a q_usage_counter reference and
is unable to complete, it will never release its reference, preventing
the queue from completing the freeze started by first thread. This puts
the request_queue in a hung state, forever waiting for the freeze
completion.
This was observed while running IO to a NVMe device at the same time we
toggled the CPU hotplug code. Eventually, once a request got stuck
requiring a timeout during a queue freeze, we saw the CPU Hotplug
notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
the trace below.
[c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
[c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
[c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
[c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
[c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
[c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
[c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
[c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
[c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
[c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
[c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
[c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
[c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
[c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
[c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
[c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
[c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
[c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
[c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
[c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4
The fix is to allow the timeout work to execute in the window between
dropping the initial refcount reference and the release of the last
reference, which actually marks the freeze completion. This can be
achieved with percpu_refcount_tryget, which does not require the counter
to be alive. This way the timeout work can do it's job and terminate a
stuck request even during a freeze, returning its reference and avoiding
the deadlock.
Allowing the timeout to run is just a part of the fix, since for some
devices, we might get stuck again inside the device driver's timeout
handler, should it attempt to allocate a new request in that path -
which is a quite common action for Abort commands, which need to be sent
after a timeout. In NVMe, for instance, we call blk_mq_alloc_request
from inside the timeout handler, which will fail during a freeze, since
it also tries to acquire a queue reference.
I considered a similar change to blk_mq_alloc_request as a generic
solution for further device driver hangs, but we can't do that, since it
would allow new requests to disturb the freeze process. I thought about
creating a new function in the block layer to support unfreezable
requests for these occasions, but after working on it for a while, I
feel like this should be handled in a per-driver basis. I'm now
experimenting with changes to the NVMe timeout path, but I'm open to
suggestions of ways to make this generic.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Cc: Brian King <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Quentin ran into this bug:
WARNING: CPU: 64 PID: 10085 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x65/0x80
sysfs: cannot create duplicate filename '/devices/virtual/block/nbd3/pid'
Modules linked in: nbd
CPU: 64 PID: 10085 Comm: qemu-nbd Tainted: G D 4.6.0+ #7
0000000000000000 ffff8820330bba68 ffffffff814b8791 ffff8820330bbac8
0000000000000000 ffff8820330bbab8 ffffffff810d04ab ffff8820330bbaa8
0000001f00000296 0000000000017681 ffff8810380bf000 ffffffffa0001790
Call Trace:
[<ffffffff814b8791>] dump_stack+0x4d/0x6c
[<ffffffff810d04ab>] __warn+0xdb/0x100
[<ffffffff810d0574>] warn_slowpath_fmt+0x44/0x50
[<ffffffff81218c65>] sysfs_warn_dup+0x65/0x80
[<ffffffff81218a02>] sysfs_add_file_mode_ns+0x172/0x180
[<ffffffff81218a35>] sysfs_create_file_ns+0x25/0x30
[<ffffffff81594a76>] device_create_file+0x36/0x90
[<ffffffffa0000e8d>] __nbd_ioctl+0x32d/0x9b0 [nbd]
[<ffffffff814cc8e8>] ? find_next_bit+0x18/0x20
[<ffffffff810f7c29>] ? select_idle_sibling+0xe9/0x120
[<ffffffff810f6cd7>] ? __enqueue_entity+0x67/0x70
[<ffffffff810f9bf0>] ? enqueue_task_fair+0x630/0xe20
[<ffffffff810efa76>] ? resched_curr+0x36/0x70
[<ffffffff810f0078>] ? check_preempt_curr+0x78/0x90
[<ffffffff810f00a2>] ? ttwu_do_wakeup+0x12/0x80
[<ffffffff810f01b1>] ? ttwu_do_activate.constprop.86+0x61/0x70
[<ffffffff810f0c15>] ? try_to_wake_up+0x185/0x2d0
[<ffffffff810f0d6d>] ? default_wake_function+0xd/0x10
[<ffffffff81105471>] ? autoremove_wake_function+0x11/0x40
[<ffffffffa0001577>] nbd_ioctl+0x67/0x94 [nbd]
[<ffffffff814ac0fd>] blkdev_ioctl+0x14d/0x940
[<ffffffff811b0da2>] ? put_pipe_info+0x22/0x60
[<ffffffff811d96cc>] block_ioctl+0x3c/0x40
[<ffffffff811ba08d>] do_vfs_ioctl+0x8d/0x5e0
[<ffffffff811aa329>] ? ____fput+0x9/0x10
[<ffffffff810e9092>] ? task_work_run+0x72/0x90
[<ffffffff811ba627>] SyS_ioctl+0x47/0x80
[<ffffffff8185f5df>] entry_SYSCALL_64_fastpath+0x17/0x93
---[ end trace 7899b295e4f850c8 ]---
It seems fairly obvious that device_create_file() is not being protected
from being run concurrently on the same nbd.
Quentin found the following relevant commits:
1a2ad21 nbd: add locking to nbd_ioctl
90b8f28 [PATCH] end of methods switch: remove the old ones
d4430d6 [PATCH] beginning of methods conversion
08f8585 [PATCH] move block_device_operations to blkdev.h
It would seem that the race was introduced in the process of moving nbd
from BKL to unlocked ioctls.
By setting nbd->task_recv while the mutex is held, we can prevent other
processes from running concurrently (since nbd->task_recv is also checked
while the mutex is held).
Reported-and-tested-by: Quentin Casasnovas <[email protected]>
Cc: Markus Pargmann <[email protected]>
Cc: Paul Clements <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
I got a KASAN report of use-after-free:
==================================================================
BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
Read of size 8 by task trinity-c1/315
=============================================================================
BUG kmalloc-32 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
___slab_alloc+0x4f1/0x520
__slab_alloc.isra.58+0x56/0x80
kmem_cache_alloc_trace+0x260/0x2a0
disk_seqf_start+0x66/0x110
traverse+0x176/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
__slab_free+0x17a/0x2c0
kfree+0x20a/0x220
disk_seqf_stop+0x42/0x50
traverse+0x3b5/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
Call Trace:
[<ffffffff81d6ce81>] dump_stack+0x65/0x84
[<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
[<ffffffff814704ff>] object_err+0x2f/0x40
[<ffffffff814754d1>] kasan_report_error+0x221/0x520
[<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
[<ffffffff83888161>] klist_iter_exit+0x61/0x70
[<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
[<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
[<ffffffff8151f812>] seq_read+0x4b2/0x11a0
[<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
[<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
[<ffffffff814b4c45>] do_readv_writev+0x565/0x660
[<ffffffff814b8a17>] vfs_readv+0x67/0xa0
[<ffffffff814b8de6>] do_preadv+0x126/0x170
[<ffffffff814b92ec>] SyS_preadv+0xc/0x10
This problem can occur in the following situation:
open()
- pread()
- .seq_start()
- iter = kmalloc() // succeeds
- seqf->private = iter
- .seq_stop()
- kfree(seqf->private)
- pread()
- .seq_start()
- iter = kmalloc() // fails
- .seq_stop()
- class_dev_iter_exit(seqf->private) // boom! old pointer
As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.
An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.
Cc: [email protected]
Signed-off-by: Vegard Nossum <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Merge 4fc29c1aa375 included this extra line, but it's not needed (or
useful) since we'll bio_set_op_attrs() right after to properly set
the op and flags for the bio.
Signed-off-by: Jens Axboe <[email protected]>
|
|
When a bio is cloned, the newly created bio must be associated with
the same blkcg as the original bio (if BLK_CGROUP is enabled). If
this operation is not performed, then the new bio is not associated
with any group, and the group of the current task is returned when
the group of the bio is requested.
Depending on the cloning frequency, this may cause a large
percentage of the bios belonging to a given group to be treated
as if belonging to other groups (in most cases as if belonging to
the root group). The expected group isolation may thereby be broken.
This commit adds the missing association in bio-cloning functions.
Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers")
Cc: [email protected] # v4.3+
Signed-off-by: Paolo Valente <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Jeff Moyer <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
'nr_undestroyed_grps' in struct throtl_data was used to count
the number of throtl_grp related with throtl_data, but now
throtl_grp is tracked by blkcg_gq, so it is useless anymore.
Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
queue_io() so that inodes which have only dirty timestamps are properly
written on fsync(2) and sync(2). However there are other call sites -
most notably going through write_inode_now() - which expect inode to be
clean after WB_SYNC_ALL writeback. This is not currently true as we do
not clear I_DIRTY_TIME in __writeback_single_inode() even for
WB_SYNC_ALL writeback in all the cases. This then resulted in the
following oops because bdev_write_inode() did not clean the inode and
writeback code later stumbled over a dirty inode with detached wb.
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: writeback wb_workfn (flush-11:0)
task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
RIP: 0010:[<ffffffff818884d2>] [<ffffffff818884d2>]
locked_inode_to_wb_and_lock_list+0xa2/0x750
RSP: 0018:ffff88006cdaf7d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
FS: 0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
Call Trace:
[< inline >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
[<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
[<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
[<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
[< inline >] wb_do_writeback fs/fs-writeback.c:1844
[<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
[<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
[<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
[<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
[<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
RIP [< inline >] wb_get include/linux/backing-dev-defs.h:212
RIP [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
fs/fs-writeback.c:281
RSP <ffff88006cdaf7d0>
---[ end trace 986a4d314dcb2694 ]---
Fix the problem by making sure __writeback_single_inode() writes inode
only with dirty times in WB_SYNC_ALL mode.
Reported-by: Dmitry Vyukov <[email protected]>
Tested-by: Laurent Dufour <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().
Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.
Cc: [email protected] # v4.5+
Reported-by: Wim Osterholt <[email protected]>
Tested-by: Wim Osterholt <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The LNKGET based atomic sequence in __cmpxchg_u32 has slightly incorrect
constraints for the return value which under certain circumstances can
allow an address unit register to be used as the first operand of a CMP
instruction. This isn't a valid instruction however as the encodings
only allow a data unit to be specified. This would result in an
assembler error like the following:
Error: failed to assemble instruction: "CMP A0.2,D0Ar6"
Fix by changing the constraint from "=&da" (assigned, early clobbered,
data or address unit register) to "=&d" (data unit register only).
The constraint for the second operand, "bd" (an op2 register where op1
is a data unit register and the instruction supports O2R) is already
correct assuming the first operand is a data unit register.
Other cases of CMP in inline asm have had their constraints checked, and
appear to all be fine.
Fixes: 6006c0d8ce94 ("metag: Atomics, locks and bitops")
Signed-off-by: James Hogan <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 3.9.x-
|
|
buf[0] is an unsigned char. touch_nr is an int. The test for negative
here doesn't make sense so I have removed it.
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
The newly added sis_i2c driver fails to link without the CRC_ITU_T
driver enabled:
drivers/input/touchscreen/sis_i2c.o: In function `sis_ts_irq_handler':
sis_i2c.c:(.text+0xc0): undefined reference to `crc_itu_t'
This adds a Kconfig select statement.
Signed-off-by: Arnd Bergmann <[email protected]>
Fixes: a485cb037fe6 ("Input: add driver for SiS 9200 family I2C touchscreen controllers")
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
|
|
Soft RoCE (RXE) - The software RoCE driver
ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device. This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.
The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.
A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices. The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.
Architecture:
+-----------------------------------------------------------+
| Application |
+-----------------------------------------------------------+
+-----------------------------------+
| libibverbs |
User +-----------------------------------+
+----------------+ +----------------+
| librxe | | HW RoCE lib |
+----------------+ +----------------+
+---------------------------------------------------------------+
+--------------+ +------------+
| Sockets | | RDMA ULP |
+--------------+ +------------+
+--------------+ +---------------------+
| TCP/IP | | ib_core |
+--------------+ +---------------------+
+------------+ +----------------+
Kernel | ib_rxe | | HW RoCE driver |
+------------+ +----------------+
+------------------------------------+
| NIC driver |
+------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------------------------------------------------------+
| Application |
+-----------------------------------------------------------+
+-----------------------------------+
| libibverbs |
User +-----------------------------------+
+----------------+ +----------------+
| librxe | | HW RoCE lib |
+----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+--------------+ +------------+
| Sockets | | RDMA ULP |
+--------------+ +------------+
+--------------+ +---------------------+
| TCP/IP | | ib_core |
+--------------+ +---------------------+
+------------+ +----------------+
Kernel | ib_rxe | | HW RoCE driver |
+------------+ +----------------+
+------------------------------------+
| NIC driver |
+------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Soft RoCE resources:
[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library
Signed-off-by: Kamal Heib <[email protected]>
Signed-off-by: Amir Vadai <[email protected]>
Signed-off-by: Moni Shoua <[email protected]>
Reviewed-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
During a resynchronization, device status char 'a' is output on the raid
status line for every device of a RAID set. It changes from 'a' to 'A'
(unless device failure) when the resynchronization completes.
Interrupting and restarting a resynchronization, by reloading the DM
table, erroneously lead to status char 'A'.
Fix this by avoiding setting the MD_RECOVERY_REQUESTED flag in
raid_preresume().
Signed-off-by: Heinz Mauelshagen <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media DocBook removal and some fixups from Mauro Carvalho Chehab:
- removal of the media DocBook (since it's all in Sphinx now)
- videobuf2: Fix an allocation regression
- a few fixes related to the CEC drivers
* tag 'media/v4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] cec: fix off-by-one memset
[media] staging: add MEDIA_SUPPORT dependency
[media] vivid: don't handle CEC_MSG_SET_STREAM_PATH
[media] media: adv7180: Fix broken interrupt register access
[media] vb2: Fix allocation size of dma_parms
[media] vim2m: copy the other colorspace-related fields as well
[media] adv7511: fix VIC autodetect
doc-rst: Remove the media docbook
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"The only interesting thing here is Jessica's patch to add
ro_after_init support to modules. The rest are all trivia"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
extable.h: add stddef.h so "NULL" definition is not implicit
modules: add ro_after_init support
jump_label: disable preemption around __module_text_address().
exceptions: fork exception table content from module.h into extable.h
modules: Add kernel parameter to blacklist modules
module: Do a WARN_ON_ONCE() for assert module mutex not held
Documentation/module-signing.txt: Note need for version info if reusing a key
module: Invalidate signatures on force-loaded modules
module: Issue warnings when tainting kernel
module: fix redundant test.
module: fix noreturn attribute for __module_put_and_exit()
|
|
Merge even more updates from Andrew Morton:
- dma-mapping API cleanup
- a few cleanups and misc things
- use jump labels in dynamic-debug
* emailed patches from Andrew Morton <[email protected]>:
dynamic_debug: add jump label support
jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL
arm: jump label may reference text in __exit
tile: support static_key usage in non-module __exit sections
sparc: support static_key usage in non-module __exit sections
powerpc: add explicit #include <asm/asm-compat.h> for jump label
drivers/media/dvb-frontends/cxd2841er.c: avoid misleading gcc warning
MAINTAINERS: update email and list of Samsung HW driver maintainers
block: remove BLK_DEV_DAX config option
samples/kretprobe: fix the wrong type
samples/kretprobe: convert the printk to pr_info/pr_err
samples/jprobe: convert the printk to pr_info/pr_err
samples/kprobe: convert the printk to pr_info/pr_err
dma-mapping: use unsigned long for dma_attrs
media: mtk-vcodec: remove unused dma_attrs
include/linux/bitmap.h: cleanup
tree-wide: replace config_enabled() with IS_ENABLED()
drivers/fpga/Kconfig: fix build failure
|
|
Although dynamic debug is often only used for debug builds, sometimes
its enabled for production builds as well. Minimize its impact by using
jump labels. This reduces the text section by 7000+ bytes in the kernel
image below. It does increase data, but this should only be referenced
when changing the direction of the branches, and hence usually not in
cache.
text data bss dec hex filename
8194852 4879776 925696 14000324 d5a0c4 vmlinux.pre
8187337 4960224 925696 14073257 d6bda9 vmlinux.post
Link: http://lkml.kernel.org/r/d165b465e8c89bc582d973758d40be44c33f018b.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The current jump_label.h includes bug.h for things such as WARN_ON().
This makes the header problematic for inclusion by kernel.h or any
headers that kernel.h includes, since bug.h includes kernel.h (circular
dependency). The inclusion of atomic.h is similarly problematic. Thus,
this should make jump_label.h 'includable' from most places.
Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The jump table can reference text found in an __exit section. Thus,
instead of discarding it at build time, include EXIT_TEXT as part of
__init and it will be released when the system boots.
Link: http://lkml.kernel.org/r/60284113bb759121e8ae3e99af1535647e52123f.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Previously, all the __exit sections were just dropped by the link phase.
However, if there are static_key (jump label) constructs in __exit
sections that are not modules, the link fails with the message:
`.exit.text' referenced in section `__jump_table' of xxx.o:
defined in discarded section `.exit.text' of xxx.o
Support this usage by keeping the .exit.text sections in the final image
if JUMP_LABEL is defined, then discarding them once initialization is
complete.
Link: http://lkml.kernel.org/r/bfd7c107c610c30e992868ebfe2a5d796a097464.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The jump table can reference text found in an __exit section. Thus,
instead of discarding it at build/link time, include EXIT_TEXT as part
of __init and release it at system boot time.
Without this patch the link fails with:
`.exit.text' referenced in section `__jump_table' of xxx.o:
defined in discarded section `.exit.text' of xxx.o
Link: http://lkml.kernel.org/r/d822da427ab07a02a394602eca687104ff682f83.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The stringify_in_c() macro may not be included. Make the dependency
explicit.
Link: http://lkml.kernel.org/r/564720c5328edd53c9d56db325be7215440eec3e.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The addition of jump label support in dynamic_debug caused an unexpected
warning in exactly one file in the kernel:
drivers/media/dvb-frontends/cxd2841er.c: In function 'cxd2841er_tune_tc':
include/linux/dynamic_debug.h:134:3: error: 'carrier_offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
__dynamic_dev_dbg(&descriptor, dev, fmt, \
^~~~~~~~~~~~~~~~~
drivers/media/dvb-frontends/cxd2841er.c:3177:11: note: 'carrier_offset' was declared here
int ret, carrier_offset;
^~~~~~~~~~~~~~
The problem seems to be that the compiler gets confused by the extra
conditionals in static_branch_unlikely, to the point where it can no
longer keep track of which branches have already been taken, and it
doesn't realize that this variable is now always initialized when it
gets used.
I have done lots of randconfig kernel builds and could not find any
other file with this behavior, so I assume it's a rare enough glitch
that we don't need to change the jump label support but instead just
work around the warning in the driver.
To achieve that, I'm moving the check for the return value into the
switch() statement, which is an obvious transformation, but is enough to
un-confuse the compiler here. The resulting code is not as nice to
read, but at least we retain the behavior of warning if it gets changed
to actually access an uninitialized carrier offset value in the future.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Abylay Ospan <[email protected]>
Cc: Sergey Kozlov <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Jason Baron <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Change my email address in the MAINTAINERS file.
Add new maintainers of selected Samsung HW drivers.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kamil Debski <[email protected]>
Reviewed-by: Jean Delvare <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Kishon Vijay Abraham I <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Lukasz Majewski <[email protected]>
Cc: Sylwester Nawrocki <[email protected]>
Cc: Kamil Debski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The functionality for block device DAX was already removed with commit
acc93d30d7d4 ("Revert "block: enable dax for raw block devices"")
However, we still had a config option hanging around that was always
disabled because it depended on CONFIG_BROKEN. This config option was
introduced in commit 03cdadb04077 ("block: disable block device DAX by
default")
This change reverts that commit, removing the dead config option.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Cc: Dave Hansen <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The regs_return_value() returns "unsigned long" or "long" value. But the
retval is int type now, it may cause overflow, the log may becomes:
[ 2911.078869] do_brk returned -2003877888 and took 4620 ns to execute
This patch converts the retval to "unsigned long" type, and fixes the
overflow issue.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Shijie <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info. In the error path, use the pr_err to replace the
printk.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Shijie <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info. In the error path, use the pr_err to replace the
printk.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Shijie <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info. In the error path, use the pr_err to replace the
printk.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Shijie <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
virtual patch
virtual context
@r@
identifier f, attrs;
@@
f(...,
- struct dma_attrs *attrs
+ unsigned long attrs
, ...)
{
...
}
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
and
// Options: --all-includes
virtual patch
virtual context
@r@
identifier f, attrs;
type t;
@@
t f(..., struct dma_attrs *attrs);
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Vineet Gupta <[email protected]>
Acked-by: Robin Murphy <[email protected]>
Acked-by: Hans-Christian Noren Egtvedt <[email protected]>
Acked-by: Mark Salter <[email protected]> [c6x]
Acked-by: Jesper Nilsson <[email protected]> [cris]
Acked-by: Daniel Vetter <[email protected]> [drm]
Reviewed-by: Bart Van Assche <[email protected]>
Acked-by: Joerg Roedel <[email protected]> [iommu]
Acked-by: Fabien Dessenne <[email protected]> [bdisp]
Reviewed-by: Marek Szyprowski <[email protected]> [vb2-core]
Acked-by: David Vrabel <[email protected]> [xen]
Acked-by: Konrad Rzeszutek Wilk <[email protected]> [xen swiotlb]
Acked-by: Joerg Roedel <[email protected]> [iommu]
Acked-by: Richard Kuo <[email protected]> [hexagon]
Acked-by: Geert Uytterhoeven <[email protected]> [m68k]
Acked-by: Gerald Schaefer <[email protected]> [s390]
Acked-by: Bjorn Andersson <[email protected]>
Acked-by: Hans-Christian Noren Egtvedt <[email protected]> [avr32]
Acked-by: Vineet Gupta <[email protected]> [arc]
Acked-by: Robin Murphy <[email protected]> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The local variable dma_attrs is set but never read.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove two unneeded `else's.
Cc: David Hildenbrand <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The use of config_enabled() against config options is ambiguous. In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED(). Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
clearer.
This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.
I noticed two cases where config_enabled() is used against a tristate
option:
- config_enabled(CONFIG_HWMON)
[ drivers/net/wireless/ath/ath10k/thermal.c ]
- config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
[ drivers/gpu/drm/gma500/opregion.c ]
I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Stas Sergeev <[email protected]>
Cc: Matt Redfearn <[email protected]>
Cc: Joshua Kinard <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Markos Chandras <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: yu-cheng yu <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Will Drewry <[email protected]>
Cc: Nikolay Martynov <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Leonid Yegoshin <[email protected]>
Cc: Rafal Milecki <[email protected]>
Cc: James Cowgill <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Alex Smith <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Brian Norris <[email protected]>
Cc: Hidehiro Kawai <[email protected]>
Cc: "Luis R. Rodriguez" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Tony Wu <[email protected]>
Cc: Huaitong Han <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andrea Gelmini <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Rabin Vincent <[email protected]>
Cc: "Maciej W. Rozycki" <[email protected]>
Cc: David Daney <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
While building m32r allmodconfig the build is failing with the error:
ERROR: "bad_dma_ops" [drivers/fpga/zynq-fpga.ko] undefined!
Xilinx Zynq FPGA is using DMA but there was no dependency while
building.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sudip Mukherjee <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Cc: Alan Tull <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
set_pte_at(.) will set or unset the PTE_RDONLY hardware bit before
writing the entry to the table.
This can cause problems with the copy-on-write logic in hugetlb_cow:
*) hugetlb_cow(.) called to handle a write fault on read only pte,
*) Before the copy-on-write updates the new page table a call is
made to pte_same(huge_ptep_get(ptep), pte)), to check for a race,
*) Because set_pte_at(.) changed the pte, *ptep != pte, and the
hugetlb_cow(.) code erroneously assumes that it lost the race,
*) The new page is subsequently freed without being used.
On arm64 this problem only becomes apparent when we apply:
67961f9 mm/hugetlb: fix huge page reserve accounting for private
mappings
When one runs the libhugetlbfs test suite, there are allocation errors
and hugetlbfs pages become erroneously locked in memory as reserved.
(There is a high HugePages_Rsvd: count).
In this patch we introduce pte_same which ignores the PTE_RDONLY bit,
allowing for the libhugetlbfs test suite to pass as expected and
without leaking any reserved HugeTLB pages.
Reported-by: Huang Shijie <[email protected]>
Signed-off-by: Steve Capper <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
Commit 4b855078601f ("KVM: nVMX: Don't advertise single
context invalidation for invept") removed advertising
single context invalidation since the spec does not mandate it.
However, some hypervisors (such as ESX) require it to be present
before willing to use ept in a nested environment. Advertise it
and fallback to the global case.
Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Nested vpid is already supported and both single/global
modes are advertised to the guest
Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
BUG: unable to handle kernel NULL pointer dereference at 000000000000008c
IP: [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
PGD 0
Oops: 0000 [#1] SMP
Call Trace:
kvm_arch_vcpu_load+0x86/0x260 [kvm]
vcpu_load+0x46/0x60 [kvm]
kvm_vcpu_ioctl+0x79/0x7c0 [kvm]
? __lock_is_held+0x54/0x70
do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
SyS_ioctl+0x79/0x90
do_syscall_64+0x7c/0x1e0
entry_SYSCALL64_slow_path+0x25/0x25
RIP [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
RSP <ffff8800db1f3d70>
CR2: 000000000000008c
---[ end trace a55fb79d2b3b4ee8 ]---
This can be reproduced steadily by kernel_irqchip=off.
We should not access preemption timer stuff if lapic is emulated in userspace.
This patch fix it by avoiding access preemption timer stuff when kernel_irqchip=off.
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Yunhong Jiang <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The KVM_X2APIC_API_USE_32BIT_IDS feature applies to both
KVM_SET_GSI_ROUTING and KVM_SIGNAL_MSI, but was not mentioned in the
documentation for the latter ioctl.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM Changes for v4.8 - Take 2
Includes GSI routing support to go along with the new VGIC and a small fix that
has been cooking in -next for a while.
|
|
The new simplified __pvclock_read_cycles does the same computation
as vread_pvclock, except that (because it takes the pvclock_vcpu_time_info
pointer) it has to be moved inside the loop. Since the loop is expected to
never roll, this makes no difference.
Acked-by: Andy Lutomirski <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The version field in struct pvclock_vcpu_time_info basically implements
a seqcount. Wrap it with the usual read_begin and read_retry functions,
and use these APIs instead of peppering the code with smp_rmb()s.
While at it, change it to the more pedantically correct virt_rmb().
With this change, __pvclock_read_cycles can be simplified noticeably.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
We want to initialise register_process_table() before ppc_md is setup,
so that it can be called as part of MMU init (at least on Radix ATM).
That no longer works because probe_machine() requires that ppc_md be
empty before it's called, and we now do probe_machine() much later.
So make register_process_table a global for now. It will probably move
into a mmu_radix_ops struct at some point in the future.
This was broken by me when applying commit 7025776ed1eb "powerpc/mm:
Move hash table ops to a separate structure" due to conflicts with other
patches.
Fixes: 7025776ed1eb ("powerpc/mm: Move hash table ops to a separate structure")
Signed-off-by: Michael Ellerman <[email protected]>
|
|
These have been changed in the hardware, update Linux's version.
Signed-off-by: Madhavan Srinivasan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|