Age | Commit message (Collapse) | Author | Files | Lines |
|
UML is a bit special since it does not have iomem nor dma. That means a
lot of drivers will not build if they miss a dependency on HAS_IOMEM.
s390 used to have the same issues but since it gained PCI support UML is
the only stranger.
We are tired of patching dozens of new drivers after every merge window
just to un-break allmod/yesconfig UML builds. One could argue that a
decent driver has to know on what it depends and therefore a missing
HAS_IOMEM dependency is a clear driver bug. But the dependency not
obvious and not everyone does UML builds with COMPILE_TEST enabled when
developing a device driver.
A possible solution to make these builds succeed on UML would be
providing stub functions for ioremap() and friends which fail upon
runtime. Another one is simply disabling COMPILE_TEST for UML. Since
it is the least hassle and does not force use to fake iomem support
let's do the latter.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Richard Weinberger <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Suppress a bunch of warnings of the form:
fs/proc/task_mmu.c: In function 'show_smap_vma_flags':
fs/proc/task_mmu.c:635:22: warning: initialized field overwritten [-Wt override-init]
[ilog2(VM_READ)] = "rd",
^~~~
fs/proc/task_mmu.c:635:22: note: (near initialization for 'mnemonics[0]')
They happen because of the way we intentionally build the table, so
silence the warning when building with 'make W=1'.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Valdis Kletnieks <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
/proc/stat shows (among lots of other things) the current boottime (i.e.
number of seconds since boot). While a 32-bit number is sufficient for
this particular case, we want to get rid of the 'struct timespec'
suffers from a 32-bit overflow in 2038.
This changes the code to use a struct timespec64, which is known to be
safe in all cases.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This was needed before to ensure that ->signal != 0 and do_each_thread()
is safe, see commit b95c35e76b29b ("oom: fix the unsafe usage of
badness() in proc_oom_score()") for details.
Today tsk->signal can't go away and for_each_thread(tsk) is always safe.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Salah Triki and Luis de Bethencourt are taking over maintainership of
befs.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Luis de Bethencourt <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Acked-by: Theodore Ts'o <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
cgroup's document path is changed to "cgroup-v1". update it.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: seokhoon.yoon <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
handle_object_size_mismatch() used %pk to format a kernel pointer with
pr_err(). This seemed to be a misspelling for %pK, but using this to
format a kernel pointer does not make much sence here.
Therefore use %p instead, like in handle_missaligned_access().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nicolas Iooss <[email protected]>
Acked-by: Andrey Ryabinin <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.
Running LTP msgsnd06 with kmemleak gives the following:
cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88003c0a11f8 (size 8):
comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
hex dump (first 8 bytes):
1b 00 00 00 01 00 00 00 ........
backtrace:
kmemleak_alloc+0x23/0x40
kmem_cache_alloc_trace+0xe1/0x180
selinux_msg_queue_alloc_security+0x3f/0xd0
security_msg_queue_alloc+0x2e/0x40
newque+0x4e/0x150
ipcget+0x159/0x1b0
SyS_msgget+0x39/0x40
entry_SYSCALL_64_fastpath+0x13/0x8f
Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()
Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Fabian Frederick <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Cc: <[email protected]> [3.12+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We must call shrink_slab() for each memory cgroup on both global and
memcg reclaim in shrink_node_memcg(). Commit d71df22b55099 accidentally
changed that so that now shrink_slab() is only called with memcg != NULL
on memcg reclaim. As a result, memcg-aware shrinkers (including
dentry/inode) are never invoked on global reclaim. Fix that.
Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vladimir Davydov <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Radix trees may be used not only for storing page cache pages, so
unconditionally accounting radix tree nodes to the current memory cgroup
is bad: if a radix tree node is used for storing data shared among
different cgroups we risk pinning dead memory cgroups forever.
So let's only account radix tree nodes if it was explicitly requested by
passing __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to
account page cache entries, so mark mapping->page_tree so.
Fixes: 58e698af4c63 ("radix-tree: account radix_tree_node to memory cgroup")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vladimir Davydov <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: <[email protected]> [4.6+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If the total amount of memory assigned to quarantine is less than the
amount of memory assigned to per-cpu quarantines, |new_quarantine_size|
may overflow. Instead, set it to zero.
[[email protected]: cleanup: use WARN_ONCE return value]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation")
Signed-off-by: Alexander Potapenko <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently we just dump stack in case of double free bug.
Let's dump all info about the object that we have.
[[email protected]: change double free message per Alexander]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The state of object currently tracked in two places - shadow memory, and
the ->state field in struct kasan_alloc_meta. We can get rid of the
latter. The will save us a little bit of memory. Also, this allow us
to move free stack into struct kasan_alloc_meta, without increasing
memory consumption. So now we should always know when the last time the
object was freed. This may be useful for long delayed use-after-free
bugs.
As a side effect this fixes following UBSAN warning:
UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
member access within misaligned address ffff88000d1efebc for type 'struct qlist_node'
which requires 8 byte alignment
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Size of slab object already stored in cache->object_size.
Note, that kmalloc() internally rounds up size of allocation, so
object_size may be not equal to alloc_size, but, usually we don't need
to know the exact size of allocated object. In case if we need that
information, we still can figure it out from the report. The dump of
shadow memory allows to identify the end of allocated memory, and
thereby the exact allocation size.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
SLUB doesn't require disabled interrupts to call ___cache_free().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Acked-by: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently we call quarantine_reduce() for ___GFP_KSWAPD_RECLAIM (implied
by __GFP_RECLAIM) allocation. So, basically we call it on almost every
allocation. quarantine_reduce() sometimes is heavy operation, and
calling it with disabled interrupts may trigger hard LOCKUP:
NMI watchdog: Watchdog detected hard LOCKUP on cpu 2irq event stamp: 1411258
Call Trace:
<NMI> dump_stack+0x68/0x96
watchdog_overflow_callback+0x15b/0x190
__perf_event_overflow+0x1b1/0x540
perf_event_overflow+0x14/0x20
intel_pmu_handle_irq+0x36a/0xad0
perf_event_nmi_handler+0x2c/0x50
nmi_handle+0x128/0x480
default_do_nmi+0xb2/0x210
do_nmi+0x1aa/0x220
end_repeat_nmi+0x1a/0x1e
<<EOE>> __kernel_text_address+0x86/0xb0
print_context_stack+0x7b/0x100
dump_trace+0x12b/0x350
save_stack_trace+0x2b/0x50
set_track+0x83/0x140
free_debug_processing+0x1aa/0x420
__slab_free+0x1d6/0x2e0
___cache_free+0xb6/0xd0
qlist_free_all+0x83/0x100
quarantine_reduce+0x177/0x1b0
kasan_kmalloc+0xf3/0x100
Reduce the quarantine_reduce iff direct reclaim is allowed.
Fixes: 55834c59098d("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Reported-by: Dave Jones <[email protected]>
Acked-by: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Once an object is put into quarantine, we no longer own it, i.e. object
could leave the quarantine and be reallocated. So having set_track()
call after the quarantine_put() may corrupt slab objects.
BUG kmalloc-4096 (Not tainted): Poison overwritten
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: 0xffff8804540de850-0xffff8804540de857. First byte 0xb5 instead of 0x6b
...
INFO: Freed in qlist_free_all+0x42/0x100 age=75 cpu=3 pid=24492
__slab_free+0x1d6/0x2e0
___cache_free+0xb6/0xd0
qlist_free_all+0x83/0x100
quarantine_reduce+0x177/0x1b0
kasan_kmalloc+0xf3/0x100
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0x109/0x3e0
mmap_region+0x53e/0xe40
do_mmap+0x70f/0xa50
vm_mmap_pgoff+0x147/0x1b0
SyS_mmap_pgoff+0x2c7/0x5b0
SyS_mmap+0x1b/0x30
do_syscall_64+0x1a0/0x4e0
return_from_SYSCALL_64+0x0/0x7a
INFO: Slab 0xffffea0011503600 objects=7 used=7 fp=0x (null) flags=0x8000000000004080
INFO: Object 0xffff8804540de848 @offset=26696 fp=0xffff8804540dc588
Redzone ffff8804540de840: bb bb bb bb bb bb bb bb ........
Object ffff8804540de848: 6b 6b 6b 6b 6b 6b 6b 6b b5 52 00 00 f2 01 60 cc kkkkkkkk.R....`.
Similarly, poisoning after the quarantine_put() leads to false positive
use-after-free reports:
BUG: KASAN: use-after-free in anon_vma_interval_tree_insert+0x304/0x430 at addr ffff880405c540a0
Read of size 8 by task trinity-c0/3036
CPU: 0 PID: 3036 Comm: trinity-c0 Not tainted 4.7.0-think+ #9
Call Trace:
dump_stack+0x68/0x96
kasan_report_error+0x222/0x600
__asan_report_load8_noabort+0x61/0x70
anon_vma_interval_tree_insert+0x304/0x430
anon_vma_chain_link+0x91/0xd0
anon_vma_clone+0x136/0x3f0
anon_vma_fork+0x81/0x4c0
copy_process.part.47+0x2c43/0x5b20
_do_fork+0x16d/0xbd0
SyS_clone+0x19/0x20
do_syscall_64+0x1a0/0x4e0
entry_SYSCALL64_slow_path+0x25/0x25
Fix this by putting an object in the quarantine after all other
operations.
Fixes: 80a9201a5965 ("mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Reported-by: Dave Jones <[email protected]>
Reported-by: Vegard Nossum <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Acked-by: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We've had a report about soft lockups caused by lock bouncing in the
soft reclaim path:
BUG: soft lockup - CPU#0 stuck for 22s! [kav4proxy-kavic:3128]
RIP: 0010:[<ffffffff81469798>] [<ffffffff81469798>] _raw_spin_lock+0x18/0x20
Call Trace:
mem_cgroup_soft_limit_reclaim+0x25a/0x280
shrink_zones+0xed/0x200
do_try_to_free_pages+0x74/0x320
try_to_free_pages+0x112/0x180
__alloc_pages_slowpath+0x3ff/0x820
__alloc_pages_nodemask+0x1e9/0x200
alloc_pages_vma+0xe1/0x290
do_wp_page+0x19f/0x840
handle_pte_fault+0x1cd/0x230
do_page_fault+0x1fd/0x4c0
page_fault+0x25/0x30
There are no memcgs created so there cannot be any in the soft limit
excess obviously:
[...]
memory 0 1 1
so all this just seems to be mem_cgroup_largest_soft_limit_node trying
to get spin_lock_irq(&mctz->lock) just to find out that the soft limit
excess tree is empty. This is just pointless wasting of cycles and
cache line bouncing during heavy parallel reclaim on large machines.
The particular machine wasn't very healthy and most probably suffering
from a memory leak which just caused the memory reclaim to trash
heavily. But bouncing on the lock certainly didn't help...
Fix this by optimistic lockless check and bail out early if the tree is
empty. This is theoretically racy but that shouldn't matter all that
much. First of all soft limit is a best effort feature and it is slowly
getting deprecated and its usage should be really scarce. Bouncing on a
lock without a good reason is surely much bigger problem, especially on
large CPU machines.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
runs his database load with memory online and offline running in
parallel. The reason is that huge_pmd_share might detect a shared pmd
which is currently migrated and so it has migration pte which is
!pte_huge.
There doesn't seem to be any easy way to prevent from the race and in
fact seeing the migration swap entry is not harmful. Both callers of
huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
will copy the swap entry and make it COW if needed. hugetlb_fault will
back off and so the page fault is retries if the page is still under
migration and waits for its completion in hugetlb_fault.
That means that the BUG_ON is wrong and we should update it. Let's
simply check that all present ptes are pte_huge instead.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: zhongjiang <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In powerpc servers with large memory(32TB), we watched several soft
lockups for hugepage under stress tests.
The call traces are as follows:
1.
get_page_from_freelist+0x2d8/0xd50
__alloc_pages_nodemask+0x180/0xc20
alloc_fresh_huge_page+0xb0/0x190
set_max_huge_pages+0x164/0x3b0
2.
prep_new_huge_page+0x5c/0x100
alloc_fresh_huge_page+0xc8/0x190
set_max_huge_pages+0x164/0x3b0
This patch fixes such soft lockups. It is safe to call cond_resched()
there because it is out of spin_lock/unlock section.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jia He <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Apparently, the tools/testing version dates to a few flags ago, and
we've sprouted 4 new ones since. Keep in sync with the value in the
main tree...
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Valdis Kletnieks <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Every swap-in anonymous page starts from inactive lru list's head. It
should be activated unconditionally when VM decide to reclaim because
page table entry for the page always usually has marked accessed bit.
Thus, their window size for getting a new referece is 2 * NR_inactive +
NR_active while others is NR_inactive + NR_active.
It's not fair that it has more chance to be referenced compared to other
newly allocated page which starts from active lru list's head.
Johannes:
: The page can still have a valid copy on the swap device, so prefering to
: reclaim that page over a fresh one could make sense. But as you point
: out, having it start inactive instead of active actually ends up giving it
: *more* LRU time, and that seems to be without justification.
Rik:
: The reason newly read in swap cache pages start on the inactive list is
: that we do some amount of read-around, and do not know which pages will
: get used.
:
: However, immediately activating the ones that DO get used, like your patch
: does, is the right thing to do.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Minchan Kim <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I ran into this:
BUG: sleeping function called from invalid context at mm/page_alloc.c:3784
in_atomic(): 0, irqs_disabled(): 0, pid: 1434, name: trinity-c1
2 locks held by trinity-c1/1434:
#0: (&mm->mmap_sem){......}, at: [<ffffffff810ce31e>] __do_page_fault+0x1ce/0x8f0
#1: (rcu_read_lock){......}, at: [<ffffffff81378f86>] filemap_map_pages+0xd6/0xdd0
CPU: 0 PID: 1434 Comm: trinity-c1 Not tainted 4.7.0+ #58
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
dump_stack+0x65/0x84
panic+0x185/0x2dd
___might_sleep+0x51c/0x600
__might_sleep+0x90/0x1a0
__alloc_pages_nodemask+0x5b1/0x2160
alloc_pages_current+0xcc/0x370
pte_alloc_one+0x12/0x90
__pte_alloc+0x1d/0x200
alloc_set_pte+0xe3e/0x14a0
filemap_map_pages+0x42b/0xdd0
handle_mm_fault+0x17d5/0x28b0
__do_page_fault+0x310/0x8f0
trace_do_page_fault+0x18d/0x310
do_async_page_fault+0x27/0xa0
async_page_fault+0x28/0x30
The important bits from the above is that filemap_map_pages() is calling
into the page allocator while holding rcu_read_lock (sleeping is not
allowed inside RCU read-side critical sections).
According to Kirill Shutemov, the prefaulting code in do_fault_around()
is supposed to take care of this, but missing error handling means that
the allocation failure can go unnoticed.
We don't need to return VM_FAULT_OOM (or any other error) here, since we
can just let the normal fault path try again.
Fixes: 7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vegard Nossum <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: "Hillf Danton" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We found a dlm-blocked situation caused by continuous breakdown of
recovery masters described below. To solve this problem, we should
purge recovery lock once detecting recovery master goes down.
N3 N2 N1(reco master)
go down
pick up recovery lock and
begin recoverying for N2
go down
pick up recovery
lock failed, then
purge it:
dlm_purge_lockres
->DROPPING_REF is set
send deref to N1 failed,
recovery lock is not purged
find N1 go down, begin
recoverying for N1, but
blocked in dlm_do_recovery
as DROPPING_REF is set:
dlm_do_recovery
->dlm_pick_recovery_master
->dlmlock
->dlm_get_lock_resource
->__dlm_wait_on_lockres_flags(tmpres,
DLM_LOCK_RES_DROPPING_REF);
Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jun Piao <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Reviewed-by: Jiufei Xue <[email protected]>
Reviewed-by: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We found a BUG situation that lockres is migrated during deref described
below. To solve the BUG, we could purge lockres directly when other
node says I did not have a ref. Additionally, we'd better purge lockres
if master goes down, as no one will response deref done.
Node 1 Node 2(old master) Node3(new master)
dlm_purge_lockres
send deref to N2
leave domain
migrate lockres to N3
finish migration
send do assert
master to N1
receive do assert msg
form N3, but can not
find lockres because
DROPPING_REF is set,
so the owner is still
N2.
receive deref from N1
and response -EINVAL
because lockres is migrated
BUG when receive -EINVAL
in dlm_drop_lockres_ref
Fixes: 842b90b62461d ("ocfs2/dlm: return in progress if master can not clear the refmap bit right now")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jun Piao <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Reviewed-by: Jiufei Xue <[email protected]>
Reviewed-by: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
dlm_deref_lockres_done_handler
We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
unexpected that described below. To solve the bug, we disable the
BUG_ON and purge lockres in dlm_do_local_recovery_cleanup.
Node 1 Node 2(master)
dlm_purge_lockres
dlm_deref_lockres_handler
DLM_LOCK_RES_SETREF_INPROG is set
response DLM_DEREF_RESPONSE_INPROG
receive DLM_DEREF_RESPONSE_INPROG
stop puring in dlm_purge_lockres
and wait for DLM_DEREF_RESPONSE_DONE
dispatch dlm_deref_lockres_worker
response DLM_DEREF_RESPONSE_DONE
receive DLM_DEREF_RESPONSE_DONE and
prepare to purge lockres
Node 2 goes down
find Node2 down and do local
clean up for Node2:
dlm_do_local_recovery_cleanup
-> clear DLM_LOCK_RES_DROPPING_REF
when purging lockres, BUG_ON happens
because DLM_LOCK_RES_DROPPING_REF is clear:
dlm_deref_lockres_done_handler
->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
[[email protected]: fix duplicated write to `ret']
Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jun Piao <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Reviewed-by: Jiufei Xue <[email protected]>
Reviewed-by: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The testcase "mmaptruncate" in ocfs2 test suite always fails with ENOSPC
error on small volume (say less than 10G). This testcase repeatedly
performs "extend" and "truncate" on a file. Continuously, it truncates
the file to 1/2 of the size, and then extends to 100% of the size. The
main bitmap will quickly run out of space because the "truncate" code
prevent truncate log from being flushed by
ocfs2_schedule_truncate_log_flush(osb, 1), while truncate log may have
cached lots of clusters.
So retry to allocate after flushing truncate log when ENOSPC is
returned. And we cannot reuse the deleted blocks before the transaction
committed. Fortunately, we already have a function to do this -
ocfs2_try_to_free_truncate_log(). Just need to remove the "static"
modifier and put it into the right place.
The "unlock"/"lock" code isn't elegant, but there seems to be no better
option.
[[email protected]: locking fix]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric Ren <[email protected]>
Reviewed-by: Gang He <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Reviewed-by: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We encountered a bug from the customer, the user did a fsck.ocfs2 on the
file system and exited unusually, the lockspace (with LVB size = 32) was
left in the kernel space, next, the user mounted this file system, the
kernel module did not create a new lockspace (LVB size = 64) via calling
dlm_new_lockspace() function in mounting stage, just used the existing
lockspace, created by the user space tool, this would lead the user was
not able to mount this file system from the other nodes, with the error
message like:
dlm: 032F5......: config mismatch: 64,0 nodeid 177127961: 32,0
(mount.ocfs2,26981,46):ocfs2_dlm_init:2995 ERROR: status = -71
ocfs2_mount_volume:1881 ERROR: status = -71
ocfs2_fill_super:1236 ERROR: status = -71
The user found it very difficult to find the root cause, then, we
brought out this patch to relieve such problem.
First, we add one more flag in calling dlm_new_lockspace() function, to
make sure the lockspace is created by kernel module itself, and this
change will not affect the backward compatibility.
Second, the obvious error message is reported in the kernel log, let the
user be more easy to find the root cause.
This patch will be used to insure the dlm lockspace is created by kernel
module when mounting a ocfs2 file system. There are two ways to create
a lockspace, from user space and kernel space, but the same name
lockspaces probably have different lvblen lengths/flags.
To avoid this mix using, we add one more flag DLM_LSFL_NEWEXCL, it will
make sure the dlm lockspace is created by kernel module when mounting.
Secondly, if a user space program (ocfs2-tools) is running on a file
system, the user tries to mount this file system in the cluster, DLM
module will return a -EEXIST or -EPROTO errno, we should give the user a
obvious error message, then, the user can let that user space tool exit
before mounting the file system again.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Gang He <[email protected]>
Reviewed-by: Goldwyn Rodrigues <[email protected]>
Reviewed-by: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Highlights:
- ARM64 support for ACPI host bridges
- new drivers for Axis ARTPEC-6 and Marvell Aardvark
- new pci_alloc_irq_vectors() interface for MSI-X, MSI, legacy INTx
- pci_resource_to_user() cleanup (more to come)
Detailed summary:
Enumeration:
- Move ecam.h to linux/include/pci-ecam.h (Jayachandran C)
- Add parent device field to ECAM struct pci_config_window (Jayachandran C)
- Add generic MCFG table handling (Tomasz Nowicki)
- Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki)
- Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki)
Resource management:
- Add devm_request_pci_bus_resources() (Bjorn Helgaas)
- Unify pci_resource_to_user() declarations (Bjorn Helgaas)
- Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas)
- Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas)
- Make PCI I/O space optional on ARM32 (Bjorn Helgaas)
- Ignore write combining when mapping I/O port space (Bjorn Helgaas)
- Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas)
- Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas)
- Support I/O resources when parsing host bridge resources (Jayachandran C)
- Add helpers to request/release memory and I/O regions (Johannes Thumshirn)
- Use pci_(request|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn)
- Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5))
- Add generic pci_bus_claim_resources() (Lorenzo Pieralisi)
- Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)
- Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi)
- Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya)
- Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu)
PCI device hotplug:
- Allow additional bus numbers for hotplug bridges (Keith Busch)
- Ignore interrupts during D3cold (Lukas Wunner)
Power management:
- Enforce type casting for pci_power_t (Andy Shevchenko)
- Don't clear d3cold_allowed for PCIe ports (Mika Westerberg)
- Put PCIe ports into D3 during suspend (Mika Westerberg)
- Power on bridges before scanning new devices (Mika Westerberg)
- Runtime resume bridge before rescan (Mika Westerberg)
- Add runtime PM support for PCIe ports (Mika Westerberg)
- Remove redundant check of pcie_set_clkpm (Shawn Lin)
Virtualization:
- Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra)
- Add DMA alias quirk for Adaptec 3805 (Alex Williamson)
- Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake)
- Add ACS quirk for Solarflare SFC9220 (Edward Cree)
MSI:
- Fix PCI_MSI dependencies (Arnd Bergmann)
- Add pci_msix_desc_addr() helper (Christoph Hellwig)
- Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig)
- Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig)
- Provide sensible IRQ vector alloc/free routines (Christoph Hellwig)
- Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig)
Error Handling:
- Bind DPC to Root Ports as well as Downstream Ports (Keith Busch)
- Remove DPC tristate module option (Keith Busch)
- Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg)
Generic host bridge driver:
- Select IRQ_DOMAIN (Arnd Bergmann)
- Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)
ACPI host bridge driver:
- Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki)
- Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki)
- Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki)
- Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki)
Altera host bridge driver:
- Check link status before retrain link (Ley Foon Tan)
- Poll for link up status after retraining the link (Ley Foon Tan)
Axis ARTPEC-6 host bridge driver:
- Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann)
- Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel)
- Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel)
Intel VMD host bridge driver:
- Use lock save/restore in interrupt enable path (Jon Derrick)
- Select device dma ops to override (Keith Busch)
- Initialize list item in IRQ disable (Keith Busch)
- Use x86_vector_domain as parent domain (Keith Busch)
- Separate MSI and MSI-X vector sharing (Keith Busch)
Marvell Aardvark host bridge driver:
- Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni)
- Add Aardvark PCI host controller driver (Thomas Petazzoni)
- Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni)
Microsoft Hyper-V host bridge driver:
- Fix interrupt cleanup path (Cathy Avery)
- Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov)
- Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov)
NVIDIA Tegra host bridge driver:
- Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren)
- Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren)
- Use lower-case hex consistently for register definitions (Thierry Reding)
- Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding)
- Stop setting pcibios_min_mem (Thierry Reding)
Renesas R-Car host bridge driver:
- Drop gen2 dummy I/O port region (Bjorn Helgaas)
TI DRA7xx host bridge driver:
- Fix return value in case of error (Christophe JAILLET)
Xilinx AXI host bridge driver:
- Fix return value in case of error (Christophe JAILLET)
Miscellaneous:
- Make bus_attr_resource_alignment static (Ben Dooks)
- Include <asm/dma.h> for isa_dma_bridge_buggy (Ben Dooks)
- MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven)
- Make host bridge drivers explicitly non-modular (Paul Gortmaker)"
* tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (125 commits)
PCI: xgene: Make explicitly non-modular
PCI: thunder-pem: Make explicitly non-modular
PCI: thunder-ecam: Make explicitly non-modular
PCI: tegra: Make explicitly non-modular
PCI: rcar-gen2: Make explicitly non-modular
PCI: rcar: Make explicitly non-modular
PCI: mvebu: Make explicitly non-modular
PCI: layerscape: Make explicitly non-modular
PCI: keystone: Make explicitly non-modular
PCI: hisi: Make explicitly non-modular
PCI: generic: Make explicitly non-modular
PCI: designware-plat: Make it explicitly non-modular
PCI: artpec6: Make explicitly non-modular
PCI: armada8k: Make explicitly non-modular
PCI: artpec: Add PCI_MSI_IRQ_DOMAIN dependency
PCI: Add ACS quirk for Solarflare SFC9220
arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700
PCI: aardvark: Add Aardvark PCI host controller driver
dt-bindings: add DT binding for the Aardvark PCIe controller
PCI: tegra: Program PADS_REFCLK_CFG* registers with per-SoC values
...
|
|
Bspec says:
"Restriction : SRD must not be enabled when the PSR Setup time from DPCD
00071h is greater than the time for vertical blank minus one line."
Let's check for that and disallow PSR if we exceed the limit.
Cc: Daniel Vetter <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
Add a small helper to parse the PSR setup time from the DPCD PSR
capabilities and return the value in microseconds.
v2: Don't waste so many bytes on the psr_setup_time_us[] table
Cc: Daniel Vetter <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
Pull MTD updates from Brian Norris:
"NAND:
Quoting Boris:
'This pull request contains only one notable change:
- Addition of the MTK NAND controller driver
And a bunch of specific NAND driver improvements/fixes. Here are the
changes that are worth mentioning:
- A few fixes/improvements for the xway NAND controller driver
- A few fixes for the sunxi NAND controller driver
- Support for DMA in the sunxi NAND driver
- Support for the sunxi NAND controller IP embedded in A23/A33 SoCs
- Addition for bitflips detection in erased pages to the brcmnand driver
- Support for new brcmnand IPs
- Update of the OMAP-GPMC binding to support DMA channel description'
In addition, some small fixes around error handling, etc., as well
as one long-standing corner case issue (2.6.20, I think?) with
writing 1 byte less than a page.
NOR:
- rework some error handling on reads and writes, so we can better
handle (for instance) SPI controllers which have limitations on
their maximum transfer size
- add new Cadence Quad SPI flash controller driver
- add new Atmel QSPI flash controller driver
- add new Hisilicon SPI flash controller driver
- support a few new flash, and update supported features on others
- fix the logic used for detecting a fully-unlocked flash
And other miscellaneous small fixes"
* tag 'for-linus-20160801' of git://git.infradead.org/linux-mtd: (60 commits)
mtd: spi-nor: don't build Cadence QuadSPI on non-ARM
mtd: mtk-nor: remove duplicated include from mtk-quadspi.c
mtd: nand: fix bug writing 1 byte less than page size
mtd: update description of MTD_BCM47XXSFLASH symbol
mtd: spi-nor: Add driver for Cadence Quad SPI Flash Controller
mtd: spi-nor: Bindings for Cadence Quad SPI Flash Controller driver
mtd: nand: brcmnand: Change BUG_ON in brcmnand_send_cmd
mtd: pmcmsp-flash: Allocating too much in init_msp_flash()
mtd: maps: sa1100-flash: potential NULL dereference
mtd: atmel-quadspi: add driver for Atmel QSPI controller
mtd: nand: omap2: fix return value check in omap_nand_probe()
Documentation: atmel-quadspi: add binding file for Atmel QSPI driver
mtd: spi-nor: add hisilicon spi-nor flash controller driver
mtd: spi-nor: support dual, quad, and WP for Gigadevice
mtd: spi-nor: Added support for n25q00a.
memory: Update dependency of IFC for Layerscape
mtd: nand: jz4780: Update MODULE_AUTHOR email address
mtd: nand: sunxi: prevent a small memory leak
mtd: nand: sunxi: add reset line support
mtd: nand: sunxi: update DT bindings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull misc kbuild updates from Michal Marek:
- coccicheck script improvements by Luis Rodriguez and Deepa Dinamani
- new coccinelle patches by Yann Droneaud and Vaishali Thakkar
- debian packaging fixes by Wilfried Klaebe, Henning Schild and Marcin
Mielniczuk
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
Fix the Debian packaging script on systems with no codename
builddeb: fix file permissions before packaging
scripts/coccinelle: require coccinelle >= 1.0.4 on device_node_continue.cocci
coccicheck: refer to Documentation/coccinelle.txt and wiki
coccicheck: add support for requring a coccinelle version
scripts: add Linux .cocciconfig for coccinelle
coccicheck: replace --very-quiet with --quiet when debugging
coccicheck: add support for DEBUG_FILE
coccicheck: enable parmap support
coccicheck: make SPFLAGS more useful
coccicheck: move spatch binary check up
builddeb: really include objtool binary in headers package
coccinelle: catch krealloc() on devm_*() allocated memory
coccinelle: recognize more devm_* memory allocation functions
coccinelle: also catch kzfree() issues
coccicheck: Allow for overriding spatch flags
Coccinelle: noderef: Add new rules and correct the old rule
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:
- GCC plugin support by Emese Revfy from grsecurity, with a fixup from
Kees Cook. The plugins are meant to be used for static analysis of
the kernel code. Two plugins are provided already.
- reduction of the gcc commandline by Arnd Bergmann.
- IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada
- bin2c fix by Michael Tautschnig
- setlocalversion fix by Wolfram Sang
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
gcc-plugins: disable under COMPILE_TEST
kbuild: Abort build on bad stack protector flag
scripts: Fix size mismatch of kexec_purgatory_size
kbuild: make samples depend on headers_install
Kbuild: don't add obj tree in additional includes
Kbuild: arch: look for generated headers in obtree
Kbuild: always prefix objtree in LINUXINCLUDE
Kbuild: avoid duplicate include path
Kbuild: don't add ../../ to include path
vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
kconfig.h: use already defined macros for IS_REACHABLE() define
export.h: use __is_defined() to check if __KSYM_* is defined
kconfig.h: use __is_defined() to check if MODULE is defined
kbuild: setlocalversion: print error to STDERR
Add sancov plugin
Add Cyclomatic complexity GCC plugin
GCC plugin infrastructure
Shared library support
|
|
Otherwise, there is potential for both DMF_SUSPENDED* and
DMF_NOFLUSH_SUSPENDING to not be set during dm_suspend() -- which is
definitely _not_ a valid state.
This fix, in conjuction with "dm rq: fix the starting and stopping of
blk-mq queues", addresses the potential for request-based DM multipath's
__multipath_map() to see !dm_noflush_suspending() during suspend.
Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]
|
|
Improve dm_stop_queue() to cancel any requeue_work. Also, have
dm_start_queue() and dm_stop_queue() clear/set the QUEUE_FLAG_STOPPED
for the blk-mq request_queue.
On suspend dm_stop_queue() handles stopping the blk-mq request_queue
BUT: even though the hw_queues are marked BLK_MQ_S_STOPPED at that point
there is still a race that is allowing block/blk-mq.c to call ->queue_rq
against a hctx that it really shouldn't. Add a check to
dm_mq_queue_rq() that guards against this rarity (albeit _not_
race-free).
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected] # must patch dm.c on < 4.8 kernels
|
|
Multiple flags were being tested without locking. Protect against
non-atomic bit changes in m->flags by holding m->lock (while testing or
setting the queue_if_no_path related flags).
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Pull KVM updates from Paolo Bonzini:
- ARM: GICv3 ITS emulation and various fixes. Removal of the
old VGIC implementation.
- s390: support for trapping software breakpoints, nested
virtualization (vSIE), the STHYI opcode, initial extensions
for CPU model support.
- MIPS: support for MIPS64 hosts (32-bit guests only) and lots
of cleanups, preliminary to this and the upcoming support for
hardware virtualization extensions.
- x86: support for execute-only mappings in nested EPT; reduced
vmexit latency for TSC deadline timer (by about 30%) on Intel
hosts; support for more than 255 vCPUs.
- PPC: bugfixes.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
KVM: PPC: Introduce KVM_CAP_PPC_HTM
MIPS: Select HAVE_KVM for MIPS64_R{2,6}
MIPS: KVM: Reset CP0_PageMask during host TLB flush
MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
MIPS: KVM: Sign extend MFC0/RDHWR results
MIPS: KVM: Fix 64-bit big endian dynamic translation
MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
MIPS: KVM: Use 64-bit CP0_EBase when appropriate
MIPS: KVM: Set CP0_Status.KX on MIPS64
MIPS: KVM: Make entry code MIPS64 friendly
MIPS: KVM: Use kmap instead of CKSEG0ADDR()
MIPS: KVM: Use virt_to_phys() to get commpage PFN
MIPS: Fix definition of KSEGX() for 64-bit
KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
kvm: x86: nVMX: maintain internal copy of current VMCS
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
KVM: arm64: vgic-its: Simplify MAPI error handling
KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
...
|
|
When the corrupt_bio_byte feature was introduced it caused READ bios to
no longer be errored with -EIO during the down_interval. This had to do
with the complexity of needing to submit READs if the corrupt_bio_byte
feature was used.
Fix it so READ bios are properly errored with -EIO; doing so early in
flakey_map() as long as there isn't a match for the corrupt_bio_byte
feature.
Fixes: a3998799fb4df ("dm flakey: add corrupt_bio_byte feature")
Reported-by: Akira Hayakawa <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]
|
|
Instead of copying the actual GRH of type struct ib_grh, existing code
copies the struct ib_global_route into the sge. This patch fixes that
and constructs the actual GRH from ib_global_route and copies the GRH
into the sge.
Reviewed-by: Dennis Dalessandro <[email protected]>
Reviewed-by: Dean Luick <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dasaratharaman Chandramouli <[email protected]>
Signed-off-by: Don Hiatt <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The interface is used to compute the 5-bit SC field from the
LRH and the RHF bits. Modify code to use the interface instead.
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Dasaratharaman Chandramouli <[email protected]>
Signed-off-by: Don Hiatt <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Cleanup hfi1_ud_rcv to not have to look at the packet
header fields multiple times. The fields are looked up
once and used throughout the function. Also fix sc
computation when validating MAD packets.
Reviewed-by: Dennis Dalessandro <[email protected]>
Reviewed-by: Dean Luick <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dasaratharaman Chandramouli <[email protected]>
Signed-off-by: Don Hiatt <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
hfi1_pio_header should really be called hfi1_sdma_header
as it is only used for sdma transmits.
Reviewed-by: Dennis Dalessandro <[email protected]>
Reviewed-by: Dean Luick <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Don Hiatt <[email protected]>
Signed-off-by: Dasaratharaman Chandramouli <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
struct ahg_ib_header has no header specific information.
Rename it to struct hfi1_ahg_info
Reviewed-by: Dennis Dalessandro <[email protected]>
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Dasaratharaman Chandramouli <[email protected]>
Signed-off-by: Don Hiatt <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
sde and hfi1_ib_header are not used anymore.
Reviewed-by: Dennis Dalessandro <[email protected]>
Reviewed-by: Dean Luick <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dasaratharaman Chandramouli <[email protected]>
Signed-off-by: Don Hiatt <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Active QSFP cables were reset only every alternate iteration of the
channel tuning algorithm instead of every iteration due to incorrect
reset of the flag that controlled QSFP reset, resulting in using stale
QSFP status in the channel tuning algorithm.
Fixes: 8ebd4cf1852a ("Add active and optical cable support")
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Easwar Hariharan <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Some QSFP cables assert the interrupt line as a side effect of module
plug-in and power up. This causes the SerDes and QSFP tuning algorithm
to begin cable initialization by reading the QSFP memory map over I2C,
which fails. This patch ignores any interrupt line assertion until
the module has completed power up and voltage rails have stabilized,
which can take a maximum of 500 ms per the SFF-8679 specification.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Easwar Hariharan <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
QSFP CDR enablement is now controlled by determining power class
and the configuration file. We disable the DC 8051 from requesting
enablement or disabling of TX and RX CDRs by removing the code
that allowed the DC 8051 to request changes.
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Easwar Hariharan <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Hanging has been observed while writing a file over NFSoRDMA. Dmesg on
the server contains messages like these:
[ 931.992501] svcrdma: Error -22 posting RDMA_READ
[ 952.076879] svcrdma: Error -22 posting RDMA_READ
[ 982.154127] svcrdma: Error -22 posting RDMA_READ
[ 1012.235884] svcrdma: Error -22 posting RDMA_READ
[ 1042.319194] svcrdma: Error -22 posting RDMA_READ
Here is why:
With the base memory management extension enabled, FRMR is used instead
of FMR. The xprtrdma server issues each RDMA read request as the following
bundle:
(1)IB_WR_REG_MR, signaled;
(2)IB_WR_RDMA_READ, signaled;
(3)IB_WR_LOCAL_INV, signaled & fencing.
These requests are signaled. In order to generate completion, the fast
register work request is processed by the hfi1 send engine after being
posted to the work queue, and the corresponding lkey is not valid until
the request is processed. However, the rdmavt driver validates lkey when
the RDMA read request is posted and thus it fails immediately with error
-EINVAL (-22).
This patch changes the work flow of local operations (fast register and
local invalidate) so that fast register work requests are always
processed immediately to ensure that the corresponding lkey is valid
when subsequent work requests are posted. Local invalidate requests are
processed immediately if fencing is not required and no previous local
invalidate request is pending.
To allow completion generation for signaled local operations that have
been processed before posting to the work queue, an internal send flag
RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag
and only generates completion for such requests.
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Jianxin Xiong <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This fix allows for support of in-kernel reserved operations
without impacting the ULP user.
The low level driver can register a non-zero value which
will be transparently added to the send queue size and hidden
from the ULP in every respect.
ULP post sends will never see a full queue due to a reserved
post send and reserved operations will never exceed that
registered value.
The s_avail will continue to track the ULP swqe availability
and the difference between the reserved value and the reserved
in use will track reserved availabity.
Reviewed-by: Ashutosh Dixit <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|