aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-01-10zram: support BDI_CAP_STABLE_WRITESMinchan Kim1-2/+11
zram has used per-cpu stream feature from v4.7. It aims for increasing cache hit ratio of scratch buffer for compressing. Downside of that approach is that zram should ask memory space for compressed page in per-cpu context which requires stricted gfp flag which could be failed. If so, it retries to allocate memory space out of per-cpu context so it could get memory this time and compress the data again, copies it to the memory space. In this scenario, zram assumes the data should never be changed but it is not true without stable page support. So, If the data is changed under us, zram can make buffer overrun so that zsmalloc free object chain is broken so system goes crash like below https://bugzilla.suse.com/show_bug.cgi?id=997574 This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block device needing *stable write*". Fixes: da9556a2367c ("zram: user per-cpu compression streams") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Hyeoncheol Lee <[email protected]> Cc: <[email protected]> Cc: Sangseok Lee <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: <[email protected]> [4.7+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10zram: revalidate disk under init_lockMinchan Kim1-7/+1
Commit b4c5c60920e3 ("zram: avoid lockdep splat by revalidate_disk") moved revalidate_disk call out of init_lock to avoid lockdep false-positive splat. However, commit 08eee69fcf6b ("zram: remove init_lock in zram_make_request") removed init_lock in IO path so there is no worry about lockdep splat. So, let's restore it. This patch is needed to set BDI_CAP_STABLE_WRITES atomically in next patch. Fixes: da9556a2367c ("zram: user per-cpu compression streams") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Hyeoncheol Lee <[email protected]> Cc: <[email protected]> Cc: Sangseok Lee <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: <[email protected]> [4.7+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: support anonymous stable pageMinchan Kim2-2/+21
During developemnt for zram-swap asynchronous writeback, I found strange corruption of compressed page, resulting in: Modules linked in: zram(E) CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G E 4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff88007620b840 task.stack: ffff880078090000 RIP: set_freeobj.part.43+0x1c/0x1f RSP: 0018:ffff880078093ca8 EFLAGS: 00010246 RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8 RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246 RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88 R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0 Call Trace: obj_malloc+0x22b/0x260 zs_malloc+0x1e4/0x580 zram_bvec_rw+0x4cd/0x830 [zram] page_requests_rw+0x9c/0x130 [zram] zram_thread+0xe6/0x173 [zram] kthread+0xca/0xe0 ret_from_fork+0x25/0x30 With investigation, it reveals currently stable page doesn't support anonymous page. IOW, reuse_swap_page can reuse the page without waiting writeback completion so it can overwrite page zram is compressing. Unfortunately, zram has used per-cpu stream feature from v4.7. It aims for increasing cache hit ratio of scratch buffer for compressing. Downside of that approach is that zram should ask memory space for compressed page in per-cpu context which requires stricted gfp flag which could be failed. If so, it retries to allocate memory space out of per-cpu context so it could get memory this time and compress the data again, copies it to the memory space. In this scenario, zram assumes the data should never be changed but it is not true unless stable page supports. So, If the data is changed under us, zram can make buffer overrun because second compression size could be bigger than one we got in previous trial and blindly, copy bigger size object to smaller buffer which is buffer overrun. The overrun breaks zsmalloc free object chaining so system goes crash like above. I think below is same problem. https://bugzilla.suse.com/show_bug.cgi?id=997574 Unfortunately, reuse_swap_page should be atomic so that we cannot wait on writeback in there so the approach in this patch is simply return false if we found it needs stable page. Although it increases memory footprint temporarily, it happens rarely and it should be reclaimed easily althoug it happened. Also, It would be better than waiting of IO completion, which is critial path for application latency. Fixes: da9556a2367c ("zram: user per-cpu compression streams") Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Hyeoncheol Lee <[email protected]> Cc: <[email protected]> Cc: Sangseok Lee <[email protected]> Cc: <[email protected]> [4.7+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: add documentation for page fragment APIsAlexander Duyck1-0/+42
This is a first pass at trying to add documentation for the page_frag APIs. They may still change over time but for now I thought I would try to get these documented so that as more network drivers and stack calls make use of them we have one central spot to document how they are meant to be used. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: rename __page_frag functions to __page_frag_cache, drop order from drainAlexander Duyck3-11/+11
This patch does two things. First it goes through and renames the __page_frag prefixed functions to __page_frag_cache so that we can be clear that we are draining or refilling the cache, not the frags themselves. Second we drop the order parameter from __page_frag_cache_drain since we don't actually need to pass it since all fragments are either order 0 or must be a compound page. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to ↵Alexander Duyck4-13/+13
page_frag_free Patch series "Page fragment updates", v4. This patch series takes care of a few cleanups for the page fragments API. First we do some renames so that things are much more consistent. First we move the page_frag_ portion of the name to the front of the functions names. Secondly we split out the cache specific functions from the other page fragment functions by adding the word "cache" to the name. Finally I added a bit of documentation that will hopefully help to explain some of this. I plan to revisit this later as we get things more ironed out in the near future with the changes planned for the DMA setup to support eXpress Data Path. This patch (of 3): This patch renames the page frag functions to be more consistent with other APIs. Specifically we place the name page_frag first in the name and then have either an alloc or free call name that we append as the suffix. This makes it a bit clearer in terms of naming. In addition we drop the leading double underscores since we are technically no longer a backing interface and instead the front end that is called from the networking APIs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm, memcg: fix the active list aging for lowmem requests when memcg is enabledMichal Hocko4-24/+49
Nils Holland and Klaus Ethgen have reported unexpected OOM killer invocations with 32b kernel starting with 4.8 kernels kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0 kworker/u4:5 cpuset=/ mems_allowed=0 CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2 [...] Mem-Info: active_anon:58685 inactive_anon:90 isolated_anon:0 active_file:274324 inactive_file:281962 isolated_file:0 unevictable:0 dirty:649 writeback:0 unstable:0 slab_reclaimable:40662 slab_unreclaimable:17754 mapped:7382 shmem:202 pagetables:351 bounce:0 free:206736 free_pcp:332 free_cma:0 Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 813 3474 3474 Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB lowmem_reserve[]: 0 0 21292 21292 HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB the oom killer is clearly pre-mature because there there is still a lot of page cache in the zone Normal which should satisfy this lowmem request. Further debugging has shown that the reclaim cannot make any forward progress because the page cache is hidden in the active list which doesn't get rotated because inactive_list_is_low is not memcg aware. The code simply subtracts per-zone highmem counters from the respective memcg's lru sizes which doesn't make any sense. We can simply end up always seeing the resulting active and inactive counts 0 and return false. This issue is not limited to 32b kernels but in practice the effect on systems without CONFIG_HIGHMEM would be much harder to notice because we do not invoke the OOM killer for allocations requests targeting < ZONE_NORMAL. Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node and subtract per-memcg highmem counts when memcg is enabled. Introduce helper lruvec_zone_lru_size which redirects to either zone counters or mem_cgroup_get_zone_lru_size when appropriate. We are losing empty LRU but non-zero lru size detection introduced by ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size") because of the inherent zone vs. node discrepancy. Fixes: f8d1a31163fc ("mm: consider whether to decivate based on eligible zones inactive ratio") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Nils Holland <[email protected]> Tested-by: Nils Holland <[email protected]> Reported-by: Klaus Ethgen <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: don't dereference struct page fields of invalid pagesArd Biesheuvel1-3/+3
The VM_BUG_ON() check in move_freepages() checks whether the node id of a page matches the node id of its zone. However, it does this before having checked whether the struct page pointer refers to a valid struct page to begin with. This is guaranteed in most cases, but may not be the case if CONFIG_HOLES_IN_ZONE=y. So reorder the VM_BUG_ON() with the pfn_valid_within() check. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ard Biesheuvel <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Hanjun Guo <[email protected]> Cc: Yisheng Xie <[email protected]> Cc: Robert Richter <[email protected]> Cc: James Morse <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mailmap: add codeaurora.org names for nameless email commitsStephen Boyd1-0/+4
Some codeaurora.org emails have crept in but the names don't exist for them. Add the names for the emails so git can match everyone up. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Cc: Sarangdhar Joshi <[email protected]> Cc: Subash Abhinov Kasiviswanathan <[email protected]> Cc: Subhash Jadavani <[email protected]> Cc: Thomas Pedersen <[email protected]> Cc: Andy Gross <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10signal: protect SIGNAL_UNKILLABLE from unintentional clearing.Jamie Iles2-2/+12
Since commit 00cd5c37afd5 ("ptrace: permit ptracing of /sbin/init") we can now trace init processes. init is initially protected with SIGNAL_UNKILLABLE which will prevent fatal signals such as SIGSTOP, but there are a number of paths during tracing where SIGNAL_UNKILLABLE can be implicitly cleared. This can result in init becoming stoppable/killable after tracing. For example, running: while true; do kill -STOP 1; done & strace -p 1 and then stopping strace and the kill loop will result in init being left in state TASK_STOPPED. Sending SIGCONT to init will resume it, but init will now respond to future SIGSTOP signals rather than ignoring them. Make sure that when setting SIGNAL_STOP_CONTINUED/SIGNAL_STOP_STOPPED that we don't clear SIGNAL_UNKILLABLE. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jamie Iles <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: pmd dirty emulation in page fault handlerMinchan Kim1-2/+4
Andreas reported [1] made a test in jemalloc hang in THP mode in arm64: http://lkml.kernel.org/r/[email protected] The problem is currently page fault handler doesn't supports dirty bit emulation of pmd for non-HW dirty-bit architecture so that application stucks until VM marked the pmd dirty. How the emulation work depends on the architecture. In case of arm64, when it set up pte firstly, it sets pte PTE_RDONLY to get a chance to mark the pte dirty via triggering page fault when store access happens. Once the page fault occurs, VM marks the pmd dirty and arch code for setting pmd will clear PTE_RDONLY for application to proceed. IOW, if VM doesn't mark the pmd dirty, application hangs forever by repeated fault(i.e., store op but the pmd is PTE_RDONLY). This patch enables pmd dirty-bit emulation for those architectures. [1] b8d3c4c3009d, mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called Fixes: b8d3c4c3009d ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reported-by: Andreas Schwab <[email protected]> Tested-by: Andreas Schwab <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jason Evans <[email protected]> Cc: Will Deacon <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: <[email protected]> [4.5+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10ipc/sem.c: fix incorrect sem_lock pairingManfred Spraul1-1/+1
Based on the syzcaller test case from dvyukov: https://gist.githubusercontent.com/dvyukov/d0e5efefe4d7d6daed829f5c3ca26a40/raw/08d0a261fe3c987bed04fbf267e08ba04bd533ea/gistfile1.txt The slow (i.e.: failure to acquire) syscall exit from semtimedop() incorrectly assumed that the the same lock is acquired as it was at the initial syscall entry. This is wrong: - thread A: single semop semop(), sleeps - thread B: multi semop semop(), sleeps - thread A: woken up by signal/timeout With this sequence, the initial sem_lock() call locks the per-semaphore spinlock, and it is unlocked with sem_unlock(). The call at the syscall return locks the global spinlock. Because locknum is not updated, the following sem_unlock() call unlocks the per-semaphore spinlock, which is actually not locked. The fix is trivial: Use the return value from sem_lock. Fixes: 370b262c896e ("ipc/sem: avoid idr tree lookup for interrupted semop") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Reported-by: Johanna Abrahamsson <[email protected]> Tested-by: Johanna Abrahamsson <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10lib/Kconfig.debug: fix frv build failureSudip Mukherjee1-1/+1
The build of frv allmodconfig was failing with the errors like: /tmp/cc0JSPc3.s: Assembler messages: /tmp/cc0JSPc3.s:1839: Error: symbol `.LSLT0' is already defined /tmp/cc0JSPc3.s:1842: Error: symbol `.LASLTP0' is already defined /tmp/cc0JSPc3.s:1969: Error: symbol `.LELTP0' is already defined /tmp/cc0JSPc3.s:1970: Error: symbol `.LELT0' is already defined Commit 866ced950bcd ("kbuild: Support split debug info v4") introduced splitting the debug info and keeping that in a separate file. Somehow, the frv-linux gcc did not like that and I am guessing that instead of splitting it started copying. The first report about this is at: https://lists.01.org/pipermail/kbuild-all/2015-July/010527.html. I will try and see if this can work with frv and if still fails I will open a bug report with gcc. But meanwhile this is the easiest option to solve build failure of frv. Fixes: 866ced950bcd ("kbuild: Support split debug info v4") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sudip Mukherjee <[email protected]> Reported-by: Fengguang Wu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: get rid of __GFP_OTHER_NODEMichal Hocko6-21/+9
The flag was introduced by commit 78afd5612deb ("mm: add __GFP_OTHER_NODE flag") to allow proper accounting of remote node allocations done by kernel daemons on behalf of a process - e.g. khugepaged. After "mm: fix remote numa hits statistics" we do not need and actually use the flag so we can safely remove it because all allocations which are satisfied from their "home" node are accounted properly. [[email protected]: fix build] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Taku Izumi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: fix remote numa hits statisticsMichal Hocko1-11/+4
Jia He has noticed that commit b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics") has an unintentional side effect that remote node allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are many of these potentially because the flag is used very rarely while we have many users of __alloc_pages_node. Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a follow up patch) and treat all allocations that were satisfied from the preferred zone's node as NUMA_HITS because this is the same node we requested the allocation from in most cases. If this is not the local node then we just account it as NUMA_OTHER rather than NUMA_LOCAL. One downsize would be that an allocation request for a node which is outside of the mempolicy nodemask would be reported as a hit which is a bit weird but that was the case before b9f00e147f27 already. Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Jia He <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> # with cbmc[1] superpowers Acked-by: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Taku Izumi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}Dan Williams1-0/+4
Both arch_add_memory() and arch_remove_memory() expect a single threaded context. For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does not hold any locks over this check and branch: if (pgd_val(*pgd)) { pud = (pud_t *)pgd_page_vaddr(*pgd); paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end), page_size_mask); continue; } pud = alloc_low_page(); paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end), page_size_mask); The result is that two threads calling devm_memremap_pages() simultaneously can end up colliding on pgd initialization. This leads to crash signatures like the following where the loser of the race initializes the wrong pgd entry: BUG: unable to handle kernel paging request at ffff888ebfff0000 IP: memcpy_erms+0x6/0x10 PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */ Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13 task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000 RIP: memcpy_erms+0x6/0x10 [..] Call Trace: ? pmem_do_bvec+0x205/0x370 [nd_pmem] ? blk_queue_enter+0x3a/0x280 pmem_rw_page+0x38/0x80 [nd_pmem] bdev_read_page+0x84/0xb0 Hold the standard memory hotplug mutex over calls to arch_{add,remove}_memory(). Fixes: 41e94a851304 ("add devm_memremap_pages") Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10ocfs2: fix crash caused by stale lvb with fsdlm pluginEric Ren3-0/+19
The crash happens rather often when we reset some cluster nodes while nodes contend fiercely to do truncate and append. The crash backtrace is below: dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18) ocfs2: End replay journal (node 318952601, slot 2) on device (253,18) ocfs2: Beginning quota recovery on device (253,18) for slot 2 ocfs2: Finishing quota recovery on device (253,18) for slot 2 (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode) (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1 ------------[ cut here ]------------ kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: No, Unsupported modules are loaded CPU: 1 PID: 30154 Comm: truncate Tainted: G OE N 4.4.21-69-default #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014 task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000 RIP: 0010:[<ffffffffa05c8c30>] [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2] RSP: 0018:ffff880074e6bd50 EFLAGS: 00010282 RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414 R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448 R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020 FS: 00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0 Call Trace: ocfs2_setattr+0x698/0xa90 [ocfs2] notify_change+0x1ae/0x380 do_truncate+0x5e/0x90 do_sys_ftruncate.constprop.11+0x108/0x160 entry_SYSCALL_64_fastpath+0x12/0x6d Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff RIP [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2] It's because ocfs2_inode_lock() get us stale LVB in which the i_size is not equal to the disk i_size. We mistakenly trust the LVB because the underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with DLM_SBF_VALNOTVALID properly for us. But, why? The current code tries to downconvert lock without DLM_LKF_VALBLK flag to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even if the lock resource type needs LVB. This is not the right way for fsdlm. The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node failure happens. The following diagram briefly illustrates how this crash happens: RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB; The 1st round: Node1 Node2 RSB1: PR RSB1(master): NULL->EX ocfs2_downconvert_lock(PR->NULL, set_lvb==0) ocfs2_dlm_lock(no DLM_LKF_VALBLK) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - dlm_lock(no DLM_LKF_VALBLK) convert_lock(overwrite lkb->lkb_exflags with no DLM_LKF_VALBLK) RSB1: NULL RSB1: EX reset Node2 dlm_recover_rsbs() recover_lvb() /* The LVB is not trustable if the node with EX fails and * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1. */ if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to return; * to invalid the LVB here. */ The 2nd round: Node 1 Node2 RSB1(become master from recovery) ocfs2_setattr() ocfs2_inode_lock(NULL->EX) /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */ ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */ ocfs2_truncate_file() mlog_bug_on_msg(disk isize != i_size_read(inode)) /* crash! */ The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin is uesed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Eric Ren <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10bpf: do not use KMALLOC_SHIFT_MAXMichal Hocko2-2/+2
Commit 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer overflow") has added checks for the maximum allocateable size. It (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect it is not very clean because we already have KMALLOC_MAX_SIZE for this very reason so let's change both checks to use KMALLOC_MAX_SIZE instead. The original motivation for using KMALLOC_SHIFT_MAX was to work around an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER". Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDERMichal Hocko1-2/+2
Andrey Konovalov has reported the following warning triggered by the syzkaller fuzzer. WARNING: CPU: 1 PID: 9935 at mm/page_alloc.c:3511 __alloc_pages_nodemask+0x159c/0x1e20 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 9935 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __alloc_pages_slowpath mm/page_alloc.c:3511 __alloc_pages_nodemask+0x159c/0x1e20 mm/page_alloc.c:3781 alloc_pages_current+0x1c7/0x6b0 mm/mempolicy.c:2072 alloc_pages include/linux/gfp.h:469 kmalloc_order+0x1f/0x70 mm/slab_common.c:1015 kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1026 kmalloc_large include/linux/slab.h:422 __kmalloc+0x210/0x2d0 mm/slub.c:3723 kmalloc include/linux/slab.h:495 ep_write_iter+0x167/0xb50 drivers/usb/gadget/legacy/inode.c:664 new_sync_write fs/read_write.c:499 __vfs_write+0x483/0x760 fs/read_write.c:512 vfs_write+0x170/0x4e0 fs/read_write.c:560 SYSC_write fs/read_write.c:607 SyS_write+0xfb/0x230 fs/read_write.c:599 entry_SYSCALL_64_fastpath+0x1f/0xc2 The issue is caused by a lack of size check for the request size in ep_write_iter which should be fixed. It, however, points to another problem, that SLUB defines KMALLOC_MAX_SIZE too large because the its KMALLOC_SHIFT_MAX is (MAX_ORDER + PAGE_SHIFT) which means that the resulting page allocator request might be MAX_ORDER which is too large (see __alloc_pages_slowpath). The same applies to the SLOB allocator which allows even larger sizes. Make sure that they are capped properly and never request more than MAX_ORDER order. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Andrey Konovalov <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Alexei Starovoitov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10dax: wrprotect pmd_t in dax_mapping_entry_mkcleanRoss Zwisler3-19/+38
Currently dax_mapping_entry_mkclean() fails to clean and write protect the pmd_t of a DAX PMD entry during an *sync operation. This can result in data loss in the following sequence: 1) mmap write to DAX PMD, dirtying PMD radix tree entry and making the pmd_t dirty and writeable 2) fsync, flushing out PMD data and cleaning the radix tree entry. We currently fail to mark the pmd_t as clean and write protected. 3) more mmap writes to the PMD. These don't cause any page faults since the pmd_t is dirty and writeable. The radix tree entry remains clean. 4) fsync, which fails to flush the dirty PMD data because the radix tree entry was clean. 5) crash - dirty data that should have been fsync'd as part of 4) could still have been in the processor cache, and is lost. Fix this by marking the pmd_t clean and write protected in dax_mapping_entry_mkclean(), which is called as part of the fsync operation 2). This will cause the writes in step 3) above to generate page faults where we'll re-dirty the PMD radix tree entry, resulting in flushes in the fsync that happens in step 4). Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ross Zwisler <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm: add follow_pte_pmd()Ross Zwisler2-7/+32
Patch series "Write protect DAX PMDs in *sync path". Currently dax_mapping_entry_mkclean() fails to clean and write protect the pmd_t of a DAX PMD entry during an *sync operation. This can result in data loss, as detailed in patch 2. This series is based on Dan's "libnvdimm-pending" branch, which is the current home for Jan's "dax: Page invalidation fixes" series. You can find a working tree here: https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_clean This patch (of 2): Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a huge page PMD leaf to be found and returned. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ross Zwisler <[email protected]> Suggested-by: Dave Hansen <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10mm/thp/pagecache/collapse: free the pte page table on collapse for thp page ↵Aneesh Kumar K.V1-19/+2
cache. With THP page cache, when trying to build a huge page from regular pte pages, we just clear the pmd entry. We will take another fault and at that point we will find the huge page in the radix tree, thereby using the huge page to complete the page fault The second fault path will allocate the needed pgtable_t page for archs like ppc64. So no need to deposit the same in collapse path. Depositing them in the collapse path resulting in a pgtable_t memory leak also giving errors like BUG: non-zero nr_ptes on freeing mm: 3 Fixes: 953c66c2b22a ("mm: THP page cache support for ppc64") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10dax: fix deadlock with DAX 4k holesRoss Zwisler1-1/+1
Currently in DAX if we have three read faults on the same hole address we can end up with the following: Thread 0 Thread 1 Thread 2 -------- -------- -------- dax_iomap_fault grab_mapping_entry lock_slot <locks empty DAX entry> dax_iomap_fault grab_mapping_entry get_unlocked_mapping_entry <sleeps on empty DAX entry> dax_iomap_fault grab_mapping_entry get_unlocked_mapping_entry <sleeps on empty DAX entry> dax_load_hole find_or_create_page ... page_cache_tree_insert dax_wake_mapping_entry_waiter <wakes one sleeper> __radix_tree_replace <swaps empty DAX entry with 4k zero page> <wakes> get_page lock_page ... put_locked_mapping_entry unlock_page put_page <sleeps forever on the DAX wait queue> The crux of the problem is that once we insert a 4k zero page, all locking from then on is done in terms of that 4k zero page and any additional threads sleeping on the empty DAX entry will never be woken. Fix this by waking all sleepers when we replace the DAX radix tree entry with a 4k zero page. This will allow all sleeping threads to successfully transition from locking based on the DAX empty entry to locking on the 4k zero page. With the test case reported by Xiong this happens very regularly in my test setup, with some runs resulting in 9+ threads in this deadlocked state. With this fix I've been able to run that same test dozens of times in a loop without issue. Fixes: ac401cc78242 ("dax: New fault locking") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ross Zwisler <[email protected]> Reported-by: Xiong Zhou <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: <[email protected]> [4.7+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10MAINTAINERS: remove duplicate bug filling descriptionVlastimil Babka1-1/+0
I have noticed that two different descriptions for B: entries in MAINTAINERS were merged: commit 686564434e88 ("MAINTAINERS: Add bug tracking system location entry type") and 2de2bd95f456 ("MAINTAINERS: add "B:" for URI where to file bugs"). This patch keeps the description from 2de2bd95f456. There has been a discussion [1] about whether this more detailed description is useful and what it exactly implies. I find it more useful and general, and the author of 686564434e88 agreed in the end that either is fine. [1] https://lkml.org/lkml/2016/12/8/71 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-10gro: Disable frag0 optimization on IPv6 ext headersHerbert Xu2-2/+8
The GRO fast path caches the frag0 address. This address becomes invalid if frag0 is modified by pskb_may_pull or its variants. So whenever that happens we must disable the frag0 optimization. This is usually done through the combination of gro_header_hard and gro_header_slow, however, the IPv6 extension header path did the pulling directly and would continue to use the GRO fast path incorrectly. This patch fixes it by disabling the fast path when we enter the IPv6 extension header path. Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address") Reported-by: Slava Shwartsman <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-01-10gro: Enter slow-path if there is no tailroomHerbert Xu1-1/+2
The GRO path has a fast-path where we avoid calling pskb_may_pull and pskb_expand by directly accessing frag0. However, this should only be done if we have enough tailroom in the skb as otherwise we'll have to expand it later anyway. This patch adds the check by capping frag0_len with the skb tailroom. Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull") Reported-by: Slava Shwartsman <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-01-10mlx4: Return EOPNOTSUPP instead of ENOTSUPPMartin KaFai Lau1-1/+1
In commit b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs"), it changed EOPNOTSUPP to ENOTSUPP by mistake. This patch fixes it. Fixes: b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs") Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-01-10net/af_iucv: don't use paged skbs for TX on HiperSocketsJulian Wiedmann1-11/+14
With commit e53743994e21 ("af_iucv: use paged SKBs for big outbound messages"), we transmit paged skbs for both of AF_IUCV's transport modes (IUCV or HiperSockets). The qeth driver for Layer 3 HiperSockets currently doesn't support NETIF_F_SG, so these skbs would just be linearized again by the stack. Avoid that overhead by using paged skbs only for IUCV transport. cc stable, since this also circumvents a significant skb leak when sending large messages (where the skb then needs to be linearized). Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: Ursula Braun <[email protected]> Cc: <[email protected]> # v4.8+ Fixes: e53743994e21 ("af_iucv: use paged SKBs for big outbound messages") Signed-off-by: David S. Miller <[email protected]>
2017-01-10net: add the AF_QIPCRTR entries to family name tablesAnna, Suman1-3/+3
Commit bdabad3e363d ("net: Add Qualcomm IPC router") introduced a new address family. Update the family name tables accordingly so that the lockdep initialization can use the proper names for this family. Cc: Courtney Cavin <[email protected]> Cc: Bjorn Andersson <[email protected]> Signed-off-by: Suman Anna <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-01-10net: qrtr: Mark 'buf' as little endianStephen Boyd1-2/+2
Failure to mark this pointer as __le32 causes checkers like sparse to complain: net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:274:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:274:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:275:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:275:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:276:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:276:16: got restricted __le32 [usertype] <noident> Silence it. Cc: Bjorn Andersson <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Acked-by: Bjorn Andersson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-01-10net: dsa: Ensure validity of dst->ds[0]Florian Fainelli1-4/+7
It is perfectly possible to have non zero indexed switches being present in a DSA switch tree, in such a case, we will be deferencing a NULL pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive and ensure that dst->ds[0] is valid before doing anything with it. Fixes: 0c73c523cf73 ("net: dsa: Initialize CPU port ethtool ops per tree") Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-01-10drm/i915/psr: disable psr2 for resolution greater than 32X20Nagaraju, Vathsala1-7/+8
PSR2 is restricted to work with panel resolutions upto 3200x2000, move the check to intel_psr_match_conditions and fully block psr. Cc: Rodrigo Vivi <[email protected]> Cc: Jim Bride <[email protected]> Suggested-by: Rodrigo Vivi <[email protected]> Signed-off-by: Vathsala Nagaraju <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-01-10drm/i915/psr: program vsc header for psr2Nagaraju, Vathsala3-2/+43
Function hsw_psr_setup handles vsc header setup for psr1 and skl_psr_setup_vsc handles vsc header setup for psr2. Setup VSC header in function skl_psr_setup_vsc for psr2 support, as per edp 1.4 spec, table 6-11:VSC SDP HEADER Extension for psr2 operation. v2: (Jani) - Initialize variables to 0 - intel_dp_get_y_cord_status and intel_dp_get_y_cord_status made static - Correct indentation for continuation lines - Change DP_PSR_Y_COORDINATE to DP_PSR2_SU_Y_COORDINATE_REQUIRED - Change DPRX_FEATURE_ENUMERATION_LIST to DP_DPRX_* - Change VSC_SDP_EXT_FOR_COLORIMETRY_SUPPORTED to DP_VSC_* Cc: Rodrigo Vivi <[email protected]> Cc: Jim Bride <[email protected]> Signed-off-by: Vathsala Nagaraju <[email protected]> Signed-off-by: Patil Deepti <[email protected]> Reviewed-by: Jim Bride <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-01-10drm : adds Y-coordinate and Colorimetry FormatNagaraju, Vathsala1-1/+12
PSR2 vsc revision number hb2( as per table 6-11)is updated to 4 or 5 based on Y cordinate and Colorimetry Format as below 04h = 3D stereo + PSR/PSR2 + Y-coordinate. 05h = -3D stereo- + PSR/PSR2 + Y-coordinate + Pixel Encoding/Colorimetry Format indication. A DP Source device is allowed to indicate the pixel encoding/colorimetry format to the DP Sink device with VSC SDP only when the DP Sink device supports it ( i.e.,VSC_SDP_EXTENSION_FOR_COLORIMETRY_SUPPORTED bit in the DPRX_FEATURE_ENUMERATION_LIST register (DPCD Address 02210h, bit 3; is set to 1). v2: (Jani) - Change DP_PSR_Y_COORDINATE to DP_PSR2_SU_Y_COORDINATE_REQUIRED. - Add DP_PSR2_SU_GRANULARITY_REQUIRED. - Change DPRX_FEATURE_ENUMERATION_LIST to DP_DPRX. - Add GTC_CAP and AV_SYNC_CAP, other bits in DPRX_FEATURE_ENUMERATION_LIST. v3: (Jani) - Add support for bits 7:4 and 1 as per DP v1.4 for DPRX_FEATURE_ENUMERATION_LIST. Cc: Rodrigo Vivi <[email protected]> Cc: Jim Bride <[email protected]> Signed-off-by: Vathsala Nagaraju <[email protected]> Signed-off-by: Patil Deepti <[email protected]> Reviewed-by: Jani Nikula <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-01-10ARM: 8634/1: hw_breakpoint: blacklist Scorpion CPUsMark Rutland2-0/+19
On APQ8060, the kernel crashes in arch_hw_breakpoint_init, taking an undefined instruction trap within write_wb_reg. This is because Scorpion CPUs erroneously appear to set DBGPRSR.SPD when WFI is issued, even if the core is not powered down. When DBGPRSR.SPD is set, breakpoint and watchpoint registers are treated as undefined. It's possible to trigger similar crashes later on from userspace, by requesting the kernel to install a breakpoint or watchpoint, as we can go idle at any point between the reset of the debug registers and their later use. This has always been the case. Given that this has always been broken, no-one has complained until now, and there is no clear workaround, disable hardware breakpoints and watchpoints on Scorpion to avoid these issues. Signed-off-by: Mark Rutland <[email protected]> Reported-by: Linus Walleij <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Russell King <[email protected]> Cc: [email protected] Signed-off-by: Russell King <[email protected]>
2017-01-10ARM: 8632/1: ftrace: fix syscall name matchingRabin Vincent1-0/+18
ARM has a few system calls (most notably mmap) for which the names of the functions which are referenced in the syscall table do not match the names of the syscall tracepoints. As a consequence of this, these tracepoints are not made available. Implement arch_syscall_match_sym_name to fix this and allow tracing even these system calls. Signed-off-by: Rabin Vincent <[email protected]> Signed-off-by: Russell King <[email protected]>
2017-01-10Merge tag 'linux-kselftest-4.10-rc4-fixes' of ↵Linus Torvalds4-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "This update consists of fixes to use shell instead of bash to run tests in embedded devices where the only shell available is the busybox ash. Also included is a typo fix to a test result message" * tag 'linux-kselftest-4.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: x86/pkeys: fix spelling mistake: "itertation" -> "iteration" selftests: do not require bash to run netsocktests testcase selftests: do not require bash to run bpf tests selftests: do not require bash for the generated test
2017-01-10drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZEChris Wilson15-45/+62
Start converting over from the byte count to its semantic macro, either we want to allocate the size of a physical page in main memory or we want the size of a virtual page in the GTT. 4096 could mean either, but PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve code comprehension and future changes. In the future, we may want to use variable GTT page sizes and so have the challenge of knowing which hardcoded values were used to represent a physical page vs the virtual page. v2: Look for a few more 4096s to convert, discover IS_ALIGNED(). v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep bdw stolen w/a as 4096 until we know better. v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page sizes. Signed-off-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Joonas Lahtinen <[email protected]>
2017-01-10drm/i915: Rename i915_gem_engine_cleanup() to engine_set_wedged()Chris Wilson1-2/+2
It has been some time since i915_gem_engine_cleanup was only called from the module unload path, and now it is only called when the GPU is wedged. Mika complained that the name is confusing, especially in light of the existence of i915_gem_cleanup_engines(). Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-01-10drm/i915: Mark all incomplete requests as -EIO when wedgedChris Wilson1-0/+10
Similarly to a normal reset, after we mark the GPU as wedged (completely fubar and no more requests can be executed), set the error status on all the in flight requests. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-01-10drm/i915: Set an error status for a resubmitted requestChris Wilson1-0/+1
Let userspace know if its request was resubmitted due to it being executed at the time of a global reset. In this case, the reset was for a guilty request on another engine, and this request was an innocent victim that will be re-executed upon restarting. However, since it was running at the time of the reset, we can not guarantee that it suffered no ill-effects from the reset (e.g. some context state may be lost, or some self-modifying fragment shaders will be restarted from the final state not their initial state), to let userspace know that it has been corrupted set a special value on the fence->error, -EAGAIN. If the request does hang on resubmission, the error will be overwritten with -EIO. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-01-10drm/i915: Set guilty-flag on fence after detecting a hangChris Wilson1-0/+2
The struct dma_fence carries a status field exposed to userspace by sync_file. This is inspected after the fence is signaled and can convey whether or not the request completed successfully, or in our case if we detected a hang during the request (signaled via -EIO in SYNC_IOC_FILE_INFO). v2: Mark all cancelled requests as failed. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-01-10drm/i915: Consolidate reset_request()Chris Wilson1-4/+4
Always reset the requests of the guilty context, including the hung request that we tell the hardware to skip. This should help if the reprogram fails entirely, but more importantly makes the guilty path more uniform (and simplifies the subsequent patch to tweak the cancelled requests). Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-01-10virtio_blk: fix panic in initialization error pathOmar Sandoval1-1/+2
If blk_mq_init_queue() returns an error, it gets assigned to vblk->disk->queue. Then, when we call put_disk(), we end up calling blk_put_queue() with the ERR_PTR, causing a bad dereference. Fix it by only assigning to vblk->disk->queue on success. Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: Jeff Moyer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-10nbd: blk_mq_init_queue returns an error code on failure, not NULLJeff Moyer1-2/+4
Additionally, don't assign directly to disk->queue, otherwise blk_put_queue (called via put_disk) will choke (panic) on the errno stored there. Bug found by code inspection after Omar found a similar issue in virtio_blk. Compile-tested only. Signed-off-by: Jeff Moyer <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-10virtio_blk: avoid DMA to stack for the sense bufferChristoph Hellwig1-1/+3
Most users of BLOCK_PC requests allocate the sense buffer on the stack, so to avoid DMA to the stack copy them to a field in the heap allocated virtblk_req structure. Without that any attempt at SCSI passthrough I/O, including the SG_IO ioctl from userspace will crash the kernel. Note that this includes running tools like hdparm even when the host does not have SCSI passthrough enabled. Signed-off-by: Christoph Hellwig <[email protected]> Cc: [email protected] # v4.9+ Signed-off-by: Jens Axboe <[email protected]>
2017-01-10do_direct_IO: Use inode->i_blkbits to compute block count to be cleanedChandan Rajendra1-1/+2
The code currently uses sdio->blkbits to compute the number of blocks to be cleaned. However sdio->blkbits is derived from the logical block size of the underlying block device (Refer to the definition of do_blockdev_direct_IO()). Due to this, generic/299 test would rarely fail when executed on an ext4 filesystem with 64k as the block size and when using a virtio based disk (having 512 byte as the logical block size) inside a kvm guest. This commit fixes the bug by using inode->i_blkbits to compute the number of blocks to be cleaned. Signed-off-by: Chandan Rajendra <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Fixed up by Jeff Moyer to only use/evaluate inode->i_blkbits once, to avoid issues with block size changes with IO in flight. Signed-off-by: Jens Axboe <[email protected]>
2017-01-10ARCv2: save r30 on kernel entry as gcc uses it for code-genVineet Gupta2-1/+3
This is not exposed to userspace debugers yet, which can be done independently as a seperate patch ! Signed-off-by: Vineet Gupta <[email protected]>
2017-01-10net: skb_flow_get_be16() can be staticEric Dumazet1-2/+2
Removes following sparse complain : net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16' was not declared. Should it be static? Fixes: 972d3876faa8 ("flow dissector: ICMP support") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-01-10ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmcEmmanuel Vadot1-1/+1
The node name for the power seq pin is mmc2@0 like the mmc2_pins_a one. This makes the original node (mmc2_pins_a) scrapped out of the dtb and result in a unusable eMMC if U-Boot didn't configured the pins to the correct functions. Signed-off-by: Emmanuel Vadot <[email protected]> Signed-off-by: Maxime Ripard <[email protected]>