aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-09-08mm: add vmf_insert_pfn_pmd()Matthew Wilcox2-0/+45
Similar to vm_insert_pfn(), but for PMDs rather than PTEs. The 'vmf_' prefix instead of 'vm_' prefix is intended to indicate that it returns a VMF_ value rather than an errno (which would only have to be converted into a VMF_ value anyway). Signed-off-by: Matthew Wilcox <[email protected]> Cc: Hillf Danton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mm: export various functions for the benefit of DAXMatthew Wilcox2-6/+11
To use the huge zero page in DAX, we need these functions exported. Signed-off-by: Matthew Wilcox <[email protected]> Cc: Hillf Danton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mm: add a pmd_fault handlerMatthew Wilcox2-6/+26
Allow non-anonymous VMAs to provide huge pages in response to a page fault. Signed-off-by: Matthew Wilcox <[email protected]> Cc: Hillf Danton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08thp: prepare for DAX huge pagesMatthew Wilcox2-19/+33
Add a vma_is_dax() helper macro to test whether the VMA is DAX, and use it in zap_huge_pmd() and __split_huge_page_pmd(). [[email protected]: fix build] Signed-off-by: Matthew Wilcox <[email protected]> Cc: Hillf Danton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08dax: revert userfaultfd changeAndrew Morton1-1/+4
Undo the change which "userfaultfd: call handle_userfault() for userfaultfd_missing() faults" made to set_huge_zero_page(). DAX will need that return value. Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08dax: move DAX-related functions to a new headerMatthew Wilcox9-14/+28
In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is defined in linux/mm.h. Given that we don't want to include <linux/mm.h> in <linux/fs.h>, the easiest solution is to move the DAX-related functions to a new header, <linux/dax.h>. We could also have moved VM_FAULT_* definitions to a new header, or a different header that isn't quite such a boil-the-ocean header as <linux/mm.h>, but this felt like the best option. Signed-off-by: Matthew Wilcox <[email protected]> Cc: Hillf Danton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08thp: vma_adjust_trans_huge(): adjust file-backed VMA tooKirill A. Shutemov2-11/+2
This series of patches adds support for using PMD page table entries to map DAX files. We expect NV-DIMMs to start showing up that are many gigabytes in size and the memory consumption of 4kB PTEs will be astronomical. The patch series leverages much of the Transparant Huge Pages infrastructure, going so far as to borrow one of Kirill's patches from his THP page cache series. This patch (of 10): Since we're going to have huge pages in page cache, we need to call adjust file-backed VMA, which potentially can contain huge pages. For now we call it for all VMAs. Probably later we will need to introduce a flag to indicate that the VMA has huge pages. Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mremap: fix the wrong !vma->vm_file check in copy_vma()Oleg Nesterov1-1/+1
Test-case: #define _GNU_SOURCE #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <assert.h> void *find_vdso_vaddr(void) { FILE *perl; char buf[32] = {}; perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;" "/^(.*?)-.*vdso/ && print hex $1 while <>'", "r"); fread(buf, sizeof(buf), 1, perl); fclose(perl); return (void *)atol(buf); } #define PAGE_SIZE 4096 void *get_unmapped_area(void) { void *p = mmap(0, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1,0); assert(p != MAP_FAILED); munmap(p, PAGE_SIZE); return p; } char save[2][PAGE_SIZE]; int main(void) { void *vdso = find_vdso_vaddr(); void *page[2]; assert(vdso); memcpy(save, vdso, sizeof (save)); // force another fault on the next check assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0); page[0] = mremap(vdso, PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE, get_unmapped_area()); page[1] = mremap(vdso + PAGE_SIZE, PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE, get_unmapped_area()); assert(page[0] != MAP_FAILED && page[1] != MAP_FAILED); printf("match: %d %d\n", !memcmp(save[0], page[0], PAGE_SIZE), !memcmp(save[1], page[1], PAGE_SIZE)); return 0; } fails without this patch. Before the previous commit it gets the wrong page, now it segfaults (which is imho better). This is because copy_vma() wrongly assumes that if vma->vm_file == NULL is irrelevant until the first fault which will use do_anonymous_page(). This is obviously wrong for the special mapping. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mmap: fix the usage of ->vm_pgoff in special_mapping pathsOleg Nesterov1-10/+2
Test-case: #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <assert.h> void *find_vdso_vaddr(void) { FILE *perl; char buf[32] = {}; perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;" "/^(.*?)-.*vdso/ && print hex $1 while <>'", "r"); fread(buf, sizeof(buf), 1, perl); fclose(perl); return (void *)atol(buf); } #define PAGE_SIZE 4096 int main(void) { void *vdso = find_vdso_vaddr(); assert(vdso); // of course they should differ, and they do so far printf("vdso pages differ: %d\n", !!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE)); // split into 2 vma's assert(mprotect(vdso, PAGE_SIZE, PROT_READ) == 0); // force another fault on the next check assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0); // now they no longer differ, the 2nd vm_pgoff is wrong printf("vdso pages differ: %d\n", !!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE)); return 0; } Output: vdso pages differ: 1 vdso pages differ: 0 This is because split_vma() correctly updates ->vm_pgoff, but the logic in insert_vm_struct() and special_mapping_fault() is absolutely broken, so the fault at vdso + PAGE_SIZE return the 1st page. The same happens if you simply unmap the 1st page. special_mapping_fault() does: pgoff = vmf->pgoff - vma->vm_pgoff; and this is _only_ correct if vma->vm_start mmaps the first page from ->vm_private_data array. vdso or any other user of install_special_mapping() is not anonymous, it has the "backing storage" even if it is just the array of pages. So we actually need to make vm_pgoff work as an offset in this array. Note: this also allows to fix another problem: currently gdb can't access "[vvar]" memory because in this case special_mapping_fault() doesn't work. Now that we can use ->vm_pgoff we can implement ->access() and fix this. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mm: introduce vma_is_anonymous(vma) helperOleg Nesterov2-4/+9
special_mapping_fault() is absolutely broken. It seems it was always wrong, but this didn't matter until vdso/vvar started to use more than one page. And after this change vma_is_anonymous() becomes really trivial, it simply checks vm_ops == NULL. However, I do think the helper makes sense. There are a lot of ->vm_ops != NULL checks, the helper makes the caller's code more understandable (self-documented) and this is more grep-friendly. This patch (of 3): Preparation. Add the new simple helper, vma_is_anonymous(vma), and change handle_pte_fault() to use it. It will have more users. The name is not accurate, say a hpet_mmap()'ed vma is not anonymous. Perhaps it should be named vma_has_fault() instead. But it matches the logic in mmap.c/memory.c (see next changes). "True" just means that a page fault will use do_anonymous_page(). Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08selftests/userfaultfd: fix compiler warnings on 32-bitGeert Uytterhoeven1-3/+6
On 32-bit: userfaultfd.c: In function 'locking_thread': userfaultfd.c:152: warning: left shift count >= width of type userfaultfd.c: In function 'uffd_poll_thread': userfaultfd.c:295: warning: cast to pointer from integer of different size userfaultfd.c: In function 'uffd_read_thread': userfaultfd.c:332: warning: cast to pointer from integer of different size Fix the shift warning by splitting the shift in two parts, and the integer/pointer warnigns by adding intermediate casts to "unsigned long". Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08cgroup: fix seq_show_option merge with legacy_nameKees Cook1-1/+1
When seq_show_option (commit a068acf2ee77: "fs: create and use seq_show_option for escaping") was merged, it did not correctly collide with cgroup's addition of legacy_name (commit 3e1d2eed39d8: "cgroup: introduce cgroup_subsys->legacy_name") changes. This fixes the reported name. Signed-off-by: Kees Cook <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08Merge tag 'libnvdimm-for-4.3' of ↵Linus Torvalds92-745/+2142
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This update has successfully completed a 0day-kbuild run and has appeared in a linux-next release. The changes outside of the typical drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and the introduction of ZONE_DEVICE + devm_memremap_pages(). Summary: - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic mechanism for adding device-driver-discovered memory regions to the kernel's direct map. This facility is used by the pmem driver to enable pfn_to_page() operations on the page frames returned by DAX ('direct_access' in 'struct block_device_operations'). For now, the 'memmap' allocation for these "device" pages comes from "System RAM". Support for allocating the memmap from device memory will arrive in a later kernel. - Introduce memremap() to replace usages of ioremap_cache() and ioremap_wt(). memremap() drops the __iomem annotation for these mappings to memory that do not have i/o side effects. The replacement of ioremap_cache() with memremap() is limited to the pmem driver to ease merging the api change in v4.3. Completion of the conversion is targeted for v4.4. - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem driver, update the VFS DAX implementation and PMEM api to provide persistence guarantees for kernel operations on a DAX mapping. - Convert the ACPI NFIT 'BLK' driver to map the block apertures as cacheable to improve performance. - Miscellaneous updates and fixes to libnvdimm including support for issuing "address range scrub" commands, clarifying the optimal 'sector size' of pmem devices, a clarification of the usage of the ACPI '_STA' (status) property for DIMM devices, and other minor fixes" * tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits) libnvdimm, pmem: direct map legacy pmem by default libnvdimm, pmem: 'struct page' for pmem libnvdimm, pfn: 'struct page' provider infrastructure x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB add devm_memremap_pages mm: ZONE_DEVICE for "device memory" mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h dax: drop size parameter to ->direct_access() nd_blk: change aperture mapping from WC to WB nvdimm: change to use generic kvfree() pmem, dax: have direct_access use __pmem annotation dax: update I/O path to do proper PMEM flushing pmem: add copy_from_iter_pmem() and clear_pmem() pmem, x86: clean up conditional pmem includes pmem: remove layer when calling arch_has_wmb_pmem() pmem, x86: move x86 PMEM API to new pmem.h header libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option pmem: switch to devm_ allocations devres: add devm_memremap libnvdimm, btt: write and validate parent_uuid ...
2015-09-08Merge branch 'misc' of ↵Linus Torvalds17-61/+345
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull misc kbuild updates from Michal Marek: - deb-pkg: + module signing fix + dtb files are added to the package + do not require `hostname -f` to work during build + make deb-pkg generates a source package, bindeb-pkg has been added to only generate the binary package - rpm-pkg packages /lib/modules as well - new coccinelle patch and updates to existing ones - new stackusage & stackdelta script to collect and compare stack usage info (using gcc's -fstack-usage) - make tags understands trace_*_rcuidle() macros - .gitignore updates, misc cleanups * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (27 commits) deb-pkg: add source package package/Makefile: move source tar creation to a function scripts: add stackdelta script kbuild: remove *.su files generated by -fstack-usage .gitignore: add *.su pattern scripts: add stackusage script kbuild: avoid listing /lib/modules in kernel spec file fallback to hostname in scripts/package/builddeb coccinelle: api: extend spatch for dropping unnecessary owner deb-pkg: simplify directory creation scripts/tags.sh: Include trace_*_rcuidle() in tags scripts/package/Makefile: rpmbuild is needed for rpm targets Kbuild: Add ID files to .gitignore gitignore: Add MIPS vmlinux.32 to the list coccinelle: simple_return: Add a blank line coccinelle: irqf_oneshot.cocci: Improve the generated commit log coccinelle: api: add vma_pages.cocci scripts/coccinelle/misc/irqf_oneshot.cocci: Fix grammar scripts/coccinelle/misc/semicolon.cocci: Use imperative mood coccinelle: simple_open: Use imperative mood ...
2015-09-08Merge branch 'kconfig' of ↵Linus Torvalds7-207/+211
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kconfig updates from Michal Marek: - kconfig warns about junk characters in Kconfig files - merge_config.sh error handling - small cleanup * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: merge_config.sh: exit on missing input files kconfig: Regenerate shipped zconf.{hash,lex}.c files kconfig: warn of unhandled characters in Kconfig commands kconfig: Delete unnecessary checks before the function call "sym_calc_value"
2015-09-08Merge branch 'kbuild' of ↵Linus Torvalds10-381/+384
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull core kbuild updates from Michal Marek: - modpost portability fix - linker script fix - genksyms segfault fix - fixdep cleanup - fix for clang detection * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kbuild: Fix clang detection kbuild: fixdep: drop meaningless hash table initialization kbuild: fixdep: optimize code slightly genksyms: Regenerate parser genksyms: Duplicate function pointer type definitions segfault kbuild: Fix .text.unlikely placement Avoid conflict with host definitions when cross-compiling
2015-09-08Merge tag 'trace-v4.3' of ↵Linus Torvalds16-437/+562
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing update from Steven Rostedt: "Mostly this is just clean ups and micro optimizations. The changes with more meat are: - Allowing the trace event filters to filter on CPU number and process ids - Two new markers for trace output latency were added (10 and 100 msec latencies) - Have tracing_thresh filter function profiling time I also worked on modifying the ring buffer code for some future work, and moved the adding of the timestamp around. One of my changes caused a regression, and since other changes were built on top of it and already tested, I had to operate a revert of that change. Instead of rebasing, this change set has the code that caused a regression as well as the code to revert that change without touching the other changes that were made on top of it" * tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated" tracing: Don't make assumptions about length of string on task rename tracing: Allow triggers to filter for CPU ids and process names ftrace: Format MCOUNT_ADDR address as type unsigned long tracing: Introduce two additional marks for delay ftrace: Fix function_graph duration spacing with 7-digits ftrace: add tracing_thresh to function profile tracing: Clean up stack tracing and fix fentry updates ring-buffer: Reorganize function locations ring-buffer: Make sure event has enough room for extend and padding ring-buffer: Get timestamp after event is allocated ring-buffer: Move the adding of the extended timestamp out of line ring-buffer: Add event descriptor to simplify passing data ftrace: correct the counter increment for trace_buffer data tracing: Fix for non-continuous cpu ids tracing: Prefer kcalloc over kzalloc with multiply
2015-09-08device property: Don't overwrite addr when failing in device_get_mac_addressJulien Grall1-6/+8
The function device_get_mac_address is trying different property names in order to get the mac address. To check the return value, the variable addr (which contain the buffer pass by the caller) will be re-used. This means that if the previous property is not found, the next property will be read using a NULL buffer. Therefore it's only possible to retrieve the mac if node contains a property "mac-address". Fix it by using a temporary buffer for the return value. This has been introduced by commit 4c96b7dc0d393f12c17e0d81db15aa4a820a6ab3 "Add a matching set of device_ functions for determining mac/phy" Signed-off-by: Julien Grall <[email protected]> Cc: Jeremy Linton <[email protected]> Cc: David S. Miller <[email protected]> Reviewed-by: Jeremy Linton <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-09-08Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds10-38/+359
Pull audit update from Paul Moore: "This is one of the larger audit patchsets in recent history, consisting of eight patches and almost 400 lines of changes. The bulk of the patchset is the new "audit by executable" functionality which allows admins to set an audit watch based on the executable on disk. Prior to this, admins could only track an application by PID, which has some obvious limitations. Beyond the new functionality we also have some refcnt fixes and a few minor cleanups" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: fixup: audit: implement audit by executable audit: implement audit by executable audit: clean simple fsnotify implementation audit: use macros for unset inode and device values audit: make audit_del_rule() more robust audit: fix uninitialized variable in audit_add_rule() audit: eliminate unnecessary extra layer of watch parent references audit: eliminate unnecessary extra layer of watch references
2015-09-08usbnet: Fix a race between usbnet_stop() and the BHEugene Shatokhin1-11/+28
The race may happen when a device (e.g. YOTA 4G LTE Modem) is unplugged while the system is downloading a large file from the Net. Hardware breakpoints and Kprobes with delays were used to confirm that the race does actually happen. The race is on skb_queue ('next' pointer) between usbnet_stop() and rx_complete(), which, in turn, calls usbnet_bh(). Here is a part of the call stack with the code where the changes to the queue happen. The line numbers are for the kernel 4.1.0: *0 __skb_unlink (skbuff.h:1517) prev->next = next; *1 defer_bh (usbnet.c:430) spin_lock_irqsave(&list->lock, flags); old_state = entry->state; entry->state = state; __skb_unlink(skb, list); spin_unlock(&list->lock); spin_lock(&dev->done.lock); __skb_queue_tail(&dev->done, skb); if (dev->done.qlen == 1) tasklet_schedule(&dev->bh); spin_unlock_irqrestore(&dev->done.lock, flags); *2 rx_complete (usbnet.c:640) state = defer_bh(dev, skb, &dev->rxq, state); At the same time, the following code repeatedly checks if the queue is empty and reads these values concurrently with the above changes: *0 usbnet_terminate_urbs (usbnet.c:765) /* maybe wait for deletions to finish. */ while (!skb_queue_empty(&dev->rxq) && !skb_queue_empty(&dev->txq) && !skb_queue_empty(&dev->done)) { schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); set_current_state(TASK_UNINTERRUPTIBLE); netif_dbg(dev, ifdown, dev->net, "waited for %d urb completions\n", temp); } *1 usbnet_stop (usbnet.c:806) if (!(info->flags & FLAG_AVOID_UNLINK_URBS)) usbnet_terminate_urbs(dev); As a result, it is possible, for example, that the skb is removed from dev->rxq by __skb_unlink() before the check "!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is also possible in this case that the skb is added to dev->done queue after "!skb_queue_empty(&dev->done)" is checked. So usbnet_terminate_urbs() may stop waiting and return while dev->done queue still has an item. Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid this race. Signed-off-by: Eugene Shatokhin <[email protected]> Reviewed-by: Bjørn Mork <[email protected]> Acked-by: Oliver Neukum <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-09-08libceph: use keepalive2 to verify the mon session is aliveYan, Zheng6-14/+93
Signed-off-by: Yan, Zheng <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2015-09-08rbd: plug rbd_dev->header.object_prefix memory leakIlya Dryomov1-1/+4
Need to free object_prefix when rbd_dev_v2_snap_context() fails, but only if this is the first time we are reading in the header. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2015-09-08rbd: fix double free on rbd_dev->header_nameIlya Dryomov1-1/+0
If rbd_dev_image_probe() in rbd_dev_probe_parent() fails, header_name is freed twice: once in rbd_dev_probe_parent() and then in its caller rbd_dev_image_probe() (rbd_dev_image_probe() is called recursively to handle parent images). rbd_dev_probe_parent() is responsible for probing the parent, so it shouldn't muck with clone's fields. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2015-09-08libceph: set 'exists' flag for newly up osdYan, Zheng1-1/+1
Signed-off-by: Yan, Zheng <[email protected]> Reviewed-by: Sage Weil <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2015-09-08ceph: cleanup use of ceph_msg_getJianpeng Ma1-2/+1
Signed-off-by: Jianpeng Ma <[email protected]> Signed-off-by: Yan, Zheng <[email protected]>
2015-09-08ceph: no need to get parent inode in ceph_openJianpeng Ma1-5/+1
parent inode is needed in creating new inode case. For ceph_open, the target inode already exists. Signed-off-by: Jianpeng Ma <[email protected]> Signed-off-by: Yan, Zheng <[email protected]>
2015-09-08ceph: remove the useless judgementJianpeng Ma1-1/+1
err != 0 is already handled. So skip this. Signed-off-by: Jianpeng Ma <[email protected]> Signed-off-by: Yan, Zheng <[email protected]>
2015-09-08ceph: remove redundant test of head->safe and silence static analysis warningsBrad Hubbard1-1/+1
Signed-off-by: Brad Hubbard <[email protected]> Signed-off-by: Yan, Zheng <[email protected]>
2015-09-08ceph: fix queuing inode to mdsdir's snaprealmYan, Zheng1-7/+0
During MDS failovers, MClientSnap message may cause kclient to move some inodes from root directory's snaprealm to mdsdir's snaprealm and queue snapshots for these inodes. For a FS has never created any snapshot, both root directory's snaprealm and mdsdir's snaprealm share the same snapshot contexts (both are ceph_empty_snapc). This confuses ceph_put_wrbuffer_cap_refs(), make it unable to distinguish snapshot buffers from head buffers. The fix is do not use ceph_empty_snapc as snaprealm's cached context. Signed-off-by: Yan, Zheng <[email protected]>
2015-09-08libceph: rename con_work() to ceph_con_workfn()Ilya Dryomov1-3/+3
Even though it's static, con_work(), being a work func, shows up in various stacktraces a lot. Prefix it with ceph_. Signed-off-by: Ilya Dryomov <[email protected]>
2015-09-08libceph: Avoid holding the zero page on ceph_msgr_slab_init errorsBenoît Canet1-5/+5
ceph_msgr_slab_init may fail due to a temporary ENOMEM. Delay a bit the initialization of zero_page in ceph_msgr_init and reorder its cleanup in _ceph_msgr_exit so it's done in reverse order of setup. BUG_ON() will not suffer to be postponed in case it is triggered. Signed-off-by: Benoît Canet <[email protected]> Reviewed-by: Alex Elder <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2015-09-08libceph: remove the unused macro AES_KEY_SIZENicholas Krause1-4/+0
This removes the no longer used macro AES_KEY_SIZE as no functions use this macro anymore and thus this macro can be removed due it no longer being required. Signed-off-by: Nicholas Krause <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2015-09-08ceph: invalidate dirty pages after forced umountYan, Zheng1-0/+2
After forced umount, ceph_writepages_start() skips flushing dirty pages. To make sure inode's reference count get dropped to zero, we need to invalidate dirty pages. Signed-off-by: Yan, Zheng <[email protected]>
2015-09-08ceph: EIO all operations after forced umountYan, Zheng5-12/+54
This patch makes try_get_cap_refs() and __do_request() check if the file system was forced umount, and return -EIO if it was. This patch also adds a helper function to drops dirty caps and wakes up blocking operation. Signed-off-by: Yan, Zheng <[email protected]>
2015-09-08Merge branch 'next' of ↵Linus Torvalds76-1407/+3588
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: - PKCS#7 support added to support signed kexec, also utilized for module signing. See comments in 3f1e1bea. ** NOTE: this requires linking against the OpenSSL library, which must be installed, e.g. the openssl-devel on Fedora ** - Smack - add IPv6 host labeling; ignore labels on kernel threads - support smack labeling mounts which use binary mount data - SELinux: - add ioctl whitelisting (see http://kernsec.org/files/lss2015/vanderstoep.pdf) - fix mprotect PROT_EXEC regression caused by mm change - Seccomp: - add ptrace options for suspend/resume" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits) PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them Documentation/Changes: Now need OpenSSL devel packages for module signing scripts: add extract-cert and sign-file to .gitignore modsign: Handle signing key in source tree modsign: Use if_changed rule for extracting cert from module signing key Move certificate handling to its own directory sign-file: Fix warning about BIO_reset() return value PKCS#7: Add MODULE_LICENSE() to test module Smack - Fix build error with bringup unconfigured sign-file: Document dependency on OpenSSL devel libraries PKCS#7: Appropriately restrict authenticated attributes and content type KEYS: Add a name for PKEY_ID_PKCS7 PKCS#7: Improve and export the X.509 ASN.1 time object decoder modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS extract-cert: Cope with multiple X.509 certificates in a single file sign-file: Generate CMS message as signature instead of PKCS#7 PKCS#7: Support CMS messages also [RFC5652] X.509: Change recorded SKID & AKID to not include Subject or Issuer PKCS#7: Check content type and versions MAINTAINERS: The keyrings mailing list has moved ...
2015-09-08Merge branch 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds6-130/+196
Pull NMI backtrace update from Russell King: "These changes convert the x86 NMI handling to be a library implementation which other architectures can make use of. Thomas Gleixner has reviewed and tested these changes, and wishes me to send these rather than taking them through the tip tree. The final patch in the set adds an initial implementation using this infrastructure to ARM, even though it doesn't send the IPI at "NMI" level. Patches are in progress to add the ARM equivalent of NMI, but we still need the IRQ-level fallback for systems where the "NMI" isn't available due to secure firmware denying access to it" * 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: add basic support for on-demand backtrace of other CPUs nmi: x86: convert to generic nmi handler nmi: create generic NMI backtrace implementation
2015-09-08Merge tag 'for-linus-4.3-rc0-tag' of ↵Linus Torvalds42-363/+2229
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: "Xen features and fixes for 4.3: - Convert xen-blkfront to the multiqueue API - [arm] Support binding event channels to different VCPUs. - [x86] Support > 512 GiB in a PV guests (off by default as such a guest cannot be migrated with the current toolstack). - [x86] PMU support for PV dom0 (limited support for using perf with Xen and other guests)" * tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (33 commits) xen: switch extra memory accounting to use pfns xen: limit memory to architectural maximum xen: avoid another early crash of memory limited dom0 xen: avoid early crash of memory limited dom0 arm/xen: Remove helpers which are PV specific xen/x86: Don't try to set PCE bit in CR4 xen/PMU: PMU emulation code xen/PMU: Intercept PMU-related MSR and APIC accesses xen/PMU: Describe vendor-specific PMU registers xen/PMU: Initialization code for Xen PMU xen/PMU: Sysfs interface for setting Xen PMU mode xen: xensyms support xen: remove no longer needed p2m.h xen: allow more than 512 GB of RAM for 64 bit pv-domains xen: move p2m list if conflicting with e820 map xen: add explicit memblock_reserve() calls for special pages mm: provide early_memremap_ro to establish read-only mapping xen: check for initrd conflicting with e820 map xen: check pre-allocated page tables for conflict with memory map xen: check for kernel memory conflicting with memory layout ...
2015-09-08Merge branch 'irq-core-for-linus' of ↵Linus Torvalds4-12/+237
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more irq updates from Thomas Gleixner: "The second part of irq related updates: - Provide EOImode for GIC[V3] irq chips, which is a prerequisite for direct interrupt handling in [KVM] guests" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/GIC: Fix EOImode setting for non-DT/ACPI systems irqchip/GIC: Don't deactivate interrupts forwarded to a guest irqchip/GIC: Convert to EOImode == 1 irqchip/GICv3: Don't deactivate interrupts forwarded to a guest irqchip/GICv3: Convert to EOImode == 1
2015-09-08Merge branch 'for-next' of ↵Linus Torvalds2-40/+35
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68k/colfire fixes from Greg Ungerer: "Only a couple of patches this time. One migrating the clock driver code to the new set-state interface. The other cleaning up to use the PFN_DOWN macro" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k/coldfire: use PFN_DOWN macro m68k/coldfire/pit: Migrate to new 'set-state' interface
2015-09-08Merge tag 'ecryptfs-4.3-rc1-stale-dcache' of ↵Linus Torvalds2-10/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: "Invalidate stale eCryptfs dcache entries caused by unlinked lower inodes" * tag 'ecryptfs-4.3-rc1-stale-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: Delete a check before the function call "key_put" eCryptfs: Invalidate dcache entries when lower i_nlink is zero
2015-09-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-0/+1
Pull crypto fix from Herbert Xu: "This fixes a memory corruption bug in ghash-clmulni-intel due to insufficient memory allocation" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ghash-clmulni: specify context size for ghash async algorithm
2015-09-08userfaultfd: selftest: update userfaultfd x86 32bit syscall numberAndrea Arcangeli1-1/+1
It changed as result of other syscalls, and while the system call list itself was correctly updated, the selftest program was not. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfnJulien Grall1-7/+7
The variable xen_store_mfn is effectively storing a GFN and not an MFN. Signed-off-by: Julien Grall <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-09-08xen/privcmd: Further s/MFN/GFN/ clean-upJulien Grall6-61/+65
The privcmd code is mixing the usage of GFN and MFN within the same functions which make the code difficult to understand when you only work with auto-translated guests. The privcmd driver is only dealing with GFN so replace all the mention of MFN into GFN. The ioctl structure used to map foreign change has been left unchanged given that the userspace is using it. Nonetheless, add a comment to explain the expected value within the "mfn" field. Signed-off-by: Julien Grall <[email protected]> Reviewed-by: David Vrabel <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-09-08hvc/xen: Further s/MFN/GFN clean-upJulien Grall1-10/+5
HVM_PARAM_CONSOLE_PFN is used to retrieved the console PFN for HVM guest. It returns a PFN (aka GFN) and not a MFN. Furthermore, use directly virt_to_gfn for both PV and HVM domain rather than doing a special case for each of the them. Signed-off-by: Julien Grall <[email protected]> Reviewed-by: David Vrabel <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-09-08video/xen-fbfront: Further s/MFN/GFN clean-upJulien Grall1-9/+9
The PV driver xen-fbfront is only dealing with GFN and not MFN. Rename all the occurence of MFN to GFN. Also take the opportunity to replace to usage of pfn_to_gfn by xen_page_to_gfn. Signed-off-by: Julien Grall <[email protected]> Reviewed-by: David Vrabel <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-09-08xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfnJulien Grall1-16/+8
All the caller of xen_tmem_{get,put}_page have a struct page * in hand and call pfn_to_gfn for the only benefits of these 2 functions. Rather than passing the pfn in parameter, pass directly the page and use directly xen_page_to_gfn. Signed-off-by: Julien Grall <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-09-08xen: Use correctly the Xen memory terminologiesJulien Grall20-47/+81
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN is meant, I suspect this is because the first support for Xen was for PV. This resulted in some misimplementation of helpers on ARM and confused developers about the expected behavior. For instance, with pfn_to_mfn, we expect to get an MFN based on the name. Although, if we look at the implementation on x86, it's returning a GFN. For clarity and avoid new confusion, replace any reference to mfn with gfn in any helpers used by PV drivers. The x86 code will still keep some reference of pfn_to_mfn which may be used by all kind of guests No changes as been made in the hypercall field, even though they may be invalid, in order to keep the same as the defintion in xen repo. Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a name to close to the KVM function gfn_to_page. Take also the opportunity to simplify simple construction such as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up will come in follow-up patches. [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb Signed-off-by: Julien Grall <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Acked-by: Dmitry Torokhov <[email protected]> Acked-by: Wei Liu <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-09-08arm/xen: implement correctly pfn_to_mfnJulien Grall1-8/+0
After the commit introducing convertion between DMA and guest addresses, all the callers of pfn_to_mfn are expecting to get a GFN (Guest Frame Number). On ARM, all the guests are auto-translated so the GFN is equal to the Linux PFN (Pseudo-physical Frame Number). The current implementation may return an MFN if the caller is passing a PFN associated to a mapped foreign grant. In pratice, I haven't seen the problem on running guest but we should fix it for the sake of correctness. Correct the implementation by always returning the pfn passed in parameter. A follow-up patch will take care to rename pfn_to_mfn to a suitable name. Signed-off-by: Julien Grall <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2015-09-08xen: Make clear that swiotlb and biomerge are dealing with DMA addressJulien Grall5-17/+40
The swiotlb is required when programming a DMA address on ARM when a device is not protected by an IOMMU. In this case, the DMA address should always be equal to the machine address. For DOM0 memory, Xen ensure it by have an identity mapping between the guest address and host address. However, when mapping a foreign grant reference, the 1:1 model doesn't work. For ARM guest, most of the callers of pfn_to_mfn expects to get a GFN (Guest Frame Number), i.e a PFN (Page Frame Number) from the Linux point of view given that all ARM guest are auto-translated. Even though the name pfn_to_mfn is misleading, we need to ensure that those caller get a GFN and not by mistake a MFN. In pratical, I haven't seen error related to this but we should fix it for the sake of correctness. In order to fix the implementation of pfn_to_mfn on ARM in a follow-up patch, we have to introduce new helpers to return the DMA from a PFN and the invert. On x86, the new helpers will be an alias of pfn_to_mfn and mfn_to_pfn. The helpers will be used in swiotlb and xen_biovec_phys_mergeable. This is necessary in the latter because we have to ensure that the biovec code will not try to merge a biovec using foreign page and another using Linux memory. Lastly, the helper mfn_to_local_pfn has been renamed to bfn_to_local_pfn given that the only usage was in swiotlb. Signed-off-by: Julien Grall <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Signed-off-by: David Vrabel <[email protected]>