aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-17x86: Add "nopv" parameter to disable PV extensionsZhenzhong Duan5-0/+22
In virtualization environment, PV extensions (drivers, interrupts, timers, etc) are enabled in the majority of use cases which is the best option. However, in some cases (kexec not fully working, benchmarking) we want to disable PV extensions. We have "xen_nopv" for that purpose but only for XEN. For a consistent admin experience a common command line parameter "nopv" set across all PV guest implementations is a better choice. There are guest types which just won't work without PV extensions, like Xen PV, Xen PVH and jailhouse. add a "ignore_nopv" member to struct hypervisor_x86 set to true for those guest types and call the detect functions only if nopv is false or ignore_nopv is true. Suggested-by: Juergen Gross <[email protected]> Signed-off-by: Zhenzhong Duan <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jan Kiszka <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Stefano Stabellini <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2019-07-17x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __initZhenzhong Duan2-5/+4
.. as they are only called at early bootup stage. In fact, other functions in x86_hyper_xen_hvm.init.* are all marked as __init. Unexport xen_hvm_need_lapic as it's never used outside. Signed-off-by: Zhenzhong Duan <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2019-07-17xen: remove tmem driverJuergen Gross8-1074/+0
The Xen tmem (transcendent memory) driver can be removed, as the related Xen hypervisor feature never made it past the "experimental" state and will be removed in future Xen versions (>= 4.13). The xen-selfballoon driver depends on tmem, so it can be removed, too. Signed-off-by: Juergen Gross <[email protected]> Acked-by: Boris Ostrovsky <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2019-07-17Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get ↵Zhenzhong Duan2-6/+3
initialized" This reverts commit ca5d376e17072c1b60c3fee66f3be58ef018952d. Commit 8990cac6e5ea ("x86/jump_label: Initialize static branching early") adds jump_label_init() call in setup_arch() to make static keys initialized early, so we could use the original simpler code again. Signed-off-by: Zhenzhong Duan <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2019-07-17xen/events: fix binding user event channels to cpusJuergen Gross3-4/+13
When binding an interdomain event channel to a vcpu via IOCTL_EVTCHN_BIND_INTERDOMAIN not only the event channel needs to be bound, but the affinity of the associated IRQi must be changed, too. Otherwise the IRQ and the event channel won't be moved to another vcpu in case the original vcpu they were bound to is going offline. Cc: <[email protected]> # 4.13 Fixes: c48f64ab472389df ("xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU") Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2019-07-16scsi: megaraid_sas: set an unlimited max_segment_sizeChristoph Hellwig1-0/+1
When using a virt_boundary_mask, as done for NVMe devices attached to megaraid_sas controllers, we require an unlimited max_segment_size as the virt boundary merging code assumes that. But we also need to propagate that to the DMA mapping layer to make dma-debug happy. The SCSI layer takes care of that when using the per-host virt_boundary setting, but given that megaraid_sas only wants to set the virt_boundary for actual NVMe devices, we can't rely on that. The DMA layer maximum segment is global to the HBA however, so we have to set it explicitly. This patch assumes that megaraid_sas does not have a segment size limitation, which seems true based on the SGL format, but will need to be verified. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16scsi: mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAsChristoph Hellwig1-0/+1
When using a virt_boundary_mask, as done for NVMe devices attached to mpt3sas controllers, we require an unlimited max_segment_size as the virt boundary merging code assumes that. But we also need to propagate that to the DMA mapping layer to make dma-debug happy. The SCSI layer takes care of that when using the per-host virt_boundary setting, but given that mpt3sas only wants to set the virt_boundary for actual NVMe devices, we can't rely on that. The DMA layer maximum segment is global to the HBA however, so we have to set it explicitly. This patch assumes that mpt3sas does not have a segment size limitation, which seems true based on the SGL format, but will need to be verified. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Suganath Prabu <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16scsi: IB/srp: set virt_boundary_mask in the scsi hostChristoph Hellwig1-15/+3
This ensures all proper DMA layer handling is taken care of by the SCSI midlayer. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Acked-by: Bart Van Assche <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16scsi: IB/iser: set virt_boundary_mask in the scsi hostChristoph Hellwig1-28/+7
This ensures all proper DMA layer handling is taken care of by the SCSI midlayer. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Sagi Grimberg <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16scsi: storvsc: set virt_boundary_mask in the scsi host templateChristoph Hellwig1-3/+2
This ensures all proper DMA layer handling is taken care of by the SCSI midlayer. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16scsi: ufshcd: set max_segment_size in the scsi host templateChristoph Hellwig1-2/+1
We need to also mirror the value to the device to ensure IOMMU merging doesn't undo it, and the SCSI host level parameter will ensure that. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16scsi: core: take the DMA max mapping size into accountChristoph Hellwig1-0/+2
We need to limit the device's max_sectors to what the DMA mapping implementation can support. If not, we risk running out of swiotlb buffers easily. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16scsi: core: add a host / host template field for the virt boundaryChristoph Hellwig3-1/+8
This allows drivers setting it up easily instead of branching out to block layer calls in slave_alloc, and ensures the upgraded max_segment_size setting gets picked up by the DMA layer. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Kashyap Desai < [email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16switch the remnants of releasing the mountpoint away from fs_pinAl Viro4-29/+26
We used to need rather convoluted ordering trickery to guarantee that dput() of ex-mountpoints happens before the final mntput() of the same. Since we don't need that anymore, there's no point playing with fs_pin for that. Signed-off-by: Al Viro <[email protected]>
2019-07-16get rid of detach_mnt()Al Viro1-34/+28
Lift getting the original mount (dentry is actually not needed at all) of the mountpoint into the callers - to do_move_mount() and pivot_root() level. That simplifies the cleanup in those and allows to get saner arguments for attach_mnt_recursive(). Signed-off-by: Al Viro <[email protected]>
2019-07-16virtio_pmem: fix sparse warningPankaj Gupta2-4/+4
This patch fixes below sparse warning related to __virtio type in virtio pmem driver. This is reported by Intel test bot on linux-next tree. nd_virtio.c:56:28: warning: incorrect type in assignment (different base types) nd_virtio.c:56:28: expected unsigned int [unsigned] [usertype] type nd_virtio.c:56:28: got restricted __virtio32 nd_virtio.c:93:59: warning: incorrect type in argument 2 (different base types) nd_virtio.c:93:59: expected restricted __virtio32 [usertype] val nd_virtio.c:93:59: got unsigned int [unsigned] [usertype] ret Reported-by: kbuild test robot <[email protected]> Signed-off-by: Pankaj Gupta <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2019-07-16make struct mountpoint bear the dentry reference to mountpoint, not struct mountAl Viro2-28/+25
Using dput_to_list() to shift the contributing reference from ->mnt_mountpoint to ->mnt_mp->m_dentry. Dentries are dropped (with dput_to_list()) as soon as struct mountpoint is destroyed; in cases where we are under namespace_sem we use the global list, shrinking it in namespace_unlock(). In case of detaching stuck MNT_LOCKed children at final mntput_no_expire() we use a local list and shrink it ourselves. ->mnt_ex_mountpoint crap is gone. Signed-off-by: Al Viro <[email protected]>
2019-07-16scsi: core: Fix race on creating sense cacheMing Lei1-3/+3
When scsi_init_sense_cache(host) is called concurrently from different hosts, each code path may find that no cache has been created and allocate a new one. The lack of locking can lead to potentially overriding a cache allocated by a different host. Fix the issue by moving 'mutex_lock(&scsi_sense_cache_mutex)' before scsi_select_sense_cache(). Fixes: 0a6ac4ee7c21 ("scsi: respect unchecked_isa_dma for blk-mq") Cc: Stable <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Ewan D. Milne <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16scsi: sd_zbc: Fix compilation warningDamien Le Moal1-1/+1
kbuild test robot gets the following compilation warning using gcc 7.4 cross compilation for c6x (GCC_VERSION=7.4.0 make.cross ARCH=c6x). In file included from include/asm-generic/bug.h:18:0, from arch/c6x/include/asm/bug.h:12, from include/linux/bug.h:5, from include/linux/thread_info.h:12, from include/asm-generic/current.h:5, from ./arch/c6x/include/generated/asm/current.h:1, from include/linux/sched.h:12, from include/linux/blkdev.h:5, from drivers//scsi/sd_zbc.c:11: drivers//scsi/sd_zbc.c: In function 'sd_zbc_read_zones': >> include/linux/kernel.h:62:48: warning: 'zone_blocks' may be used uninitialized in this function [-Wmaybe-uninitialized] #define __round_mask(x, y) ((__typeof__(x))((y)-1)) ^ drivers//scsi/sd_zbc.c:464:6: note: 'zone_blocks' was declared here u32 zone_blocks; ^~~~~~~~~~~ This is a false-positive report. The variable zone_blocks is always initialized in sd_zbc_check_zones() before use. It is not initialized only and only if sd_zbc_check_zones() fails. Avoid this warning by initializing the zone_blocks variable to 0. Fixes: 5f832a395859 ("scsi: sd_zbc: Fix sd_zbc_check_zones() error checks") Cc: Stable <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16scsi: libfc: fix null pointer dereference on a null lportColin Ian King1-1/+1
Currently if lport is null then the null lport pointer is dereference when printing out debug via the FC_LPORT_DB macro. Fix this by using the more generic FC_LIBFC_DBG debug macro instead that does not use lport. Addresses-Coverity: ("Dereference after null check") Fixes: 7414705ea4ae ("libfc: Add runtime debugging with debug_logging module parameter") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2019-07-16dax: Fix missed wakeup with PMD faultsMatthew Wilcox (Oracle)1-20/+33
RocksDB can hang indefinitely when using a DAX file. This is due to a bug in the XArray conversion when handling a PMD fault and finding a PTE entry. We use the wrong index in the hash and end up waiting on the wrong waitqueue. There's actually no need to wait; if we find a PTE entry while looking for a PMD entry, we can return immediately as we know we should fall back to a PTE fault (which may not conflict with the lock held). We reuse the XA_RETRY_ENTRY to signal a conflicting entry was found. This value can never be found in an XArray while holding its lock, so it does not create an ambiguity. Cc: <[email protected]> Link: http://lkml.kernel.org/r/CAPcyv4hwHpX-MkUEqxwdTj7wCCZCN4RV-L4jsnuwLGyL_UEG4A@mail.gmail.com Fixes: b15cd800682f ("dax: Convert page fault handlers to XArray") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Dan Williams <[email protected]> Reported-by: Robert Barror <[email protected]> Reported-by: Seema Pandit <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2019-07-16fs/select.c: use struct_size() in kmalloc()Gustavo A. R. Silva1-3/+3
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = kmalloc(size, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); Also, notice that variable size is unnecessary, hence it is removed. This code was detected with the help of Coccinelle. Link: http://lkml.kernel.org/r/20190604164226.GA13823@embeddedor Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16mm: add account_locked_vm utility functionDaniel Jordan7-190/+98
locked_vm accounting is done roughly the same way in five places, so unify them in a helper. Include the helper's caller in the debug print to distinguish between callsites. Error codes stay the same, so user-visible behavior does too. The one exception is that the -EPERM case in tce_account_locked_vm is removed because Alexey has never seen it triggered. [[email protected]: v3] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix mm/util.c] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Daniel Jordan <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Tested-by: Alexey Kardashevskiy <[email protected]> Acked-by: Alex Williamson <[email protected]> Cc: Alan Tull <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Moritz Fischer <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Steve Sistare <[email protected]> Cc: Wu Hao <[email protected]> Cc: Ira Weiny <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16arm64: mm: implement pte_devmap supportRobin Murphy3-0/+23
In order for things like get_user_pages() to work on ZONE_DEVICE memory, we need a software PTE bit to identify device-backed PFNs. Hook this up along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP. [[email protected]: build fixes] Link: http://lkml.kernel.org/r/13026c4e64abc17133bbfa07d7731ec6691c0bcd.1559050949.git.robin.murphy@arm.com Link: http://lkml.kernel.org/r/817d92886fc3b33bcbf6e105ee83a74babb3a5aa.1558547956.git.robin.murphy@arm.com Signed-off-by: Robin Murphy <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oliver O'Halloran <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16mm: introduce ARCH_HAS_PTE_DEVMAPRobin Murphy9-14/+11
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined with the long-out-of-date comment can lead to the impression than an architecture may just enable it (since __add_pages() now "comprehends device memory" for itself) and expect things to work. In practice, however, ZONE_DEVICE users have little chance of functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper dependency so the real situation is clearer. Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com Signed-off-by: Robin Murphy <[email protected]> Acked-by: Dan Williams <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Acked-by: Oliver O'Halloran <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16mm: clean up is_device_*_page() definitionsRobin Murphy1-22/+9
Refactor is_device_{public,private}_page() with is_pci_p2pdma_page() to make them all consistent in depending on their respective config options even when CONFIG_DEV_PAGEMAP_OPS is enabled for other reasons. This allows a little more compile-time optimisation as well as the conceptual and cosmetic cleanup. Link: http://lkml.kernel.org/r/187c2ab27dea70635d375a61b2f2076d26c032b0.1558547956.git.robin.murphy@arm.com Signed-off-by: Robin Murphy <[email protected]> Suggested-by: Jerome Glisse <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oliver O'Halloran <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16mm/mmap: move common defines to mman-common.hAneesh Kumar K.V4-17/+10
Two architecture that use arch specific MMAP flags are powerpc and sparc. We still have few flag values common across them and other architectures. Consolidate this in mman-common.h. Also update the comment to indicate where to find HugeTLB specific reserved values Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16mm: move MAP_SYNC to asm-generic/mman-common.hAneesh Kumar K.V2-2/+2
This enables support for synchronous DAX fault on powerpc The generic changes are added as part of b6fb293f2497 ("mm: Define MAP_SYNC and VM_SYNC flags") Without this, mmap returns EOPNOTSUPP for MAP_SYNC with MAP_SHARED_VALIDATE Instead of adding MAP_SYNC with same value to arch/powerpc/include/uapi/asm/mman.h, I am moving the #define to asm-generic/mman-common.h. Two architectures using mman-common.h directly are sparc and powerpc. We should be able to consloidate more #defines to mman-common.h. That can be done as a separate patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16device-dax: "Hotremove" persistent memory that is used like normal RAMPavel Tatashin2-4/+39
It is now allowed to use persistent memory like a regular RAM, but currently there is no way to remove this memory until machine is rebooted. This work expands the functionality to also allows hotremoving previously hotplugged persistent memory, and recover the device for use for other purposes. To hotremove persistent memory, the management software must first offline all memory blocks of dax region, and than unbind it from device-dax/kmem driver. So, operations should look like this: echo offline > /sys/devices/system/memory/memoryN/state ... echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind Note: if unbind is done without offlining memory beforehand, it won't be possible to do dax0.0 hotremove, and dax's memory is going to be part of System RAM until reboot. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: James Morris <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dan Williams <[email protected]> Cc: Keith Busch <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Huang Ying <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Yaowei Bai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16mm/hotplug: make remove_memory() interface usablePavel Tatashin2-23/+49
Presently the remove_memory() interface is inherently broken. It tries to remove memory but panics if some memory is not offline. The problem is that it is impossible to ensure that all memory blocks are offline as this function also takes lock_device_hotplug that is required to change memory state via sysfs. So, between calling this function and offlining all memory blocks there is always a window when lock_device_hotplug is released, and therefore, there is always a chance for a panic during this window. Make this interface to return an error if memory removal fails. This way it is safe to call this function without panicking machine, and also makes it symmetric to add_memory() which already returns an error. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: Huang Ying <[email protected]> Cc: James Morris <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Keith Busch <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Yaowei Bai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16device-dax: fix memory and resource leak if hotplug failsPavel Tatashin1-1/+4
Patch series ""Hotremove" persistent memory", v6. Recently, adding a persistent memory to be used like a regular RAM was added to Linux. This work extends this functionality to also allow hot removing persistent memory. We (Microsoft) have an important use case for this functionality. The requirement is for physical machines with small amount of RAM (~8G) to be able to reboot in a very short period of time (<1s). Yet, there is a userland state that is expensive to recreate (~2G). The solution is to boot machines with 2G preserved for persistent memory. Copy the state, and hotadd the persistent memory so machine still has all 8G available for runtime. Before reboot, offline and hotremove device-dax 2G, copy the memory that is needed to be preserved to pmem0 device, and reboot. The series of operations look like this: 1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps. and free ramdisk. 2. Convert raw pmem0 to devdax ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f 3. Hotadd to System RAM echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id echo online_movable > /sys/devices/system/memoryXXX/state 4. Before reboot hotremove device-dax memory from System RAM echo offline > /sys/devices/system/memoryXXX/state echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind 5. Create raw pmem0 device ndctl create-namespace --mode raw -e namespace0.0 -f 6. Copy the state that was stored by apps to ramdisk to pmem device 7. Do kexec reboot or reboot through firmware if firmware does not zero memory in pmem0 region (These machines have only regular volatile memory). So to have pmem0 device either memmap kernel parameter is used, or devices nodes in dtb are specified. This patch (of 3): When add_memory() fails, the resource and the memory should be freed. Link: http://lkml.kernel.org/r/[email protected] Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM") Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jiang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: Huang Ying <[email protected]> Cc: James Morris <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Keith Busch <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Yaowei Bai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16include/linux/lz4.h: fix spelling and copy-paste errors in documentationTom Levy1-9/+9
Fix a few spelling and grammar errors, and two places where fast/safe in the documentation did not match the function. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tom Levy <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Jiri Kosina <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16ipc/mqueue.c: only perform resource calculation if user validKees Cook1-9/+10
Andreas Christoforou reported: UBSAN: Undefined behaviour in ipc/mqueue.c:414:49 signed integer overflow: 9 * 2305843009213693951 cannot be represented in type 'long int' ... Call Trace: mqueue_evict_inode+0x8e7/0xa10 ipc/mqueue.c:414 evict+0x472/0x8c0 fs/inode.c:558 iput_final fs/inode.c:1547 [inline] iput+0x51d/0x8c0 fs/inode.c:1573 mqueue_get_inode+0x8eb/0x1070 ipc/mqueue.c:320 mqueue_create_attr+0x198/0x440 ipc/mqueue.c:459 vfs_mkobj+0x39e/0x580 fs/namei.c:2892 prepare_open ipc/mqueue.c:731 [inline] do_mq_open+0x6da/0x8e0 ipc/mqueue.c:771 Which could be triggered by: struct mq_attr attr = { .mq_flags = 0, .mq_maxmsg = 9, .mq_msgsize = 0x1fffffffffffffff, .mq_curmsgs = 0, }; if (mq_open("/testing", 0x40, 3, &attr) == (mqd_t) -1) perror("mq_open"); mqueue_get_inode() was correctly rejecting the giant mq_msgsize, and preparing to return -EINVAL. During the cleanup, it calls mqueue_evict_inode() which performed resource usage tracking math for updating "user", before checking if there was a valid "user" at all (which would indicate that the calculations would be sane). Instead, delay this check to after seeing a valid "user". The overflow was real, but the results went unused, so while the flaw is harmless, it's noisy for kernel fuzzers, so just fix it by moving the calculation under the non-NULL "user" where it actually gets used. Link: http://lkml.kernel.org/r/201906072207.ECB65450@keescook Signed-off-by: Kees Cook <[email protected]> Reported-by: Andreas Christoforou <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Al Viro <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT ↵Drew Davenport1-2/+4
architectures For architectures using __WARN_TAINT, the WARN_ON macro did not print out the "cut here" string. The other WARN_XXX macros would print "cut here" inside __warn_printk, which is not called for WARN_ON since it doesn't have a message to print. Link: http://lkml.kernel.org/r/[email protected] Fixes: a7bed27af194 ("bug: fix "cut here" location for __WARN_TAINT architectures") Signed-off-by: Drew Davenport <[email protected]> Acked-by: Kees Cook <[email protected]> Tested-by: Kees Cook <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16scripts/gdb: add helpers to find and list devicesLeonard Crestez2-0/+183
Add helper commands and functions for finding pointers to struct device by enumerating linux device bus/class infrastructure. This can be used to fetch subsystem and driver-specific structs: (gdb) p *$container_of($lx_device_find_by_class_name("net", "eth0"), "struct net_device", "dev") (gdb) p *$container_of($lx_device_find_by_bus_name("i2c", "0-004b"), "struct i2c_client", "dev") (gdb) p *(struct imx_port*)$lx_device_find_by_class_name("tty", "ttymxc1")->parent->driver_data Several generic "lx-device-list" functions are included to enumerate devices by bus and class: (gdb) lx-device-list-bus usb (gdb) lx-device-list-class (gdb) lx-device-list-tree &platform_bus Similar information is available in /sys but pointer values are deliberately hidden. Link: http://lkml.kernel.org/r/c948628041311cbf1b9b4cff3dda7d2073cb3eaa.1561492937.git.leonard.crestez@nxp.com Signed-off-by: Leonard Crestez <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Cc: Kieran Bingham <[email protected]> Cc: Jan Kiszka <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16scripts/gdb: add lx-genpd-summary commandLeonard Crestez2-0/+84
This is like /sys/kernel/debug/pm/pm_genpd_summary except it's accessible through a debugger. This can be useful if the target crashes or hangs because power domains were not properly enabled. Link: http://lkml.kernel.org/r/f9ee627a0d4f94b894aa202fee8a98444049bed8.1561492937.git.leonard.crestez@nxp.com Signed-off-by: Leonard Crestez <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Cc: Kieran Bingham <[email protected]> Cc: Jan Kiszka <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctlMiroslav Lichvar1-0/+8
The PPS assert/clear offset corrections are set by the PPS_SETPARAMS ioctl in the pps_ktime structs, which also contain flags. The flags are not initialized by applications (using the timepps.h header) and they are not used by the kernel for anything except returning them back in the PPS_GETPARAMS ioctl. Set the flags to zero to make it clear they are unused and avoid leaking uninitialized data of the PPS_SETPARAMS caller to other applications that have a read access to the PPS device. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Miroslav Lichvar <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Rodolfo Giometti <[email protected]> Cc: Greg KH <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16kernel/pid.c: convert struct pid count to refcount_tJoel Fernandes (Google)2-7/+7
struct pid's count is an atomic_t field used as a refcount. Use refcount_t for it which is basically atomic_t but does additional checking to prevent use-after-free bugs. For memory ordering, the only change is with the following: - if ((atomic_read(&pid->count) == 1) || - atomic_dec_and_test(&pid->count)) { + if (refcount_dec_and_test(&pid->count)) { kmem_cache_free(ns->pid_cachep, pid); Here the change is from: Fully ordered --> RELEASE + ACQUIRE (as per refcount-vs-atomic.rst) This ACQUIRE should take care of making sure the free happens after the refcount_dec_and_test(). The above hunk also removes atomic_read() since it is not needed for the code to work and it is unclear how beneficial it is. The removal lets refcount_dec_and_test() check for cases where get_pid() happened before the object was freed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joel Fernandes (Google) <[email protected]> Reviewed-by: Andrea Parri <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Elena Reshetova <[email protected]> Cc: Jann Horn <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: KJ Tsanaktsidis <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some stringsDan Carpenter1-0/+2
The dev_info.name[] array has space for RIO_MAX_DEVNAME_SZ + 1 characters. But the problem here is that we don't ensure that the user put a NUL terminator on the end of the string. It could lead to an out of bounds read. Link: http://lkml.kernel.org/r/20190529110601.GB19119@mwanda Fixes: e8de370188d0 ("rapidio: add mport char device driver") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Alexandre Bounine <[email protected]> Cc: Ira Weiny <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()Oleg Nesterov1-33/+13
Now that restore_saved_sigmask_unless() is always called with the same argument right before poll_select_copy_remaining() we can move it into poll_select_copy_remaining() and make it the only caller of restore() in fs/select.c. The patch also renames poll_select_copy_remaining(), poll_select_finish() looks better after this change. kern_select() doesn't use set_user_sigmask(), so in this case poll_select_finish() does restore_saved_sigmask_unless() "for no reason". But this won't hurt, and WARN_ON(!TIF_SIGPENDING) is still valid. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Al Viro <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: David Laight <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Deepa Dinamani <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Eric Wong <[email protected]> Cc: Jason Baron <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16select: change do_poll() to return -ERESTARTNOHAND rather than -EINTROleg Nesterov1-23/+7
do_poll() returns -EINTR if interrupted and after that all its callers have to translate it into -ERESTARTNOHAND. Change do_poll() to return -ERESTARTNOHAND and update (simplify) the callers. Note that this also unifies all users of restore_saved_sigmask_unless(), see the next patch. Linus: : The *right* return value will actually be then chosen by : poll_select_copy_remaining(), which will turn ERESTARTNOHAND to EINTR : when it can't update the timeout. : : Except for the cases that use restart_block and do that instead and : don't have the whole timeout restart issue as a result. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Al Viro <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: David Laight <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Deepa Dinamani <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Eric Wong <[email protected]> Cc: Jason Baron <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16signal: simplify set_user_sigmask/restore_user_sigmaskOleg Nesterov8-108/+57
task->saved_sigmask and ->restore_sigmask are only used in the ret-from- syscall paths. This means that set_user_sigmask() can save ->blocked in ->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked was modified. This way the callers do not need 2 sigset_t's passed to set/restore and restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns into the trivial helper which just calls restore_saved_sigmask(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Deepa Dinamani <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Eric Wong <[email protected]> Cc: Jason Baron <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Al Viro <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: David Laight <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16signal: reorder struct sighand_structAlexey Dobriyan1-2/+2
struct sighand_struct::siglock field is the most used field by far, put it first so that is can be accessed without IMM8 or IMM32 encoding on x86_64. Space savings (on trimmed down VM test config): add/remove: 0/0 grow/shrink: 8/68 up/down: 49/-1147 (-1098) Function old new delta complete_signal 512 533 +21 do_signalfd4 335 346 +11 __cleanup_sighand 39 43 +4 unhandled_signal 49 52 +3 prepare_signal 692 695 +3 ignore_signals 37 40 +3 __tty_check_change.part 248 251 +3 ksys_unshare 780 781 +1 sighand_ctor 33 29 -4 ptrace_trap_notify 60 56 -4 sigqueue_free 98 91 -7 run_posix_cpu_timers 1389 1382 -7 proc_pid_status 2448 2441 -7 proc_pid_limits 344 337 -7 posix_cpu_timer_rearm 222 215 -7 posix_cpu_timer_get 249 242 -7 kill_pid_info_as_cred 243 236 -7 freeze_task 197 190 -7 flush_old_exec 1873 1866 -7 do_task_stat 3363 3356 -7 do_send_sig_info 98 91 -7 do_group_exit 147 140 -7 init_sighand 2088 2080 -8 do_notify_parent_cldstop 399 391 -8 signalfd_cleanup 50 41 -9 do_notify_parent 557 545 -12 __send_signal 1029 1017 -12 ptrace_stop 590 577 -13 get_signal 1576 1563 -13 __lock_task_sighand 112 99 -13 zap_pid_ns_processes 391 377 -14 update_rlimit_cpu 78 64 -14 tty_signal_session_leader 413 399 -14 tty_open_proc_set_tty 149 135 -14 tty_jobctrl_ioctl 936 922 -14 set_cpu_itimer 339 325 -14 ptrace_resume 226 212 -14 ptrace_notify 110 96 -14 proc_clear_tty 81 67 -14 posix_cpu_timer_del 229 215 -14 kernel_sigaction 156 142 -14 getrusage 977 963 -14 get_current_tty 98 84 -14 force_sigsegv 89 75 -14 force_sig_info 205 191 -14 flush_signals 83 69 -14 flush_itimer_signals 85 71 -14 do_timer_create 1120 1106 -14 do_sigpending 88 74 -14 do_signal_stop 537 523 -14 cgroup_init_fs_context 644 630 -14 call_usermodehelper_exec_async 402 388 -14 calculate_sigpending 58 44 -14 __x64_sys_timer_delete 248 234 -14 __set_current_blocked 80 66 -14 __ptrace_unlink 310 296 -14 __ptrace_detach.part 187 173 -14 send_sigqueue 362 347 -15 get_cpu_itimer 214 199 -15 signalfd_poll 175 159 -16 dequeue_signal 340 323 -17 do_getitimer 192 174 -18 release_task.part 1060 1040 -20 ptrace_peek_siginfo 408 387 -21 posix_cpu_timer_set 827 806 -21 exit_signals 437 416 -21 do_sigaction 541 520 -21 do_setitimer 485 464 -21 disassociate_ctty.part 545 517 -28 __x64_sys_rt_sigtimedwait 721 679 -42 __x64_sys_ptrace 1319 1277 -42 ptrace_request 1828 1782 -46 signalfd_read 507 459 -48 wait_consider_task 2027 1971 -56 do_coredump 3672 3616 -56 copy_process.part 6936 6871 -65 Link: http://lkml.kernel.org/r/20190503192800.GA18004@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16selftests/ptrace: add a test case for PTRACE_GET_SYSCALL_INFODmitry V. Levin3-1/+273
Check whether PTRACE_GET_SYSCALL_INFO semantics implemented in the kernel matches userspace expectations. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry V. Levin <[email protected]> Acked-by: Shuah Khan <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Elvira Khabirova <[email protected]> Cc: Eugene Syromyatnikov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Helge Deller <[email protected]> [parisc] Cc: James E.J. Bottomley <[email protected]> Cc: James Hogan <[email protected]> Cc: kbuild test robot <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16ptrace: add PTRACE_GET_SYSCALL_INFO requestElvira Khabirova4-8/+150
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain details of the syscall the tracee is blocked in. There are two reasons for a special syscall-related ptrace request. Firstly, with the current ptrace API there are cases when ptracer cannot retrieve necessary information about syscalls. Some examples include: * The notorious int-0x80-from-64-bit-task issue. See [1] for details. In short, if a 64-bit task performs a syscall through int 0x80, its tracer has no reliable means to find out that the syscall was, in fact, a compat syscall, and misidentifies it. * Syscall-enter-stop and syscall-exit-stop look the same for the tracer. Common practice is to keep track of the sequence of ptrace-stops in order not to mix the two syscall-stops up. But it is not as simple as it looks; for example, strace had a (just recently fixed) long-standing bug where attaching strace to a tracee that is performing the execve system call led to the tracer identifying the following syscall-exit-stop as syscall-enter-stop, which messed up all the state tracking. * Since the introduction of commit 84d77d3f06e7 ("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA and process_vm_readv become unavailable when the process dumpable flag is cleared. On such architectures as ia64 this results in all syscall arguments being unavailable for the tracer. Secondly, ptracers also have to support a lot of arch-specific code for obtaining information about the tracee. For some architectures, this requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall argument and return value. ptrace(2) man page: long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data); ... PTRACE_GET_SYSCALL_INFO Retrieve information about the syscall that caused the stop. The information is placed into the buffer pointed by "data" argument, which should be a pointer to a buffer of type "struct ptrace_syscall_info". The "addr" argument contains the size of the buffer pointed to by "data" argument (i.e., sizeof(struct ptrace_syscall_info)). The return value contains the number of bytes available to be written by the kernel. If the size of data to be written by the kernel exceeds the size specified by "addr" argument, the output is truncated. [[email protected]: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Elvira Khabirova <[email protected]> Co-developed-by: Dmitry V. Levin <[email protected]> Signed-off-by: Dmitry V. Levin <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Cc: Eugene Syromyatnikov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Helge Deller <[email protected]> [parisc] Cc: James E.J. Bottomley <[email protected]> Cc: James Hogan <[email protected]> Cc: kbuild test robot <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16powerpc: define syscall_get_error()Dmitry V. Levin1-0/+10
syscall_get_error() is required to be implemented on this architecture in addition to already implemented syscall_get_nr(), syscall_get_arguments(), syscall_get_return_value(), and syscall_get_arch() functions in order to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry V. Levin <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: Elvira Khabirova <[email protected]> Cc: Eugene Syromyatnikov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Helge Deller <[email protected]> [parisc] Cc: James E.J. Bottomley <[email protected]> Cc: James Hogan <[email protected]> Cc: kbuild test robot <[email protected]> Cc: Kees Cook <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16parisc: define syscall_get_error()Dmitry V. Levin1-0/+7
syscall_get_error() is required to be implemented on all architectures in addition to already implemented syscall_get_nr(), syscall_get_arguments(), syscall_get_return_value(), and syscall_get_arch() functions in order to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry V. Levin <[email protected]> Acked-by: Helge Deller <[email protected]> [parisc] Cc: James E.J. Bottomley <[email protected]> Cc: Elvira Khabirova <[email protected]> Cc: Eugene Syromyatnikov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greentime Hu <[email protected]> Cc: James Hogan <[email protected]> Cc: kbuild test robot <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16mips: define syscall_get_error()Dmitry V. Levin1-0/+6
syscall_get_error() is required to be implemented on all architectures in addition to already implemented syscall_get_nr(), syscall_get_arguments(), syscall_get_return_value(), and syscall_get_arch() functions in order to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry V. Levin <[email protected]> Acked-by: Paul Burton <[email protected]> Cc: Elvira Khabirova <[email protected]> Cc: Eugene Syromyatnikov <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: James Hogan <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Helge Deller <[email protected]> [parisc] Cc: James E.J. Bottomley <[email protected]> Cc: kbuild test robot <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16hexagon: define syscall_get_error() and syscall_get_return_value()Dmitry V. Levin1-0/+14
syscall_get_* functions are required to be implemented on all architectures in order to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request. This adds remaining 2 syscall_get_* functions as documented in asm-generic/syscall.h: syscall_get_error and syscall_get_return_value. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry V. Levin <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Elvira Khabirova <[email protected]> Cc: Eugene Syromyatnikov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Helge Deller <[email protected]> [parisc] Cc: James E.J. Bottomley <[email protected]> Cc: James Hogan <[email protected]> Cc: kbuild test robot <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16nds32: fix asm/syscall.hDmitry V. Levin1-10/+17
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain details of the syscall the tracee is blocked in. There are two reasons for a special syscall-related ptrace request. Firstly, with the current ptrace API there are cases when ptracer cannot retrieve necessary information about syscalls. Some examples include: * The notorious int-0x80-from-64-bit-task issue. See [1] for details. In short, if a 64-bit task performs a syscall through int 0x80, its tracer has no reliable means to find out that the syscall was, in fact, a compat syscall, and misidentifies it. * Syscall-enter-stop and syscall-exit-stop look the same for the tracer. Common practice is to keep track of the sequence of ptrace-stops in order not to mix the two syscall-stops up. But it is not as simple as it looks; for example, strace had a (just recently fixed) long-standing bug where attaching strace to a tracee that is performing the execve system call led to the tracer identifying the following syscall-exit-stop as syscall-enter-stop, which messed up all the state tracking. * Since the introduction of commit 84d77d3f06e7 ("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA and process_vm_readv become unavailable when the process dumpable flag is cleared. On such architectures as ia64 this results in all syscall arguments being unavailable for the tracer. Secondly, ptracers also have to support a lot of arch-specific code for obtaining information about the tracee. For some architectures, this requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall argument and return value. PTRACE_GET_SYSCALL_INFO returns the following structure: struct ptrace_syscall_info { __u8 op; /* PTRACE_SYSCALL_INFO_* */ __u32 arch __attribute__((__aligned__(sizeof(__u32)))); __u64 instruction_pointer; __u64 stack_pointer; union { struct { __u64 nr; __u64 args[6]; } entry; struct { __s64 rval; __u8 is_error; } exit; struct { __u64 nr; __u64 args[6]; __u32 ret_data; } seccomp; }; }; The structure was chosen according to [2], except for the following changes: * seccomp substructure was added as a superset of entry substructure * the type of nr field was changed from int to __u64 because syscall numbers are, as a practical matter, 64 bits * stack_pointer field was added along with instruction_pointer field since it is readily available and can save the tracer from extra PTRACE_GETREGS/PTRACE_GETREGSET calls * arch is always initialized to aid with tracing system calls such as execve() * instruction_pointer and stack_pointer are always initialized so they could be easily obtained for non-syscall stops * a boolean is_error field was added along with rval field, this way the tracer can more reliably distinguish a return value from an error value strace has been ported to PTRACE_GET_SYSCALL_INFO. Starting with release 4.26, strace uses PTRACE_GET_SYSCALL_INFO API as the preferred mechanism of obtaining syscall information. [1] https://lore.kernel.org/lkml/CA+55aFzcSVmdDj9Lh_gdbz1OzHyEm6ZrGPBDAJnywm2LF_eVyg@mail.gmail.com/ [2] https://lore.kernel.org/lkml/CAObL_7GM0n80N7J_DFw_eQyfLyzq+sf4y2AvsCCV88Tb3AwEHA@mail.gmail.com/ This patch (of 7): All syscall_get_*() and syscall_set_*() functions must be defined as static inline as on all other architectures, otherwise asm/syscall.h cannot be included in more than one compilation unit. This bug has to be fixed in order to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request. Link: http://lkml.kernel.org/r/[email protected] Fixes: 1932fbe36e02 ("nds32: System calls handling") Signed-off-by: Dmitry V. Levin <[email protected]> Reported-by: kbuild test robot <[email protected]> Acked-by: Greentime Hu <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Elvira Khabirova <[email protected]> Cc: Eugene Syromyatnikov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Helge Deller <[email protected]> [parisc] Cc: James E.J. Bottomley <[email protected]> Cc: James Hogan <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>