aboutsummaryrefslogtreecommitdiff
path: root/arch/um
AgeCommit message (Collapse)AuthorFilesLines
2016-06-22um: track 'parent' device in a local variableDan Williams1-2/+3
In preparation for the removal of 'driverfs_dev' from 'struct gendisk' use a local variable to track the parented vs un-parented case in ubd_disk_register(). Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Bart Van Assche <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-06-14um/ptrace: run seccomp after ptraceKees Cook1-5/+4
Close the hole where ptrace can change a syscall out from under seccomp. Signed-off-by: Kees Cook <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: [email protected]
2016-06-14seccomp: Add a seccomp_data parameter secure_computing()Andy Lutomirski1-1/+1
Currently, if arch code wants to supply seccomp_data directly to seccomp (which is generally much faster than having seccomp do it using the syscall_get_xyz() API), it has to use the two-phase seccomp hooks. Add it to the easy hooks, too. Cc: [email protected] Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2016-06-07GCC plugin infrastructureEmese Revfy1-0/+1
This patch allows to build the whole kernel with GCC plugins. It was ported from grsecurity/PaX. The infrastructure supports building out-of-tree modules and building in a separate directory. Cross-compilation is supported too. Currently the x86, arm, arm64 and uml architectures enable plugins. The directory of the gcc plugins is scripts/gcc-plugins. You can use a file or a directory there. The plugins compile with these options: * -fno-rtti: gcc is compiled with this option so the plugins must use it too * -fno-exceptions: this is inherited from gcc too * -fasynchronous-unwind-tables: this is inherited from gcc too * -ggdb: it is useful for debugging a plugin (better backtrace on internal errors) * -Wno-narrowing: to suppress warnings from gcc headers (ipa-utils.h) * -Wno-unused-variable: to suppress warnings from gcc headers (gcc_version variable, plugin-version.h) The infrastructure introduces a new Makefile target called gcc-plugins. It supports all gcc versions from 4.5 to 6.0. The scripts/gcc-plugin.sh script chooses the proper host compiler (gcc-4.7 can be built by either gcc or g++). This script also checks the availability of the included headers in scripts/gcc-plugins/gcc-common.h. The gcc-common.h header contains frequently included headers for GCC plugins and it has a compatibility layer for the supported gcc versions. The gcc-generate-*-pass.h headers automatically generate the registration structures for GIMPLE, SIMPLE_IPA, IPA and RTL passes. Note that 'make clean' keeps the *.so files (only the distclean or mrproper targets clean all) because they are needed for out-of-tree modules. Based on work created by the PaX Team. Signed-off-by: Emese Revfy <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Michal Marek <[email protected]>
2016-06-07block, drivers: add REQ_OP_FLUSH operationMike Christie1-1/+1
This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-05-27Merge branch 'for-linus-4.7-rc1' of ↵Linus Torvalds3-9/+23
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: "This contains a nice FPU fixup from Eli Cooper for UML" * 'for-linus-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: add extended processor state save/restore support um: extend fpstate to _xstate to support YMM registers um: fix FPU state preservation around signal handlers
2016-05-23arch/defconfig: remove CONFIG_RESOURCE_COUNTERSKonstantin Khlebnikov2-2/+0
This option was replaced by PAGE_COUNTER which is selected by MEMCG. Signed-off-by: Konstantin Khlebnikov <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-21um: add extended processor state save/restore supportEli Cooper2-1/+3
This patch extends save_fp_registers() and restore_fp_registers() to use PTRACE_GETREGSET and PTRACE_SETREGSET with the XSTATE note type, adding support for new processor state extensions between context switches. When the new ptrace requests are unavailable, it falls back to the old PTRACE_GETFPREGS and PTRACE_SETFPREGS methods, which have been renamed to save_i387_registers() and restore_i387_registers(). Now these functions expect *fp_regs to have the space of an _xstate struct. Thus, this also makes ptrace in UML responde to PTRACE_GETFPREGS/_SETFPREG requests with a user_i387_struct (thus independent from HOST_FP_SIZE), and by calling save_i387_registers() and restore_i387_registers() instead of the extended save_fp_registers() and restore_fp_registers() functions. Signed-off-by: Eli Cooper <[email protected]>
2016-05-21um: extend fpstate to _xstate to support YMM registersEli Cooper1-8/+20
Extends fpstate to _xstate, in order to hold AVX/YMM registers. To avoid oversized stack frame, the following functions have been refactored by using malloc. - sig_handler_common - timer_real_alarm_handler Signed-off-by: Eli Cooper <[email protected]>
2016-05-20exit_thread: remove empty bodiesJiri Slaby1-4/+0
Define HAVE_EXIT_THREAD for archs which want to do something in exit_thread. For others, let's define exit_thread as an empty inline. This is a cleanup before we change the prototype of exit_thread to accept a task parameter. [[email protected]: fix mips] Signed-off-by: Jiri Slaby <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Aurelien Jacquiot <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Liqin <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Howells <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Hogan <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Koichi Yasutake <[email protected]> Cc: Lennox Wu <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Steven Miao <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-2/+2
Pull networking updates from David Miller: "Highlights: 1) Support SPI based w5100 devices, from Akinobu Mita. 2) Partial Segmentation Offload, from Alexander Duyck. 3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE. 4) Allow cls_flower stats offload, from Amir Vadai. 5) Implement bpf blinding, from Daniel Borkmann. 6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is actually using FASYNC these atomics are superfluous. From Eric Dumazet. 7) Run TCP more preemptibly, also from Eric Dumazet. 8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e driver, from Gal Pressman. 9) Allow creating ppp devices via rtnetlink, from Guillaume Nault. 10) Improve BPF usage documentation, from Jesper Dangaard Brouer. 11) Support tunneling offloads in qed, from Manish Chopra. 12) aRFS offloading in mlx5e, from Maor Gottlieb. 13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo Leitner. 14) Add MSG_EOR support to TCP, this allows controlling packet coalescing on application record boundaries for more accurate socket timestamp sampling. From Martin KaFai Lau. 15) Fix alignment of 64-bit netlink attributes across the board, from Nicolas Dichtel. 16) Per-vlan stats in bridging, from Nikolay Aleksandrov. 17) Several conversions of drivers to ethtool ksettings, from Philippe Reynes. 18) Checksum neutral ILA in ipv6, from Tom Herbert. 19) Factorize all of the various marvell dsa drivers into one, from Vivien Didelot 20) Add VF support to qed driver, from Yuval Mintz" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits) Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m" Revert "phy dp83867: Make rgmii parameters optional" r8169: default to 64-bit DMA on recent PCIe chips phy dp83867: Make rgmii parameters optional phy dp83867: Fix compilation with CONFIG_OF_MDIO=m bpf: arm64: remove callee-save registers use for tmp registers asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions switchdev: pass pointer to fib_info instead of copy net_sched: close another race condition in tcf_mirred_release() tipc: fix nametable publication field in nl compat drivers: net: Don't print unpopulated net_device name qed: add support for dcbx. ravb: Add missing free_irq() calls to ravb_close() qed: Remove a stray tab net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fec-mpc52xx: use phydev from struct net_device bpf, doc: fix typo on bpf_asm descriptions stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fs-enet: use phydev from struct net_device ...
2016-05-04treewide: replace dev->trans_start update with helperFlorian Westphal1-2/+2
Replace all trans_start updates with netif_trans_update helper. change was done via spatch: struct net_device *d; @@ - d->trans_start = jiffies + netif_trans_update(d) Compile tested only. Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Florian Westphal <[email protected]> Acked-by: Felipe Balbi <[email protected]> Acked-by: Mugunthan V N <[email protected]> Acked-by: Antonio Quartulli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-12um: switch to using blk_queue_write_cache()Jens Axboe1-1/+1
Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2016-03-22fs/coredump: prevent fsuid=0 dumps into user-controlled directoriesJann Horn1-1/+1
This commit fixes the following security hole affecting systems where all of the following conditions are fulfilled: - The fs.suid_dumpable sysctl is set to 2. - The kernel.core_pattern sysctl's value starts with "/". (Systems where kernel.core_pattern starts with "|/" are not affected.) - Unprivileged user namespace creation is permitted. (This is true on Linux >=3.8, but some distributions disallow it by default using a distro patch.) Under these conditions, if a program executes under secure exec rules, causing it to run with the SUID_DUMP_ROOT flag, then unshares its user namespace, changes its root directory and crashes, the coredump will be written using fsuid=0 and a path derived from kernel.core_pattern - but this path is interpreted relative to the root directory of the process, allowing the attacker to control where a coredump will be written with root privileges. To fix the security issue, always interpret core_pattern for dumps that are written under SUID_DUMP_ROOT relative to the root directory of init. Signed-off-by: Jann Horn <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-03-20Merge branch 'mm-pkeys-for-linus' of ↵Linus Torvalds1-0/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 protection key support from Ingo Molnar: "This tree adds support for a new memory protection hardware feature that is available in upcoming Intel CPUs: 'protection keys' (pkeys). There's a background article at LWN.net: https://lwn.net/Articles/643797/ The gist is that protection keys allow the encoding of user-controllable permission masks in the pte. So instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a (handful of) protection mask variants and can change the masks runtime relatively cheaply, without having to change every single page in the affected virtual memory range. This allows the dynamic switching of the protection bits of large amounts of virtual memory, via user-space instructions. It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit (see more about that below). This tree adds the MM infrastructure and low level x86 glue needed for that, plus it adds a high level API to make use of protection keys - if a user-space application calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies PROT_READ as well. Unreadable executable mappings have security advantages: they cannot be read via information leaks to figure out ASLR details, nor can they be scanned for ROP gadgets - and they cannot be used by exploits for data purposes either. We know about no user-space code that relies on pure PROT_EXEC mappings today, but binary loaders could start making use of this new feature to map binaries and libraries in a more secure fashion. There is other pending pkeys work that offers more high level system call APIs to manage protection keys - but those are not part of this pull request. Right now there's a Kconfig that controls this feature (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled (like most x86 CPU feature enablement code that has no runtime overhead), but it's not user-configurable at the moment. If there's any serious problem with this then we can make it configurable and/or flip the default" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/mm/pkeys: Fix mismerge of protection keys CPUID bits mm/pkeys: Fix siginfo ABI breakage caused by new u64 field x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA mm/core, x86/mm/pkeys: Add execute-only protection keys support x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags x86/mm/pkeys: Allow kernel to modify user pkey rights register x86/fpu: Allow setting of XSAVE state x86/mm: Factor out LDT init from context init mm/core, x86/mm/pkeys: Add arch_validate_pkey() mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits() x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU x86/mm/pkeys: Add Kconfig prompt to existing config option x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps x86/mm/pkeys: Dump PKRU with other kernel registers mm/core, x86/mm/pkeys: Differentiate instruction fetches x86/mm/pkeys: Optimize fault handling in access_error() mm/core: Do not enforce PKEY permissions on remote mm access um, pkeys: Add UML arch_*_access_permitted() methods mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys x86/mm/gup: Simplify get_user_pages() PTE bit handling ...
2016-03-17mm: cleanup *pte_alloc* interfacesKirill A. Shutemov1-1/+1
There are few things about *pte_alloc*() helpers worth cleaning up: - 'vma' argument is unused, let's drop it; - most __pte_alloc() callers do speculative check for pmd_none(), before taking ptl: let's introduce pte_alloc() macro which does the check. The only direct user of __pte_alloc left is userfaultfd, which has different expectation about atomicity wrt pmd. - pte_alloc_map() and pte_alloc_map_lock() are redefined using pte_alloc(). [[email protected]: fix build for arm64 hugetlbpage] [[email protected]: fix arch/arm/mm/mmu.c some more] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Sudeep Holla <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-03-05um: Export pm_power_offRichard Weinberger1-0/+1
...modules are using this symbol. Export it like all other archs to. Signed-off-by: Richard Weinberger <[email protected]>
2016-03-05Revert "um: Fix get_signal() usage"Richard Weinberger1-1/+1
Commit db2f24dc240856fb1d78005307f1523b7b3c121b was plain wrong. I did not realize the we are allowed to loop here. In fact we have to loop and must not return to userspace before all SIGSEGVs have been delivered. Other archs do this directly in their entry code, UML does it here. Reported-by: Al Viro <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2016-02-18um, pkeys: Add UML arch_*_access_permitted() methodsDave Hansen1-0/+14
UML has a special mmu_context.h and needs updates whenever the generic one is updated. Signed-off-by: Dave Hansen <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-05um: asm/page.h: remove the pte_high member from struct pte_tNicolai Stange1-13/+10
Commit 16da306849d0 ("um: kill pfn_t") introduced a compile warning for defconfig (SUBARCH=i386): arch/um/kernel/skas/mmu.c:38:206: warning: right shift count >= width of type [-Wshift-count-overflow] Aforementioned patch changes the definition of the phys_to_pfn() macro from ((pfn_t) ((p) >> PAGE_SHIFT)) to ((p) >> PAGE_SHIFT) This effectively changes the phys_to_pfn() expansion's type from unsigned long long to unsigned long. Through the callchain init_stub_pte() => mk_pte(), the expansion of phys_to_pfn() is (indirectly) fed into the 'phys' argument of the pte_set_val(pte, phys, prot) macro, eventually leading to (pte).pte_high = (phys) >> 32; This results in the warning from above. Since UML only deals with 32 bit addresses, the upper 32 bits from 'phys' used to be always zero anyway. Also, all page protection flags defined by UML don't use any bits beyond bit 9. Since the contents of a PTE are defined within architecture scope only, the ->pte_high member can be safely removed. Remove the ->pte_high member from struct pte_t. Rename ->pte_low to ->pte. Adapt the pte helper macros in arch/um/include/asm/page.h. Noteworthy is the pte_copy() macro where a smp_wmb() gets dropped. This write barrier doesn't seem to be paired with any read barrier though and thus, was useless anyway. Fixes: 16da306849d0 ("um: kill pfn_t") Signed-off-by: Nicolai Stange <[email protected]> Cc: Dan Williams <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Nicolai Stange <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15um: kill pfn_tDan Williams3-7/+6
The core has developed a need for a "pfn_t" type [1]. Convert the usage of pfn_t by usermode-linux to an unsigned long, and update pfn_to_phys() to drop its expectation of a typed pfn. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html Signed-off-by: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-12Merge branch 'work.misc' of ↵Linus Torvalds2-18/+6
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of stuff. That probably should've been 5 or 6 separate branches, but by the time I'd realized how large and mixed that bag had become it had been too close to -final to play with rebasing. Some fs/namei.c cleanups there, memdup_user_nul() introduction and switching open-coded instances, burying long-dead code, whack-a-mole of various kinds, several new helpers for ->llseek(), assorted cleanups and fixes from various people, etc. One piece probably deserves special mention - Neil's lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets called without ->i_mutex and tries to avoid ever taking it. That, of course, means that it's not useful for any directory modifications, but things like getting inode attributes in nfds readdirplus are fine with that. I really should've asked for moratorium on lookup-related changes this cycle, but since I hadn't done that early enough... I *am* asking for that for the coming cycle, though - I'm going to try and get conversion of i_mutex to rwsem with ->lookup() done under lock taken shared. There will be a patch closer to the end of the window, along the lines of the one Linus had posted last May - mechanical conversion of ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/ inode_is_locked()/inode_lock_nested(). To quote Linus back then: ----- | This is an automated patch using | | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/' | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/' | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/' | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/' | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/' | | with a very few manual fixups ----- I'm going to send that once the ->i_mutex-affecting stuff in -next gets mostly merged (or when Linus says he's about to stop taking merges)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) nfsd: don't hold i_mutex over userspace upcalls fs:affs:Replace time_t with time64_t fs/9p: use fscache mutex rather than spinlock proc: add a reschedule point in proc_readfd_common() logfs: constify logfs_block_ops structures fcntl: allow to set O_DIRECT flag on pipe fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE fs: xattr: Use kvfree() [s390] page_to_phys() always returns a multiple of PAGE_SIZE nbd: use ->compat_ioctl() fs: use block_device name vsprintf helper lib/vsprintf: add %*pg format specifier fs: use gendisk->disk_name where possible poll: plug an unused argument to do_poll amdkfd: don't open-code memdup_user() cdrom: don't open-code memdup_user() rsxx: don't open-code memdup_user() mtip32xx: don't open-code memdup_user() [um] mconsole: don't open-code memdup_user_nul() [um] hostaudio: don't open-code memdup_user() ...
2016-01-10um: Use race-free temporary file creationMickaël Salaün1-0/+11
Open the memory mapped file with the O_TMPFILE flag when available. Signed-off-by: Mickaël Salaün <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Acked-by: Tristan Schmelcher <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2016-01-10um: Do not set unsecure permission for temporary fileMickaël Salaün1-6/+0
Remove the insecure 0777 mode for temporary file to prohibit other users to change the executable mapped code. An attacker could gain access to the mapped file descriptor from the temporary file (before it is unlinked) in a read-only mode but it should not be accessible in write mode to avoid arbitrary code execution. To not change the hostfs behavior, the temporary file creation permission now depends on the current umask(2) and the implementation of mkstemp(3). Signed-off-by: Mickaël Salaün <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Acked-by: Tristan Schmelcher <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2016-01-10um: Add seccomp supportMickaël Salaün4-0/+24
This brings SECCOMP_MODE_STRICT and SECCOMP_MODE_FILTER support through prctl(2) and seccomp(2) to User-mode Linux for i386 and x86_64 subarchitectures. secure_computing() is called first in handle_syscall() so that the syscall emulation will be aborted quickly if matching a seccomp rule. This is inspired from Meredydd Luff's patch (https://gerrit.chromium.org/gerrit/21425). Signed-off-by: Mickaël Salaün <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Will Drewry <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: James Hogan <[email protected]> Cc: Meredydd Luff <[email protected]> Cc: David Drysdale <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Acked-by: Kees Cook <[email protected]>
2016-01-10um: Add full asm/syscall.h supportMickaël Salaün1-0/+138
Add subarchitecture-independent implementation of asm-generic/syscall.h allowing access to user system call parameters and results: * syscall_get_nr() * syscall_rollback() * syscall_get_error() * syscall_get_return_value() * syscall_set_return_value() * syscall_get_arguments() * syscall_set_arguments() * syscall_get_arch() provided by arch/x86/um/asm/syscall.h This provides the necessary syscall helpers needed by HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error(). This is inspired from Meredydd Luff's patch (https://gerrit.chromium.org/gerrit/21425). Signed-off-by: Mickaël Salaün <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Will Drewry <[email protected]> Cc: Meredydd Luff <[email protected]> Cc: David Drysdale <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Acked-by: Kees Cook <[email protected]>
2016-01-10um: Fix ptrace GETREGS/SETREGS bugsMickaël Salaün3-20/+14
This fix two related bugs: * PTRACE_GETREGS doesn't get the right orig_ax (syscall) value * PTRACE_SETREGS can't set the orig_ax value (erased by initial value) Get rid of the now useless and error-prone get_syscall(). Fix inconsistent behavior in the ptrace implementation for i386 when updating orig_eax automatically update the syscall number as well. This is now updated in handle_syscall(). Signed-off-by: Mickaël Salaün <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Will Drewry <[email protected]> Cc: Thomas Meyer <[email protected]> Cc: Nicolas Iooss <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Meredydd Luff <[email protected]> Cc: David Drysdale <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Acked-by: Kees Cook <[email protected]>
2016-01-10um: Update UBD to use pread/pwrite family of functionsAnton Ivanov3-22/+26
This decreases the number of syscalls per read/write by half. Signed-off-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2016-01-10um: Do not change hard IRQ flags in soft IRQ processingAnton Ivanov1-0/+23
Software IRQ processing in generic architectures assumes that the exit out of hard IRQ may have re-enabled interrupts (some architectures may have an implicit EOI). It presumes them enabled and toggles the flags once more just in case unless this is turned off in the architecture specific hardirq.h by setting __ARCH_IRQ_EXIT_IRQS_DISABLED This patch adds this to UML where due to the way IRQs are handled it is an optimization (it works fine without it too). Signed-off-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2016-01-10um: Prevent IRQ handler reentrancyAnton Ivanov1-1/+15
The existing IRQ handler design in UML does not prevent reentrancy This is mitigated by fd-enable/fd-disable semantics for the IO portion of the UML subsystem. The timer, however, can and is re-entered resulting in very deep stack usage and occasional stack exhaustion. This patch prevents this by checking if there is a timer interrupt in-flight before processing any pending timer interrupts. Signed-off-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2016-01-10uml: flush stdout before forkingVegard Nossum1-0/+2
I was seeing some really weird behaviour where piping UML's output somewhere would cause output to get duplicated: $ ./vmlinux | head -n 40 Checking that ptrace can change system call numbers...Core dump limits : soft - 0 hard - NONE OK Checking syscall emulation patch for ptrace...Core dump limits : soft - 0 hard - NONE OK Checking advanced syscall emulation patch for ptrace...Core dump limits : soft - 0 hard - NONE OK Core dump limits : soft - 0 hard - NONE This is because these tests do a fork() which duplicates the non-empty stdout buffer, then glibc flushes the duplicated buffer as each child exits. A simple workaround is to flush before forking. Cc: [email protected] Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2016-01-04[um] mconsole: don't open-code memdup_user_nul()Al Viro1-11/+3
Signed-off-by: Al Viro <[email protected]>
2016-01-04[um] hostaudio: don't open-code memdup_user()Al Viro1-7/+3
Signed-off-by: Al Viro <[email protected]>
2015-12-08um: fix returns without va_endGeyslan G. Bem1-4/+6
When using va_list ensure that va_start will be followed by va_end. Signed-off-by: Geyslan G. Bem <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2015-12-08arch: um: fix error when linking vmlinux.Lorenzo Colitti1-1/+1
On gcc Ubuntu 4.8.4-2ubuntu1~14.04, linking vmlinux fails with: arch/um/os-Linux/built-in.o: In function `os_timer_create': /android/kernel/android/arch/um/os-Linux/time.c:51: undefined reference to `timer_create' arch/um/os-Linux/built-in.o: In function `os_timer_set_interval': /android/kernel/android/arch/um/os-Linux/time.c:84: undefined reference to `timer_settime' arch/um/os-Linux/built-in.o: In function `os_timer_remain': /android/kernel/android/arch/um/os-Linux/time.c:109: undefined reference to `timer_gettime' arch/um/os-Linux/built-in.o: In function `os_timer_one_shot': /android/kernel/android/arch/um/os-Linux/time.c:132: undefined reference to `timer_settime' arch/um/os-Linux/built-in.o: In function `os_timer_disable': /android/kernel/android/arch/um/os-Linux/time.c:145: undefined reference to `timer_settime' This is because -lrt appears in the generated link commandline after arch/um/os-Linux/built-in.o. Fix this by removing -lrt from arch/um/Makefile and adding it to the UM-specific section of scripts/link-vmlinux.sh. Signed-off-by: Lorenzo Colitti <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2015-12-08um: Fix get_signal() usageRichard Weinberger1-1/+1
If get_signal() returns us a signal to post we must not call it again, otherwise the already posted signal will be overridden. Before commit a610d6e672d this was the case as we stopped the while after a successful handle_signal(). Cc: <[email protected]> # 3.10- Fixes: a610d6e672d ("pull clearing RESTORE_SIGMASK into block_sigmask()") Signed-off-by: Richard Weinberger <[email protected]>
2015-11-06um: Switch clocksource to hrtimersAnton Ivanov14-224/+255
UML is using an obsolete itimer call for all timers and "polls" for kernel space timer firing in its userspace portion resulting in a long list of bugs and incorrect behaviour(s). It also uses ITIMER_VIRTUAL for its timer which results in the timer being dependent on it running and the cpu load. This patch fixes this by moving to posix high resolution timers firing off CLOCK_MONOTONIC and relaying the timer correctly to the UML userspace. Fixes: - crashes when hosts suspends/resumes - broken userspace timers - effecive ~40Hz instead of what they should be. Note - this modifies skas behavior by no longer setting an itimer per clone(). Timer events are relayed instead. - kernel network packet scheduling disciplines - tcp behaviour especially under load - various timer related corner cases Finally, overall responsiveness of userspace is better. Signed-off-by: Thomas Meyer <[email protected]> Signed-off-by: Anton Ivanov <[email protected]> [rw: massaged commit message] Signed-off-by: Richard Weinberger <[email protected]>
2015-11-06um: net: replace GFP_KERNEL with GFP_ATOMIC when spinlock is heldSaurabh Sengar1-8/+9
since GFP_KERNEL with GFP_ATOMIC while spinlock is held, as code while holding a spinlock should be atomic. GFP_KERNEL may sleep and can cause deadlock, where as GFP_ATOMIC may fail but certainly avoids deadlockdex f70dd54..d898f6c 100644 Signed-off-by: Saurabh Sengar <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2015-11-06um: Report host OOM more nicelyRichard Weinberger1-1/+15
If UML runs on the host side out of memory, report this condition more nicely. Signed-off-by: Richard Weinberger <[email protected]>
2015-11-06um: Get rid of open coded NR_SYSCALLSRichard Weinberger1-5/+3
We can use __NR_syscall_max. Signed-off-by: Richard Weinberger <[email protected]>
2015-11-06um: Store syscall number after syscall_trace_enter()Richard Weinberger3-13/+11
To support changing syscall numbers we have to store it after syscall_trace_enter(). Signed-off-by: Richard Weinberger <[email protected]>
2015-11-06um: Define PTRACE_OLDSETOPTIONSRichard Weinberger1-0/+2
...such that processes within UML can do a ptrace(PTRACE_OLDSETOPTIONS, ...) Signed-off-by: Richard Weinberger <[email protected]>
2015-10-19um: Fix kernel mode fault conditionRichard Weinberger1-1/+1
We have to exclude memory locations <= PAGE_SIZE from the condition and let the kernel mode fault path catch it. Otherwise a kernel NULL pointer exception will be reported as a kernel user space access. Fixes: d2313084e2c (um: Catch unprotected user memory access) Signed-off-by: Richard Weinberger <[email protected]>
2015-10-19um: Fix waitpid() usage in helper codeRichard Weinberger1-3/+3
If UML is executing a helper program it is using waitpid() with the __WCLONE flag to wait for the program as the helper is executed from a clone()'ed thread. While using __WCLONE is perfectly fine for clone()'ed childs it won't detect terminated childs if the helper has issued an execve(). We have to use __WALL to wait for both clone()'ed and regular childs to detect the termination before and after an execve(). Reported-and-tested-by: Thomas Meyer <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2015-10-19um: Fix out-of-tree buildRichard Weinberger1-2/+2
Commit 30b11ee9a (um: Remove copy&paste code from init.h) uncovered an issue wrt. out-of-tree builds. For out-of-tree builds, we must not rely on relative paths. Before 30b11ee9a it worked by chance as no host code included generated header files. Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2015-10-04Merge branch 'strscpy' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull strscpy string copy function implementation from Chris Metcalf. Chris sent this during the merge window, but I waffled back and forth on the pull request, which is why it's going in only now. The new "strscpy()" function is definitely easier to use and more secure than either strncpy() or strlcpy(), both of which are horrible nasty interfaces that have serious and irredeemable problems. strncpy() has a useless return value, and doesn't NUL-terminate an overlong result. To make matters worse, it pads a short result with zeroes, which is a performance disaster if you have big buffers. strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking the insane NUL padding, but having a differently broken return value which returns the original length of the source string. Which means that it will read characters past the count from the source buffer, and you have to trust the source to be properly terminated. It also makes error handling fragile, since the test for overflow is unnecessarily subtle. strscpy() avoids both these problems, guaranteeing the NUL termination (but not excessive padding) if the destination size wasn't zero, and making the overflow condition very obvious by returning -E2BIG. It also doesn't read past the size of the source, and can thus be used for untrusted source data too. So why did I waffle about this for so long? Every time we introduce a new-and-improved interface, people start doing these interminable series of trivial conversion patches. And every time that happens, somebody does some silly mistake, and the conversion patch to the improved interface actually makes things worse. Because the patch is mindnumbing and trivial, nobody has the attention span to look at it carefully, and it's usually done over large swatches of source code which means that not every conversion gets tested. So I'm pulling the strscpy() support because it *is* a better interface. But I will refuse to pull mindless conversion patches. Use this in places where it makes sense, but don't do trivial patches to fix things that aren't actually known to be broken. * 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: use global strscpy() rather than private copy string: provide strscpy() Make asm/word-at-a-time.h available on all architectures
2015-09-01Merge branch 'timers-core-for-linus' of ↵Linus Torvalds1-24/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Rather large, but nothing exiting: - new range check for settimeofday() to prevent that boot time becomes negative. - fix for file time rounding - a few simplifications of the hrtimer code - fix for the proc/timerlist code so the output of clock realtime timers is accurate - more y2038 work - tree wide conversion of clockevent drivers to the new callbacks" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (88 commits) hrtimer: Handle failure of tick_init_highres() gracefully hrtimer: Unconfuse switch_hrtimer_base() a bit hrtimer: Simplify get_target_base() by returning current base hrtimer: Drop return code of hrtimer_switch_to_hres() time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64() time: Introduce current_kernel_time64() time: Introduce struct itimerspec64 time: Add the common weak version of update_persistent_clock() time: Always make sure wall_to_monotonic isn't positive time: Fix nanosecond file time rounding in timespec_trunc() timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers cris/time: Migrate to new 'set-state' interface kernel: broadcast-hrtimer: Migrate to new 'set-state' interface xtensa/time: Migrate to new 'set-state' interface unicore/time: Migrate to new 'set-state' interface um/time: Migrate to new 'set-state' interface sparc/time: Migrate to new 'set-state' interface sh/localtimer: Migrate to new 'set-state' interface score/time: Migrate to new 'set-state' interface s390/time: Migrate to new 'set-state' interface ...
2015-08-10um/time: Migrate to new 'set-state' interfaceViresh Kumar1-24/+20
Migrate um driver to the new 'set-state' interface provided by clockevents core, the earlier 'set-mode' interface is marked obsolete now. This also enables us to implement callbacks for new states of clockevent devices, for example: ONESHOT_STOPPED. Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Viresh Kumar <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-07-31Merge branch 'x86/urgent' into x86/asm, before applying dependent patchesIngo Molnar2-15/+1
Signed-off-by: Ingo Molnar <[email protected]>
2015-07-17mm: clean up per architecture MM hook header filesLaurent Dufour2-15/+1
Commit 2ae416b142b6 ("mm: new mm hook framework") introduced an empty header file (mm-arch-hooks.h) for every architecture, even those which doesn't need to define mm hooks. As suggested by Geert Uytterhoeven, this could be cleaned through the use of a generic header file included via each per architecture asm/include/Kbuild file. The PowerPC architecture is not impacted here since this architecture has to defined the arch_remap MM hook. Signed-off-by: Laurent Dufour <[email protected]> Suggested-by: Geert Uytterhoeven <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Vineet Gupta <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>