aboutsummaryrefslogtreecommitdiff
path: root/arch/alpha
AgeCommit message (Collapse)AuthorFilesLines
2019-04-03locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archsWaiman Long1-7/+0
Currently, we have two different implementation of rwsem: 1) CONFIG_RWSEM_GENERIC_SPINLOCK (rwsem-spinlock.c) 2) CONFIG_RWSEM_XCHGADD_ALGORITHM (rwsem-xadd.c) As we are going to use a single generic implementation for rwsem-xadd.c and no architecture-specific code will be needed, there is no point in keeping two different implementations of rwsem. In most cases, the performance of rwsem-spinlock.c will be worse. It also doesn't get all the performance tuning and optimizations that had been implemented in rwsem-xadd.c over the years. For simplication, we are going to remove rwsem-spinlock.c and make all architectures use a single implementation of rwsem - rwsem-xadd.c. All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM in the code are removed. Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-03locking/rwsem: Remove arch specific rwsem filesWaiman Long1-211/+0
As the generic rwsem-xadd code is using the appropriate acquire and release versions of the atomic operations, the arch specific rwsem.h files will not be that much faster than the generic code as long as the atomic functions are properly implemented. So we can remove those arch specific rwsem.h and stop building asm/rwsem.h to reduce maintenance effort. Currently, only x86, alpha and ia64 have implemented architecture specific fast paths. I don't have access to alpha and ia64 systems for testing, but they are legacy systems that are not likely to be updated to the latest kernel anyway. By using a rwsem microbenchmark, the total locking rates on a 4-socket 56-core 112-thread x86-64 system before and after the patch were as follows (mixed means equal # of read and write locks): Before Patch After Patch # of Threads wlock rlock mixed wlock rlock mixed ------------ ----- ----- ----- ----- ----- ----- 1 29,201 30,143 29,458 28,615 30,172 29,201 2 6,807 13,299 1,171 7,725 15,025 1,804 4 6,504 12,755 1,520 7,127 14,286 1,345 8 6,762 13,412 764 6,826 13,652 726 16 6,693 15,408 662 6,599 15,938 626 32 6,145 15,286 496 5,549 15,487 511 64 5,812 15,495 60 5,858 15,572 60 There were some run-to-run variations for the multi-thread tests. For x86-64, using the generic C code fast path seems to be a little bit faster than the assembly version with low lock contention. Looking at the assembly version of the fast paths, there are assembly to/from C code wrappers that save and restore all the callee-clobbered registers (7 registers on x86-64). The assembly generated from the generic C code doesn't need to do that. That may explain the slight performance gain here. The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h with no code change as no other code other than those under kernel/locking needs to access the internal rwsem macros and functions. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-03arch/tlb: Clean up simple architecturesPeter Zijlstra2-6/+1
For the architectures that do not implement their own tlb_flush() but do already use the generic mmu_gather, there are two options: 1) the platform has an efficient flush_tlb_range() and asm-generic/tlb.h doesn't need any overrides at all. 2) the platform lacks an efficient flush_tlb_range() and we select MMU_GATHER_NO_RANGE to minimize full invalidates. Convert all 'simple' architectures to one of these two forms. alpha: has no range invalidate -> 2 arc: already used flush_tlb_range() -> 1 c6x: has no range invalidate -> 2 hexagon: has an efficient flush_tlb_range() -> 1 (flush_tlb_mm() is in fact a full range invalidate, so no need to shoot down everything) m68k: has inefficient flush_tlb_range() -> 2 microblaze: has no flush_tlb_range() -> 2 mips: has efficient flush_tlb_range() -> 1 (even though it currently seems to use flush_tlb_mm()) nds32: already uses flush_tlb_range() -> 1 nios2: has inefficient flush_tlb_range() -> 2 (no limit on range iteration) openrisc: has inefficient flush_tlb_range() -> 2 (no limit on range iteration) parisc: already uses flush_tlb_range() -> 1 sparc32: already uses flush_tlb_range() -> 1 unicore32: has inefficient flush_tlb_range() -> 2 (no limit on range iteration) xtensa: has efficient flush_tlb_range() -> 1 Note this also fixes a bug in the existing code for a number platforms. Those platforms that did: tlb_end_vma() -> if (!full_mm) flush_tlb_*() tlb_flush -> if (full_mm) flush_tlb_mm() missed the case of shift_arg_pages(), which doesn't have @fullmm set, nor calls into tlb_*vma(), but still frees page-tables and thus needs an invalidate. The new code handles this by detecting a non-empty range, and either issuing the matching range invalidate or a full invalidate, depending on the capabilities. No change in behavior intended. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David S. Miller <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Helge Deller <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salter <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2019-03-28KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supportedMasahiro Yamada2-2/+1
I do not see any consistency about headers_install of <linux/kvm_para.h> and <asm/kvm_para.h>. According to my analysis of Linux 5.1-rc1, there are 3 groups: [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported alpha, arm, hexagon, mips, powerpc, s390, sparc, x86 [2] <asm/kvm_para.h> is exported, but <linux/kvm_para.h> is not arc, arm64, c6x, h8300, ia64, m68k, microblaze, nios2, openrisc, parisc, sh, unicore32, xtensa [3] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported csky, nds32, riscv This does not match to the actual KVM support. At least, [2] is half-baked. Nor do arch maintainers look like they care about this. For example, commit 0add53713b1c ("microblaze: Add missing kvm_para.h to Kbuild") exported <asm/kvm_para.h> to user-space in order to fix an in-kernel build error. We have two ways to make this consistent: [A] export both <linux/kvm_para.h> and <asm/kvm_para.h> for all architectures, irrespective of the KVM support [B] Match the header export of <linux/kvm_para.h> and <asm/kvm_para.h> to the KVM support My first attempt was [A] because the code looks cleaner, but Paolo suggested [B]. So, this commit goes with [B]. For most architectures, <asm/kvm_para.h> was moved to the kernel-space. I changed include/uapi/linux/Kbuild so that it checks generated asm/kvm_para.h as well as check-in ones. After this commit, there will be two groups: [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported arm, arm64, mips, powerpc, s390, x86 [2] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported alpha, arc, c6x, csky, h8300, hexagon, ia64, m68k, microblaze, nds32, nios2, openrisc, parisc, riscv, sh, sparc, unicore32, xtensa Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Cornelia Huck <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-03-20syscall_get_arch: add "struct task_struct *" argumentDmitry V. Levin1-1/+1
This argument is required to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request: syscall_get_arch() is going to be called from ptrace_request() along with syscall_get_nr(), syscall_get_arguments(), syscall_get_error(), and syscall_get_return_value() functions with a tracee as their argument. The primary intent is that the triple (audit_arch, syscall_nr, arg1..arg6) should describe what system call is being called and what its arguments are. Reverts: 5e937a9ae913 ("syscall_get_arch: remove useless function arguments") Reverts: 1002d94d3076 ("syscall.h: fix doc text for syscall_get_arch()") Reviewed-by: Andy Lutomirski <[email protected]> # for x86 Reviewed-by: Palmer Dabbelt <[email protected]> Acked-by: Paul Moore <[email protected]> Acked-by: Paul Burton <[email protected]> # MIPS parts Acked-by: Michael Ellerman <[email protected]> (powerpc) Acked-by: Kees Cook <[email protected]> # seccomp parts Acked-by: Mark Salter <[email protected]> # for the c6x bit Cc: Elvira Khabirova <[email protected]> Cc: Eugene Syromyatnikov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Dmitry V. Levin <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2019-03-17Merge tag 'kbuild-v5.1-2' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - add more Build-Depends to Debian source package - prefix header search paths with $(srctree)/ - make modpost show verbose section mismatch warnings - avoid hard-coded CROSS_COMPILE for h8300 - fix regression for Debian make-kpkg command - add semantic patch to detect missing put_device() - fix some warnings of 'make deb-pkg' - optimize NOSTDINC_FLAGS evaluation - add warnings about redundant generic-y - clean up Makefiles and scripts * tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: remove stale lxdialog/.gitignore kbuild: force all architectures except um to include mandatory-y kbuild: warn redundant generic-y Revert "modsign: Abort modules_install when signing fails" kbuild: Make NOSTDINC_FLAGS a simply expanded variable kbuild: deb-pkg: avoid implicit effects coccinelle: semantic code search for missing put_device() kbuild: pkg: grep include/config/auto.conf instead of $KCONFIG_CONFIG kbuild: deb-pkg: introduce is_enabled and if_enabled_echo to builddeb kbuild: deb-pkg: add CONFIG_ prefix to kernel config options kbuild: add workaround for Debian make-kpkg kbuild: source include/config/auto.conf instead of ${KCONFIG_CONFIG} unicore32: simplify linker script generation for decompressor h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- kbuild: move archive command to scripts/Makefile.lib modpost: always show verbose warning for section mismatch ia64: prefix header search path with $(srctree)/ libfdt: prefix header search paths with $(srctree)/ deb-pkg: generate correct build dependencies
2019-03-17kbuild: force all architectures except um to include mandatory-yMasahiro Yamada1-2/+0
Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes the common Kbuild.asm file. Factor out the duplicated include directives to scripts/Makefile.asm-generic so that no architecture would opt out of the mandatory-y mechanism. um is not forced to include mandatory-y since it is a very exceptional case which does not support UAPI. Signed-off-by: Masahiro Yamada <[email protected]>
2019-03-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+1
Pull networking fixes from David Miller: "More fixes in the queue: 1) Netfilter nat can erroneously register the device notifier twice, fix from Florian Westphal. 2) Use after free in nf_tables, from Pablo Neira Ayuso. 3) Parallel update of steering rule fix in mlx5 river, from Eli Britstein. 4) RX processing panic in lan743x, fix from Bryan Whitehead. 5) Use before initialization of TCP_SKB_CB, fix from Christoph Paasch. 6) Fix locking in SRIOV mode of mlx4 driver, from Jack Morgenstein. 7) Fix TX stalls in lan743x due to mishandling of interrupt ACKing modes, from Bryan Whitehead. 8) Fix infoleak in l2tp_ip6_recvmsg(), from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) pptp: dst_release sk_dst_cache in pptp_sock_destruct MAINTAINERS: GENET & SYSTEMPORT: Add internal Broadcom list l2tp: fix infoleak in l2tp_ip6_recvmsg() net/tls: Inform user space about send buffer availability net_sched: return correct value for *notify* functions lan743x: Fix TX Stall Issue net/mlx4_core: Fix qp mtt size calculation net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling net/mlx4_core: Fix reset flow when in command polling mode mlxsw: minimal: Initialize base_mac mlxsw: core: Prevent duplication during QSFP module initialization net: dwmac-sun8i: fix a missing check of of_get_phy_mode net: sh_eth: fix a missing check of of_get_phy_mode net: 8390: fix potential NULL pointer dereferences net: fujitsu: fix a potential NULL pointer dereference net: qlogic: fix a potential NULL pointer dereference isdn: hfcpci: fix potential NULL pointer dereference Documentation: devicetree: add a new optional property for port mac address net: rocker: fix a potential NULL pointer dereference net: qlge: fix a potential NULL pointer dereference ...
2019-03-12treewide: add checks for the return value of memblock_alloc*()Mike Rapoport5-3/+42
Add check for the return value of memblock_alloc*() functions and call panic() in case of error. The panic message repeats the one used by panicing memblock allocators with adjustment of parameters to include only relevant ones. The replacement was mostly automated with semantic patches like the one below with manual massaging of format strings. @@ expression ptr, size, align; @@ ptr = memblock_alloc(size, align); + if (!ptr) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align); [[email protected]: use '%pa' with 'phys_addr_t' type] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix format strings for panics after memblock_alloc] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: don't panic if the allocation in sparse_buffer_init fails] Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx [[email protected]: fix xtensa printk warning] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Anders Roxell <[email protected]> Reviewed-by: Guo Ren <[email protected]> [c-sky] Acked-by: Paul Burton <[email protected]> [MIPS] Acked-by: Heiko Carstens <[email protected]> [s390] Reviewed-by: Juergen Gross <[email protected]> [Xen] Reviewed-by: Geert Uytterhoeven <[email protected]> [m68k] Acked-by: Max Filippov <[email protected]> [xtensa] Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Rob Herring <[email protected]> Cc: Rob Herring <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-12arch: use memblock_alloc() instead of memblock_alloc_from(size, align, 0)Mike Rapoport3-4/+4
The last parameter of memblock_alloc_from() is the lower limit for the memory allocation. When it is 0, the call is equivalent to memblock_alloc(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Paul Burton <[email protected]> # MIPS part Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> [c-sky] Cc: Heiko Carstens <[email protected]> Cc: Juergen Gross <[email protected]> [Xen] Cc: Mark Salter <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Rob Herring <[email protected]> Cc: Rob Herring <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-11y2038: fix socket.h header inclusionArnd Bergmann1-1/+1
Referencing the __kernel_long_t type caused some user space applications to stop compiling when they had not already included linux/posix_types.h, e.g. s/multicast.c -o ext/sockets/multicast.lo In file included from /builddir/build/BUILD/php-7.3.3/main/php.h:468, from /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:27: /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c: In function 'zm_startup_sockets': /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:776:40: error: '__kernel_long_t' undeclared (first use in this function) 776 | REGISTER_LONG_CONSTANT("SO_SNDTIMEO", SO_SNDTIMEO, CONST_CS | CONST_PERSISTENT); It is safe to include that header here, since it only contains kernel internal types that do not conflict with other user space types. It's still possible that some related build failures remain, but those are likely to be for code that is not already y2038 safe. Reported-by: Laura Abbott <[email protected]> Fixes: a9beb86ae6e5 ("sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+2
Merge misc updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <[email protected]>: (159 commits) tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include proc: more robust bulk read test proc: test /proc/*/maps, smaps, smaps_rollup, statm proc: use seq_puts() everywhere proc: read kernel cpu stat pointer once proc: remove unused argument in proc_pid_lookup() fs/proc/thread_self.c: code cleanup for proc_setup_thread_self() fs/proc/self.c: code cleanup for proc_setup_self() proc: return exit code 4 for skipped tests mm,mremap: bail out earlier in mremap_to under map pressure mm/sparse: fix a bad comparison mm/memory.c: do_fault: avoid usage of stale vm_area_struct writeback: fix inode cgroup switching comment mm/huge_memory.c: fix "orig_pud" set but not used mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC mm/memcontrol.c: fix bad line in comment mm/cma.c: cma_declare_contiguous: correct err handling mm/page_ext.c: fix an imbalance with kmemleak mm/compaction: pass pgdat to too_many_isolated() instead of zone mm: remove zone_lru_lock() function, access ->lru_lock directly ...
2019-03-06Merge tag 'asm-generic-5.1' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "Only a few small changes this time: - Michael S. Tsirkin cleans up linux/mman.h - Mike Rapoport found a typo I had originally merged another cleanup series for I/O accessors from Hugo Lefeuvre as well, but dropped it after the discussion of the barrier semantics and some conflicts. I expect this series to get merged for a later release though" * tag 'asm-generic-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic/page.h: fix typo in #error text requiring a real asm/page.h arch: move common mmap flags to linux/mman.h drm: tweak header name x86/mpx: tweak header name
2019-03-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds1-6/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Lots of tooling updates - too many to list, here's a few highlights: - Various subcommand updates to 'perf trace', 'perf report', 'perf record', 'perf annotate', 'perf script', 'perf test', etc. - CPU and NUMA topology and affinity handling improvements, - HW tracing and HW support updates: - Intel PT updates - ARM CoreSight updates - vendor HW event updates - BPF updates - Tons of infrastructure updates, both on the build system and the library support side - Documentation updates. - ... and lots of other changes, see the changelog for details. Kernel side updates: - Tighten up kprobes blacklist handling, reduce the number of places where developers can install a kprobe and hang/crash the system. - Fix/enhance vma address filter handling. - Various PMU driver updates, small fixes and additions. - refcount_t conversions - BPF updates - error code propagation enhancements - misc other changes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits) perf script python: Add Python3 support to syscall-counts-by-pid.py perf script python: Add Python3 support to syscall-counts.py perf script python: Add Python3 support to stat-cpi.py perf script python: Add Python3 support to stackcollapse.py perf script python: Add Python3 support to sctop.py perf script python: Add Python3 support to powerpc-hcalls.py perf script python: Add Python3 support to net_dropmonitor.py perf script python: Add Python3 support to mem-phys-addr.py perf script python: Add Python3 support to failed-syscalls-by-pid.py perf script python: Add Python3 support to netdev-times.py perf tools: Add perf_exe() helper to find perf binary perf script: Handle missing fields with -F +.. perf data: Add perf_data__open_dir_data function perf data: Add perf_data__(create_dir|close_dir) functions perf data: Fail check_backup in case of error perf data: Make check_backup work over directories perf tools: Add rm_rf_perf_data function perf tools: Add pattern name checking to rm_rf perf tools: Add depth checking to rm_rf perf data: Add global path holder ...
2019-03-05mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual1-1/+2
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Jeff Kirsher <[email protected]> [ixgbe] Acked-by: Jens Axboe <[email protected]> [mtip32xx] Acked-by: Vinod Koul <[email protected]> [dmaengine.c] Acked-by: Michael Ellerman <[email protected]> [powerpc] Acked-by: Doug Ledford <[email protected]> [drivers/infiniband] Cc: Joseph Qi <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-05Merge branch 'timers-2038-for-linus' of ↵Linus Torvalds4-28/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull year 2038 updates from Thomas Gleixner: "Another round of changes to make the kernel ready for 2038. After lots of preparatory work this is the first set of syscalls which are 2038 safe: 403 clock_gettime64 404 clock_settime64 405 clock_adjtime64 406 clock_getres_time64 407 clock_nanosleep_time64 408 timer_gettime64 409 timer_settime64 410 timerfd_gettime64 411 timerfd_settime64 412 utimensat_time64 413 pselect6_time64 414 ppoll_time64 416 io_pgetevents_time64 417 recvmmsg_time64 418 mq_timedsend_time64 419 mq_timedreceiv_time64 420 semtimedop_time64 421 rt_sigtimedwait_time64 422 futex_time64 423 sched_rr_get_interval_time64 The syscall numbers are identical all over the architectures" * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) riscv: Use latest system call ABI checksyscalls: fix up mq_timedreceive and stat exceptions unicore32: Fix __ARCH_WANT_STAT64 definition asm-generic: Make time32 syscall numbers optional asm-generic: Drop getrlimit and setrlimit syscalls from default list 32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option compat ABI: use non-compat openat and open_by_handle_at variants y2038: add 64-bit time_t syscalls to all 32-bit architectures y2038: rename old time and utime syscalls y2038: remove struct definition redirects y2038: use time32 syscall names on 32-bit syscalls: remove obsolete __IGNORE_ macros y2038: syscalls: rename y2038 compat syscalls x86/x32: use time64 versions of sigtimedwait and recvmmsg timex: change syscalls to use struct __kernel_timex timex: use __kernel_timex internally sparc64: add custom adjtimex/clock_adjtime functions time: fix sys_timer_settime prototype time: Add struct __kernel_timex time: make adjtime compat handling available for 32 bit ...
2019-03-05a.out: remove core dumping supportLinus Torvalds1-81/+0
We're (finally) phasing out a.out support for good. As Borislav Petkov points out, we've supported ELF binaries for about 25 years by now, and coredumping in particular has bitrotted over the years. None of the tool chains even support generating a.out binaries any more, and the plan is to deprecate a.out support entirely for the kernel. But I want to start with just removing the core dumping code, because I can still imagine that somebody actually might want to support a.out as a simpler biinary format. Particularly if you generate some random binaries on the fly, ELF is a much more complicated format (admittedly ELF also does have a lot of toolchain support, mitigating that complexity a lot and you really should have moved over in the last 25 years). So it's at least somewhat possible that somebody out there has some workflow that still involves generating and running a.out executables. In contrast, it's very unlikely that anybody depends on debugging any legacy a.out core files. But regardless, I want this phase-out to be done in two steps, so that we can resurrect a.out support (if needed) without having to resurrect the core file dumping that is almost certainly not needed. Jann Horn pointed to the <asm/a.out-core.h> file that my first trivial cut at this had missed. And Alan Cox points out that the a.out binary loader _could_ be done in user space if somebody wants to, but we might keep just the loader in the kernel if somebody really wants it, since the loader isn't that big and has no really odd special cases like the core dumping does. Acked-by: Borislav Petkov <[email protected]> Cc: Alan Cox <[email protected]> Cc: Jann Horn <[email protected]> Cc: Richard Weinberger <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-9/+40
Pull networking updates from David Miller: "Here we go, another merge window full of networking and #ebpf changes: 1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP range without dealing with floods of ARP traffic, from Linus Lüssing. 2) Throttle buffered multicast packet transmission in mt76, from Felix Fietkau. 3) Support adaptive interrupt moderation in ice, from Brett Creeley. 4) A lot of struct_size conversions, from Gustavo A. R. Silva. 5) Add peek/push/pop commands to bpftool, as well as bash completion, from Stanislav Fomichev. 6) Optimize sk_msg_clone(), from Vakul Garg. 7) Add SO_BINDTOIFINDEX, from David Herrmann. 8) Be more conservative with local resends due to local congestion, from Yuchung Cheng. 9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata. 10) Add health buffer support to devlink, from Eran Ben Elisha. 11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen. 12) Add statistics to basic packet scheduler filter, from Cong Wang. 13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan. 14) Lots of new IP tunneling forwarding tests, also from Nir Dotan. 15) Add 3ad stats to bonding, from Nikolay Aleksandrov. 16) Lots of probing improvements for bpftool, from Quentin Monnet. 17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski. 18) Allow #ebpf programs to access gso_segs from skb shared info, from Eric Dumazet. 19) Add sock_diag support for AF_XDP sockets, from Björn Töpel. 20) Support 22260 iwlwifi devices, from Luca Coelho. 21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov. 22) Add JMP32 instruction class support to #ebpf, from Jiong Wang. 23) Add spinlock support to #ebpf, from Alexei Starovoitov. 24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson. 25) Add device infomation API to devlink, from Jakub Kicinski. 26) Add new timestamping socket options which are y2038 safe, from Deepa Dinamani. 27) Add RX checksum offloading for various sh_eth chips, from Sergei Shtylyov. 28) Flow offload infrastructure, from Pablo Neira Ayuso. 29) Numerous cleanups, improvements, and bug fixes to the PHY layer and many drivers from Heiner Kallweit. 30) Lots of changes to try and make packet scheduler classifiers run lockless as much as possible, from Vlad Buslov. 31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows. 32) Add concurrency tests to tc-tests infrastructure, from Vlad Buslov. 33) Add hwmon support to aquantia, from Heiner Kallweit. 34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet. And I would be remiss if I didn't thank the various major networking subsystem maintainers for integrating much of this work before I even saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso, Johannes Berg, Kalle Valo, and many others. Thank you!" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits) net/sched: avoid unused-label warning net: ignore sysctl_devconf_inherit_init_net without SYSCTL phy: mdio-mux: fix Kconfig dependencies net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework selftest/net: Remove duplicate header sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79 net/mlx5e: Update tx reporter status in case channels were successfully opened devlink: Add support for direct reporter health state update devlink: Update reporter state to error even if recover aborted sctp: call iov_iter_revert() after sending ABORT team: Free BPF filter when unregistering netdev ip6mr: Do not call __IP6_INC_STATS() from preemptible context isdn: mISDN: Fix potential NULL pointer dereference of kzalloc net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs cxgb4/chtls: Prefix adapter flags with CXGB4 net-sysfs: Switch to bitmap_zalloc() mellanox: Switch to bitmap_zalloc() bpf: add test cases for non-pointer sanitiation logic mlxsw: i2c: Extend initialization by querying resources data ...
2019-03-04get rid of legacy 'get_ds()' functionLinus Torvalds1-1/+0
Every in-kernel use of this function defined it to KERNEL_DS (either as an actual define, or as an inline function). It's an entirely historical artifact, and long long long ago used to actually read the segment selector valueof '%ds' on x86. Which in the kernel is always KERNEL_DS. Inspired by a patch from Jann Horn that just did this for a very small subset of users (the ones in fs/), along with Al who suggested a script. I then just took it to the logical extreme and removed all the remaining gunk. Roughly scripted with git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/' git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d' plus manual fixups to remove a few unusual usage patterns, the couple of inline function cases and to fix up a comment that had become stale. The 'get_ds()' function remains in an x86 kvm selftest, since in user space it actually does something relevant. Inspired-by: Jann Horn <[email protected]> Inspired-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-28Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar2-4/+4
Signed-off-by: Ingo Molnar <[email protected]>
2019-02-18arch: move common mmap flags to linux/mman.hMichael S. Tsirkin1-3/+1
Now that we have 3 mmap flags shared by all architectures, let's move them into the common header. This will help discourage future architectures from duplicating code. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2019-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-4/+4
The netfilter conflicts were rather simple overlapping changes. However, the cls_tcindex.c stuff was a bit more complex. On the 'net' side, Cong is fixing several races and memory leaks. Whilst on the 'net-next' side we have Vlad adding the rtnl-ness support. What I've decided to do, in order to resolve this, is revert the conversion over to using a workqueue that Cong did, bringing us back to pure RCU. I did it this way because I believe that either Cong's races don't apply with have Vlad did things, or Cong will have to implement the race fix slightly differently. Signed-off-by: David S. Miller <[email protected]>
2019-02-10alpha: fix page fault handling for r16-r18 targetsSergei Trofimovich1-1/+1
Fix page fault handling code to fixup r16-r18 registers. Before the patch code had off-by-two registers bug. This bug caused overwriting of ps,pc,gp registers instead of fixing intended r16,r17,r18 (see `struct pt_regs`). More details: Initially Dmitry noticed a kernel bug as a failure on strace test suite. Test passes unmapped userspace pointer to io_submit: ```c #include <err.h> #include <unistd.h> #include <sys/mman.h> #include <asm/unistd.h> int main(void) { unsigned long ctx = 0; if (syscall(__NR_io_setup, 1, &ctx)) err(1, "io_setup"); const size_t page_size = sysconf(_SC_PAGESIZE); const size_t size = page_size * 2; void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (MAP_FAILED == ptr) err(1, "mmap(%zu)", size); if (munmap(ptr, size)) err(1, "munmap"); syscall(__NR_io_submit, ctx, 1, ptr + page_size); syscall(__NR_io_destroy, ctx); return 0; } ``` Running this test causes kernel to crash when handling page fault: ``` Unable to handle kernel paging request at virtual address ffffffffffff9468 CPU 3 aio(26027): Oops 0 pc = [<fffffc00004eddf8>] ra = [<fffffc00004edd5c>] ps = 0000 Not tainted pc is at sys_io_submit+0x108/0x200 ra is at sys_io_submit+0x6c/0x200 v0 = fffffc00c58e6300 t0 = fffffffffffffff2 t1 = 000002000025e000 t2 = fffffc01f159fef8 t3 = fffffc0001009640 t4 = fffffc0000e0f6e0 t5 = 0000020001002e9e t6 = 4c41564e49452031 t7 = fffffc01f159c000 s0 = 0000000000000002 s1 = 000002000025e000 s2 = 0000000000000000 s3 = 0000000000000000 s4 = 0000000000000000 s5 = fffffffffffffff2 s6 = fffffc00c58e6300 a0 = fffffc00c58e6300 a1 = 0000000000000000 a2 = 000002000025e000 a3 = 00000200001ac260 a4 = 00000200001ac1e8 a5 = 0000000000000001 t8 = 0000000000000008 t9 = 000000011f8bce30 t10= 00000200001ac440 t11= 0000000000000000 pv = fffffc00006fd320 at = 0000000000000000 gp = 0000000000000000 sp = 00000000265fd174 Disabling lock debugging due to kernel taint Trace: [<fffffc0000311404>] entSys+0xa4/0xc0 ``` Here `gp` has invalid value. `gp is s overwritten by a fixup for the following page fault handler in `io_submit` syscall handler: ``` __se_sys_io_submit ... ldq a1,0(t1) bne t0,4280 <__se_sys_io_submit+0x180> ``` After a page fault `t0` should contain -EFALUT and `a1` is 0. Instead `gp` was overwritten in place of `a1`. This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18` (aka `a0-a2`). I think the bug went unnoticed for a long time as `gp` is one of scratch registers. Any kernel function call would re-calculate `gp`. Dmitry tracked down the bug origin back to 2.1.32 kernel version where trap_a{0,1,2} fields were inserted into struct pt_regs. And even before that `dpf_reg()` contained off-by-one error. Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: [email protected] Cc: [email protected] Reported-and-reviewed-by: "Dmitry V. Levin" <[email protected]> Cc: [email protected] # v2.1.32+ Bug: https://bugs.gentoo.org/672040 Signed-off-by: Sergei Trofimovich <[email protected]> Signed-off-by: Matt Turner <[email protected]>
2019-02-10alpha: Fix Eiger NR_IRQS to 128Meelis Roos1-3/+3
Eiger machine vector definition has nr_irqs 128, and working 2.6.26 boot shows SCSI getting IRQ-s 64 and 65. Current kernel boot fails because Symbios SCSI fails to request IRQ-s and does not find the disks. It has been broken at least since 3.18 - the earliest I could test with my gcc-5. The headers have moved around and possibly another order of defines has worked in the past - but since 128 seems to be correct and used, fix arch/alpha/include/asm/irq.h to have NR_IRQS=128 for Eiger. This fixes 4.19-rc7 boot on my Force Flexor A264 (Eiger subarch). Cc: [email protected] # v3.18+ Signed-off-by: Meelis Roos <[email protected]> Signed-off-by: Matt Turner <[email protected]>
2019-02-07y2038: add 64-bit time_t syscalls to all 32-bit architecturesArnd Bergmann1-0/+2
This adds 21 new system calls on each ABI that has 32-bit time_t today. All of these have the exact same semantics as their existing counterparts, and the new ones all have macro names that end in 'time64' for clarification. This gets us to the point of being able to safely use a C library that has 64-bit time_t in user space. There are still a couple of loose ends to tie up in various areas of the code, but this is the big one, and should be entirely uncontroversial at this point. In particular, there are four system calls (getitimer, setitimer, waitid, and getrusage) that don't have a 64-bit counterpart yet, but these can all be safely implemented in the C library by wrapping around the existing system calls because the 32-bit time_t they pass only counts elapsed time, not time since the epoch. They will be dealt with later. Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Heiko Carstens <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Catalin Marinas <[email protected]>
2019-02-07timex: use __kernel_timex internallyDeepa Dinamani1-2/+3
struct timex is not y2038 safe. Replace all uses of timex with y2038 safe __kernel_timex. Note that struct __kernel_timex is an ABI interface definition. We could define a new structure based on __kernel_timex that is only available internally instead. Right now, there isn't a strong motivation for this as the structure is isolated to a few defined struct timex interfaces and such a structure would be exactly the same as struct timex. The patch was generated by the following coccinelle script: virtual patch @depends on patch forall@ identifier ts; expression e; @@ ( - struct timex ts; + struct __kernel_timex ts; | - struct timex ts = {}; + struct __kernel_timex ts = {}; | - struct timex ts = e; + struct __kernel_timex ts = e; | - struct timex *ts; + struct __kernel_timex *ts; | (memset \| copy_from_user \| copy_to_user \)(..., - sizeof(struct timex)) + sizeof(struct __kernel_timex)) ) @depends on patch forall@ identifier ts; identifier fn; @@ fn(..., - struct timex *ts, + struct __kernel_timex *ts, ...) { ... } @depends on patch forall@ identifier ts; identifier fn; @@ fn(..., - struct timex *ts) { + struct __kernel_timex *ts) { ... } Signed-off-by: Deepa Dinamani <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2019-02-03sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEWDeepa Dinamani1-3/+9
Add new socket timeout options that are y2038 safe. Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2019-02-03socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixesDeepa Dinamani1-2/+5
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval as the time format. struct timeval is not y2038 safe. The subsequent patches in the series add support for new socket timeout options with _NEW suffix that will use y2038 safe data structures. Although the existing struct timeval layout is sufficiently wide to represent timeouts, because of the way libc will interpret time_t based on user defined flag, these new flags provide a way of having a structure that is the same for all architectures consistently. Rename the existing options with _OLD suffix forms so that the right option is enabled for userspace applications according to the architecture and time_t definition of libc. Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2019-02-03socket: Add SO_TIMESTAMPING_NEWDeepa Dinamani1-2/+3
Add SO_TIMESTAMPING_NEW variant of socket timestamp options. This is the y2038 safe versions of the SO_TIMESTAMPING_OLD for all architectures. Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2019-02-03socket: Add SO_TIMESTAMP[NS]_NEWDeepa Dinamani1-2/+12
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of socket timestamp options. These are the y2038 safe versions of the SO_TIMESTAMP_OLD and SO_TIMESTAMPNS_OLD for all architectures. Note that the format of scm_timestamping.ts[0] is not changed in this patch. Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2019-02-03sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLDDeepa Dinamani1-7/+16
SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the way they are currently defined, are not y2038 safe. Subsequent patches in the series add new y2038 safe versions of these options which provide 64 bit timestamps on all architectures uniformly. Hence, rename existing options with OLD tag suffixes. Also note that kernel will not use the untagged SO_TIMESTAMP* and SCM_TIMESTAMP* options internally anymore. Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2019-01-25alpha: add generic get{eg,eu,g,p,u,pp}id() syscallsArnd Bergmann3-11/+8
Alpha has traditionally followed the OSF1 calling conventions here, with its getxpid, getxuid, getxgid system calls returning two different values in separate registers. Following what glibc has done here, we can define getpid, getuid and getgid to be aliases for getxpid, getxuid and getxgid respectively, and add new system call numbers for getppid, geteuid and getegid. Signed-off-by: Arnd Bergmann <[email protected]>
2019-01-25alpha: add standard statfs64/fstatfs64 syscallsArnd Bergmann1-0/+2
As Joseph Myers points out, alpha has never had a standard statfs64 interface and instead returns only 32-bit numbers here. While there is an old osf_statfs64 system call that returns additional data, this has some other quirks and does not get used in glibc. I considered making the stat64 structure layout compatible with with the one used by the kernel on most other 64 bit architecture that implement it (ia64, parisc, powerpc, and sparc), but in the end decided to stay with the one that was traditionally defined in the alpha headers but not used, since this is also what glibc exposes to user space. Signed-off-by: Arnd Bergmann <[email protected]>
2019-01-25arch: add pkey and rseq syscall numbers everywhereArnd Bergmann2-4/+4
Most architectures define system call numbers for the rseq and pkey system calls, even when they don't support the features, and perhaps never will. Only a few architectures are missing these, so just define them anyway for consistency. If we decide to add them later to one of these, the system call numbers won't get out of sync then. Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Heiko Carstens <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]>
2019-01-25ipc: rename old-style shmctl/semctl/msgctl syscallsArnd Bergmann1-3/+3
The behavior of these system calls is slightly different between architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION symbol. Most architectures that implement the split IPC syscalls don't set that symbol and only get the modern version, but alpha, arm, microblaze, mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag. For the architectures that so far only implement sys_ipc(), i.e. m68k, mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior when adding the split syscalls, so we need to distinguish between the two groups of architectures. The method I picked for this distinction is to have a separate system call entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl() does not. The system call tables of the five architectures are changed accordingly. As an additional benefit, we no longer need the configuration specific definition for ipc_parse_version(), it always does the same thing now, but simply won't get called on architectures with the modern interface. A small downside is that on architectures that do set ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points that are never called. They only add a few bytes of bloat, so it seems better to keep them compared to adding yet another Kconfig symbol. I considered adding new syscall numbers for the IPC_64 variants for consistency, but decided against that for now. Signed-off-by: Arnd Bergmann <[email protected]>
2019-01-25alpha: update syscall macro definitionsArnd Bergmann3-8/+7
Other architectures commonly use __NR_umount2 for sys_umount, only ia64 and alpha use __NR_umount here. In order to synchronize the generated tables, use umount2 like everyone else, and add back the old name from asm/unistd.h for compatibility. For shmat, alpha uses the osf_shmat name, we can do the same thing here, which means we don't have to add an entry in the __IGNORE list now that shmat is mandatory everywhere alarm, creat, pause, time, and utime are optional everywhere these days, no need to list them here any more. I considered also adding the regular versions of the get*id system calls that have different names and calling conventions on alpha, which would further help unify the syscall ABI, but for now I decided against that. Signed-off-by: Arnd Bergmann <[email protected]>
2019-01-25alpha: wire up io_pgetevents system callArnd Bergmann1-0/+1
The io_pgetevents system call was added in linux-4.18 but has no entry for alpha: warning: #warning syscall io_pgetevents not implemented [-Wcpp] Assign a the next system call number here. Cc: [email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2019-01-21perf/core, arch/alpha: Strengthen exclusion checks with PERF_PMU_CAP_NO_EXCLUDEAndrew Murray1-6/+1
As the Alpha PMU doesn't support context exclusion let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. This change means that __hw_perf_event_init will now also indicate that it doesn't support exclude_host and exclude_guest and will now implicitly return -EINVAL instead of -EPERM. This is likely more desirable as -EPERM will result in a kernel.perf_event_paranoid related warning from the perf userspace utility. Signed-off-by: Andrew Murray <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Russell King <[email protected]> Cc: Sascha Hauer <[email protected]> Cc: Shawn Guo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-01-17net: introduce SO_BINDTOIFINDEX sockoptDavid Herrmann1-0/+2
This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. User-space often refers to network-interfaces via their index, but has to temporarily resolve it to a name for a call into SO_BINDTODEVICE. This might pose problems when the network-device is renamed asynchronously by other parts of the system. When this happens, the SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong device. In most cases user-space only ever operates on devices which they either manage themselves, or otherwise have a guarantee that the device name will not change (e.g., devices that are UP cannot be renamed). However, particularly in libraries this guarantee is non-obvious and it would be nice if that race-condition would simply not exist. It would make it easier for those libraries to operate even in situations where the device-name might change under the hood. A real use-case that we recently hit is trying to start the network stack early in the initrd but make it survive into the real system. Existing distributions rename network-interfaces during the transition from initrd into the real system. This, obviously, cannot affect devices that are up and running (unless you also consider moving them between network-namespaces). However, the network manager now has to make sure its management engine for dormant devices will not run in parallel to these renames. Particularly, when you offload operations like DHCP into separate processes, these might setup their sockets early, and thus have to resolve the device-name possibly running into this race-condition. By avoiding a call to resolve the device-name, we no longer depend on the name and can run network setup of dormant devices in parallel to the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this race. Reviewed-by: Tom Gundersen <[email protected]> Signed-off-by: David Herrmann <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-01-06Merge tag 'kbuild-v4.21-3' of ↵Linus Torvalds1-7/+0
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - improve boolinit.cocci and use_after_iter.cocci semantic patches - fix alignment for kallsyms - move 'asm goto' compiler test to Kconfig and clean up jump_label CONFIG option - generate asm-generic wrappers automatically if arch does not implement mandatory UAPI headers - remove redundant generic-y defines - misc cleanups * tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: rename generated .*conf-cfg to *conf-cfg kbuild: remove unnecessary stubs for archheader and archscripts kbuild: use assignment instead of define ... endef for filechk_* rules arch: remove redundant UAPI generic-y defines kbuild: generate asm-generic wrappers if mandatory headers are missing arch: remove stale comments "UAPI Header export list" riscv: remove redundant kernel-space generic-y kbuild: change filechk to surround the given command with { } kbuild: remove redundant target cleaning on failure kbuild: clean up rule_dtc_dt_yaml kbuild: remove UIMAGE_IN and UIMAGE_OUT jump_label: move 'asm goto' support test to Kconfig kallsyms: lower alignment on ARM scripts: coccinelle: boolinit: drop warnings on named constants scripts: coccinelle: check for redeclaration kconfig: remove unused "file" field of yylval union nds32: remove redundant kernel-space generic-y nios2: remove unneeded HAS_DMA define
2019-01-06Fix 'acccess_ok()' on alpha and SHLinus Torvalds1-3/+5
Commit 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'") broke both alpha and SH booting in qemu, as noticed by Guenter Roeck. It turns out that the bug wasn't actually in that commit itself (which would have been surprising: it was mostly a no-op), but in how the addition of access_ok() to the strncpy_from_user() and strnlen_user() functions now triggered the case where those functions would test the access of the very last byte of the user address space. The string functions actually did that user range test before too, but they did it manually by just comparing against user_addr_max(). But with user_access_begin() doing the check (using "access_ok()"), it now exposed problems in the architecture implementations of that function. For example, on alpha, the access_ok() helper macro looked like this: #define __access_ok(addr, size) \ ((get_fs().seg & (addr | size | (addr+size))) == 0) and what it basically tests is of any of the high bits get set (the USER_DS masking value is 0xfffffc0000000000). And that's completely wrong for the "addr+size" check. Because it's off-by-one for the case where we check to the very end of the user address space, which is exactly what the strn*_user() functions do. Why? Because "addr+size" will be exactly the size of the address space, so trying to access the last byte of the user address space will fail the __access_ok() check, even though it shouldn't. As a result, the user string accessor functions failed consistently - because they literally don't know how long the string is going to be, and the max access is going to be that last byte of the user address space. Side note: that alpha macro is buggy for another reason too - it re-uses the arguments twice. And SH has another version of almost the exact same bug: #define __addr_ok(addr) \ ((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg) so far so good: yes, a user address must be below the limit. But then: #define __access_ok(addr, size) \ (__addr_ok((addr) + (size))) is wrong with the exact same off-by-one case: the case when "addr+size" is exactly _equal_ to the limit is actually perfectly fine (think "one byte access at the last address of the user address space") The SH version is actually seriously buggy in another way: it doesn't actually check for overflow, even though it did copy the _comment_ that talks about overflow. So it turns out that both SH and alpha actually have completely buggy implementations of access_ok(), but they happened to work in practice (although the SH overflow one is a serious serious security bug, not that anybody likely cares about SH security). This fixes the problems by using a similar macro on both alpha and SH. It isn't trying to be clever, the end address is based on this logic: unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b; which basically says "add start and length, and then subtract one unless the length was zero". We can't subtract one for a zero length, or we'd just hit an underflow instead. For a lot of access_ok() users the length is a constant, so this isn't actually as expensive as it initially looks. Reported-and-tested-by: Guenter Roeck <[email protected]> Cc: Matt Turner <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-01-06arch: remove redundant UAPI generic-y definesMasahiro Yamada1-6/+0
Now that Kbuild automatically creates asm-generic wrappers for missing mandatory headers, it is redundant to list the same headers in generic-y and mandatory-y. Suggested-by: Sam Ravnborg <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Sam Ravnborg <[email protected]>
2019-01-06arch: remove stale comments "UAPI Header export list"Masahiro Yamada1-1/+0
These comments are leftovers of commit fcc8487d477a ("uapi: export all headers under uapi directories"). Prior to that commit, exported headers must be explicitly added to header-y. Now, all headers under the uapi/ directories are exported. Signed-off-by: Masahiro Yamada <[email protected]>
2019-01-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-5/+5
Merge more updates from Andrew Morton: - procfs updates - various misc bits - lib/ updates - epoll updates - autofs - fatfs - a few more MM bits * emailed patches from Andrew Morton <[email protected]>: (58 commits) mm/page_io.c: fix polled swap page in checkpatch: add Co-developed-by to signature tags docs: fix Co-Developed-by docs drivers/base/platform.c: kmemleak ignore a known leak fs: don't open code lru_to_page() fs/: remove caller signal_pending branch predictions mm/: remove caller signal_pending branch predictions arch/arc/mm/fault.c: remove caller signal_pending_branch predictions kernel/sched/: remove caller signal_pending branch predictions kernel/locking/mutex.c: remove caller signal_pending branch predictions mm: select HAVE_MOVE_PMD on x86 for faster mremap mm: speed up mremap by 20x on large regions mm: treewide: remove unused address argument from pte_alloc functions initramfs: cleanup incomplete rootfs scripts/gdb: fix lx-version string output kernel/kcov.c: mark write_comp_data() as notrace kernel/sysctl: add panic_print into sysctl panic: add options to print system info when panic happens bfs: extra sanity checking and static inode bitmap exec: separate MM_ANONPAGES and RLIMIT_STACK accounting ...
2019-01-04mm: treewide: remove unused address argument from pte_alloc functionsJoel Fernandes (Google)1-3/+3
Patch series "Add support for fast mremap". This series speeds up the mremap(2) syscall by copying page tables at the PMD level even for non-THP systems. There is concern that the extra 'address' argument that mremap passes to pte_alloc may do something subtle architecture related in the future that may make the scheme not work. Also we find that there is no point in passing the 'address' to pte_alloc since its unused. This patch therefore removes this argument tree-wide resulting in a nice negative diff as well. Also ensuring along the way that the enabled architectures do not do anything funky with the 'address' argument that goes unnoticed by the optimization. Build and boot tested on x86-64. Build tested on arm64. The config enablement patch for arm64 will be posted in the future after more testing. The changes were obtained by applying the following Coccinelle script. (thanks Julia for answering all Coccinelle questions!). Following fix ups were done manually: * Removal of address argument from pte_fragment_alloc * Removal of pte_alloc_one_fast definitions from m68k and microblaze. // Options: --include-headers --no-includes // Note: I split the 'identifier fn' line, so if you are manually // running it, please unsplit it so it runs for you. virtual patch @pte_alloc_func_def depends on patch exists@ identifier E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; type T2; @@ fn(... - , T2 E2 ) { ... } @pte_alloc_func_proto_noarg depends on patch exists@ type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1, T2); + T3 fn(T1); | - T3 fn(T1, T2, T4); + T3 fn(T1, T2); ) @pte_alloc_func_proto depends on patch exists@ identifier E1, E2, E4; type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1 E1, T2 E2); + T3 fn(T1 E1); | - T3 fn(T1 E1, T2 E2, T4 E4); + T3 fn(T1 E1, T2 E2); ) @pte_alloc_func_call depends on patch exists@ expression E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ fn(... -, E2 ) @pte_alloc_macro depends on patch exists@ identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; identifier a, b, c; expression e; position p; @@ ( - #define fn(a, b, c) e + #define fn(a, b) e | - #define fn(a, b) e + #define fn(a) e ) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joel Fernandes (Google) <[email protected]> Suggested-by: Kirill A. Shutemov <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-01-04fls: change parameter to unsigned intMatthew Wilcox1-2/+2
When testing in userspace, UBSAN pointed out that shifting into the sign bit is undefined behaviour. It doesn't really make sense to ask for the highest set bit of a negative value, so just turn the argument type into an unsigned int. Some architectures (eg ppc) already had it declared as an unsigned int, so I don't expect too many problems. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds4-9/+9
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <[email protected]>
2018-12-31Merge branch 'for-linus' of ↵Linus Torvalds14-1043/+609
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha Pull alpha architecture updates from Matt Turner: "A few small changes for alpha as well as the new system call table generation support from Firoz Khan" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: alpha: Remove some unused variables alpha: rtc: simplify alpha_rtc_init alpha: Fix a typo on ptrace.h alpha: fix spelling mistake QSD_PORT_ACTUVE -> QSD_PORT_ACTIVE alpha: generate uapi header and syscall table header files alpha: add system call table generation support alpha: add __NR_syscalls along with NR_SYSCALLS alpha: remove CONFIG_OSF4_COMPAT flag from syscall table alpha: move __IGNORE* entries to non uapi header
2018-12-29Merge tag 'kconfig-v4.21-2' of ↵Linus Torvalds1-28/+12
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kconfig file consolidation from Masahiro Yamada: "Consolidation of bus (PCI, PCMCIA, EISA, RapidIO) config entries by Christoph Hellwig. Currently, every architecture that wants to provide common peripheral busses needs to add some boilerplate code and include the right Kconfig files. This series instead just selects the presence (when needed) and then handles everything in the bus-specific Kconfig file under drivers/" * tag 'kconfig-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: pcmcia: remove per-arch PCMCIA config entry eisa: consolidate EISA Kconfig entry in drivers/eisa rapidio: consolidate RAPIDIO config entry in drivers/rapidio pcmcia: allow PCMCIA support independent of the architecture PCI: consolidate the PCI_SYSCALL symbol PCI: consolidate the PCI_DOMAINS and PCI_DOMAINS_GENERIC config options PCI: consolidate PCI config entry in drivers/pci MIPS: remove the HT_PCI config option
2018-12-28Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds3-13/+7
Pull DMA mapping updates from Christoph Hellwig: "A huge update this time, but a lot of that is just consolidating or removing code: - provide a common DMA_MAPPING_ERROR definition and avoid indirect calls for dma_map_* error checking - use direct calls for the DMA direct mapping case, avoiding huge retpoline overhead for high performance workloads - merge the swiotlb dma_map_ops into dma-direct - provide a generic remapping DMA consistent allocator for architectures that have devices that perform DMA that is not cache coherent. Based on the existing arm64 implementation and also used for csky now. - improve the dma-debug infrastructure, including dynamic allocation of entries (Robin Murphy) - default to providing chaining scatterlist everywhere, with opt-outs for the few architectures (alpha, parisc, most arm32 variants) that can't cope with it - misc sparc32 dma-related cleanups - remove the dma_mark_clean arch hook used by swiotlb on ia64 and replace it with the generic noncoherent infrastructure - fix the return type of dma_set_max_seg_size (Niklas Söderlund) - move the dummy dma ops for not DMA capable devices from arm64 to common code (Robin Murphy) - ensure dma_alloc_coherent returns zeroed memory to avoid kernel data leaks through userspace. We already did this for most common architectures, but this ensures we do it everywhere. dma_zalloc_coherent has been deprecated and can hopefully be removed after -rc1 with a coccinelle script" * tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping: (73 commits) dma-mapping: fix inverted logic in dma_supported dma-mapping: deprecate dma_zalloc_coherent dma-mapping: zero memory returned from dma_alloc_* sparc/iommu: fix ->map_sg return value sparc/io-unit: fix ->map_sg return value arm64: default to the direct mapping in get_arch_dma_ops PCI: Remove unused attr variable in pci_dma_configure ia64: only select ARCH_HAS_DMA_COHERENT_TO_PFN if swiotlb is enabled dma-mapping: bypass indirect calls for dma-direct vmd: use the proper dma_* APIs instead of direct methods calls dma-direct: merge swiotlb_dma_ops into the dma_direct code dma-direct: use dma_direct_map_page to implement dma_direct_map_sg dma-direct: improve addressability error reporting swiotlb: remove dma_mark_clean swiotlb: remove SWIOTLB_MAP_ERROR ACPI / scan: Refactor _CCA enforcement dma-mapping: factor out dummy DMA ops dma-mapping: always build the direct mapping code dma-mapping: move dma_cache_sync out of line dma-mapping: move various slow path functions out of line ...