aboutsummaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)AuthorFilesLines
2015-09-04userfaultfd: change the read API to return a uffd_msgAndrea Arcangeli1-15/+55
I had requests to return the full address (not the page aligned one) to userland. It's not entirely clear how the page offset could be relevant because userfaults aren't like SIGBUS that can sigjump to a different place and it actually skip resolving the fault depending on a page offset. There's currently no real way to skip the fault especially because after a UFFDIO_COPY|ZEROPAGE, the fault is optimized to be retried within the kernel without having to return to userland first (not even self modifying code replacing the .text that touched the faulting address would prevent the fault to be repeated). Userland cannot skip repeating the fault even more so if the fault was triggered by a KVM secondary page fault or any get_user_pages or any copy-user inside some syscall which will return to kernel code. The second time FAULT_FLAG_RETRY_NOWAIT won't be set leading to a SIGBUS being raised because the userfault can't wait if it cannot release the mmap_map first (and FAULT_FLAG_RETRY_NOWAIT is required for that). Still returning userland a proper structure during the read() on the uffd, can allow to use the current UFFD_API for the future non-cooperative extensions too and it looks cleaner as well. Once we get additional fields there's no point to return the fault address page aligned anymore to reuse the bits below PAGE_SHIFT. The only downside is that the read() syscall will read 32bytes instead of 8bytes but that's not going to be measurable overhead. The total number of new events that can be extended or of new future bits for already shipped events, is limited to 64 by the features field of the uffdio_api structure. If more will be needed a bump of UFFD_API will be required. [[email protected]: use __packed] Signed-off-by: Andrea Arcangeli <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Sanidhya Kashyap <[email protected]> Cc: [email protected] Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andres Lagar-Cavilla <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Peter Feiner <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "Huangpeng (Peter)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-04userfaultfd: Rename uffd_api.bits into .featuresPavel Emelyanov1-3/+9
This is (seems to be) the minimal thing that is required to unblock standard uffd usage from the non-cooperative one. Now more bits can be added to the features field indicating e.g. UFFD_FEATURE_FORK and others needed for the latter use-case. Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Sanidhya Kashyap <[email protected]> Cc: [email protected] Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andres Lagar-Cavilla <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Peter Feiner <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "Huangpeng (Peter)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-04userfaultfd: uAPIAndrea Arcangeli2-0/+84
Defines the uAPI of the userfaultfd, notably the ioctl numbers and protocol. Signed-off-by: Andrea Arcangeli <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Sanidhya Kashyap <[email protected]> Cc: [email protected] Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andres Lagar-Cavilla <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Peter Feiner <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "Huangpeng (Peter)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-04capabilities: add a securebit to disable PR_CAP_AMBIENT_RAISEAndy Lutomirski1-1/+10
Per Andrew Morgan's request, add a securebit to allow admins to disable PR_CAP_AMBIENT_RAISE. This securebit will prevent processes from adding capabilities to their ambient set. For simplicity, this disables PR_CAP_AMBIENT_RAISE entirely rather than just disabling setting previously cleared bits. Signed-off-by: Andy Lutomirski <[email protected]> Acked-by: Andrew G. Morgan <[email protected]> Acked-by: Serge Hallyn <[email protected]> Cc: Kees Cook <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Aaron Jones <[email protected]> Cc: Ted Ts'o <[email protected]> Cc: Andrew G. Morgan <[email protected]> Cc: Mimi Zohar <[email protected]> Cc: Austin S Hemmelgarn <[email protected]> Cc: Markku Savela <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-04capabilities: ambient capabilitiesAndy Lutomirski1-0/+7
Credit where credit is due: this idea comes from Christoph Lameter with a lot of valuable input from Serge Hallyn. This patch is heavily based on Christoph's patch. ===== The status quo ===== On Linux, there are a number of capabilities defined by the kernel. To perform various privileged tasks, processes can wield capabilities that they hold. Each task has four capability masks: effective (pE), permitted (pP), inheritable (pI), and a bounding set (X). When the kernel checks for a capability, it checks pE. The other capability masks serve to modify what capabilities can be in pE. Any task can remove capabilities from pE, pP, or pI at any time. If a task has a capability in pP, it can add that capability to pE and/or pI. If a task has CAP_SETPCAP, then it can add any capability to pI, and it can remove capabilities from X. Tasks are not the only things that can have capabilities; files can also have capabilities. A file can have no capabilty information at all [1]. If a file has capability information, then it has a permitted mask (fP) and an inheritable mask (fI) as well as a single effective bit (fE) [2]. File capabilities modify the capabilities of tasks that execve(2) them. A task that successfully calls execve has its capabilities modified for the file ultimately being excecuted (i.e. the binary itself if that binary is ELF or for the interpreter if the binary is a script.) [3] In the capability evolution rules, for each mask Z, pZ represents the old value and pZ' represents the new value. The rules are: pP' = (X & fP) | (pI & fI) pI' = pI pE' = (fE ? pP' : 0) X is unchanged For setuid binaries, fP, fI, and fE are modified by a moderately complicated set of rules that emulate POSIX behavior. Similarly, if euid == 0 or ruid == 0, then fP, fI, and fE are modified differently (primary, fP and fI usually end up being the full set). For nonroot users executing binaries with neither setuid nor file caps, fI and fP are empty and fE is false. As an extra complication, if you execute a process as nonroot and fE is set, then the "secure exec" rules are in effect: AT_SECURE gets set, LD_PRELOAD doesn't work, etc. This is rather messy. We've learned that making any changes is dangerous, though: if a new kernel version allows an unprivileged program to change its security state in a way that persists cross execution of a setuid program or a program with file caps, this persistent state is surprisingly likely to allow setuid or file-capped programs to be exploited for privilege escalation. ===== The problem ===== Capability inheritance is basically useless. If you aren't root and you execute an ordinary binary, fI is zero, so your capabilities have no effect whatsoever on pP'. This means that you can't usefully execute a helper process or a shell command with elevated capabilities if you aren't root. On current kernels, you can sort of work around this by setting fI to the full set for most or all non-setuid executable files. This causes pP' = pI for nonroot, and inheritance works. No one does this because it's a PITA and it isn't even supported on most filesystems. If you try this, you'll discover that every nonroot program ends up with secure exec rules, breaking many things. This is a problem that has bitten many people who have tried to use capabilities for anything useful. ===== The proposed change ===== This patch adds a fifth capability mask called the ambient mask (pA). pA does what most people expect pI to do. pA obeys the invariant that no bit can ever be set in pA if it is not set in both pP and pI. Dropping a bit from pP or pI drops that bit from pA. This ensures that existing programs that try to drop capabilities still do so, with a complication. Because capability inheritance is so broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and then calling execve effectively drops capabilities. Therefore, setresuid from root to nonroot conditionally clears pA unless SECBIT_NO_SETUID_FIXUP is set. Processes that don't like this can re-add bits to pA afterwards. The capability evolution rules are changed: pA' = (file caps or setuid or setgid ? 0 : pA) pP' = (X & fP) | (pI & fI) | pA' pI' = pI pE' = (fE ? pP' : pA') X is unchanged If you are nonroot but you have a capability, you can add it to pA. If you do so, your children get that capability in pA, pP, and pE. For example, you can set pA = CAP_NET_BIND_SERVICE, and your children can automatically bind low-numbered ports. Hallelujah! Unprivileged users can create user namespaces, map themselves to a nonzero uid, and create both privileged (relative to their namespace) and unprivileged process trees. This is currently more or less impossible. Hallelujah! You cannot use pA to try to subvert a setuid, setgid, or file-capped program: if you execute any such program, pA gets cleared and the resulting evolution rules are unchanged by this patch. Users with nonzero pA are unlikely to unintentionally leak that capability. If they run programs that try to drop privileges, dropping privileges will still work. It's worth noting that the degree of paranoia in this patch could possibly be reduced without causing serious problems. Specifically, if we allowed pA to persist across executing non-pA-aware setuid binaries and across setresuid, then, naively, the only capabilities that could leak as a result would be the capabilities in pA, and any attacker *already* has those capabilities. This would make me nervous, though -- setuid binaries that tried to privilege-separate might fail to do so, and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have unexpected side effects. (Whether these unexpected side effects would be exploitable is an open question.) I've therefore taken the more paranoid route. We can revisit this later. An alternative would be to require PR_SET_NO_NEW_PRIVS before setting ambient capabilities. I think that this would be annoying and would make granting otherwise unprivileged users minor ambient capabilities (CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than it is with this patch. ===== Footnotes ===== [1] Files that are missing the "security.capability" xattr or that have unrecognized values for that xattr end up with has_cap set to false. The code that does that appears to be complicated for no good reason. [2] The libcap capability mask parsers and formatters are dangerously misleading and the documentation is flat-out wrong. fE is *not* a mask; it's a single bit. This has probably confused every single person who has tried to use file capabilities. [3] Linux very confusingly processes both the script and the interpreter if applicable, for reasons that elude me. The results from thinking about a script's file capabilities and/or setuid bits are mostly discarded. Preliminary userspace code is here, but it needs updating: https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2 Here is a test program that can be used to verify the functionality (from Christoph): /* * Test program for the ambient capabilities. This program spawns a shell * that allows running processes with a defined set of capabilities. * * (C) 2015 Christoph Lameter <[email protected]> * Released under: GPL v3 or later. * * * Compile using: * * gcc -o ambient_test ambient_test.o -lcap-ng * * This program must have the following capabilities to run properly: * Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE * * A command to equip the binary with the right caps is: * * setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test * * * To get a shell with additional caps that can be inherited by other processes: * * ./ambient_test /bin/bash * * * Verifying that it works: * * From the bash spawed by ambient_test run * * cat /proc/$$/status * * and have a look at the capabilities. */ #include <stdlib.h> #include <stdio.h> #include <errno.h> #include <cap-ng.h> #include <sys/prctl.h> #include <linux/capability.h> /* * Definitions from the kernel header files. These are going to be removed * when the /usr/include files have these defined. */ #define PR_CAP_AMBIENT 47 #define PR_CAP_AMBIENT_IS_SET 1 #define PR_CAP_AMBIENT_RAISE 2 #define PR_CAP_AMBIENT_LOWER 3 #define PR_CAP_AMBIENT_CLEAR_ALL 4 static void set_ambient_cap(int cap) { int rc; capng_get_caps_process(); rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap); if (rc) { printf("Cannot add inheritable cap\n"); exit(2); } capng_apply(CAPNG_SELECT_CAPS); /* Note the two 0s at the end. Kernel checks for these */ if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) { perror("Cannot set cap"); exit(1); } } int main(int argc, char **argv) { int rc; set_ambient_cap(CAP_NET_RAW); set_ambient_cap(CAP_NET_ADMIN); set_ambient_cap(CAP_SYS_NICE); printf("Ambient_test forking shell\n"); if (execv(argv[1], argv + 1)) perror("Cannot exec"); return 0; } Signed-off-by: Christoph Lameter <[email protected]> # Original author Signed-off-by: Andy Lutomirski <[email protected]> Acked-by: Serge E. Hallyn <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Aaron Jones <[email protected]> Cc: Ted Ts'o <[email protected]> Cc: Andrew G. Morgan <[email protected]> Cc: Mimi Zohar <[email protected]> Cc: Austin S Hemmelgarn <[email protected]> Cc: Markku Savela <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-04Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds3-6/+55
Pull drm updates from Dave Airlie: "This is the main pull request for the drm for 4.3. Nouveau is probably the biggest amount of changes in here, since it missed 4.2. Highlights below, along with the usual bunch of fixes. All stuff outside drm should have applicable acks. Highlights: - new drivers: freescale dcu kms driver - core: more atomic fixes disable some dri1 interfaces on kms drivers drop fb panic handling, this was just getting more broken, as more locking was required. new core fbdev Kconfig support - instead of each driver enable/disabling it struct_mutex cleanups - panel: more new panels cleanup Kconfig - i915: Skylake support enabled by default legacy modesetting using atomic infrastructure Skylake fixes GEN9 workarounds - amdgpu: Fiji support CGS support for amdgpu Initial GPU scheduler - off by default Lots of bug fixes and optimisations. - radeon: DP fixes misc fixes - amdkfd: Add Carrizo support for amdkfd using amdgpu. - nouveau: long pending cleanup to complete driver, fully bisectable which makes it larger, perfmon work more reclocking improvements maxwell displayport fixes - vmwgfx: new DX device support, supports OpenGL 3.3 screen targets support - mgag200: G200eW support G200e new revision support - msm: dragonboard 410c support, msm8x94 support, msm8x74v1 support yuv format support dma plane support mdp5 rotation initial hdcp - sti: atomic support - exynos: lots of cleanups atomic modesetting/pageflipping support render node support - tegra: tegra210 support (dc, dsi, dp/hdmi) dpms with atomic modesetting support - atmel: support for 3 more atmel SoCs new input formats, PRIME support. - dwhdmi: preparing to add audio support - rockchip: yuv plane support" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1369 commits) drm/amdgpu: rename gmc_v8_0_init_compute_vmid drm/amdgpu: fix vce3 instance handling drm/amdgpu: remove ib test for the second VCE Ring drm/amdgpu: properly enable VM fault interrupts drm/amdgpu: fix warning in scheduler drm/amdgpu: fix buffer placement under memory pressure drm/amdgpu/cz: fix cz_dpm_update_low_memory_pstate logic drm/amdgpu: fix typo in dce11 watermark setup drm/amdgpu: fix typo in dce10 watermark setup drm/amdgpu: use top down allocation for non-CPU accessible vram drm/amdgpu: be explicit about cpu vram access for driver BOs (v2) drm/amdgpu: set MEC doorbell range for Fiji drm/amdgpu: implement burst NOP for SDMA drm/amdgpu: add insert_nop ring func and default implementation drm/amdgpu: add amdgpu_get_sdma_instance helper function drm/amdgpu: add AMDGPU_MAX_SDMA_INSTANCES drm/amdgpu: add burst_nop flag for sdma drm/amdgpu: add count field for the SDMA NOP packet v2 drm/amdgpu: use PT for VM sync on unmap drm/amdgpu: make wait_event uninterruptible in push_job ...
2015-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds2-0/+5
Pull tile updates from Chris Metcalf: "This includes secure computing support as well as miscellaneous minor improvements" * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: correct some typos in opcode type names tile/vdso: emit a GNU hash as well tile: Remove finish_arch_switch tile: enable full SECCOMP support tile/time: Migrate to new 'set-state' interface
2015-09-03Merge tag 'powerpc-4.3-1' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask from Benjamin Herrenschmidt - EEH fixes for SRIOV from Gavin - introduce rtas_get_sensor_fast() for IRQ handlers from Thomas Huth - use hardware RNG for arch_get_random_seed_* not arch_get_random_* from Paul Mackerras - seccomp filter support from Michael Ellerman - opal_cec_reboot2() handling for HMIs & machine checks from Mahesh Salgaonkar - add powerpc timebase as a trace clock source from Naveen N. Rao - misc cleanups in the xmon, signal & SLB code from Anshuman Khandual - add an inline function to update POWER8 HID0 from Gautham R. Shenoy - fix pte_pagesize_index() crash on 4K w/64K hash from Michael Ellerman - drop support for 64K local store on 4K kernels from Michael Ellerman - move dma_get_required_mask() from pnv_phb to pci_controller_ops from Andrew Donnellan - initialize distance lookup table from drconf path from Nikunj A Dadhania - enable RTC class support from Vaibhav Jain - disable automatically blocked PCI config from Gavin Shan - add LEDs driver for PowerNV platform from Vasant Hegde - fix endianness issues in the HVSI driver from Laurent Dufour - kexec endian fixes from Samuel Mendoza-Jonas - fix corrupted pdn list from Gavin Shan - fix fenced PHB caused by eeh_slot_error_detail() from Gavin Shan - Freescale updates from Scott: Highlights include 32-bit memcpy/memset optimizations, checksum optimizations, 85xx config fragments and updates, device tree updates, e6500 fixes for non-SMP, and misc cleanup and minor fixes. - a ton of cxl updates & fixes: - add explicit precision specifiers from Rasmus Villemoes - use more common format specifier from Rasmus Villemoes - destroy cxl_adapter_idr on module_exit from Johannes Thumshirn - destroy afu->contexts_idr on release of an afu from Johannes Thumshirn - compile with -Werror from Daniel Axtens - EEH support from Daniel Axtens - plug irq_bitmap getting leaked in cxl_context from Vaibhav Jain - add alternate MMIO error handling from Ian Munsie - allow release of contexts which have been OPENED but not STARTED from Andrew Donnellan - remove use of macro DEFINE_PCI_DEVICE_TABLE from Vaishali Thakkar - release irqs if memory allocation fails from Vaibhav Jain - remove racy attempt to force EEH invocation in reset from Daniel Axtens - fix + cleanup error paths in cxl_dev_context_init from Ian Munsie - fix force unmapping mmaps of contexts allocated through the kernel api from Ian Munsie - set up and enable PSL Timebase from Philippe Bergheaud * tag 'powerpc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (140 commits) cxl: Set up and enable PSL Timebase cxl: Fix force unmapping mmaps of contexts allocated through the kernel api cxl: Fix + cleanup error paths in cxl_dev_context_init powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail() powerpc/pseries: Cleanup on pci_dn_reconfig_notifier() powerpc/pseries: Fix corrupted pdn list powerpc/powernv: Enable LEDS support powerpc/iommu: Set default DMA offset in dma_dev_setup cxl: Remove racy attempt to force EEH invocation in reset cxl: Release irqs if memory allocation fails cxl: Remove use of macro DEFINE_PCI_DEVICE_TABLE powerpc/powernv: Fix mis-merge of OPAL support for LEDS driver powerpc/powernv: Reset HILE before kexec_sequence() powerpc/kexec: Reset secondary cpu endianness before kexec powerpc/hvsi: Fix endianness issues in the HVSI driver leds/powernv: Add driver for PowerNV platform powerpc/powernv: Create LED platform device powerpc/powernv: Add OPAL interfaces for accessing and modifying system LED states powerpc/powernv: Fix the log message when disabling VF cxl: Allow release of contexts which have been OPENED but not STARTED ...
2015-09-03Merge tag 'dlm-4.3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set mainly includes a change to the way the dlm uses the SCTP API in the kernel, removing the direct dependency on the sctp module. Other odd SCTP-related fixes are also included. The other notable fix is for a long standing regression in the behavior of lock value blocks for user space locks" * tag 'dlm-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: print error from kernel_sendpage dlm: fix lvb copy for user locks dlm: sctp_accept_from_sock() can be static dlm: fix reconnecting but not sending data dlm: replace BUG_ON with a less severe handling dlm: use sctp 1-to-1 API dlm: fix not reconnecting on connecting error handling dlm: fix race while closing connections dlm: fix connection stealing if using SCTP
2015-09-03IB/hfi1: Add PSM2 user space header to header_installIra Weiny2-0/+3
When the hfi1 driver was added a user space header file (hfi1_user.h) was added to be shared between PSM2 and the driver. However, the file was not added to the header install. Add it now. Fixes: d4ab347005fb ("IB/core: Add core header changes needed for OPA") Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2015-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds25-8/+267
Pull networking updates from David Miller: "Another merge window, another set of networking changes. I've heard rumblings that the lightweight tunnels infrastructure has been voted networking change of the year. But what do I know? 1) Add conntrack support to openvswitch, from Joe Stringer. 2) Initial support for VRF (Virtual Routing and Forwarding), which allows the segmentation of routing paths without using multiple devices. There are some semantic kinks to work out still, but this is a reasonably strong foundation. From David Ahern. 3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov. 4) Ignore route nexthops with a link down state in ipv6, just like ipv4. From Andy Gospodarek. 5) Remove spinlock from fast path of act_gact and act_mirred, from Eric Dumazet. 6) Document the DSA layer, from Florian Fainelli. 7) Add netconsole support to bcmgenet, systemport, and DSA. Also from Florian Fainelli. 8) Add Mellanox Switch Driver and core infrastructure, from Jiri Pirko. 9) Add support for "light weight tunnels", which allow for encapsulation and decapsulation without bearing the overhead of a full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of others. 10) Add Identifier Locator Addressing support for ipv6, from Tom Herbert. 11) Support fragmented SKBs in iwlwifi, from Johannes Berg. 12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia. 13) Add BQL support to 3c59x driver, from Loganaden Velvindron. 14) Stop using a zero TX queue length to mean that a device shouldn't have a qdisc attached, use an explicit flag instead. From Phil Sutter. 15) Use generic geneve netdevice infrastructure in openvswitch, from Pravin B Shelar. 16) Add infrastructure to avoid re-forwarding a packet in software that was already forwarded by a hardware switch. From Scott Feldman. 17) Allow AF_PACKET fanout function to be implemented in a bpf program, from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits) netfilter: nf_conntrack: make nf_ct_zone_dflt built-in netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled net: fec: clear receive interrupts before processing a packet ipv6: fix exthdrs offload registration in out_rt path xen-netback: add support for multicast control bgmac: Update fixed_phy_register() sock, diag: fix panic in sock_diag_put_filterinfo flow_dissector: Use 'const' where possible. flow_dissector: Fix function argument ordering dependency ixgbe: Resolve "initialized field overwritten" warnings ixgbe: Remove bimodal SR-IOV disabling ixgbe: Add support for reporting 2.5G link speed ixgbe: fix bounds checking in ixgbe_setup_tc for 82598 ixgbe: support for ethtool set_rxfh ixgbe: Avoid needless PHY access on copper phys ixgbe: cleanup to use cached mask value ixgbe: Remove second instance of lan_id variable ixgbe: use kzalloc for allocating one thing flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c ixgbe: Remove unused PCI bus types ...
2015-09-02Merge tag 'dm-4.3-changes' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper update from Mike Snitzer: - a couple small cleanups in dm-cache, dm-verity, persistent-data's dm-btree, and DM core. - a 4.1-stable fix for dm-cache that fixes the leaking of deferred bio prison cells - a 4.2-stable fix that adds feature reporting for the dm-stats features added in 4.2 - improve DM-snapshot to not invalidate the on-disk snapshot if snapshot device write overflow occurs; but a write overflow triggered through the origin device will still invalidate the snapshot. - optimize DM-thinp's async discard submission a bit now that late bio splitting has been included in block core. - switch DM-cache's SMQ policy lock from using a mutex to a spinlock; improves performance on very low latency devices (eg. NVMe SSD). - document DM RAID 4/5/6's discard support [ I did not pull the slab changes, which weren't appropriate for this tree, and weren't obviously the right thing to do anyway. At the very least they need some discussion and explanation before getting merged. Because not pulling the actual tagged commit but doing a partial pull instead, this merge commit thus also obviously is missing the git signature from the original tag ] * tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: fix use after freeing migrations dm cache: small cleanups related to deferred prison cell cleanup dm cache: fix leaking of deferred bio prison cells dm raid: document RAID 4/5/6 discard support dm stats: report precise_timestamps and histogram in @stats_list output dm thin: optimize async discard submission dm snapshot: don't invalidate on-disk image on snapshot write overflow dm: remove unlikely() before IS_ERR() dm: do not override error code returned from dm_get_device() dm: test return value for DM_MAPIO_SUBMITTED dm verity: remove unused mempool dm cache: move wake_waker() from free_migrations() to where it is needed dm btree remove: remove unused function get_nr_entries() dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling() dm cache policy smq: change the mutex to a spinlock
2015-09-02Merge branch 'for-4.3/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull block driver updates from Jens Axboe: "On top of the 4.3 core block IO changes, here are the driver related changes for 4.3. Basically just NVMe and nbd this time around: - NVMe: - PRACT PI improvement from Alok Pandey. - Cleanups and improvements on submission queue doorbell and writing, using CMB if available. From Jon Derrick. - From Keith, support for setting queue maximum segments, and reset support. - Also from Jon, fixup of u64 division issue on 32-bit archs and wiring up of the reset support through and ioctl. - Two small cleanups from Matias and Sunad - Various code cleanups and fixes from Markus Pargmann" * 'for-4.3/drivers' of git://git.kernel.dk/linux-block: NVMe: Using PRACT bit to generate and verify PI by controller NVMe:Remove unreachable code in nvme_abort_req NVMe: Add nvme subsystem reset IOCTL NVMe: Add nvme subsystem reset support NVMe: removed unused nn var from nvme_dev_add NVMe: Set queue max segments nbd: flags is a u32 variable nbd: Rename functions for clearness of recv/send path nbd: Change 'disconnect' to be boolean nbd: Add debugfs entries nbd: Remove variable 'pid' nbd: Move clear queue debug message nbd: Remove 'harderror' and propagate error properly nbd: restructure sock_shutdown nbd: sock_shutdown, remove conditional lock nbd: Fix timeout detection nvme: Fixes u64 division which breaks i386 builds NVMe: Use CMB for the IO SQes if available NVMe: Unify SQ entry writing and doorbell ringing
2015-09-02Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2-0/+175
Pull first round of SCSI updates from James Bottomley: "This includes one new driver: cxlflash plus the usual grab bag of updates for the major drivers: qla2xxx, ipr, storvsc, pm80xx, hptiop, plus a few assorted fixes. There's another tranch coming, but I want to incubate it another few days in the checkers, plus it includes a mpt2sas separated lifetime fix, which Avago won't get done testing until Friday" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (85 commits) aic94xx: set an error code on failure storvsc: Set the error code correctly in failure conditions storvsc: Allow write_same when host is windows 10 storvsc: use storage protocol version to determine storage capabilities storvsc: use correct defaults for values determined by protocol negotiation storvsc: Untangle the storage protocol negotiation from the vmbus protocol negotiation. storvsc: Use a single value to track protocol versions storvsc: Rather than look for sets of specific protocol versions, make decisions based on ranges. cxlflash: Remove unused variable from queuecommand cxlflash: shift wrapping bug in afu_link_reset() cxlflash: off by one bug in cxlflash_show_port_status() cxlflash: Virtual LUN support cxlflash: Superpipe support cxlflash: Base error recovery support qla2xxx: Update driver version to 8.07.00.26-k qla2xxx: Add pci device id 0x2261. qla2xxx: Fix missing device login retries. qla2xxx: do not clear slot in outstanding cmd array qla2xxx: Remove decrement of sp reference count in abort handler. qla2xxx: Add support to show MPI and PEP FW version for ISP27xx. ...
2015-09-02uapi/drm/i915_drm.h: fix userspace compilation.Artem Savkov1-1/+1
commit 346add7834557b5b9628b9bf2387106d42e631d4 Author: Daniel Vetter <[email protected]> Date: Tue Jul 14 18:07:30 2015 +0200 drm/i915: Use expcitly fixed type in compat32 structs changed the type of param field in drm_i915_getparam from int to s32. This header is exported to userspace and needs to use userspace type __s32 instead. This fixes userspace compilation errors like the following: include/drm/i915_drm.h:361:2: error: unknown type name 's32' s32 param; Signed-off-by: Artem Savkov <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Signed-off-by: Jani Nikula <[email protected]>
2015-08-31Merge branch 'perf-core-for-linus' of ↵Linus Torvalds1-2/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Main perf kernel side changes: - uprobes updates/fixes. (Oleg Nesterov) - Add PERF_RECORD_SWITCH to indicate context switches and use it in tooling. (Adrian Hunter) - Support BPF programs attached to uprobes and first steps for BPF tooling support. (Wang Nan) - x86 generic x86 MSR-to-perf PMU driver. (Andy Lutomirski) - x86 Intel PT, LBR and BTS updates. (Alexander Shishkin) - x86 Intel Skylake support. (Andi Kleen) - x86 Intel Knights Landing (KNL) RAPL support. (Dasaratharaman Chandramouli) - x86 Intel Broadwell-DE uncore support. (Kan Liang) - x86 hw breakpoints robustization (Andy Lutomirski) Main perf tooling side changes: - Support Intel PT in several tools, enabling the use of the processor trace feature introduced in Intel Broadwell processors: (Adrian Hunter) # dmesg | grep Performance # [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver. # perf record -e intel_pt//u -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.216 MB perf.data ] # perf script # then navigate in the tool output to some area, like this one: 184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so) 185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so) 186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so) 187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so) 188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so) 189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so) 190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so) 191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so) 192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so) 193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) 194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so) 195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so) 196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so) 197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so) 198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so) 199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so) 200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so) 201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so) - Add support for using several Intel PT features (CYC, MTC packets), the relevant documentation was updated in: tools/perf/Documentation/intel-pt.txt briefly describing those packets, its purposes, how to configure them in the event config terms and relevant external documentation for further reading. (Adrian Hunter) - Introduce support for probing at an absolute address, for user and kernel 'perf probe's, useful when one have the symbol maps on a developer machine but not on an embedded system. (Wang Nan) - Add Intel BTS support, with a call-graph script to show it and PT in use in a GUI using 'perf script' python scripting with postgresql and Qt. (Adrian Hunter) - Allow selecting the type of callchains per event, including disabling callchains in all but one entry in an event list, to save space, and also to ask for the callchains collected in one event to be used in other events. (Kan Liang) - Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho de Melo) * A bunch more translate file/pathnames from pointers to strings. * Convert numbers to strings for the 'keyctl' syscall 'option' arg. * Add missing 'clockid' entries. - Introduce 'srcfile' sort key: (Andi Kleen) # perf record -F 10000 usleep 1 # perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile <SNIP> # Overhead Source File 26.49% copy_page_64.S 5.49% signal.c 0.51% msr.h # It can be combined with other fields, for instance, experiment with '-s srcfile,symbol'. There are some oddities in some distros and with some specific DSOs, being investigated, so your mileage may vary. - Support per-event 'freq' term: (Namhyung Kim) $ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1 $ perf evlist -F cpu/instructions,freq=1234/: sample_freq=1234 cycles: sample_period=1000 $ - Deref sys_enter pointer args with contents from probe:vfs_getname, showing pathnames instead of pointers in many syscalls in 'perf trace'. (Arnaldo Carvalho de Melo) - Stop collecting /proc/kallsyms in perf.data files, saving about 4.5MB on a typical x86-64 system, use the the symbol resolution routines used in all the other tools (report, top, etc) now that we can ask libtraceevent to use perf's symbol resolution code. (Arnaldo Carvalho de Melo) - Allow filtering out of perf's PID via 'perf record --exclude-perf'. (Wang Nan) - 'perf trace' now supports syscall groups, like strace, i.e: $ trace -e file touch file Will expand 'file' into multiple, file related, syscalls. More work needed to add extra groups for other syscall groups, and also to complement what was added for the 'file' group, included as a proof of concept. (Arnaldo Carvalho de Melo) - Add lock_pi stresser to 'perf bench futex', to test the kernel code related to FUTEX_(UN)LOCK_PI. (Davidlohr Bueso) - Let user have timestamps with per-thread recording in 'perf record' (Adrian Hunter) - ... and tons of other changes, see the shortlog and the Git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits) perf evlist: Add backpointer for perf_env to evlist perf tools: Rename perf_session_env to perf_env perf tools: Do not change lib/api/fs/debugfs directly perf tools: Add tracing_path and remove unneeded functions perf buildid: Introduce sysfs/filename__sprintf_build_id perf evsel: Add a backpointer to the evlist a evsel is in perf trace: Add header with copyright and background info perf scripts python: Add new compaction-times script perf stat: Get correct cpu id for print_aggr tools lib traceeveent: Allow for negative numbers in print format perf script: Add --[no-]-demangle/--[no-]-demangle-kernel tracing/uprobes: Do not print '0x (null)' when offset is 0 perf probe: Support probing at absolute address perf probe: Fix error reported when offset without function perf probe: Fix list result when address is zero perf probe: Fix list result when symbol can't be found tools build: Allow duplicate objects in the object list perf tools: Remove export.h from MANIFEST perf probe: Prevent segfault when reading probe point with absolute address perf tools: Update Intel PT documentation ...
2015-08-31Merge tag 'usb-4.3-rc1' of ↵Linus Torvalds1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB updates from Greg KH: "Here's the big USB and PHY patchset for 4.3-rc1. As usual, the majority of the changes are in the USB gadget portion of the tree, lots of little changes all over the place for bugs and new hardware. Other than that, the normal mix of new hardware support and bugfixes. All have been in linux-next with no reported issues" * tag 'usb-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (261 commits) USB: qcserial: add HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module USB: ftdi_sio: Added custom PID for CustomWare products USB: usb_wwan: silence read errors on disconnect USB: option: silence interrupt errors USB: symbolserial: Correct transferred data size USB: symbolserial: Use usb_get_serial_port_data usb: misc: usbtest: format max packet size for iso transfer usb: host: ehci-sys: delete useless bus_to_hcd conversion Revert "usb: interface authorization: Declare authorized attribute" Revert "usb: interface authorization: Introduces the default interface authorization" Revert "usb: interface authorization: Control interface probing and claiming" Revert "usb: interface authorization: Introduces the USB interface authorization" Revert "usb: interface authorization: SysFS part of USB interface authorization" Revert "usb: interface authorization: Documentation part" Revert "usb: interface authorization: Use a flag for the default device authorization" usb: core: hub: Removed some warnings generated by checkpatch.pl USB: host: ohci-at91: merge loops in ohci_hcd_at91_drv_probe USB: host: ohci-at91: merge ohci_at91_of_init in ohci_hcd_at91_drv_probe USB: host: ohci-at91: depend on OF USB: host: ohci-at91: move at91_usbh_data definition in c file ...
2015-08-31Merge tag 'tty-4.3-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here is the big tty/serial driver update for 4.3-rc1. Not many major things, a number of driver updates and changes, and the 8250 driver got split up a bit to make it easier to work with by moving some functions to a new file. Full details are in the shortlog. All have been in linux-next with no reported issues" * tag 'tty-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (90 commits) serial: imx: save and restore context in the suspend path serial: imx: allow waking up on RTSD serial: imx: introduce serial_imx_enable_wakeup() serial: imx: remove unbalanced clk_prepare serial: 8250: move rx_running out of the bitfield tty: serial: 8250_omap: do not use RX DMA if pause is not supported serial:8250_dw: do not alter CTS and DCTS since AFE is enabled tty: serial: men_z135_uart.c: Don't initialize port->lock tty: serial: men_z135_uart.c: Fix race between IRQ and set_termios() serial: 8250: bind to ALi Fast Infrared Controller (ALI5123) serial: 8250: don't bind to SMSC IrCC IR port serial: mxs-auart: fix baud rate range serial: mxs-auart: keep the AUART unit in reset state when not in use serial: mxs-auart: use a function name to reflect what it really does serial: 8250_pci: fix mode after S3/S4 resume for F81504/508/512 sc16is7xx: constify devtype sc16is7xx: support multiple devices sc16is7xx: save and use per-chip line number uart: pl011: Add support to ZTE ZX296702 uart uart: pl011: Improve LCRH register access decision ...
2015-08-31fib, fib6: reject invalid feature bitsDaniel Borkmann1-4/+7
Feature bits that are invalid should not be accepted by the kernel, only the lower 4 bits may be configured, but not the remaining ones. Even from these 4, 2 of them are unused. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-31Merge tag 'char-misc-4.3-rc1' of ↵Linus Torvalds1-0/+19
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver patches from Greg KH: "Here's the "big" char/misc driver update for 4.3-rc1. Not much really interesting here, just a number of little changes all over the place, and some nice consolidation of the nvmem drivers to a common framework. As usual, the mei drivers stand out as the largest "churn" to handle new devices and features in their hardware. All have been in linux-next for a while with no issues" * tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits) auxdisplay: ks0108: initialize local parport variable extcon: palmas: Fix build break due to devm_gpiod_get_optional API change extcon: palmas: Support GPIO based USB ID detection extcon: Fix signedness bugs about break error handling extcon: Drop owner assignment from i2c_driver extcon: arizona: Simplify pdata symantics for micd_dbtime extcon: arizona: Declare 3-pole jack if we detect open circuit on mic extcon: Add exception handling to prevent the NULL pointer access extcon: arizona: Ensure variables are set for headphone detection extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio extcon: arizona: Add basic microphone detection DT/ACPI bindings extcon: arizona: Update to use the new device properties API extcon: palmas: Remove the mutually_exclusive array extcon: Remove optional print_state() function pointer of struct extcon_dev extcon: Remove duplicate header file in extcon.h extcon: max77843: Clear IRQ bits state before request IRQ toshiba laptop: replace ioremap_cache with ioremap misc: eeprom: max6875: clean up max6875_read() misc: eeprom: clean up eeprom_read() misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write ...
2015-08-31Merge tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+2
Pull kvm updates from Paolo Bonzini: "A very small release for x86 and s390 KVM. - s390: timekeeping changes, cleanups and fixes - x86: support for Hyper-V MSRs to report crashes, and a bunch of cleanups. One interesting feature that was planned for 4.3 (emulating the local APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to be delayed because Intel complained about my reading of the manual" * tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) x86/kvm: Rename VMX's segment access rights defines KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn kvm: x86: Fix error handling in the function kvm_lapic_sync_from_vapic KVM: s390: Fix assumption that kvm_set_irq_routing is always run successfully KVM: VMX: drop ept misconfig check KVM: MMU: fully check zero bits for sptes KVM: MMU: introduce is_shadow_zero_bits_set() KVM: MMU: introduce the framework to check zero bits on sptes KVM: MMU: split reset_rsvds_bits_mask_ept KVM: MMU: split reset_rsvds_bits_mask KVM: MMU: introduce rsvd_bits_validate KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c KVM: MMU: fix validation of mmio page fault KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON KVM: s390: host STP toleration for VMs KVM: x86: clean/fix memory barriers in irqchip_in_kernel KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus KVM: x86: remove unnecessary memory barriers for shared MSRs KVM: move code related to KVM_SET_BOOT_CPU_ID to x86 KVM: s390: log capability enablement and vm attribute changes ...
2015-08-31Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar1-0/+6
Signed-off-by: Ingo Molnar <[email protected]>
2015-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
2015-08-30IB/netlink: Add defines for local service requests through netlinkKaike Wan1-0/+82
This patch adds netlink defines for local service client, local service group, local service operations, and related attributes. Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: John Fleck <[email protected]> Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2015-08-28netlink: add NETLINK_CAP_ACK socket optionChristophe Ricard1-0/+1
Since commit c05cdb1b864f ("netlink: allow large data transfers from user-space"), the kernel may fail to allocate the necessary room for the acknowledgment message back to userspace. This patch introduces a new socket option that trims off the payload of the original netlink message. The netlink message header is still included, so the user can guess from the sequence number what is the message that has triggered the acknowledgment. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Christophe Ricard <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-28IB/core: Add core header changes needed for OPADennis Dalessandro1-0/+427
This patch adds the value of the CNP opcode to the existing list of enumerated opcodes in ib_pack.h Add common OPA header definitions for driver build: - opa_port_info.h - opa_smi.h - hfi1_user.h Additionally, ib_mad.h, has additional definitions that are common to ib_drivers including: - trap support - cca support The qib driver has the duplication removed in favor those in ib_mad.h Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: John, Jubin <[email protected]> Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2015-08-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2-1/+8
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. In sum, patches to address fallout from the previous round plus updates from the IPVS folks via Simon Horman, they are: 1) Add a new scheduler to IPVS: The weighted overflow scheduling algorithm directs network connections to the server with the highest weight that is currently available and overflows to the next when active connections exceed the node's weight. From Raducu Deaconu. 2) Fix locking ordering in IPVS, always take rtnl_lock in first place. Patch from Julian Anastasov. 3) Allow to indicate the MTU to the IPVS in-kernel state sync daemon. From Julian Anastasov. 4) Enhance multicast configuration for the IPVS state sync daemon. Also from Julian. 5) Resolve sparse warnings in the nf_dup modules. 6) Fix a linking problem when CONFIG_NF_DUP_IPV6 is not set. 7) Add ICMP codes 5 and 6 to IPv6 REJECT target, they are more informative subsets of code 1. From Andreas Herz. 8) Revert the jumpstack size calculation from mark_source_chains due to chain depth miscalculations, from Florian Westphal. 9) Calm down more sparse warning around the Netfilter tree, again from Florian Westphal. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-08-27NFS: Update NFS4_BITMAP_SIZEKinglong Mee1-1/+1
v4.1/v4.2 have define attributes at word2, nfs client also support security label now. v3, same as v2. Signed-off-by: Kinglong Mee <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-08-27geneve: Add support to collect tunnel metadata.Pravin B Shelar1-0/+1
Following patch create new tunnel flag which enable tunnel metadata collection on given device. These devices can be used by tunnel metadata based routing or by OVS. Geneve Consolidation patch get rid of collect_md_tun to simplify tunnel lookup further. Signed-off-by: Pravin B Shelar <[email protected]> Reviewed-by: Jesse Gross <[email protected]> Acked-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-27geneve: Make dst-port configurable.Pravin B Shelar1-0/+1
Add netlink interface to configure Geneve UDP port number. So that user can configure it for a Gevene device. Signed-off-by: Pravin B Shelar <[email protected]> Reviewed-by: Jesse Gross <[email protected]> Acked-by: Thomas Graf <[email protected]> Acked-by: John W. Linville <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-27bridge: Add netlink support for vlan_protocol attributeToshiaki Makita1-0/+1
This enables bridge vlan_protocol to be configured through netlink. When CONFIG_BRIDGE_VLAN_FILTERING is disabled, kernel behaves the same way as this feature is not implemented. Signed-off-by: Toshiaki Makita <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-27openvswitch: Allow attaching helpers to ct actionJoe Stringer1-0/+3
Add support for using conntrack helpers to assist protocol detection. The new OVS_CT_ATTR_HELPER attribute of the CT action specifies a helper to be used for this connection. If no helper is specified, then helpers will be automatically applied as per the sysctl configuration of net.netfilter.nf_conntrack_helper. The helper may be specified as part of the conntrack action, eg: ct(helper=ftp). Initial packets for related connections should be committed to allow later packets for the flow to be considered established. Example ovs-ofctl flows allowing FTP connections from ports 1->2: in_port=1,tcp,action=ct(helper=ftp,commit),2 in_port=2,tcp,ct_state=-trk,action=ct(recirc) in_port=2,tcp,ct_state=+trk-new+est,action=1 in_port=2,tcp,ct_state=+trk+rel,action=1 Signed-off-by: Joe Stringer <[email protected]> Acked-by: Thomas Graf <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-27openvswitch: Allow matching on conntrack labelJoe Stringer1-0/+10
Allow matching and setting the ct_label field. As with ct_mark, this is populated by executing the CT action. The label field may be modified by specifying a label and mask nested under the CT action. It is stored as metadata attached to the connection. Label modification occurs after lookup, and will only persist when the conntrack entry is committed by providing the COMMIT flag to the CT action. Labels are currently fixed to 128 bits in size. Signed-off-by: Joe Stringer <[email protected]> Acked-by: Thomas Graf <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-27openvswitch: Allow matching on conntrack markJoe Stringer1-0/+5
Allow matching and setting the ct_mark field. As with ct_state and ct_zone, these fields are populated when the CT action is executed. To write to this field, a value and mask can be specified as a nested attribute under the CT action. This data is stored with the conntrack entry, and is executed after the lookup occurs for the CT action. The conntrack entry itself must be committed using the COMMIT flag in the CT action flags for this change to persist. Signed-off-by: Justin Pettit <[email protected]> Signed-off-by: Joe Stringer <[email protected]> Acked-by: Thomas Graf <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-27openvswitch: Add conntrack actionJoe Stringer1-0/+40
Expose the kernel connection tracker via OVS. Userspace components can make use of the CT action to populate the connection state (ct_state) field for a flow. This state can be subsequently matched. Exposed connection states are OVS_CS_F_*: - NEW (0x01) - Beginning of a new connection. - ESTABLISHED (0x02) - Part of an existing connection. - RELATED (0x04) - Related to an established connection. - INVALID (0x20) - Could not track the connection for this packet. - REPLY_DIR (0x40) - This packet is in the reply direction for the flow. - TRACKED (0x80) - This packet has been sent through conntrack. When the CT action is executed by itself, it will send the packet through the connection tracker and populate the ct_state field with one or more of the connection state flags above. The CT action will always set the TRACKED bit. When the COMMIT flag is passed to the conntrack action, this specifies that information about the connection should be stored. This allows subsequent packets for the same (or related) connections to be correlated with this connection. Sending subsequent packets for the connection through conntrack allows the connection tracker to consider the packets as ESTABLISHED, RELATED, and/or REPLY_DIR. The CT action may optionally take a zone to track the flow within. This allows connections with the same 5-tuple to be kept logically separate from connections in other zones. If the zone is specified, then the "ct_zone" match field will be subsequently populated with the zone id. IP fragments are handled by transparently assembling them as part of the CT action. The maximum received unit (MRU) size is tracked so that refragmentation can occur during output. IP frag handling contributed by Andy Zhou. Based on original design by Justin Pettit. Signed-off-by: Joe Stringer <[email protected]> Signed-off-by: Justin Pettit <[email protected]> Signed-off-by: Andy Zhou <[email protected]> Acked-by: Thomas Graf <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-26cxlflash: Virtual LUN supportMatthew R. Ochs1-0/+34
Add support for physical LUN segmentation (virtual LUNs) to device driver supporting the IBM CXL Flash adapter. This patch allows user space applications to virtually segment a physical LUN into N virtual LUNs, taking advantage of the translation features provided by this adapter. Signed-off-by: Matthew R. Ochs <[email protected]> Signed-off-by: Manoj N. Kumar <[email protected]> Reviewed-by: Michael Neuling <[email protected]> Reviewed-by: Wen Xiong <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2015-08-26cxlflash: Superpipe supportMatthew R. Ochs2-0/+141
Add superpipe supporting infrastructure to device driver for the IBM CXL Flash adapter. This patch allows userspace applications to take advantage of the accelerated I/O features that this adapter provides and bypass the traditional filesystem stack. Signed-off-by: Matthew R. Ochs <[email protected]> Signed-off-by: Manoj N. Kumar <[email protected]> Reviewed-by: Michael Neuling <[email protected]> Reviewed-by: Wen Xiong <[email protected]> Reviewed-by: Brian King <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2015-08-26Merge tag 'ipvs2-for-v4.3' of ↵Pablo Neira Ayuso1-0/+5
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Second Round of IPVS Updates for v4.3 I realise these are a little late in the cycle, so if you would prefer me to repost them for v4.4 then just let me know. The updates include: * A new scheduler from Raducu Deaconu * Enhanced configurability of the sync daemon from Julian Anastasov ==================== Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-08-26netfilter: ip6t_REJECT: added missing icmpv6 codesAndreas Herz1-1/+3
RFC 4443 added two new codes values for ICMPv6 type 1: 5 - Source address failed ingress/egress policy 6 - Reject route to destination And RFC 7084 states in L-14 that IPv6 Router MUST send ICMPv6 Destination Unreachable with code 5 for packets forwarded to it that use an address from a prefix that has been invalidated. Codes 5 and 6 are more informative subsets of code 1. Signed-off-by: Andreas Herz <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-08-25dlm: fix lvb copy for user locksDavid Teigland1-1/+1
For a userland lock request, the previous and current lock modes are used to decide when the lvb should be copied back to the user. The wrong previous value was used, so that it always matched the current value. This caused the lvb to be copied back to the user in the wrong cases. Signed-off-by: David Teigland <[email protected]>
2015-08-24Merge tag 'v4.2-rc8' into drm-nextDave Airlie1-0/+6
Linux 4.2-rc8 Backmerge required for Intel so they can fix their -next tree up properly.
2015-08-22Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into ↵Paolo Bonzini2-0/+3
kvm-queue Patch queue for ppc - 2015-08-22 Highlights for KVM PPC this time around: - Book3S: A few bug fixes - Book3S: Allow micro-threading on POWER8
2015-08-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+6
Conflicts: drivers/net/usb/qmi_wwan.c Overlapping additions of new device IDs to qmi_wwan.c Signed-off-by: David S. Miller <[email protected]>
2015-08-21ipvs: add more mcast parameters for the sync daemonJulian Anastasov1-0/+4
- mcast_group: configure the multicast address, now IPv6 is supported too - mcast_port: configure the multicast port - mcast_ttl: configure the multicast TTL/HOP_LIMIT Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2015-08-21ipvs: add sync_maxlen parameter for the sync daemonJulian Anastasov1-0/+1
Allow setups with large MTU to send large sync packets by adding sync_maxlen parameter. The default value is now based on MTU but no more than 1500 for compatibility reasons. To avoid problems if MTU changes allow fragmentation by sending packets with DF=0. Problem reported by Dan Carpenter. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2015-08-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller3-1/+31
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next This is second pull request includes the conflict resolution patch that resulted from the updates that we got for the conntrack template through kmalloc. No changes with regards to the previously sent 15 patches. The following patchset contains Netfilter updates for your net-next tree, they are: 1) Rework the existing nf_tables counter expression to make it per-cpu. 2) Prepare and factor out common packet duplication code from the TEE target so it can be reused from the new dup expression. 3) Add the new dup expression for the nf_tables IPv4 and IPv6 families. 4) Convert the nf_tables limit expression to use a token-based approach with 64-bits precision. 5) Enhance the nf_tables limit expression to support limiting at packet byte. This comes after several preparation patches. 6) Add a burst parameter to indicate the amount of packets or bytes that can exceed the limiting. 7) Add netns support to nfacct, from Andreas Schultz. 8) Pass the nf_conn_zone structure instead of the zone ID in nf_tables to allow accessing more zone specific information, from Daniel Borkmann. 9) Allow to define zone per-direction to support netns containers with overlapping network addressing, also from Daniel. 10) Extend the CT target to allow setting the zone based on the skb->mark as a way to support simple mappings from iptables, also from Daniel. 11) Make the nf_tables payload expression aware of the fact that VLAN offload may have removed a vlan header, from Florian Westphal. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-08-21Merge branch 'master' of ↵Pablo Neira Ayuso11-27/+73
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Resolve conflicts with conntrack template fixes. Conflicts: net/netfilter/nf_conntrack_core.c net/netfilter/nf_synproxy_core.c net/netfilter/xt_CT.c Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-08-20ipv6: route: per route IP tunnel metadata via lightweight tunnelJiri Benc1-0/+16
Allow specification of per route IP tunnel instructions also for IPv6. This complements commit 3093fbe7ff4b ("route: Per route IP tunnel metadata via lightweight tunnel"). Signed-off-by: Jiri Benc <[email protected]> CC: YOSHIFUJI Hideaki <[email protected]> Acked-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-20Merge tag 'sound-4.2' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Here are a small collecton of sound fix patches. The most significant one is the disablement of newly introduced topology API. Its ABI couldn't be stabilized enough, so we decided to delay for 4.3 in the end. Other than that, all oneliner fixes: a USB-audio runtime PM fix and a couple of HD-audio quirks" * tag 'sound-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Add dock support for Thinkpad W541 (17aa:2211) ALSA: usb-audio: Fix runtime PM unbalance ASoC: topology: Disable use from userspace ASoC: topology: Add Kconfig option for topology ALSA: hda - Fix the white noise on Dell laptop
2015-08-20Merge branch 'perf/urgent' into perf/core, to pick up fixes before adding ↵Ingo Molnar4-15/+29
more changes Signed-off-by: Ingo Molnar <[email protected]>