aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2019-04-30mm/vmalloc: Add flag for freeing of special permsissionsRick Edgecombe1-0/+15
Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to immediately clear executable TLB entries before freeing pages, and handle resetting permissions on the directmap. This flag is useful for any kind of memory with elevated permissions, or where there can be related permissions changes on the directmap. Today this is RO+X and RO memory. Although this enables directly vfreeing non-writeable memory now, non-writable memory cannot be freed in an interrupt because the allocation itself is used as a node on deferred free list. So when RO memory needs to be freed in an interrupt the code doing the vfree needs to have its own work queue, as was the case before the deferred vfree list was added to vmalloc. For architectures with set_direct_map_ implementations this whole operation can be done with one TLB flush when centralized like this. For others with directmap permissions, currently only arm64, a backup method using set_memory functions is used to reset the directmap. When arm64 adds set_direct_map_ functions, this backup can be removed. When the TLB is flushed to both remove TLB entries for the vmalloc range mapping and the direct map permissions, the lazy purge operation could be done to try to save a TLB flush later. However today vm_unmap_aliases could flush a TLB range that does not include the directmap. So a helper is added with extra parameters that can allow both the vmalloc address and the direct mapping to be flushed during this operation. The behavior of the normal vm_unmap_aliases function is unchanged. Suggested-by: Dave Hansen <[email protected]> Suggested-by: Andy Lutomirski <[email protected]> Suggested-by: Will Deacon <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-30mm/hibernation: Make hibernation handle unmapped pagesRick Edgecombe1-12/+6
Make hibernate handle unmapped pages on the direct map when CONFIG_ARCH_HAS_SET_ALIAS=y is set. These functions allow for setting pages to invalid configurations, so now hibernate should check if the pages have valid mappings and handle if they are unmapped when doing a hibernate save operation. Previously this checking was already done when CONFIG_DEBUG_PAGEALLOC=y was configured. It does not appear to have a big hibernating performance impact. The speed of the saving operation before this change was measured as 819.02 MB/s, and after was measured at 813.32 MB/s. Before: [ 4.670938] PM: Wrote 171996 kbytes in 0.21 seconds (819.02 MB/s) After: [ 4.504714] PM: Wrote 178932 kbytes in 0.22 seconds (813.32 MB/s) Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Pavel Machek <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-30x86/mm/cpa: Add set_direct_map_*() functionsRick Edgecombe1-0/+11
Add two new functions set_direct_map_default_noflush() and set_direct_map_invalid_noflush() for setting the direct map alias for the page to its default valid permissions and to an invalid state that cannot be cached in a TLB, respectively. These functions do not flush the TLB. Note, __kernel_map_pages() does something similar but flushes the TLB and doesn't reset the permission bits to default on all architectures. Also add an ARCH config ARCH_HAS_SET_DIRECT_MAP for specifying whether these have an actual implementation or a default empty one. Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-30x86/modules: Avoid breaking W^X while loading modulesNadav Amit1-0/+1
When modules and BPF filters are loaded, there is a time window in which some memory is both writable and executable. An attacker that has already found another vulnerability (e.g., a dangling pointer) might be able to exploit this behavior to overwrite kernel code. Prevent having writable executable PTEs in this stage. In addition, avoiding having W+X mappings can also slightly simplify the patching of modules code on initialization (e.g., by alternatives and static-key), as would be done in the next patch. This was actually the main motivation for this patch. To avoid having W+X mappings, set them initially as RW (NX) and after they are set as RO set them as X as well. Setting them as executable is done as a separate step to avoid one core in which the old PTE is cached (hence writable), and another which sees the updated PTE (executable), which would break the W^X protection. Suggested-by: Thomas Gleixner <[email protected]> Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Rik van Riel <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-30fork: Provide a function for copying init_mmNadav Amit1-0/+1
Provide a function for copying init_mm. This function will be later used for setting a temporary mm. Tested-by: Masami Hiramatsu <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-30uprobes: Initialize uprobes earlierNadav Amit1-0/+5
In order to have a separate address space for text poking, we need to duplicate init_mm early during start_kernel(). This, however, introduces a problem since uprobes functions are called from dup_mmap(), but uprobes is still not initialized in this early stage. Since uprobes initialization is necassary for fork, and since all the dependant initialization has been done when fork is initialized (percpu and vmalloc), move uprobes initialization to fork_init(). It does not seem uprobes introduces any security problem for the poking_mm. Crash and burn if uprobes initialization fails, similarly to other early initializations. Change the init_probes() name to probes_init() to match other early initialization functions name convention. Reported-by: kernel test robot <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-30KVM: Introduce a 'release' method for KVM devicesCédric Le Goater1-0/+9
When a P9 sPAPR VM boots, the CAS negotiation process determines which interrupt mode to use (XICS legacy or XIVE native) and invokes a machine reset to activate the chosen mode. To be able to switch from one interrupt mode to another, we introduce the capability to release a KVM device without destroying the VM. The KVM device interface is extended with a new 'release' method which is called when the file descriptor of the device is closed. Once 'release' is called, the 'destroy' method will not be called anymore as the device is removed from the device list of the VM. Cc: Paolo Bonzini <[email protected]> Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: Introduce a 'mmap' method for KVM devicesCédric Le Goater1-0/+1
Some KVM devices will want to handle special mappings related to the underlying HW. For instance, the XIVE interrupt controller of the POWER9 processor has MMIO pages for thread interrupt management and for interrupt source control that need to be exposed to the guest when the OS has the required support. Cc: Paolo Bonzini <[email protected]> Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30Merge branch 'opp/linux-next' of ↵Rafael J. Wysocki1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp Pull operating performance points (OPP) framework changes for v5.2 from Viresh Kumar: "This pull request contains: - New helper in OPP core to find best matching frequency for a voltage value." * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: OPP: Introduce dev_pm_opp_find_freq_ceil_by_volt()
2019-04-30USB: serial: drop unused iflag macroJohan Hovold1-3/+0
Drop the RELEVANT_IFLAG() macro which essentially hasn't been used for over a decade except in some remnant debug printks that were recently removed. Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Johan Hovold <[email protected]>
2019-04-30USB: serial: clean up throttle handlingJohan Hovold1-4/+1
Clean up the throttle implementation by dropping the redundant throttle_req flag which was a remnant from back when there was only a single read URB. Also convert the throttled flag to an atomic bit flag. Signed-off-by: Johan Hovold <[email protected]>
2019-04-29net/mlx5: Geneve, Add flow table capabilities for Geneve decap with TLV optionsYevgeny Kliteynik2-6/+48
Introduce specification for Geneve decap flow with encapsulation options and allow creation of rules that are matching on Geneve TLV options. Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: Yevgeny Kliteynik <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-04-29net/mlx5: Geneve, Add basic Geneve encap/decap flow table capabilitiesYevgeny Kliteynik1-3/+13
Introduce support for Geneve flow specification and allow the creation of rules that are matching on basic Geneve protocol fields: VNI, OAM bit, protocol type, options length. Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: Yevgeny Kliteynik <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-04-29net/mlx5: Eswitch, enable RoCE loopback trafficMaor Gottlieb1-0/+7
When in switchdev mode, we would like to treat loopback RoCE traffic (on eswitch manager) as RDMA and not as regular Ethernet traffic In order to enable it we add flow steering rule that forward RoCE loopback traffic to the HW RoCE filter (by adding allow rule). In addition we add RoCE address in GID index 0, which will be set in the RoCE loopback packet. Signed-off-by: Maor Gottlieb <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Acked-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-04-29net/mlx5: Add new miss flow table actionMaor Gottlieb1-1/+9
Flow table supports three types of miss action: 1. Default miss action - go to default miss table according to table. 2. Go to specific table. 3. Switch domain - go to the root table of an alternative steering table domain. New table miss action was added - switch_domain. The next domain for RDMA_RX namespace is the NIC RX domain. Signed-off-by: Maor Gottlieb <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-04-29net/mlx5: Add support in RDMA RX steeringMaor Gottlieb3-1/+8
Add new flow steering namespace - MLX5_FLOW_NAMESPACE_RDMA_RX. Flow steering rules in this namespace are used to filter RDMA traffic. Signed-off-by: Maor Gottlieb <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-04-29net/mlx5: Get rid of storing copy of device nameParav Pandit1-2/+1
Currently mlx5 core stores copy of the PCI device name in a mlx5_priv structure and uses pr_warn, pr_err helpers. Get rid of the copy of this name; instead store the parent device pointer that contains name as well as dma specific parameters. This also allows to use kernel's well defined dev_warn, dev_err, dev_dbg device specific print routines. This is also a preparation patch to access non PCI parent device in future. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-04-29net/mlx5: E-Switch: Introduce prio tag modeEli Britstein1-1/+3
Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN push on RX path. To workaround that limitation untagged packets will be tagged with VLAN ID 0x000 (priority tag) and pop/push actions will be replaced by VLAN re-write actions (which are supported by the HW). Introduce prio tag mode as a pre-step to controlling the workaround behavior. Signed-off-by: Eli Britstein <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-04-29function_graph: Place ftrace_graph_entry_stub() prototype in ↵Steven Rostedt (VMware)1-0/+2
include/linux/ftrace.h ftrace_graph_entry_stub() is defined in generic code, its prototype should be in the generic header and not defined throughout architecture specific code in order to use it. Cc: Greentime Hu <[email protected]> Cc: Vincent Chen <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: [email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-04-29PCI: Add pci_dev_id() helperHeiner Kallweit1-0/+5
In several places in the kernel we find PCI_DEVID used like this: PCI_DEVID(dev->bus->number, dev->devfn) Add a "pci_dev_id(struct pci_dev *dev)" helper to simplify callers. Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2019-04-29Merge tag 'reset-for-5.2' of git://git.pengutronix.de/pza/linux into arm/driversOlof Johansson1-0/+2
Reset controller changes for v5.2 This adds support for 'acquired'/'released' states to exclusive reset controls to allow drivers that usually need direct control over their reset line, to temporarily yield control to another driver, such as a power domain controller during power transitions. A fix adds missing headers to linux/reset.h, which caused build errors in linux-next, discovered by the new lima DRM driver. * tag 'reset-for-5.2' of git://git.pengutronix.de/pza/linux: reset: fix linux/reset.h errors Signed-off-by: Olof Johansson <[email protected]>
2019-04-29Merge tag 'imx-drivers-5.2' of ↵Olof Johansson1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/drivers i.MX drivers change for 5.2: - A series from Aisheng to generalize the SCU powerdomain driver for easier adding new SCU based platforms like imx8qm. - Add a generic i.MX8 SoC driver for reporting SoC and platform information. - Replace explicit polling loop with a call to regmap_read_poll_timeout() for gpcv2 driver to avoid code repetition. - Use devm_platform_ioremap_resource() to simplify gpc/gpcv2 driver code a bit. - Add general IRQ support for imx-scu driver, so that interrupt of device like RTC, thermal and watchdog can be handled. * tag 'imx-drivers-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: soc: imx: Add generic i.MX8 SoC driver firmware: imx: enable imx scu general irq function soc: imx: gpcv2: use devm_platform_ioremap_resource() to simplify code soc: imx: gpc: use devm_platform_ioremap_resource() to simplify code firmware: imx: scu-pd: decouple the SS information from domain names firmware: imx: scu-pd: add specifying the base of domain name index support firmware: imx: scu-pd: use bool to set postfix soc: imx: gpcv2: Make use of regmap_read_poll_timeout() Signed-off-by: Olof Johansson <[email protected]>
2019-04-29irqchip/gic-v3-its: fix some definitions of inner cacheability attributesHongbo Yao1-6/+6
Some definitions of Inner Cacheability attibutes need to be corrected. Fixes: 8c828a535e29f ("irqchip/gicv3-its: Restore all cacheability attributes") Signed-off-by: Hongbo Yao <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-04-29stacktrace: Provide common infrastructureThomas Gleixner1-0/+39
All architectures which support stacktrace carry duplicated code and do the stack storage and filtering at the architecture side. Provide a consolidated interface with a callback function for consuming the stack entries provided by the architecture specific stack walker. This removes lots of duplicated code and allows to implement better filtering than 'skip number of entries' in the future without touching any architecture specific code. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Cc: Steven Rostedt <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: [email protected] Cc: David Rientjes <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: [email protected] Cc: Mike Rapoport <[email protected]> Cc: Akinobu Mita <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: Robin Murphy <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: David Sterba <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Mike Snitzer <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: [email protected] Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Miroslav Benes <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-04-29lib/stackdepot: Remove obsolete functionsThomas Gleixner1-4/+0
No more users of the struct stack_trace based interfaces. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: [email protected] Cc: David Rientjes <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: [email protected] Cc: Mike Rapoport <[email protected]> Cc: Akinobu Mita <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: Robin Murphy <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: David Sterba <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Mike Snitzer <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: [email protected] Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-04-29stacktrace: Remove obsolete functionsThomas Gleixner1-17/+0
No more users of the struct stack_trace based interfaces. Remove them. Remove the macro stubs for !CONFIG_STACKTRACE as well as they are pointless because the storage on the call sites is conditional on CONFIG_STACKTRACE already. No point to be 'smart'. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: [email protected] Cc: David Rientjes <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: [email protected] Cc: Mike Rapoport <[email protected]> Cc: Akinobu Mita <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: Robin Murphy <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: David Sterba <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Mike Snitzer <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: [email protected] Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-04-29lockdep: Simplify stack trace handlingThomas Gleixner1-2/+7
Replace the indirection through struct stack_trace by using the storage array based interfaces and storing the information is a small lockdep specific data structure. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: [email protected] Cc: David Rientjes <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: [email protected] Cc: Mike Rapoport <[email protected]> Cc: Akinobu Mita <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: Robin Murphy <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: David Sterba <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Mike Snitzer <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: [email protected] Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-04-29lib/stackdepot: Provide functions which operate on plain storage arraysThomas Gleixner1-0/+4
The struct stack_trace indirection in the stack depot functions is a truly pointless excercise which requires horrible code at the callsites. Provide interfaces based on plain storage arrays. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: [email protected] Cc: David Rientjes <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: [email protected] Cc: Mike Rapoport <[email protected]> Cc: Akinobu Mita <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: Robin Murphy <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: David Sterba <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Mike Snitzer <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: [email protected] Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-04-29stacktrace: Provide helpers for common stack trace operationsThomas Gleixner1-0/+27
All operations with stack traces are based on struct stack_trace. That's a horrible construct as the struct is a kitchen sink for input and output. Quite some usage sites embed it into their own data structures which creates weird indirections. There is absolutely no point in doing so. For all use cases a storage array and the number of valid stack trace entries in the array is sufficient. Provide helper functions which avoid the struct stack_trace indirection so the usage sites can be cleaned up. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: [email protected] Cc: David Rientjes <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: [email protected] Cc: Mike Rapoport <[email protected]> Cc: Akinobu Mita <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: Robin Murphy <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: David Sterba <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Mike Snitzer <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: [email protected] Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-04-29tracing: Cleanup stack trace codeThomas Gleixner1-14/+4
- Remove the extra array member of stack_dump_trace[] along with the ARRAY_SIZE - 1 initialization for struct stack_trace :: max_entries. Both are historical leftovers of no value. The stack tracer never exceeds the array and there is no extra storage requirement either. - Make variables which are only used in trace_stack.c static. - Simplify the enable/disable logic. - Rename stack_trace_print() as it's using the stack_trace_ namespace. Free the name up for stack trace related functions. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: [email protected] Cc: David Rientjes <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: [email protected] Cc: Mike Rapoport <[email protected]> Cc: Akinobu Mita <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: Robin Murphy <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: David Sterba <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Mike Snitzer <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: [email protected] Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-04-29s390/kexec_file: Disable kexec_load when IPLed securePhilipp Rudo1-1/+1
A kernel loaded via kexec_load cannot be verified. Thus disable kexec_load systemcall in kernels which where IPLed securely. Use the IMA mechanism to do so. Signed-off-by: Philipp Rudo <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2019-04-28Merge tag 'ixp4xx-for-armsoc' of ↵Olof Johansson4-0/+152
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik into arm/soc This modernizes the IXP4xx platform and adds initial Device Tree Support. We migrate to MULTI_IRQ_HANDLER, bumps the IRQs to offset 16, converts to SPARSE_IRQ, then we add proper subsystem drivers in each subsystem for irqchip, GPIO and clocksource and switch over to using these new drivers. Next we modernize the NPE and QMGR drivers and push them down into drivers/soc. This has been tested on the IXP4xx NSLU2 and the Gateworks GW2358-4. * tag 'ixp4xx-for-armsoc' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik: (31 commits) ARM: dts: Add queue manager and NPE to the IXP4xx DTSI soc: ixp4xx: qmgr: Add DT probe code soc: ixp4xx: qmgr: Add DT bindings for IXP4xx qmgr soc: ixp4xx: npe: Add DT probe code soc: ixp4xx: Add DT bindings for IXP4xx NPE soc: ixp4xx: qmgr: Pass resources soc: ixp4xx: Remove unused functions soc: ixp4xx: Uninline several functions soc: ixp4xx: npe: Pass addresses as resources ARM: ixp4xx: Turn the QMGR into a platform device ARM: ixp4xx: Turn the NPE into a platform device ARM: ixp4xx: Move IXP4xx QMGR and NPE headers ARM: ixp4xx: Move NPE and QMGR to drivers/soc ARM: dts: Add some initial IXP4xx device trees ARM: ixp4xx: Add device tree boot support ARM: ixp4xx: Add DT bindings gpio: ixp4xx: Add OF probing support gpio: ixp4xx: Add DT bindings clocksource/drivers/ixp4xx: Add OF initialization support clocksource/drivers/ixp4xx: Add DT bindings ... Signed-off-by: Olof Johansson <[email protected]>
2019-04-28Merge tag 'tegra-for-5.2-firmware' of ↵Olof Johansson1-0/+96
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/soc firmware: tegra: Changes for v5.2-rc1 This set of changes includes improvements for Trusted Foundations and also moves the source files for this support into the standard location under drivers/firmware. * tag 'tegra-for-5.2-firmware' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: firmware: Move Trusted Foundations support ARM: tegra: Sort dependencies alphabetically ARM: tegra: Add firmware calls required for suspend-resume on Tegra30 ARM: tegra: Always boot CPU in ARM-mode ARM: tegra: Don't apply CPU erratas in insecure mode ARM: tegra: Set up L2 cache using Trusted Foundations firmware ARM: trusted_foundations: Provide information about whether firmware is registered ARM: trusted_foundations: Make prepare_idle call to take mode argument ARM: trusted_foundations: Support L2 cache maintenance Signed-off-by: Olof Johansson <[email protected]>
2019-04-28Merge tag 'tegra-for-5.2-soc' of ↵Olof Johansson1-26/+87
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/drivers soc/tegra: Changes for v5.2-rc1 Besides a couple of fixes to better cope with deferred probing, this set of patches also implements the acquire/release protocol for resets used during powergate operations. This is necessary to allow these resets to be temporarily shared with other devices that may also need to control these resets. * tag 'tegra-for-5.2-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: soc/tegra: pmc: Move powergate initialisation to probe soc/tegra: pmc: Remove reset sysfs entries on error soc/tegra: pmc: Fix reset sources and levels soc/tegra: pmc: Implement acquire/release for resets reset: Add acquire/release support for arrays reset: Add acquired flag to of_reset_control_array_get() reset: add acquired/released state for exclusive reset controls Signed-off-by: Olof Johansson <[email protected]>
2019-04-29locking/static_key: Add support for deferred static branchesJakub Kicinski1-3/+61
Add deferred static branches. We can't unfortunately use the nice trick of encapsulating the entire structure in true/false variants, because the inside has to be either struct static_key_true or struct static_key_false. Use defines to pass the appropriate members to the helpers separately. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Simon Horman <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-29perf/x86: Make perf callchains work without CONFIG_FRAME_POINTERKairui Song1-4/+10
Currently perf callchain doesn't work well with ORC unwinder when sampling from trace point. We'll get useless in kernel callchain like this: perf 6429 [000] 22.498450: kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux) 7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so) 5651468729c1 [unknown] (/usr/bin/perf) 5651467ee82a main+0x69a (/usr/bin/perf) 7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so) 5541f689495641d7 [unknown] ([unknown]) The root cause is that, for trace point events, it doesn't provide a real snapshot of the hardware registers. Instead perf tries to get required caller's registers and compose a fake register snapshot which suppose to contain enough information for start a unwinding. However without CONFIG_FRAME_POINTER, if failed to get caller's BP as the frame pointer, so current frame pointer is returned instead. We get a invalid register combination which confuse the unwinder, and end the stacktrace early. So in such case just don't try dump BP, and let the unwinder start directly when the register is not a real snapshot. Use SP as the skip mark, unwinder will skip all the frames until it meet the frame of the trace point caller. Tested with frame pointer unwinder and ORC unwinder, this makes perf callchain get the full kernel space stacktrace again like this: perf 6503 [000] 1567.570191: kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux) 7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so) 55a22960d9c1 [unknown] (/usr/bin/perf) 55a22958982a main+0x69a (/usr/bin/perf) 7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so) 5541f689495641d7 [unknown] ([unknown]) Co-developed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Kairui Song <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Young <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-04-28ARM: ep93xx: move pinctrl interfaces into include/linux/socArnd Bergmann1-0/+37
ep93xx does not have a proper pinctrl driver, but does things ad-hoc through mach/platform.h, which is also used for setting up the boards. To avoid using mach/*.h headers completely, let's move the interfaces into include/linux/soc/. This is far from great, but gets the job done here, without the need for a proper pinctrl driver. Acked-by: Alexander Sverdlin <[email protected]> Acked-by: H Hartley Sweeten <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Olof Johansson <[email protected]>
2019-04-28ARM: ep93xx: keypad: stop using mach/platform.hArnd Bergmann1-2/+2
We can communicate the clock rate using platform data rather than setting a flag to use a particular value in the driver, which is cleaner and avoids the dependency. No platform in the kernel currently defines the ep93xx keypad device structure, so this is a rather pointless excercise. Any out of tree users are probably dead now, but if not, they have to change their platform code to match the new platform_data structure. Acked-by: H Hartley Sweeten <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Olof Johansson <[email protected]>
2019-04-28ARM: ep93xx: move network platform data to separate headerArnd Bergmann1-0/+10
The header file is the only thing preventing us from building the driver in a cross-platform configuration, so move the structure we are interested in to the global platform_data location and enable compile testing. Acked-by: Alexander Sverdlin <[email protected]> Acked-by: H Hartley Sweeten <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Olof Johansson <[email protected]>
2019-04-28Merge tag 'zynqmp-soc-for-v5.2' of https://github.com/Xilinx/linux-xlnx into ↵Olof Johansson1-1/+13
arm/drivers arm64: zynqmp: SoC changes for v5.2 - Add support for ZynqMP fpga manager - Defer some probes which depends on firmware driver to be ready - Debugfs fix * tag 'zynqmp-soc-for-v5.2' of https://github.com/Xilinx/linux-xlnx: fpga manager: Adding FPGA Manager support for Xilinx zynqmp dt-bindings: fpga: Add bindings for ZynqMP fpga driver firmware: xilinx: Add fpga API's drivers: Defer probe if firmware is not ready firmware: xilinx: fix debugfs write handler Signed-off-by: Olof Johansson <[email protected]>
2019-04-28Merge tag 'omap-for-v5.2/am4-pm-v2-signed' of ↵Olof Johansson3-0/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/drivers PM changes for am335x and am437x This series adds support for am437x RTC-only mode in suspend. In the RTC-only mode suspend, everything is shut down except the RTC. This makes the power consumption very low for suspend mode. To support RTC-only mode, we need to export omap_rtc_power_off_program() from the rtc driver and improve PM code to save and restore the wkup domain context. As RTC-only mode depends on the device being wired properly for things like memory, we need to also check for the machine type before we allow it. We also need to run DDR3 hardware leveling on resume. Note that there is a trivial merge conflict between the RTC branch and these changes where the RTC branch makes tm2bcd() a void function and the error handling parts can be just dropped. * tag 'omap-for-v5.2/am4-pm-v2-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: sleep43xx: Run EMIF HW leveling on resume path memory: ti-emif-sram: Add ti_emif_run_hw_leveling for DDR3 hardware leveling soc: ti: pm33xx: AM437X: Add rtc_only with ddr in self-refresh support soc: ti: pm33xx: Move the am33xx_push_sram_idle to the top ARM: OMAP2+: pm33xx: Add support for rtc+ddr in self refresh mode rtc: OMAP: Add support for rtc-only mode Signed-off-by: Olof Johansson <[email protected]>
2019-04-28Merge tag 'omap-for-v5.2/ti-sysc-signed' of ↵Olof Johansson1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/soc Driver changes for ti-sysc for v5.2 merge window This series of changes for ti-sysc interconnect target module driver gets us to the point where we can actually drop legacy platform data for many devices in favor of device tree data. To do this, we improve ti-sysc driver not to rely on platform data callbacks to manage module clocks, and handle more quirks needed for some devices. Also few minor fixes are needed, but were considered not needed to be sent separately as they only show up with this series. Then we drop several thousands of lines of legacy platform data for omap4, omap5, dra7, am335x and am437x. We drop platform data for mmc, i2c, gpio and uart devices to start with as those are typically easily tested on all devices. In case of unexpected issues, we can just add back the legacy platform data for a single device type if needed. Finally we add initial support for enabling and disabling some devices without legacy platform data callbacks. I was planning on sending the dropping of legacy platform data as a separate series, but already applied Roger's patch on top and pushed it out. Note that this series depends on related SoC and is based on those. * tag 'omap-for-v5.2/ti-sysc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: (33 commits) bus: ti-sysc: Add generic enable/disable functions ARM: OMAP2+: Drop mcspi platform data for omap4 ARM: OMAP2+: Drop uart platform data for dra7 ARM: OMAP2+: Drop gpio platform data for dra7 ARM: OMAP2+: Drop i2c platform data for dra7 ARM: OMAP2+: Drop mmc platform data for dra7 ARM: OMAP2+: Drop uart platform data for omap5 ARM: OMAP2+: Drop gpio platform data for omap5 ARM: OMAP2+: Drop i2c platform data for omap5 ARM: OMAP2+: Drop mmc platform data for omap5 ARM: OMAP2+: Drop uart platform data for am33xx and am43xx ARM: OMAP2+: Drop gpio platform data for am33xx and am43xx ARM: OMAP2+: Drop i2c platform data for am33xx and am43xx ARM: OMAP2+: Drop mmc platform data for am330x and am43xx ARM: OMAP2+: Drop uart platform data for omap4 ARM: OMAP2+: Drop gpio platform data for omap4 ARM: OMAP2+: Drop i2c platform data for omap4 ARM: OMAP2+: Drop mmc platform data for omap4 Documentation: bus: ti-sysc: fix spelling mistakes "multipe" and "interconnet" bus: ti-sysc: Detect DMIC for debugging ... Signed-off-by: Olof Johansson <[email protected]>
2019-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller5-14/+31
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-04-28 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Introduce BPF socket local storage map so that BPF programs can store private data they associate with a socket (instead of e.g. separate hash table), from Martin. 2) Add support for bpftool to dump BTF types. This is done through a new `bpftool btf dump` sub-command, from Andrii. 3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which was currently not supported since skb was used to lookup netns, from Stanislav. 4) Add an opt-in interface for tracepoints to expose a writable context for attached BPF programs, used here for NBD sockets, from Matt. 5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel. 6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to support tunnels such as sit. Add selftests as well, from Willem. 7) Various smaller misc fixes. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-04-27ipset: drop ipset_nest_start() and ipset_nest_end()Michal Kubecek1-7/+4
After the previous commit, both ipset_nest_start() and ipset_nest_end() are just aliases for nla_nest_start() and nla_nest_end() so that there is no need to keep them. Signed-off-by: Michal Kubecek <[email protected]> Acked-by: Jozsef Kadlecsik <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-27netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek1-1/+1
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <[email protected]> Acked-by: Jiri Pirko <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-27net/tls: move definition of tls ops into net/tls.hJakub Kicinski1-22/+1
There seems to be no reason for tls_ops to be defined in netdevice.h which is included in a lot of places. Don't wrap the struct/enum declaration in ifdefs, it trickles down unnecessary ifdefs into driver code. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-27bpf: Introduce bpf sk local storageMartin KaFai Lau2-0/+3
After allowing a bpf prog to - directly read the skb->sk ptr - get the fullsock bpf_sock by "bpf_sk_fullsock()" - get the bpf_tcp_sock by "bpf_tcp_sock()" - get the listener sock by "bpf_get_listener_sock()" - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock" into different bpf running context. this patch is another effort to make bpf's network programming more intuitive to do (together with memory and performance benefit). When bpf prog needs to store data for a sk, the current practice is to define a map with the usual 4-tuples (src/dst ip/port) as the key. If multiple bpf progs require to store different sk data, multiple maps have to be defined. Hence, wasting memory to store the duplicated keys (i.e. 4 tuples here) in each of the bpf map. [ The smallest key could be the sk pointer itself which requires some enhancement in the verifier and it is a separate topic. ] Also, the bpf prog needs to clean up the elem when sk is freed. Otherwise, the bpf map will become full and un-usable quickly. The sk-free tracking currently could be done during sk state transition (e.g. BPF_SOCK_OPS_STATE_CB). The size of the map needs to be predefined which then usually ended-up with an over-provisioned map in production. Even the map was re-sizable, while the sk naturally come and go away already, this potential re-size operation is arguably redundant if the data can be directly connected to the sk itself instead of proxy-ing through a bpf map. This patch introduces sk->sk_bpf_storage to provide local storage space at sk for bpf prog to use. The space will be allocated when the first bpf prog has created data for this particular sk. The design optimizes the bpf prog's lookup (and then optionally followed by an inline update). bpf_spin_lock should be used if the inline update needs to be protected. BPF_MAP_TYPE_SK_STORAGE: ----------------------- To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can be created to fit different bpf progs' needs. The map enforces BTF to allow printing the sk-local-storage during a system-wise sk dump (e.g. "ss -ta") in the future. The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete a "sk-local-storage" data from a particular sk. Think of the map as a meta-data (or "type") of a "sk-local-storage". This particular "type" of "sk-local-storage" data can then be stored in any sk. The main purposes of this map are mostly: 1. Define the size of a "sk-local-storage" type. 2. Provide a similar syscall userspace API as the map (e.g. lookup/update, map-id, map-btf...etc.) 3. Keep track of all sk's storages of this "type" and clean them up when the map is freed. sk->sk_bpf_storage: ------------------ The main lookup/update/delete is done on sk->sk_bpf_storage (which is a "struct bpf_sk_storage"). When doing a lookup, the "map" pointer is now used as the "key" to search on the sk_storage->list. The "map" pointer is actually serving as the "type" of the "sk-local-storage" that is being requested. To allow very fast lookup, it should be as fast as looking up an array at a stable-offset. At the same time, it is not ideal to set a hard limit on the number of sk-local-storage "type" that the system can have. Hence, this patch takes a cache approach. The last search result from sk_storage->list is cached in sk_storage->cache[] which is a stable sized array. Each "sk-local-storage" type has a stable offset to the cache[] array. In the future, a map's flag could be introduced to do cache opt-out/enforcement if it became necessary. The cache size is 16 (i.e. 16 types of "sk-local-storage"). Programs can share map. On the program side, having a few bpf_progs running in the networking hotpath is already a lot. The bpf_prog should have already consolidated the existing sock-key-ed map usage to minimize the map lookup penalty. 16 has enough runway to grow. All sk-local-storage data will be removed from sk->sk_bpf_storage during sk destruction. bpf_sk_storage_get() and bpf_sk_storage_delete(): ------------------------------------------------ Instead of using bpf_map_(lookup|update|delete)_elem(), the bpf prog needs to use the new helper bpf_sk_storage_get() and bpf_sk_storage_delete(). The verifier can then enforce the ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to "create" new elem if one does not exist in the sk. It is done by the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE. The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together, it has eliminated the potential use cases for an equivalent bpf_map_update_elem() API (for bpf_prog) in this patch. Misc notes: ---------- 1. map_get_next_key is not supported. From the userspace syscall perspective, the map has the socket fd as the key while the map can be shared by pinned-file or map-id. Since btf is enforced, the existing "ss" could be enhanced to pretty print the local-storage. Supporting a kernel defined btf with 4 tuples as the return key could be explored later also. 2. The sk->sk_lock cannot be acquired. Atomic operations is used instead. e.g. cmpxchg is done on the sk->sk_bpf_storage ptr. Please refer to the source code comments for the details in synchronization cases and considerations. 3. The mem is charged to the sk->sk_omem_alloc as the sk filter does. Benchmark: --------- Here is the benchmark data collected by turning on the "kernel.bpf_stats_enabled" sysctl. Two bpf progs are tested: One bpf prog with the usual bpf hashmap (max_entries = 8192) with the sk ptr as the key. (verifier is modified to support sk ptr as the key That should have shortened the key lookup time.) Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE. Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for each egress skb and then bump the cnt. netperf is used to drive data with 4096 connected UDP sockets. BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run) 27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633 loaded_at 2019-04-15T13:46:39-0700 uid 0 xlated 344B jited 258B memlock 4096B map_ids 16 btf_id 5 BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run) 30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739 loaded_at 2019-04-15T13:47:54-0700 uid 0 xlated 168B jited 156B memlock 4096B map_ids 17 btf_id 6 Here is a high-level picture on how are the objects organized: sk ┌──────┐ │ │ │ │ │ │ │*sk_bpf_storage─────▶ bpf_sk_storage └──────┘ ┌───────┐ ┌───────────┤ list │ │ │ │ │ │ │ │ │ │ │ └───────┘ │ │ elem │ ┌────────┐ ├─▶│ snode │ │ ├────────┤ │ │ data │ bpf_map │ ├────────┤ ┌─────────┐ │ │map_node│◀─┬─────┤ list │ │ └────────┘ │ │ │ │ │ │ │ │ elem │ │ │ │ ┌────────┐ │ └─────────┘ └─▶│ snode │ │ ├────────┤ │ bpf_map │ data │ │ ┌─────────┐ ├────────┤ │ │ list ├───────▶│map_node│ │ │ │ └────────┘ │ │ │ │ │ │ elem │ └─────────┘ ┌────────┐ │ ┌─▶│ snode │ │ │ ├────────┤ │ │ │ data │ │ │ ├────────┤ │ │ │map_node│◀─┘ │ └────────┘ │ │ │ ┌───────┐ sk └──────────│ list │ ┌──────┐ │ │ │ │ │ │ │ │ │ │ │ │ └───────┘ │*sk_bpf_storage───────▶bpf_sk_storage └──────┘ Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-04-27Merge tag 'thunderbolt-for-v5.2' of ↵Greg Kroah-Hartman1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into char-misc-next Mika writes: thunderbolt: Changes for v5.2 merge window This improves software connection manager on older Apple systems with Thunderbolt 1 and 2 controller to support full PCIe daisy chains, Display Port tunneling and P2P networking. There are also fixes for potential NULL pointer dereferences at various places in the driver. * tag 'thunderbolt-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (44 commits) thunderbolt: Make priority unsigned in struct tb_path thunderbolt: Start firmware on Titan Ridge Apple systems thunderbolt: Reword output of tb_dump_hop() thunderbolt: Make rest of the logging to happen at debug level thunderbolt: Make __TB_[SW|PORT]_PRINT take const parameters thunderbolt: Add support for XDomain connections thunderbolt: Make tb_switch_alloc() return ERR_PTR() thunderbolt: Add support for DMA tunnels thunderbolt: Add XDomain UUID exchange support thunderbolt: Run tb_xdp_handle_request() in system workqueue thunderbolt: Do not tear down tunnels when driver is unloaded thunderbolt: Add support for Display Port tunnels thunderbolt: Rework NFC credits handling thunderbolt: Generalize port finding routines to support all port types thunderbolt: Scan only valid NULL adapter ports in hotplug thunderbolt: Add support for full PCIe daisy chains thunderbolt: Discover preboot PCIe paths the boot firmware established thunderbolt: Deactivate all paths before restarting them thunderbolt: Extend tunnel creation to more than 2 adjacent switches thunderbolt: Add helper function to iterate from one port to another ...
2019-04-26bpf: add writable context for raw tracepointsMatt Mullins3-0/+4
This is an opt-in interface that allows a tracepoint to provide a safe buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program. The size of the buffer must be a compile-time constant, and is checked before allowing a BPF program to attach to a tracepoint that uses this feature. The pointer to this buffer will be the first argument of tracepoints that opt in; the pointer is valid and can be bpf_probe_read() by both BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a tracepoint, but the buffer to which it points may only be written by the latter. Signed-off-by: Matt Mullins <[email protected]> Acked-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-04-26lockd: Store the lockd client credential in struct nlm_hostTrond Myklebust2-1/+4
When we create a new lockd client, we want to be able to pass the correct credential of the process that created the struct nlm_host. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>