Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to
immediately clear executable TLB entries before freeing pages, and handle
resetting permissions on the directmap. This flag is useful for any kind
of memory with elevated permissions, or where there can be related
permissions changes on the directmap. Today this is RO+X and RO memory.
Although this enables directly vfreeing non-writeable memory now,
non-writable memory cannot be freed in an interrupt because the allocation
itself is used as a node on deferred free list. So when RO memory needs to
be freed in an interrupt the code doing the vfree needs to have its own
work queue, as was the case before the deferred vfree list was added to
vmalloc.
For architectures with set_direct_map_ implementations this whole operation
can be done with one TLB flush when centralized like this. For others with
directmap permissions, currently only arm64, a backup method using
set_memory functions is used to reset the directmap. When arm64 adds
set_direct_map_ functions, this backup can be removed.
When the TLB is flushed to both remove TLB entries for the vmalloc range
mapping and the direct map permissions, the lazy purge operation could be
done to try to save a TLB flush later. However today vm_unmap_aliases
could flush a TLB range that does not include the directmap. So a helper
is added with extra parameters that can allow both the vmalloc address and
the direct mapping to be flushed during this operation. The behavior of the
normal vm_unmap_aliases function is unchanged.
Suggested-by: Dave Hansen <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Will Deacon <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Make hibernate handle unmapped pages on the direct map when
CONFIG_ARCH_HAS_SET_ALIAS=y is set. These functions allow for setting pages
to invalid configurations, so now hibernate should check if the pages have
valid mappings and handle if they are unmapped when doing a hibernate
save operation.
Previously this checking was already done when CONFIG_DEBUG_PAGEALLOC=y
was configured. It does not appear to have a big hibernating performance
impact. The speed of the saving operation before this change was measured
as 819.02 MB/s, and after was measured at 813.32 MB/s.
Before:
[ 4.670938] PM: Wrote 171996 kbytes in 0.21 seconds (819.02 MB/s)
After:
[ 4.504714] PM: Wrote 178932 kbytes in 0.22 seconds (813.32 MB/s)
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add two new functions set_direct_map_default_noflush() and
set_direct_map_invalid_noflush() for setting the direct map alias for the
page to its default valid permissions and to an invalid state that cannot
be cached in a TLB, respectively. These functions do not flush the TLB.
Note, __kernel_map_pages() does something similar but flushes the TLB and
doesn't reset the permission bits to default on all architectures.
Also add an ARCH config ARCH_HAS_SET_DIRECT_MAP for specifying whether
these have an actual implementation or a default empty one.
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When modules and BPF filters are loaded, there is a time window in
which some memory is both writable and executable. An attacker that has
already found another vulnerability (e.g., a dangling pointer) might be
able to exploit this behavior to overwrite kernel code. Prevent having
writable executable PTEs in this stage.
In addition, avoiding having W+X mappings can also slightly simplify the
patching of modules code on initialization (e.g., by alternatives and
static-key), as would be done in the next patch. This was actually the
main motivation for this patch.
To avoid having W+X mappings, set them initially as RW (NX) and after
they are set as RO set them as X as well. Setting them as executable is
done as a separate step to avoid one core in which the old PTE is cached
(hence writable), and another which sees the updated PTE (executable),
which would break the W^X protection.
Suggested-by: Thomas Gleixner <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Rik van Riel <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Provide a function for copying init_mm. This function will be later used
for setting a temporary mm.
Tested-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
In order to have a separate address space for text poking, we need to
duplicate init_mm early during start_kernel(). This, however, introduces
a problem since uprobes functions are called from dup_mmap(), but
uprobes is still not initialized in this early stage.
Since uprobes initialization is necassary for fork, and since all the
dependant initialization has been done when fork is initialized (percpu
and vmalloc), move uprobes initialization to fork_init(). It does not
seem uprobes introduces any security problem for the poking_mm.
Crash and burn if uprobes initialization fails, similarly to other early
initializations. Change the init_probes() name to probes_init() to match
other early initialization functions name convention.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Rick Edgecombe <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When a P9 sPAPR VM boots, the CAS negotiation process determines which
interrupt mode to use (XICS legacy or XIVE native) and invokes a
machine reset to activate the chosen mode.
To be able to switch from one interrupt mode to another, we introduce
the capability to release a KVM device without destroying the VM. The
KVM device interface is extended with a new 'release' method which is
called when the file descriptor of the device is closed.
Once 'release' is called, the 'destroy' method will not be called
anymore as the device is removed from the device list of the VM.
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Some KVM devices will want to handle special mappings related to the
underlying HW. For instance, the XIVE interrupt controller of the
POWER9 processor has MMIO pages for thread interrupt management and
for interrupt source control that need to be exposed to the guest when
the OS has the required support.
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp
Pull operating performance points (OPP) framework changes for v5.2
from Viresh Kumar:
"This pull request contains:
- New helper in OPP core to find best matching frequency for a voltage
value."
* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
OPP: Introduce dev_pm_opp_find_freq_ceil_by_volt()
|
|
Drop the RELEVANT_IFLAG() macro which essentially hasn't been used for
over a decade except in some remnant debug printks that were recently
removed.
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
|
|
Clean up the throttle implementation by dropping the redundant
throttle_req flag which was a remnant from back when there was only a
single read URB.
Also convert the throttled flag to an atomic bit flag.
Signed-off-by: Johan Hovold <[email protected]>
|
|
Introduce specification for Geneve decap flow with encapsulation options
and allow creation of rules that are matching on Geneve TLV options.
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Introduce support for Geneve flow specification and allow
the creation of rules that are matching on basic Geneve
protocol fields: VNI, OAM bit, protocol type, options length.
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When in switchdev mode, we would like to treat loopback RoCE
traffic (on eswitch manager) as RDMA and not as regular
Ethernet traffic
In order to enable it we add flow steering rule that forward RoCE
loopback traffic to the HW RoCE filter (by adding allow rule).
In addition we add RoCE address in GID index 0, which will be
set in the RoCE loopback packet.
Signed-off-by: Maor Gottlieb <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Acked-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Flow table supports three types of miss action:
1. Default miss action - go to default miss table according to table.
2. Go to specific table.
3. Switch domain - go to the root table of an alternative steering
table domain.
New table miss action was added - switch_domain.
The next domain for RDMA_RX namespace is the NIC RX domain.
Signed-off-by: Maor Gottlieb <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Add new flow steering namespace - MLX5_FLOW_NAMESPACE_RDMA_RX.
Flow steering rules in this namespace are used to filter
RDMA traffic.
Signed-off-by: Maor Gottlieb <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently mlx5 core stores copy of the PCI device name in a
mlx5_priv structure and uses pr_warn, pr_err helpers.
Get rid of the copy of this name; instead store the parent device
pointer that contains name as well as dma specific parameters.
This also allows to use kernel's well defined dev_warn, dev_err, dev_dbg
device specific print routines.
This is also a preparation patch to access non PCI parent device in
future.
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN
push on RX path. To workaround that limitation untagged packets will be
tagged with VLAN ID 0x000 (priority tag) and pop/push actions will be
replaced by VLAN re-write actions (which are supported by the HW).
Introduce prio tag mode as a pre-step to controlling the workaround
behavior.
Signed-off-by: Eli Britstein <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
include/linux/ftrace.h
ftrace_graph_entry_stub() is defined in generic code, its prototype should
be in the generic header and not defined throughout architecture specific
code in order to use it.
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: [email protected]
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
In several places in the kernel we find PCI_DEVID used like this:
PCI_DEVID(dev->bus->number, dev->devfn)
Add a "pci_dev_id(struct pci_dev *dev)" helper to simplify callers.
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Reset controller changes for v5.2
This adds support for 'acquired'/'released' states to exclusive reset
controls to allow drivers that usually need direct control over their
reset line, to temporarily yield control to another driver, such as a
power domain controller during power transitions. A fix adds missing
headers to linux/reset.h, which caused build errors in linux-next,
discovered by the new lima DRM driver.
* tag 'reset-for-5.2' of git://git.pengutronix.de/pza/linux:
reset: fix linux/reset.h errors
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/drivers
i.MX drivers change for 5.2:
- A series from Aisheng to generalize the SCU powerdomain driver
for easier adding new SCU based platforms like imx8qm.
- Add a generic i.MX8 SoC driver for reporting SoC and platform
information.
- Replace explicit polling loop with a call to regmap_read_poll_timeout()
for gpcv2 driver to avoid code repetition.
- Use devm_platform_ioremap_resource() to simplify gpc/gpcv2 driver
code a bit.
- Add general IRQ support for imx-scu driver, so that interrupt of
device like RTC, thermal and watchdog can be handled.
* tag 'imx-drivers-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
soc: imx: Add generic i.MX8 SoC driver
firmware: imx: enable imx scu general irq function
soc: imx: gpcv2: use devm_platform_ioremap_resource() to simplify code
soc: imx: gpc: use devm_platform_ioremap_resource() to simplify code
firmware: imx: scu-pd: decouple the SS information from domain names
firmware: imx: scu-pd: add specifying the base of domain name index support
firmware: imx: scu-pd: use bool to set postfix
soc: imx: gpcv2: Make use of regmap_read_poll_timeout()
Signed-off-by: Olof Johansson <[email protected]>
|
|
Some definitions of Inner Cacheability attibutes need to be corrected.
Fixes: 8c828a535e29f ("irqchip/gicv3-its: Restore all cacheability attributes")
Signed-off-by: Hongbo Yao <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
All architectures which support stacktrace carry duplicated code and
do the stack storage and filtering at the architecture side.
Provide a consolidated interface with a callback function for consuming the
stack entries provided by the architecture specific stack walker. This
removes lots of duplicated code and allows to implement better filtering
than 'skip number of entries' in the future without touching any
architecture specific code.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: Steven Rostedt <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: David Rientjes <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: [email protected]
Cc: Mike Rapoport <[email protected]>
Cc: Akinobu Mita <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: Robin Murphy <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mike Snitzer <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: [email protected]
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Miroslav Benes <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
No more users of the struct stack_trace based interfaces.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Alexander Potapenko <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: David Rientjes <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: [email protected]
Cc: Mike Rapoport <[email protected]>
Cc: Akinobu Mita <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: Robin Murphy <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mike Snitzer <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: [email protected]
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
No more users of the struct stack_trace based interfaces. Remove them.
Remove the macro stubs for !CONFIG_STACKTRACE as well as they are pointless
because the storage on the call sites is conditional on CONFIG_STACKTRACE
already. No point to be 'smart'.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: David Rientjes <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: [email protected]
Cc: Mike Rapoport <[email protected]>
Cc: Akinobu Mita <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: Robin Murphy <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mike Snitzer <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: [email protected]
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Replace the indirection through struct stack_trace by using the storage
array based interfaces and storing the information is a small lockdep
specific data structure.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: David Rientjes <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: [email protected]
Cc: Mike Rapoport <[email protected]>
Cc: Akinobu Mita <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: Robin Murphy <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mike Snitzer <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: [email protected]
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The struct stack_trace indirection in the stack depot functions is a truly
pointless excercise which requires horrible code at the callsites.
Provide interfaces based on plain storage arrays.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Alexander Potapenko <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: David Rientjes <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: [email protected]
Cc: Mike Rapoport <[email protected]>
Cc: Akinobu Mita <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: Robin Murphy <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mike Snitzer <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: [email protected]
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
All operations with stack traces are based on struct stack_trace. That's a
horrible construct as the struct is a kitchen sink for input and
output. Quite some usage sites embed it into their own data structures
which creates weird indirections.
There is absolutely no point in doing so. For all use cases a storage array
and the number of valid stack trace entries in the array is sufficient.
Provide helper functions which avoid the struct stack_trace indirection so
the usage sites can be cleaned up.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: David Rientjes <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: [email protected]
Cc: Mike Rapoport <[email protected]>
Cc: Akinobu Mita <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: Robin Murphy <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mike Snitzer <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: [email protected]
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
- Remove the extra array member of stack_dump_trace[] along with the
ARRAY_SIZE - 1 initialization for struct stack_trace :: max_entries.
Both are historical leftovers of no value. The stack tracer never exceeds
the array and there is no extra storage requirement either.
- Make variables which are only used in trace_stack.c static.
- Simplify the enable/disable logic.
- Rename stack_trace_print() as it's using the stack_trace_ namespace. Free
the name up for stack trace related functions.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: David Rientjes <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: [email protected]
Cc: Mike Rapoport <[email protected]>
Cc: Akinobu Mita <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: Robin Murphy <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mike Snitzer <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: [email protected]
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
A kernel loaded via kexec_load cannot be verified. Thus disable kexec_load
systemcall in kernels which where IPLed securely. Use the IMA mechanism to
do so.
Signed-off-by: Philipp Rudo <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik into arm/soc
This modernizes the IXP4xx platform and adds initial Device Tree
Support. We migrate to MULTI_IRQ_HANDLER, bumps the IRQs to
offset 16, converts to SPARSE_IRQ, then we add proper subsystem
drivers in each subsystem for irqchip, GPIO and clocksource and
switch over to using these new drivers.
Next we modernize the NPE and QMGR drivers and push them down
into drivers/soc.
This has been tested on the IXP4xx NSLU2 and the Gateworks
GW2358-4.
* tag 'ixp4xx-for-armsoc' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik: (31 commits)
ARM: dts: Add queue manager and NPE to the IXP4xx DTSI
soc: ixp4xx: qmgr: Add DT probe code
soc: ixp4xx: qmgr: Add DT bindings for IXP4xx qmgr
soc: ixp4xx: npe: Add DT probe code
soc: ixp4xx: Add DT bindings for IXP4xx NPE
soc: ixp4xx: qmgr: Pass resources
soc: ixp4xx: Remove unused functions
soc: ixp4xx: Uninline several functions
soc: ixp4xx: npe: Pass addresses as resources
ARM: ixp4xx: Turn the QMGR into a platform device
ARM: ixp4xx: Turn the NPE into a platform device
ARM: ixp4xx: Move IXP4xx QMGR and NPE headers
ARM: ixp4xx: Move NPE and QMGR to drivers/soc
ARM: dts: Add some initial IXP4xx device trees
ARM: ixp4xx: Add device tree boot support
ARM: ixp4xx: Add DT bindings
gpio: ixp4xx: Add OF probing support
gpio: ixp4xx: Add DT bindings
clocksource/drivers/ixp4xx: Add OF initialization support
clocksource/drivers/ixp4xx: Add DT bindings
...
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/soc
firmware: tegra: Changes for v5.2-rc1
This set of changes includes improvements for Trusted Foundations and
also moves the source files for this support into the standard location
under drivers/firmware.
* tag 'tegra-for-5.2-firmware' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
firmware: Move Trusted Foundations support
ARM: tegra: Sort dependencies alphabetically
ARM: tegra: Add firmware calls required for suspend-resume on Tegra30
ARM: tegra: Always boot CPU in ARM-mode
ARM: tegra: Don't apply CPU erratas in insecure mode
ARM: tegra: Set up L2 cache using Trusted Foundations firmware
ARM: trusted_foundations: Provide information about whether firmware is registered
ARM: trusted_foundations: Make prepare_idle call to take mode argument
ARM: trusted_foundations: Support L2 cache maintenance
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/drivers
soc/tegra: Changes for v5.2-rc1
Besides a couple of fixes to better cope with deferred probing, this set
of patches also implements the acquire/release protocol for resets used
during powergate operations. This is necessary to allow these resets to
be temporarily shared with other devices that may also need to control
these resets.
* tag 'tegra-for-5.2-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
soc/tegra: pmc: Move powergate initialisation to probe
soc/tegra: pmc: Remove reset sysfs entries on error
soc/tegra: pmc: Fix reset sources and levels
soc/tegra: pmc: Implement acquire/release for resets
reset: Add acquire/release support for arrays
reset: Add acquired flag to of_reset_control_array_get()
reset: add acquired/released state for exclusive reset controls
Signed-off-by: Olof Johansson <[email protected]>
|
|
Add deferred static branches. We can't unfortunately use the
nice trick of encapsulating the entire structure in true/false
variants, because the inside has to be either struct static_key_true
or struct static_key_false. Use defines to pass the appropriate
members to the helpers separately.
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Currently perf callchain doesn't work well with ORC unwinder
when sampling from trace point. We'll get useless in kernel callchain
like this:
perf 6429 [000] 22.498450: kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL
ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
5651468729c1 [unknown] (/usr/bin/perf)
5651467ee82a main+0x69a (/usr/bin/perf)
7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
5541f689495641d7 [unknown] ([unknown])
The root cause is that, for trace point events, it doesn't provide a
real snapshot of the hardware registers. Instead perf tries to get
required caller's registers and compose a fake register snapshot
which suppose to contain enough information for start a unwinding.
However without CONFIG_FRAME_POINTER, if failed to get caller's BP as the
frame pointer, so current frame pointer is returned instead. We get
a invalid register combination which confuse the unwinder, and end the
stacktrace early.
So in such case just don't try dump BP, and let the unwinder start
directly when the register is not a real snapshot. Use SP
as the skip mark, unwinder will skip all the frames until it meet
the frame of the trace point caller.
Tested with frame pointer unwinder and ORC unwinder, this makes perf
callchain get the full kernel space stacktrace again like this:
perf 6503 [000] 1567.570191: kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL
ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux)
ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux)
ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux)
ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux)
ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux)
ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux)
ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux)
7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
55a22960d9c1 [unknown] (/usr/bin/perf)
55a22958982a main+0x69a (/usr/bin/perf)
7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
5541f689495641d7 [unknown] ([unknown])
Co-developed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Kairui Song <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
ep93xx does not have a proper pinctrl driver, but does things
ad-hoc through mach/platform.h, which is also used for setting
up the boards.
To avoid using mach/*.h headers completely, let's move the interfaces
into include/linux/soc/. This is far from great, but gets the job
done here, without the need for a proper pinctrl driver.
Acked-by: Alexander Sverdlin <[email protected]>
Acked-by: H Hartley Sweeten <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Olof Johansson <[email protected]>
|
|
We can communicate the clock rate using platform data rather than setting
a flag to use a particular value in the driver, which is cleaner and
avoids the dependency.
No platform in the kernel currently defines the ep93xx keypad device
structure, so this is a rather pointless excercise. Any out of tree
users are probably dead now, but if not, they have to change their
platform code to match the new platform_data structure.
Acked-by: H Hartley Sweeten <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Olof Johansson <[email protected]>
|
|
The header file is the only thing preventing us from building the
driver in a cross-platform configuration, so move the structure
we are interested in to the global platform_data location
and enable compile testing.
Acked-by: Alexander Sverdlin <[email protected]>
Acked-by: H Hartley Sweeten <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Olof Johansson <[email protected]>
|
|
arm/drivers
arm64: zynqmp: SoC changes for v5.2
- Add support for ZynqMP fpga manager
- Defer some probes which depends on firmware driver to be ready
- Debugfs fix
* tag 'zynqmp-soc-for-v5.2' of https://github.com/Xilinx/linux-xlnx:
fpga manager: Adding FPGA Manager support for Xilinx zynqmp
dt-bindings: fpga: Add bindings for ZynqMP fpga driver
firmware: xilinx: Add fpga API's
drivers: Defer probe if firmware is not ready
firmware: xilinx: fix debugfs write handler
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/drivers
PM changes for am335x and am437x
This series adds support for am437x RTC-only mode in suspend. In the
RTC-only mode suspend, everything is shut down except the RTC. This
makes the power consumption very low for suspend mode.
To support RTC-only mode, we need to export omap_rtc_power_off_program()
from the rtc driver and improve PM code to save and restore the wkup
domain context. As RTC-only mode depends on the device being wired
properly for things like memory, we need to also check for the machine
type before we allow it. We also need to run DDR3 hardware leveling on
resume.
Note that there is a trivial merge conflict between the RTC branch
and these changes where the RTC branch makes tm2bcd() a void function
and the error handling parts can be just dropped.
* tag 'omap-for-v5.2/am4-pm-v2-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: sleep43xx: Run EMIF HW leveling on resume path
memory: ti-emif-sram: Add ti_emif_run_hw_leveling for DDR3 hardware leveling
soc: ti: pm33xx: AM437X: Add rtc_only with ddr in self-refresh support
soc: ti: pm33xx: Move the am33xx_push_sram_idle to the top
ARM: OMAP2+: pm33xx: Add support for rtc+ddr in self refresh mode
rtc: OMAP: Add support for rtc-only mode
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/soc
Driver changes for ti-sysc for v5.2 merge window
This series of changes for ti-sysc interconnect target module driver
gets us to the point where we can actually drop legacy platform data
for many devices in favor of device tree data.
To do this, we improve ti-sysc driver not to rely on platform data
callbacks to manage module clocks, and handle more quirks needed for
some devices. Also few minor fixes are needed, but were considered
not needed to be sent separately as they only show up with this series.
Then we drop several thousands of lines of legacy platform data for
omap4, omap5, dra7, am335x and am437x. We drop platform data for mmc,
i2c, gpio and uart devices to start with as those are typically
easily tested on all devices. In case of unexpected issues, we can just
add back the legacy platform data for a single device type if needed.
Finally we add initial support for enabling and disabling some devices
without legacy platform data callbacks. I was planning on sending the
dropping of legacy platform data as a separate series, but already
applied Roger's patch on top and pushed it out.
Note that this series depends on related SoC and is based on those.
* tag 'omap-for-v5.2/ti-sysc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: (33 commits)
bus: ti-sysc: Add generic enable/disable functions
ARM: OMAP2+: Drop mcspi platform data for omap4
ARM: OMAP2+: Drop uart platform data for dra7
ARM: OMAP2+: Drop gpio platform data for dra7
ARM: OMAP2+: Drop i2c platform data for dra7
ARM: OMAP2+: Drop mmc platform data for dra7
ARM: OMAP2+: Drop uart platform data for omap5
ARM: OMAP2+: Drop gpio platform data for omap5
ARM: OMAP2+: Drop i2c platform data for omap5
ARM: OMAP2+: Drop mmc platform data for omap5
ARM: OMAP2+: Drop uart platform data for am33xx and am43xx
ARM: OMAP2+: Drop gpio platform data for am33xx and am43xx
ARM: OMAP2+: Drop i2c platform data for am33xx and am43xx
ARM: OMAP2+: Drop mmc platform data for am330x and am43xx
ARM: OMAP2+: Drop uart platform data for omap4
ARM: OMAP2+: Drop gpio platform data for omap4
ARM: OMAP2+: Drop i2c platform data for omap4
ARM: OMAP2+: Drop mmc platform data for omap4
Documentation: bus: ti-sysc: fix spelling mistakes "multipe" and "interconnet"
bus: ti-sysc: Detect DMIC for debugging
...
Signed-off-by: Olof Johansson <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-04-28
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Introduce BPF socket local storage map so that BPF programs can store
private data they associate with a socket (instead of e.g. separate hash
table), from Martin.
2) Add support for bpftool to dump BTF types. This is done through a new
`bpftool btf dump` sub-command, from Andrii.
3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which
was currently not supported since skb was used to lookup netns, from Stanislav.
4) Add an opt-in interface for tracepoints to expose a writable context
for attached BPF programs, used here for NBD sockets, from Matt.
5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel.
6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to
support tunnels such as sit. Add selftests as well, from Willem.
7) Various smaller misc fixes.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
After the previous commit, both ipset_nest_start() and ipset_nest_end() are
just aliases for nla_nest_start() and nla_nest_end() so that there is no
need to keep them.
Signed-off-by: Michal Kubecek <[email protected]>
Acked-by: Jozsef Kadlecsik <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.
Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().
Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:
@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)
@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)
Signed-off-by: Michal Kubecek <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There seems to be no reason for tls_ops to be defined in netdevice.h
which is included in a lot of places. Don't wrap the struct/enum
declaration in ifdefs, it trickles down unnecessary ifdefs into
driver code.
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After allowing a bpf prog to
- directly read the skb->sk ptr
- get the fullsock bpf_sock by "bpf_sk_fullsock()"
- get the bpf_tcp_sock by "bpf_tcp_sock()"
- get the listener sock by "bpf_get_listener_sock()"
- avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock"
into different bpf running context.
this patch is another effort to make bpf's network programming
more intuitive to do (together with memory and performance benefit).
When bpf prog needs to store data for a sk, the current practice is to
define a map with the usual 4-tuples (src/dst ip/port) as the key.
If multiple bpf progs require to store different sk data, multiple maps
have to be defined. Hence, wasting memory to store the duplicated
keys (i.e. 4 tuples here) in each of the bpf map.
[ The smallest key could be the sk pointer itself which requires
some enhancement in the verifier and it is a separate topic. ]
Also, the bpf prog needs to clean up the elem when sk is freed.
Otherwise, the bpf map will become full and un-usable quickly.
The sk-free tracking currently could be done during sk state
transition (e.g. BPF_SOCK_OPS_STATE_CB).
The size of the map needs to be predefined which then usually ended-up
with an over-provisioned map in production. Even the map was re-sizable,
while the sk naturally come and go away already, this potential re-size
operation is arguably redundant if the data can be directly connected
to the sk itself instead of proxy-ing through a bpf map.
This patch introduces sk->sk_bpf_storage to provide local storage space
at sk for bpf prog to use. The space will be allocated when the first bpf
prog has created data for this particular sk.
The design optimizes the bpf prog's lookup (and then optionally followed by
an inline update). bpf_spin_lock should be used if the inline update needs
to be protected.
BPF_MAP_TYPE_SK_STORAGE:
-----------------------
To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in
this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can
be created to fit different bpf progs' needs. The map enforces
BTF to allow printing the sk-local-storage during a system-wise
sk dump (e.g. "ss -ta") in the future.
The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete
a "sk-local-storage" data from a particular sk.
Think of the map as a meta-data (or "type") of a "sk-local-storage". This
particular "type" of "sk-local-storage" data can then be stored in any sk.
The main purposes of this map are mostly:
1. Define the size of a "sk-local-storage" type.
2. Provide a similar syscall userspace API as the map (e.g. lookup/update,
map-id, map-btf...etc.)
3. Keep track of all sk's storages of this "type" and clean them up
when the map is freed.
sk->sk_bpf_storage:
------------------
The main lookup/update/delete is done on sk->sk_bpf_storage (which
is a "struct bpf_sk_storage"). When doing a lookup,
the "map" pointer is now used as the "key" to search on the
sk_storage->list. The "map" pointer is actually serving
as the "type" of the "sk-local-storage" that is being
requested.
To allow very fast lookup, it should be as fast as looking up an
array at a stable-offset. At the same time, it is not ideal to
set a hard limit on the number of sk-local-storage "type" that the
system can have. Hence, this patch takes a cache approach.
The last search result from sk_storage->list is cached in
sk_storage->cache[] which is a stable sized array. Each
"sk-local-storage" type has a stable offset to the cache[] array.
In the future, a map's flag could be introduced to do cache
opt-out/enforcement if it became necessary.
The cache size is 16 (i.e. 16 types of "sk-local-storage").
Programs can share map. On the program side, having a few bpf_progs
running in the networking hotpath is already a lot. The bpf_prog
should have already consolidated the existing sock-key-ed map usage
to minimize the map lookup penalty. 16 has enough runway to grow.
All sk-local-storage data will be removed from sk->sk_bpf_storage
during sk destruction.
bpf_sk_storage_get() and bpf_sk_storage_delete():
------------------------------------------------
Instead of using bpf_map_(lookup|update|delete)_elem(),
the bpf prog needs to use the new helper bpf_sk_storage_get() and
bpf_sk_storage_delete(). The verifier can then enforce the
ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to
"create" new elem if one does not exist in the sk. It is done by
the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be
provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE.
The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together,
it has eliminated the potential use cases for an equivalent
bpf_map_update_elem() API (for bpf_prog) in this patch.
Misc notes:
----------
1. map_get_next_key is not supported. From the userspace syscall
perspective, the map has the socket fd as the key while the map
can be shared by pinned-file or map-id.
Since btf is enforced, the existing "ss" could be enhanced to pretty
print the local-storage.
Supporting a kernel defined btf with 4 tuples as the return key could
be explored later also.
2. The sk->sk_lock cannot be acquired. Atomic operations is used instead.
e.g. cmpxchg is done on the sk->sk_bpf_storage ptr.
Please refer to the source code comments for the details in
synchronization cases and considerations.
3. The mem is charged to the sk->sk_omem_alloc as the sk filter does.
Benchmark:
---------
Here is the benchmark data collected by turning on
the "kernel.bpf_stats_enabled" sysctl.
Two bpf progs are tested:
One bpf prog with the usual bpf hashmap (max_entries = 8192) with the
sk ptr as the key. (verifier is modified to support sk ptr as the key
That should have shortened the key lookup time.)
Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE.
Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for
each egress skb and then bump the cnt. netperf is used to drive
data with 4096 connected UDP sockets.
BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run)
27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633
loaded_at 2019-04-15T13:46:39-0700 uid 0
xlated 344B jited 258B memlock 4096B map_ids 16
btf_id 5
BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run)
30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739
loaded_at 2019-04-15T13:47:54-0700 uid 0
xlated 168B jited 156B memlock 4096B map_ids 17
btf_id 6
Here is a high-level picture on how are the objects organized:
sk
┌──────┐
│ │
│ │
│ │
│*sk_bpf_storage─────▶ bpf_sk_storage
└──────┘ ┌───────┐
┌───────────┤ list │
│ │ │
│ │ │
│ │ │
│ └───────┘
│
│ elem
│ ┌────────┐
├─▶│ snode │
│ ├────────┤
│ │ data │ bpf_map
│ ├────────┤ ┌─────────┐
│ │map_node│◀─┬─────┤ list │
│ └────────┘ │ │ │
│ │ │ │
│ elem │ │ │
│ ┌────────┐ │ └─────────┘
└─▶│ snode │ │
├────────┤ │
bpf_map │ data │ │
┌─────────┐ ├────────┤ │
│ list ├───────▶│map_node│ │
│ │ └────────┘ │
│ │ │
│ │ elem │
└─────────┘ ┌────────┐ │
┌─▶│ snode │ │
│ ├────────┤ │
│ │ data │ │
│ ├────────┤ │
│ │map_node│◀─┘
│ └────────┘
│
│
│ ┌───────┐
sk └──────────│ list │
┌──────┐ │ │
│ │ │ │
│ │ │ │
│ │ └───────┘
│*sk_bpf_storage───────▶bpf_sk_storage
└──────┘
Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into char-misc-next
Mika writes:
thunderbolt: Changes for v5.2 merge window
This improves software connection manager on older Apple systems with
Thunderbolt 1 and 2 controller to support full PCIe daisy chains,
Display Port tunneling and P2P networking. There are also fixes for
potential NULL pointer dereferences at various places in the driver.
* tag 'thunderbolt-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (44 commits)
thunderbolt: Make priority unsigned in struct tb_path
thunderbolt: Start firmware on Titan Ridge Apple systems
thunderbolt: Reword output of tb_dump_hop()
thunderbolt: Make rest of the logging to happen at debug level
thunderbolt: Make __TB_[SW|PORT]_PRINT take const parameters
thunderbolt: Add support for XDomain connections
thunderbolt: Make tb_switch_alloc() return ERR_PTR()
thunderbolt: Add support for DMA tunnels
thunderbolt: Add XDomain UUID exchange support
thunderbolt: Run tb_xdp_handle_request() in system workqueue
thunderbolt: Do not tear down tunnels when driver is unloaded
thunderbolt: Add support for Display Port tunnels
thunderbolt: Rework NFC credits handling
thunderbolt: Generalize port finding routines to support all port types
thunderbolt: Scan only valid NULL adapter ports in hotplug
thunderbolt: Add support for full PCIe daisy chains
thunderbolt: Discover preboot PCIe paths the boot firmware established
thunderbolt: Deactivate all paths before restarting them
thunderbolt: Extend tunnel creation to more than 2 adjacent switches
thunderbolt: Add helper function to iterate from one port to another
...
|
|
This is an opt-in interface that allows a tracepoint to provide a safe
buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
The size of the buffer must be a compile-time constant, and is checked
before allowing a BPF program to attach to a tracepoint that uses this
feature.
The pointer to this buffer will be the first argument of tracepoints
that opt in; the pointer is valid and can be bpf_probe_read() by both
BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE
programs that attach to such a tracepoint, but the buffer to which it
points may only be written by the latter.
Signed-off-by: Matt Mullins <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
When we create a new lockd client, we want to be able to pass the
correct credential of the process that created the struct nlm_host.
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|