| Age | Commit message (Collapse) | Author | Files | Lines |
|
When CONFIG_KASAN_HW_TAGS is enabled we currently increase the minimum
slab alignment to 16. This happens even if MTE is not supported in
hardware or disabled via kasan=off, which creates an unnecessary memory
overhead in those cases. Eliminate this overhead by making the minimum
slab alignment a runtime property and only aligning to 16 if KASAN is
enabled at runtime.
On a DragonBoard 845c (non-MTE hardware) with a kernel built with
CONFIG_KASAN_HW_TAGS, waiting for quiescence after a full Android boot I
see the following Slab measurements in /proc/meminfo (median of 3
reboots):
Before: 169020 kB
After: 167304 kB
[[email protected]: make slab alignment type `unsigned int' to avoid casting]
Link: https://linux-review.googlesource.com/id/I752e725179b43b144153f4b6f584ceb646473ead
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Collingbourne <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>
Tested-by: Hyeonggon Yoo <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
An inclusion of cache.h in printk.h was added in 2014 in commit
c28aa1f0a847 ("printk/cache: mark printk_once test variable
__read_mostly") in order to bring in the definition of __read_mostly. The
usage of __read_mostly was later removed in commit 3ec25826ae33 ("printk:
Tie printk_once / printk_deferred_once into .data.once for reset") which
made the inclusion of cache.h unnecessary, so remove it.
We have a small amount of code that depended on the inclusion of cache.h
from printk.h; fix that code to include the appropriate header.
This fixes a circular inclusion on arm64 (linux/printk.h -> linux/cache.h
-> asm/cache.h -> linux/kasan-enabled.h -> linux/static_key.h ->
linux/jump_label.h -> linux/bug.h -> asm/bug.h -> linux/printk.h) that
would otherwise be introduced by the next patch.
Build tested using {allyesconfig,defconfig} x {arm64,x86_64}.
Link: https://linux-review.googlesource.com/id/I8fd51f72c9ef1f2d6afd3b2cbc875aa4792c1fba
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Collingbourne <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
monitoring
Patch series "support fixed virtual address ranges monitoring".
The monitoring operations set for virtual address spaces automatically
updates the monitoring target regions to cover entire mappings of the
virtual address spaces as much as possible. Some users could have more
information about their programs than kernel and therefore have interest
in not entire regions but only specific regions. For such cases, the
automatic monitoring target regions updates are only unnecessary overhead
or distractions.
This patchset adds supports for the use case on DAMON's kernel API
(DAMON_OPS_FVADDR) and sysfs interface ('fvaddr' keyword for 'operations'
sysfs file).
This patch (of 3):
The monitoring operations set for virtual address spaces automatically
updates the monitoring target regions to cover entire mappings of the
virtual address spaces as much as possible. Some users could have more
information about their programs than kernel and therefore have interest
in not entire regions but only specific regions. For such cases, the
automatic monitoring target regions updates are only unnecessary overheads
or distractions.
For such cases, DAMON's API users can simply set the '->init()' and
'->update()' of the DAMON context's '->ops' NULL, and set the target
monitoring regions when creating the context. But, that would be a dirty
hack. Worse yet, the hack is unavailable for DAMON user space interface
users.
To support the use case in a clean way that can easily exported to the
user space, this commit adds another monitoring operations set called
'fvaddr', which is same to 'vaddr' but does not automatically update the
monitoring regions. Instead, it will only respect the virtual address
regions which have explicitly passed at the initial context creation.
Note that this commit leave sysfs interface not supporting the feature
yet. The support will be made in a following commit.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/damon: allow users know which monitoring ops are available".
DAMON users can configure it for vaious address spaces including virtual
address spaces and the physical address space by setting its monitoring
operations set with appropriate one for their purpose. However, there is
no celan and simple way to know exactly which monitoring operations sets
are available on the currently running kernel.
This patchset adds functions for the purpose on DAMON's kernel API
('damon_is_registered_ops()') and sysfs interface ('avail_operations' file
under each context directory).
This patch (of 4):
To know if a specific 'damon_operations' is registered, users need to
check the kernel config or try 'damon_select_ops()' with the ops of the
question, and then see if it successes. In the latter case, the user
should also revert the change. To make the process simple and convenient,
this commit adds a function for checking if a specific 'damon_operations'
is registered or not.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add VM_BUG_ON() bounds checking to make sure that, if "offset + len>
PAGE_SIZE", memset() does not corrupt data in adjacent pages.
Mainly to match all the similar functions in highmem.h.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Fabio M. De Francesco <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Peter Collingbourne <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Calls to change_protection_range() on THP can trigger, at least on x86,
two TLB flushes for one page: one immediately, when pmdp_invalidate() is
called by change_huge_pmd(), and then another one later (that can be
batched) when change_protection_range() finishes.
The first TLB flush is only necessary to prevent the dirty bit (and with a
lesser importance the access bit) from changing while the PTE is modified.
However, this is not necessary as the x86 CPUs set the dirty-bit
atomically with an additional check that the PTE is (still) present. One
caveat is Intel's Knights Landing that has a bug and does not do so.
Leverage this behavior to eliminate the unnecessary TLB flush in
change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad()
that only invalidates the access and dirty bit from further changes.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nadav Amit <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Currently, using mprotect() to unprotect a memory region or uffd to
unprotect a memory region causes a TLB flush. However, in such cases the
PTE is often not modified (i.e., remain RO) and therefore not TLB flush is
needed.
Add an arch-specific pte_needs_flush() which tells whether a TLB flush is
needed based on the old PTE and the new one. Implement an x86
pte_needs_flush().
Always flush the TLB when it is architecturally needed even when skipping
a TLB flush might only result in a spurious page-faults by skipping the
flush.
Even with such conservative manner, we can in the future further refine
the checks to test whether a PTE is present by only considering the
architectural _PAGE_PRESENT flag instead of {pte|pmd}_preesnt(). For not
be careful and use the latter.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nadav Amit <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/mprotect: avoid unnecessary TLB flushes", v6.
This patchset is intended to remove unnecessary TLB flushes during
mprotect() syscalls. Once this patch-set make it through, similar and
further optimizations for MADV_COLD and userfaultfd would be possible.
Basically, there are 3 optimizations in this patch-set:
1. Use TLB batching infrastructure to batch flushes across VMAs and do
better/fewer flushes. This would also be handy for later userfaultfd
enhancements.
2. Avoid unnecessary TLB flushes. This optimization is the one that
provides most of the performance benefits. Unlike previous versions,
we now only avoid flushes that would not result in spurious
page-faults.
3. Avoiding TLB flushes on change_huge_pmd() that are only needed to
prevent the A/D bits from changing.
Andrew asked for some benchmark numbers. I do not have an easy
determinate macrobenchmark in which it is easy to show benefit. I
therefore ran a microbenchmark: a loop that does the following on
anonymous memory, just as a sanity check to see that time is saved by
avoiding TLB flushes. The loop goes:
mprotect(p, PAGE_SIZE, PROT_READ)
mprotect(p, PAGE_SIZE, PROT_READ|PROT_WRITE)
*p = 0; // make the page writable
The test was run in KVM guest with 1 or 2 threads (the second thread was
busy-looping). I measured the time (cycles) of each operation:
1 thread 2 threads
mmots +patch mmots +patch
PROT_READ 3494 2725 (-22%) 8630 7788 (-10%)
PROT_READ|WRITE 3952 2724 (-31%) 9075 2865 (-68%)
[ mmots = v5.17-rc6-mmots-2022-03-06-20-38 ]
The exact numbers are really meaningless, but the benefit is clear. There
are 2 interesting results though.
(1) PROT_READ is cheaper, while one can expect it not to be affected.
This is presumably due to TLB miss that is saved
(2) Without memory access (*p = 0), the speedup of the patch is even
greater. In that scenario mprotect(PROT_READ) also avoids the TLB flush.
As a result both operations on the patched kernel take roughly ~1500
cycles (with either 1 or 2 threads), whereas on mmotm their cost is as
high as presented in the table.
This patch (of 3):
change_pXX_range() currently does not use mmu_gather, but instead
implements its own deferred TLB flushes scheme. This both complicates the
code, as developers need to be aware of different invalidation schemes,
and prevents opportunities to avoid TLB flushes or perform them in finer
granularity.
The use of mmu_gather for modified PTEs has benefits in various scenarios
even if pages are not released. For instance, if only a single page needs
to be flushed out of a range of many pages, only that page would be
flushed. If a THP page is flushed, on x86 a single TLB invlpg instruction
can be used instead of 512 instructions (or a full TLB flush, which would
Linux would actually use by default). mprotect() over multiple VMAs
requires a single flush.
Use mmu_gather in change_pXX_range(). As the pages are not released, only
record the flushed range using tlb_flush_pXX_range().
Handle THP similarly and get rid of flush_cache_range() which becomes
redundant since tlb_start_vma() calls it when needed.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nadav Amit <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
As domain->force_snooping only impacts the devices attached with the
domain, there's no need to check against all IOMMU units. On the other
hand, force_snooping could be set on a domain no matter whether it has
been attached or not, and once set it is an immutable flag. If no
device attached, the operation always succeeds. Then this empty domain
can be only attached to a device of which the IOMMU supports snoop
control.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
|
|
PRQ overflow may cause I/O throughput congestion, resulting in unnecessary
degradation of I/O performance. Appropriately increasing the length of PRQ
can greatly reduce the occurrence of PRQ overflow. The count of maximum
page requests that can be generated in parallel by a PCIe device is
statically defined in the Outstanding Page Request Capacity field of the
PCIe ATS configure space.
The new length of PRQ is calculated by summing up the value of Outstanding
Page Request Capacity register across all devices where Page Requests are
supported on the real PR-capable platform (Intel Sapphire Rapids). The
result is round to the nearest higher power of 2.
The PRQ length is also double sized as the VT-d IOMMU driver only updates
the Page Request Queue Head Register (PQH_REG) after processing the entire
queue.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
|
|
Adds h264 lat and core architecture driver for mt8192,
and the decode mode is frame based for stateless decoder.
Signed-off-by: Yunfei Dong <[email protected]>
Tested-by: Nícolas F. R. A. Prado <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
Currently to setup a fully sparse descriptor space upfront, the app needs
to alloate an array of the full size and memset it to -1 and then pass
that in. Make this a bit easier by allowing a flag that simply does
this internally rather than needing to copy each slot separately.
This works with IORING_REGISTER_FILES2 as the flag is set in struct
io_uring_rsrc_register, and is only allow when the type is
IORING_RSRC_FILE as this doesn't make sense for registered buffers.
Reviewed-by: Hao Xu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If the application passes in IORING_FILE_INDEX_ALLOC as the file_slot,
then that's a hint to allocate a fixed file descriptor rather than have
one be passed in directly.
This can be useful for having io_uring manage the direct descriptor space.
Normal open direct requests will complete with 0 for success, and < 0
in case of error. If io_uring is asked to allocated the direct descriptor,
then the direct descriptor is returned in case of success.
Reviewed-by: Hao Xu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into arm/soc
mmsys:
- add SW reset to MT8192
- add support for MT8195
pmic wrapper:
- update binding description needed for future MT8195 support
mutex:
- add support for MT8195
cmdq helper:
- remove legacy callback
* tag 'v5.18-next-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux:
soc: mediatek: mutex: remove mt8195 MOD0 and SOF0 definition
dt-bindings: pwrap: mediatek: Update pwrap document for mt8195
soc: mediatek: add DDP_DOMPONENT_DITHER0 enum for mt8195 vdosys0
soc: mediatek: add mtk-mutex support for mt8195 vdosys0
soc: mediatek: add mtk-mmsys support for mt8195 vdosys0
soc: mediatek: cmdq: Use mailbox rx_callback instead of cmdq_task_cb
dt-bindings: arm: mediatek: mmsys: add mt8195 SoC binding
dt-bindings: arm: mediatek: mmsys: add power and gce properties
soc: mediatek: mmsys: Add sw0_rst_offset for MT8192
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
Drivers that want to remove registered conflicting framebuffers prior to
register their own framebuffer, call to remove_conflicting_framebuffers().
This function takes the registration_lock mutex, to prevent a race when
drivers register framebuffer devices. But if a conflicting framebuffer
device is found, the underlaying platform device is unregistered and this
will lead to the platform driver .remove callback to be called. Which in
turn will call to unregister_framebuffer() that takes the same lock.
To prevent this, a struct fb_info.forced_out field was used as indication
to unregister_framebuffer() whether the mutex has to be grabbed or not.
But this could be unsafe, since the fbdev core is making assumptions about
what drivers may or may not do in their .remove callbacks. Allowing to run
these callbacks with the registration_lock held can cause deadlocks, since
the fbdev core has no control over what drivers do in their removal path.
A better solution is to drop the lock before platform_device_unregister(),
so unregister_framebuffer() can take it when called from the fbdev driver.
The lock is acquired again after the device has been unregistered and at
this point the removal loop can be restarted.
Since the conflicting framebuffer device has already been removed, the
loop would just finish when no more conflicting framebuffers are found.
Suggested-by: Daniel Vetter <[email protected]>
Signed-off-by: Javier Martinez Canillas <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This reverts commits:
0dad4087a86a2cbe177404dc73f18ada26a2c390 ("tcp/dccp: get rid of inet_twsk_purge()")
d507204d3c5cc57d9a8bdf0a477615bb59ea1611 ("tcp/dccp: add tw->tw_bslot")
As Leonard pointed out, a newly allocated netns can happen
to reuse a freed 'struct net'.
While TCP TW timers were covered by my patches, other things were not:
1) Lookups in rx path (INET_MATCH() and INET6_MATCH()), as they look
at 4-tuple plus the 'struct net' pointer.
2) /proc/net/tcp[6] and inet_diag, same reason.
3) hashinfo->bhash[], same reason.
Fixing all this seems risky, lets instead revert.
In the future, we might have a per netns tcp hash table, or
a per netns list of timewait sockets...
Fixes: 0dad4087a86a ("tcp/dccp: get rid of inet_twsk_purge()")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Leonard Crestez <[email protected]>
Tested-by: Leonard Crestez <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
INET_MATCH() runs without holding a lock on the socket.
We probably need to annotate most reads.
This patch makes INET_MATCH() an inline function
to ease our changes.
v2:
We remove the 32bit version of it, as modern compilers
should generate the same code really, no need to
try to be smarter.
Also make 'struct net *net' the first argument.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This adds HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN quirk which can be
used to mark HCI_Enhanced_Setup_Synchronous_Connection as broken even
if its support command bit are set since some controller report it as
supported but the command don't work properly with some configurations
(e.g. BT_VOICE_TRANSPARENT/mSBC).
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Using an activation counter to decide when the enable or disable the
cec adapter is not the best approach and can lead to race conditions.
Change this to determining the current status of the adapter, and
enable or disable the adapter accordingly.
It now only needs to be called whenever there is a chance that the
state changes, and it can handle enabling/disabling monitoring as
well if needed.
This simplifies the code and it should be a more robust approach as well.
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
If the physical address changes (i.e. becomes invalid, then valid again)
while the adapter is still claiming free logical addresses, then trigger
a reconfiguration since any claimed LAs may now be stale.
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
vb2 queue ownership is managed by the ioctl handler helpers
(vb2_ioctl_*). There are however use cases where drivers can benefit
from checking queue ownership, for instance when open-coding an ioctl
handler that needs to perform additional checks before calling the
corresponding vb2 operation.
Expose the vb2_queue_is_busy() function in the videobuf2-v4l2.h header,
and change its first argument to a struct vb2_queue pointer as the
function name implies it operates on a queue, not a video_device.
Signed-off-by: Laurent Pinchart <[email protected]>
Reviewed-by: Kieran Bingham <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
A series from Masahiro Yamada to clean up the uapi headers, making
sure they can actually be included from user space without additional
dependencies on either kernel headers or specific libc versions.
* asm-generic-headers-cleanup:
sparc: add asm/stat.h to UAPI compile-test coverage
powerpc: add asm/stat.h to UAPI compile-test coverage
mips: add asm/stat.h to UAPI compile-test coverage
riscv: add linux/bpf_perf_event.h to UAPI compile-test coverage
kbuild: prevent exported headers from including <stdlib.h>, <stdbool.h>
agpgart.h: do not include <stdlib.h> from exported header
|
|
Commit 35d0f1d54ecd ("include/uapi/linux/agpgart.h: include stdlib.h in
userspace") included <stdlib.h> to fix the unknown size_t error, but
I do not think it is the right fix.
This header already uses __kernel_size_t a few lines below.
Replace the remaining size_t, and stop including <stdlib.h>.
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
|
|
The extcon_get_extcon_dev() function returns error pointers on error,
NULL when it's a -EPROBE_DEFER defer situation, and ERR_PTR(-ENODEV)
when the CONFIG_EXTCON option is disabled. This is very complicated for
the callers to handle and a number of them had bugs that would lead to
an Oops.
In real life, there are two things which prevented crashes. First,
error pointers would only be returned if there was bug in the caller
where they passed a NULL "extcon_name" and none of them do that.
Second, only two out of the eight drivers will build when CONFIG_EXTCON
is disabled.
The normal way to write this would be to return -EPROBE_DEFER directly
when appropriate and return NULL when CONFIG_EXTCON is disabled. Then
the error handling is simple and just looks like:
dev->edev = extcon_get_extcon_dev(acpi_dev_name(adev));
if (IS_ERR(dev->edev))
return PTR_ERR(dev->edev);
For the two drivers which can build with CONFIG_EXTCON disabled, then
extcon_get_extcon_dev() will now return NULL which is not treated as an
error and the probe will continue successfully. Those two drivers are
"typec_fusb302" and "max8997-battery". In the original code, the
typec_fusb302 driver had an 800ms hang in tcpm_get_current_limit() but
now that function is a no-op. For the max8997-battery driver everything
should continue working as is.
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Reviewed-by: Heikki Krogerus <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Acked-by: Sebastian Reichel <[email protected]>
Signed-off-by: Chanwoo Choi <[email protected]>
|
|
This is very theoretical compile failure:
ELF_ST_TYPE(st_info = A)
Cast will bind first and st_info will stop being lvalue:
error: lvalue required as left operand of assignment
Given that the only use of this macro is
ELF_ST_TYPE(sym->st_info)
where st_info is "unsigned char" I've decided to remove cast especially
given that companion macro ELF_ST_BIND doesn't use cast.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
https://gitlab.freedesktop.org/drm/tegra into drm-next
drm/tegra: Changes for v5.19-rc1
Only a few fixes this time, and some debuggability improvements.
Signed-off-by: Dave Airlie <[email protected]>
From: Thierry Reding <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The listen sk is currently stored in two hash tables,
listening_hash (hashed by port) and lhash2 (hashed by port and address).
After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address")
and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"),
the TCP-SYN lookup fast path does not use listening_hash.
The commit 05c0b35709c5 ("tcp: seq_file: Replace listening_hash with lhash2")
also moved the seq_file (/proc/net/tcp) iteration usage from
listening_hash to lhash2.
There are still a few listening_hash usages left.
One of them is inet_reuseport_add_sock() which uses the listening_hash
to search a listen sk during the listen() system call. This turns
out to be very slow on use cases that listen on many different
VIPs at a popular port (e.g. 443). [ On top of the slowness in
adding to the tail in the IPv6 case ]. The latter patch has a
selftest to demonstrate this case.
This patch takes this chance to move all remaining listening_hash
usages to lhash2 and then retire listening_hash.
Since most changes need to be done together, it is hard to cut
the listening_hash to lhash2 switch into small patches. The
changes in this patch is highlighted here for the review
purpose.
1. Because of the listening_hash removal, lhash2 can use the
sk->sk_nulls_node instead of the icsk->icsk_listen_portaddr_node.
This will also keep the sk_unhashed() check to work as is
after stop adding sk to listening_hash.
The union is removed from inet_listen_hashbucket because
only nulls_head is needed.
2. icsk->icsk_listen_portaddr_node and its helpers are removed.
3. The current lhash2 users needs to iterate with sk_nulls_node
instead of icsk_listen_portaddr_node.
One case is in the inet[6]_lhash2_lookup().
Another case is the seq_file iterator in tcp_ipv4.c.
One thing to note is sk_nulls_next() is needed
because the old inet_lhash2_for_each_icsk_continue()
does a "next" first before iterating.
4. Move the remaining listening_hash usage to lhash2
inet_reuseport_add_sock() which this series is
trying to improve.
inet_diag.c and mptcp_diag.c are the final two
remaining use cases and is moved to lhash2 now also.
Signed-off-by: Martin KaFai Lau <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address")
and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"),
the count is no longer used. This patch removes it.
Signed-off-by: Martin KaFai Lau <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently the ocelot switch lib is unaware of the index of a struct
ocelot_port, since that is kept in the encapsulating structures of outer
drivers (struct dsa_port :: index, struct ocelot_port_private :: chip_port).
With the upcoming increase in complexity associated with assigning DSA
tag_8021q CPU ports to certain user ports, it becomes necessary for the
switch lib to be able to retrieve the index of a certain ocelot_port.
Therefore, introduce a new u8 to ocelot_port (same size as the chip_port
used by the ocelot switchdev driver) and rework the existing code to
populate and use it.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Reorder members of struct ocelot_port to eliminate holes and reduce
structure size. Pahole says:
Before:
struct ocelot_port {
struct ocelot * ocelot; /* 0 8 */
struct regmap * target; /* 8 8 */
bool vlan_aware; /* 16 1 */
/* XXX 7 bytes hole, try to pack */
const struct ocelot_bridge_vlan * pvid_vlan; /* 24 8 */
unsigned int ptp_skbs_in_flight; /* 32 4 */
u8 ptp_cmd; /* 36 1 */
/* XXX 3 bytes hole, try to pack */
struct sk_buff_head tx_skbs; /* 40 96 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
u8 ts_id; /* 136 1 */
/* XXX 3 bytes hole, try to pack */
phy_interface_t phy_mode; /* 140 4 */
bool is_dsa_8021q_cpu; /* 144 1 */
bool learn_ena; /* 145 1 */
/* XXX 6 bytes hole, try to pack */
struct net_device * bond; /* 152 8 */
bool lag_tx_active; /* 160 1 */
/* XXX 1 byte hole, try to pack */
u16 mrp_ring_id; /* 162 2 */
/* XXX 4 bytes hole, try to pack */
struct net_device * bridge; /* 168 8 */
int bridge_num; /* 176 4 */
u8 stp_state; /* 180 1 */
/* XXX 3 bytes hole, try to pack */
int speed; /* 184 4 */
/* size: 192, cachelines: 3, members: 18 */
/* sum members: 161, holes: 7, sum holes: 27 */
/* padding: 4 */
};
After:
struct ocelot_port {
struct ocelot * ocelot; /* 0 8 */
struct regmap * target; /* 8 8 */
struct net_device * bond; /* 16 8 */
struct net_device * bridge; /* 24 8 */
const struct ocelot_bridge_vlan * pvid_vlan; /* 32 8 */
phy_interface_t phy_mode; /* 40 4 */
unsigned int ptp_skbs_in_flight; /* 44 4 */
struct sk_buff_head tx_skbs; /* 48 96 */
/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
u16 mrp_ring_id; /* 144 2 */
u8 ptp_cmd; /* 146 1 */
u8 ts_id; /* 147 1 */
u8 stp_state; /* 148 1 */
bool vlan_aware; /* 149 1 */
bool is_dsa_8021q_cpu; /* 150 1 */
bool learn_ena; /* 151 1 */
bool lag_tx_active; /* 152 1 */
/* XXX 3 bytes hole, try to pack */
int bridge_num; /* 156 4 */
int speed; /* 160 4 */
/* size: 168, cachelines: 3, members: 18 */
/* sum members: 161, holes: 1, sum holes: 3 */
/* padding: 4 */
/* last cacheline: 40 bytes */
};
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This is no longer used since commit 7c4bb540e917 ("net: dsa: tag_ocelot:
create separate tagger for Seville").
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
DSA has not supported (and probably will not support in the future
either) independent tagging protocols per CPU port.
Different switch drivers have different requirements, some may need to
replicate some settings for each CPU port, some may need to apply some
settings on a single CPU port, while some may have to configure some
global settings and then some per-CPU-port settings.
In any case, the current model where DSA calls ->change_tag_protocol for
each CPU port turns out to be impractical for drivers where there are
global things to be done. For example, felix calls dsa_tag_8021q_register(),
which makes no sense per CPU port, so it suppresses the second call.
Let drivers deal with replication towards all CPU ports, and remove the
CPU port argument from the function prototype.
Signed-off-by: Vladimir Oltean <[email protected]>
Acked-by: Luiz Angelo Daros de Luca <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
At the time - commit 7569459a52c9 ("net: dsa: manage flooding on the CPU
ports") - not introducing a dedicated switch callback for host flooding
made sense, because for the only user, the felix driver, there was
nothing different to do for the CPU port than set the flood flags on the
CPU port just like on any other bridge port.
There are 2 reasons why this approach is not good enough, however.
(1) Other drivers, like sja1105, support configuring flooding as a
function of {ingress port, egress port}, whereas the DSA
->port_bridge_flags() function only operates on an egress port.
So with that driver we'd have useless host flooding from user ports
which don't need it.
(2) Even with the felix driver, support for multiple CPU ports makes it
difficult to piggyback on ->port_bridge_flags(). The way in which
the felix driver is going to support host-filtered addresses with
multiple CPU ports is that it will direct these addresses towards
both CPU ports (in a sort of multicast fashion), then restrict the
forwarding to only one of the two using the forwarding masks.
Consequently, flooding will also be enabled towards both CPU ports.
However, ->port_bridge_flags() gets passed the index of a single CPU
port, and that leaves the flood settings out of sync between the 2
CPU ports.
This is to say, it's better to have a specific driver method for host
flooding, which takes the user port as argument. This solves problem (1)
by allowing the driver to do different things for different user ports,
and problem (2) by abstracting the operation and letting the driver do
whatever, rather than explicitly making the DSA core point to the CPU
port it thinks needs to be touched.
This new method also creates a problem, which is that cross-chip setups
are not handled. However I don't have hardware right now where I can
test what is the proper thing to do, and there isn't hardware compatible
with multi-switch trees that supports host flooding. So it remains a
problem to be tackled in the future.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Similar to dsa_user_ports() which retrieves a port mask of all user
ports, introduce dsa_cpu_ports() which retrieves the mask of all CPU
ports of a switch.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Very few drivers actually have Kconfig knobs for adding
-DDEBUG. 8 according to a quick grep, while there are
93 users of skb_checksum_none_assert(). Switch to the
new DEBUG_NET_WARN_ON_ONCE() to catch bad skbs.
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
No conflicts.
Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
No functional change.
- remove checkpoint=disable check for f2fs_write_checkpoint
- get sec_freed all the time
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and bluetooth.
No outstanding fires.
Current release - regressions:
- eth: atlantic: always deep reset on pm op, fix null-deref
Current release - new code bugs:
- rds: use maybe_get_net() when acquiring refcount on TCP sockets
[refinement of a previous fix]
- eth: ocelot: mark traps with a bool instead of guessing type based
on list membership
Previous releases - regressions:
- net: fix skipping features in for_each_netdev_feature()
- phy: micrel: fix null-derefs on suspend/resume and probe
- bcmgenet: check for Wake-on-LAN interrupt probe deferral
Previous releases - always broken:
- ipv4: drop dst in multicast routing path, prevent leaks
- ping: fix address binding wrt vrf
- net: fix wrong network header length when BPF protocol translation
is used on skbs with a fraglist
- bluetooth: fix the creation of hdev->name
- rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
- wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing
- wifi: ath11k: reduce the wait time of 11d scan and hw scan while
adding an interface
- mac80211: fix rx reordering with non explicit / psmp ack policy
- mac80211: reset MBSSID parameters upon connection
- nl80211: fix races in nl80211_set_tx_bitrate_mask()
- tls: fix context leak on tls_device_down
- sched: act_pedit: really ensure the skb is writable
- batman-adv: don't skb_split skbuffs with frag_list
- eth: ocelot: fix various issues with TC actions (null-deref; bad
stats; ineffective drops; ineffective filter removal)"
* tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
tls: Fix context leak on tls_device_down
net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending
net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()
mlxsw: Avoid warning during ip6gre device removal
net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral
net: ethernet: mediatek: ppe: fix wrong size passed to memset()
Bluetooth: Fix the creation of hdev->name
i40e: i40e_main: fix a missing check on list iterator
net/sched: act_pedit: really ensure the skb is writable
s390/lcs: fix variable dereferenced before check
s390/ctcm: fix potential memory leak
s390/ctcm: fix variable dereferenced before check
net: atlantic: verify hw_head_ lies within TX buffer ring
net: atlantic: add check for MAX_SKB_FRAGS
net: atlantic: reduce scope of is_rsc_complete
net: atlantic: fix "frag[0] not initialized"
net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
net: phy: micrel: Fix incorrect variable type in micrel
decnet: Use container_of() for struct dn_neigh casts
...
|
|
In commit ca321ec74322 ("module.h: allow #define strings to work with
MODULE_IMPORT_NS") I fixed up the MODULE_IMPORT_NS() macro to allow
defined strings to work with it. Unfortunatly I did it in a two-stage
process, when it could just be done with the __stringify() macro as
pointed out by Masahiro Yamada.
Clean this up to only be one macro instead of two steps to achieve the
same end result.
Fixes: ca321ec74322 ("module.h: allow #define strings to work with MODULE_IMPORT_NS")
Reported-by: Masahiro Yamada <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Matthias Maennich <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
KUnit's test-managed resources can be created in two ways:
- Using the kunit_add_resource() family of functions, which accept a
struct kunit_resource pointer, typically allocated statically or on
the stack during the test.
- Using the kunit_alloc_resource() family of functions, which allocate a
struct kunit_resource using kzalloc() behind the scenes.
Both of these families of functions accept a 'free' function to be
called when the resource is finally disposed of.
At present, KUnit will kfree() the resource if this 'free' function is
specified, and will not if it is NULL. However, this can lead
kunit_alloc_resource() to leak memory (if no 'free' function is passed
in), or kunit_add_resource() to incorrectly kfree() memory which was
allocated by some other means (on the stack, as part of a larger
allocation, etc), if a 'free' function is provided.
Instead, always kfree() if the resource was allocated with
kunit_alloc_resource(), and never kfree() if it was passed into
kunit_add_resource() by the user. (If the user of kunit_add_resource()
wishes the resource be kfree()ed, they can call kfree() on the resource
from within the 'free' function.
This is implemented by adding a 'should_free' member to
struct kunit_resource and setting it appropriately. To facilitate this,
the various resource add/alloc functions have been refactored somewhat,
making them all call a __kunit_add_resource() helper after setting the
'should_free' member appropriately. In the process, all other functions
have been made static inline functions.
Signed-off-by: David Gow <[email protected]>
Tested-by: Daniel Latypov <[email protected]>
Reviewed-by: Brendan Higgins <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
Current atomic write has three major issues like below.
- keeps the updates in non-reclaimable memory space and they are even
hard to be migrated, which is not good for contiguous memory
allocation.
- disk spaces used for atomic files cannot be garbage collected, so
this makes it difficult for the filesystem to be defragmented.
- If atomic write operations hit the threshold of either memory usage
or garbage collection failure count, All the atomic write operations
will fail immediately.
To resolve the issues, I will keep a COW inode internally for all the
updates to be flushed from memory, when we need to flush them out in a
situation like high memory pressure. These COW inodes will be tagged
as orphan inodes to be reclaimed in case of sudden power-cut or system
failure during atomic writes.
Signed-off-by: Daeho Jeong <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
|
|
Don't hand-initialize the struct here, create a macro to initialize it
so new fields added don't get forgotten in places.
Signed-off-by: Corey Minyard <[email protected]>
|
|
There was a "type" element added to this structure, but some static
values were missed. The default value will be zero, which is correct,
but create an initializer for the type and initialize the type properly
in the initializer to avoid future issues.
Reported-by: Joe Wiese <[email protected]>
Signed-off-by: Corey Minyard <[email protected]>
|
|
This fixes the corresponding warnings during building the docs.
Signed-off-by: Siddh Raman Pant <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
Add tracing support which may be useful for debugging systems that fail to complete
In Field Scan tests.
Acked-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Acked-by: Hans de Goede <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|
|
Hardware core level testing features require near simultaneous execution
of WRMSR instructions on all threads of a core to initiate a test.
Provide a customized cut down version of stop_machine_cpuslocked() that
just operates on the threads of a single core.
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|
|
Commit ef295ecf090d modified the Linux kernel such that the bottom bits
of the bi_opf member contain the operation instead of the topmost bits.
That commit did not update the comment next to bi_opf. Hence this patch.
From commit ef295ecf090d:
-#define bio_op(bio) ((bio)->bi_opf >> BIO_OP_SHIFT)
+#define bio_op(bio) ((bio)->bi_opf & REQ_OP_MASK)
Cc: Christoph Hellwig <[email protected]>
Cc: Ming Lei <[email protected]>
Fixes: ef295ecf090d ("block: better op and flags encoding")
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Keep the op-specific flag last so that they are clearly separate from
the generic flags. Various recent commits just kept adding new flags
at the end.
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fix arg ordering in TP_ARGS macro, which fixes the output.
Fixes: 502c87d65564c ("io-uring: Make tracepoints consistent.")
Signed-off-by: Dylan Yudaken <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
It has been observed with certain PCIe USB cards (like Inateck connected
to AM64 EVM or J7200 EVM) that as soon as the primary roothub is
registered, port status change is handled even before xHC is running
leading to cold plug USB devices not detected. For such cases, registering
both the root hubs along with the second HCD is required. Add support for
deferring roothub registration in usb_add_hcd(), so that both primary and
secondary roothubs are registered along with the second HCD.
This patch has been added and reverted earier as it triggered a race
in usb device enumeration.
That race is now fixed in 5.16-rc3, and in stable back to 5.4
commit 6cca13de26ee ("usb: hub: Fix locking issues with address0_mutex")
commit 6ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0
race")
CC: [email protected] # 5.4+
Suggested-by: Mathias Nyman <[email protected]>
Tested-by: Chris Chiu <[email protected]>
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Kishon Vijay Abraham I <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|