| Age | Commit message (Collapse) | Author | Files | Lines |
|
There is no user of these callbacks. The motivation for this change is
to stop returning an error code from the remove callback.
This is a preparation for making platform remove callbacks return void.
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
There is no machine providing a teardown callback, so drop the unused
code.
This is a preparation for making platform remove callbacks return void.
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
The last user, which in fact was a dead code, has gone a year ago,
previous one 3 years ago. On top of that we want to drop away the
legacy GPIO APIs in the kernel, so take a chance to get rid of
unused devm_gpio_free() and accompanying stuff.
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may be quite slow.
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Acked-by: Robin Murphy <[email protected]>
Acked-by: Martin K. Petersen <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.
Provide an API for device drivers to know this "optimal" limit, such that
they may try to produce mapping which don't exceed it.
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Acked-by: Martin K. Petersen <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Remove locked versions of functions that are no longer used by anyone.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Prepare for devlink reload being called with devlink->lock held and
convert the netdevsim driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add unlocked variants of devlink_region_create/destroy() functions
to be used in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add unlocked variants of devlink_dpipe*() functions to be used
in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add unlocked variants of devlink_sb*() functions to be used
in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add unlocked variants of devlink_resource*() functions to be used
in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add unlocked variants of devl_trap*() functions to be used in drivers
called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We don't want to list every single ubuf_info callback in
skb_orphan_frags(), add a flag controlling the behaviour.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
MSM8909 has the same power domains as MSM8916 so just define them
as aliases for the existing definitions.
Signed-off-by: Stephan Gerhold <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Bjorn Andersson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
FSDAX page refcounts are 1-based, rather than 0-based: if refcount is
1, then the page is freed. The FSDAX pages can be pinned through GUP,
then they will be unpinned via unpin_user_page() using a folio variant
to put the page, however, folio variants did not consider this special
case, the result will be to miss a wakeup event (like the user of
__fuse_dax_break_layouts()). This results in a task being permanently
stuck in TASK_INTERRUPTIBLE state.
Since FSDAX pages are only possibly obtained by GUP users, so fix GUP
instead of folio_put() to lower overhead.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: d8ddc099c6b3 ("mm/gup: Add gup_put_folio()")
Signed-off-by: Muchun Song <[email protected]>
Suggested-by: Matthew Wilcox <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: William Kucharski <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
From VCN4, AMDGPU_HW_IP_VCN_ENC is re-used to support
both encoding and decoding jobs.
Link: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/245/commits
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Ruijing Dong <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We need the USB fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Make it clear that DMA_RESV_USAGE_BOOKKEEP can be used for explicit synced
user space submissions as well and document the rules around adding the
same fence with different usages.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The ublk protocol has no mechanism to actually transfer the integrity
metadata, so don't define this flag, which requires that an integrity
payload is attached to a bio.
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of registering callback to process sensor events right at
initialization time, wait for the sensor to be register in the iio
subsystem.
Events can come at probe time (in case the kernel rebooted abruptly
without switching the sensor off for instance), and be sent to IIO core
before the sensor is fully registered.
Fixes: aa984f1ba4a4 ("iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO")
Reported-by: Douglas Anderson <[email protected]>
Signed-off-by: Gwendal Grignou <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Cameron <[email protected]>
|
|
When RDRAND was introduced, there was much discussion on whether it
should be trusted and how the kernel should handle that. Initially, two
mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and
"nordrand", a boot-time switch.
Later the thinking evolved. With a properly designed RNG, using RDRAND
values alone won't harm anything, even if the outputs are malicious.
Rather, the issue is whether those values are being *trusted* to be good
or not. And so a new set of options were introduced as the real
ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu".
With these options, RDRAND is used, but it's not always credited. So in
the worst case, it does nothing, and in the best case, maybe it helps.
Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the
center and became something certain platforms force-select.
The old options don't really help with much, and it's a bit odd to have
special handling for these instructions when the kernel can deal fine
with the existence or untrusted existence or broken existence or
non-existence of that CPU capability.
Simplify the situation by removing CONFIG_ARCH_RANDOM and using the
ordinary asm-generic fallback pattern instead, keeping the two options
that are actually used. For now it leaves "nordrand" for now, as the
removal of that will take a different route.
Acked-by: Michael Ellerman <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
While reading sysctl_tcp_notsent_lowat, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
While reading these sysctl knobs, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.
- tcp_retries1
- tcp_retries2
- tcp_orphan_retries
- tcp_fin_timeout
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
While reading sysctl_tcp_keepalive_(time|probes|intvl), they can be changed
concurrently. Thus, we need to add READ_ONCE() to their readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Async crypto currently benefits from the fact that we decrypt
in place. When we allow input and output to be different skbs
we will have to hang onto the input while we move to the next
record. Clone the inputs and keep them on a list.
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We no longer allow a decrypted skb to remain linked to ctx->recv_pkt.
Anything on the list is decrypted, anything on ctx->recv_pkt needs
to be decrypted.
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
recvmsg() in TLS gets data from the skb list (rx_list) or fresh
skbs we read from TCP via strparser. The former holds skbs which were
already decrypted for peek or decrypted and partially consumed.
tls_wait_data() only notices appearance of fresh skbs coming out
of TCP (or psock). It is possible, if there is a concurrent call
to peek() and recv() that the peek() will move the data from input
to rx_list without recv() noticing. recv() will then read data out
of order or never wake up.
This is not a practical use case/concern, but it makes the self
tests less reliable. This patch solves the problem by allowing
only one reader in.
Because having multiple processes calling read()/peek() is not
normal avoid adding a lock and try to fast-path the single reader
case.
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Extend SMC-R link group netlink attribute SMC_GEN_LGR_SMCR.
Introduce SMC_NLA_LGR_R_BUF_TYPE to show the buffer type of
SMC-R link group.
Signed-off-by: Wen Gu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch introduces the sysctl smcr_buf_type for setting
the type of SMC-R sndbufs and RMBs.
Valid values includes:
- SMCR_PHYS_CONT_BUFS, which means use physically contiguous
buffers for better performance and is the default value.
- SMCR_VIRT_CONT_BUFS, which means use virtually contiguous
buffers in case of physically contiguous memory is scarce.
- SMCR_MIXED_BUFS, which means first try to use physically
contiguous buffers. If not available, then use virtually
contiguous buffers.
Signed-off-by: Wen Gu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Using current implementation of polling mode, there is high chances we
will hit into timeout error when running phc2sys. Hence, update the
implementation of hardware crosstimestamping to use the MAC interrupt
service routine instead of polling for TSIS bit in the MAC Timestamp
Interrupt Status register to be set.
Cc: Richard Cochran <[email protected]>
Signed-off-by: Wong Vee Khee <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add FourCCs for two missing permutations of the packed YUV 4:4:4 color
components, namely AVUY and XVUY.
These formats are needed by the NXP i.MX8 ISI. While the ISI is
supported by a V4L2 device (corresponding formats have been submitted to
V4L2), it is handled in userspace by libcamera, which uses DRM FourCCs
for pixel formats.
Signed-off-by: Laurent Pinchart <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
No need to expose this structure definition in the header.
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Commit 20347fca71a3 ("swiotlb: split up the global swiotlb lock") splits
io_tlb_mem into multiple areas. Each area has its own lock and index. The
global ones are not used so remove them.
Signed-off-by: Chao Gao <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
The extern specifier is not needed for this declaration, so drop it. The
function also depends only on the input parameters, and has no side
effects, so it can be marked __pure like other functions in cpumask.h.
Link: https://lkml.kernel.org/r/72ab755695b74bb5fbaa756ae4c0edd708d172f1.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Yury Norov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
On uniprocessor builds, any CPU mask is assumed to contain exactly one CPU
(cpu0). This assumption ignores the existence of empty masks, resulting
in incorrect behaviour.
cpumask_first_zero(), cpumask_next_zero(), and for_each_cpu_not() don't
provide behaviour matching the assumption that a UP mask is always "1",
and instead provide behaviour matching the empty mask.
Drop the incorrectly optimised code and use the generic implementations in
all cases.
Link: https://lkml.kernel.org/r/86bf3f005abba2d92120ddd0809235cab4f759a6.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <[email protected]>
Suggested-by: Yury Norov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Valentin Schneider <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
On uniprocessor builds, the following loops will always run over a mask
that contains one enabled CPU (cpu0):
- for_each_possible_cpu
- for_each_online_cpu
- for_each_present_cpu
Provide uniprocessor-specific macros for these loops, that always run
exactly once.
Link: https://lkml.kernel.org/r/3a92869b902a075b97be5d1452c9c6badbbff0df.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <[email protected]>
Acked-by: Yury Norov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Valentin Schneider <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The kfifo_to_user() macro is supposed to return zero for success or
negative error codes. Unfortunately, there is a signedness bug so it
returns unsigned int. This only affects callers which try to save the
result in ssize_t and as far as I can see the only place which does that
is line6_hwdep_read().
TL;DR: s/_uint/_int/.
Link: https://lkml.kernel.org/r/YrVL3OJVLlNhIMFs@kili
Fixes: 144ecf310eb5 ("kfifo: fix kfifo_alloc() to return a signed int value")
Signed-off-by: Dan Carpenter <[email protected]>
Cc: Stefani Seibold <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The workaround for 'asm goto' miscompilation introduces a compiler barrier
quirk that inhibits many useful compiler optimizations. For example,
__try_cmpxchg_user compiles to:
11375: 41 8b 4d 00 mov 0x0(%r13),%ecx
11379: 41 8b 02 mov (%r10),%eax
1137c: f0 0f b1 0a lock cmpxchg %ecx,(%rdx)
11380: 0f 94 c2 sete %dl
11383: 84 d2 test %dl,%dl
11385: 75 c4 jne 1134b <...>
11387: 41 89 02 mov %eax,(%r10)
where the barrier inhibits flags propagation from asm when compiled with
gcc-12.
When the mentioned quirk is removed, the following code is generated:
11553: 41 8b 4d 00 mov 0x0(%r13),%ecx
11557: 41 8b 02 mov (%r10),%eax
1155a: f0 0f b1 0a lock cmpxchg %ecx,(%rdx)
1155e: 74 c9 je 11529 <...>
11560: 41 89 02 mov %eax,(%r10)
The refered compiler bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
was fixed for gcc-4.8.2.
Current minimum required version of GCC is version 5.1 which has the above
'asm goto' miscompilation fixed, so remove the workaround.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uros Bizjak <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
DO_ONCE(func, ...) will call func with spinlock which acquired by
spin_lock_irqsave in __do_once_start. But the get_random_once_wait will
sleep in get_random_bytes_wait -> wait_for_random_bytes.
Fortunately, there is no place to use {net_}get_random_once_wait, so we
could remove them simply.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: wuchi <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Paolo Abeni <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
bdi_sched_wait() is no longer used since commit 839a8e8660b6 ("writeback:
replace custom worker pool implementation with unbound workqueue"), so
remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiu Jianfeng <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
show_free_areas() allows to filter out node specific data which is
irrelevant to the allocation request. But hugetlb_show_meminfo() still
shows hugetlb on all nodes, which is redundant and unnecessary.
Use show_mem_node_skip() to skip irrelevant nodes. And replace
hugetlb_show_meminfo() with hugetlb_show_meminfo_node(nid).
before-and-after sample output of OOM:
before:
```
[ 214.362453] Node 1 active_anon:148kB inactive_anon:4050920kB active_file:112kB inactive_file:100kB
[ 214.375429] Node 1 Normal free:45100kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig
[ 214.388334] lowmem_reserve[]: 0 0 0 0 0
[ 214.390251] Node 1 Normal: 423*4kB (UE) 320*8kB (UME) 187*16kB (UE) 117*32kB (UE) 57*64kB (UME) 20
[ 214.397626] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 214.401518] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
```
after:
```
[ 145.069705] Node 1 active_anon:128kB inactive_anon:4049412kB active_file:56kB inactive_file:84kB u
[ 145.110319] Node 1 Normal free:45424kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig
[ 145.152315] lowmem_reserve[]: 0 0 0 0 0
[ 145.155244] Node 1 Normal: 470*4kB (UME) 373*8kB (UME) 247*16kB (UME) 168*32kB (UE) 86*64kB (UME)
[ 145.164119] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
```
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gang Li <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The current comment is confusing because if global or memcg deferred list
in the second tail page is occupied by compound_head, why we still use
page[2].deferred_list here? I think it wants to say that Global or memcg
deferred list in the first tail page is occupied by compound_mapcount and
compound_pincount so we use the second tail page's deferred_list instead.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zach O'Keefe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When pmd is non-present, pmd_pfn returns an insane value. So we should
check pmd_present first to avoid acquiring such insane value and also
avoid touching possible cold huge_zero_pfn cache line when pmd isn't
present.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zach O'Keefe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Now all the platforms enable ARCH_HAS_GET_PAGE_PROT. They define and
export own vm_get_page_prot() whether custom or standard
DECLARE_VM_GET_PAGE_PROT. Hence there is no need for default generic
fallback for vm_get_page_prot(). Just drop this fallback and also
ARCH_HAS_GET_PAGE_PROT mechanism.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Now that protection_map[] has been moved inside those platforms that
enable ARCH_HAS_VM_GET_PAGE_PROT. Hence generic protection_map[] array
now can be protected with CONFIG_ARCH_HAS_VM_GET_PAGE_PROT intead of
__P000.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This just converts the generic vm_get_page_prot() implementation into a
new macro i.e DECLARE_VM_GET_PAGE_PROT which later can be used across
platforms when enabling them with ARCH_HAS_VM_GET_PAGE_PROT. This does
not create any functional change.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Suggested-by: Christoph Hellwig <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/mmap: Drop __SXXX/__PXXX macros from across platforms",
v7.
__SXXX/__PXXX macros are unnecessary abstraction layer in creating the
generic protection_map[] array which is used for vm_get_page_prot(). This
abstraction layer can be avoided, if the platforms just define the array
protection_map[] for all possible vm_flags access permission combinations
and also export vm_get_page_prot() implementation.
This series drops __SXXX/__PXXX macros from across platforms in the tree.
First it build protects generic protection_map[] array with '#ifdef
__P000' and moves it inside platforms which enable
ARCH_HAS_VM_GET_PAGE_PROT. Later this build protects same array with
'#ifdef ARCH_HAS_VM_GET_PAGE_PROT' and moves inside remaining platforms
while enabling ARCH_HAS_VM_GET_PAGE_PROT. This adds a new macro
DECLARE_VM_GET_PAGE_PROT defining the current generic vm_get_page_prot(),
in order for it to be reused on platforms that do not require custom
implementation. Finally, ARCH_HAS_VM_GET_PAGE_PROT can just be dropped,
as all platforms now define and export vm_get_page_prot(), via looking up
a private and static protection_map[] array. protection_map[] data type
has been changed as 'static const' on all platforms that do not change it
during boot.
This patch (of 26):
Build protect generic protection_map[] array with __P000, so that it can
be moved inside all the platforms one after the other. Otherwise there
will be build failures during this process.
CONFIG_ARCH_HAS_VM_GET_PAGE_PROT cannot be used for this purpose as only
certain platforms enable this config now.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Suggested-by: Christophe Leroy <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Currently the PCP lists are protected by using local_lock_irqsave to
prevent migration and IRQ reentrancy but this is inconvenient. Remote
draining of the lists is impossible and a workqueue is required and every
task allocation/free must disable then enable interrupts which is
expensive.
As preparation for dealing with both of those problems, protect the
lists with a spinlock. The IRQ-unsafe version of the lock is used
because IRQs are already disabled by local_lock_irqsave. spin_trylock
is used in combination with local_lock_irqsave() but later will be
replaced with a spin_trylock_irqsave when the local_lock is removed.
The per_cpu_pages still fits within the same number of cache lines after
this patch relative to before the series.
struct per_cpu_pages {
spinlock_t lock; /* 0 4 */
int count; /* 4 4 */
int high; /* 8 4 */
int batch; /* 12 4 */
short int free_factor; /* 16 2 */
short int expire; /* 18 2 */
/* XXX 4 bytes hole, try to pack */
struct list_head lists[13]; /* 24 208 */
/* size: 256, cachelines: 4, members: 7 */
/* sum members: 228, holes: 1, sum holes: 4 */
/* padding: 24 */
} __attribute__((__aligned__(64)));
There is overhead in the fast path due to acquiring the spinlock even
though the spinlock is per-cpu and uncontended in the common case. Page
Fault Test (PFT) running on a 1-socket reported the following results on a
1 socket machine.
5.19.0-rc3 5.19.0-rc3
vanilla mm-pcpspinirq-v5r16
Hmean faults/sec-1 869275.7381 ( 0.00%) 874597.5167 * 0.61%*
Hmean faults/sec-3 2370266.6681 ( 0.00%) 2379802.0362 * 0.40%*
Hmean faults/sec-5 2701099.7019 ( 0.00%) 2664889.7003 * -1.34%*
Hmean faults/sec-7 3517170.9157 ( 0.00%) 3491122.8242 * -0.74%*
Hmean faults/sec-8 3965729.6187 ( 0.00%) 3939727.0243 * -0.66%*
There is a small hit in the number of faults per second but given that the
results are more stable, it's borderline noise.
[[email protected]: add missing local_unlock_irqrestore() on contention path]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mel Gorman <[email protected]>
Tested-by: Yu Zhao <[email protected]>
Reviewed-by: Nicolas Saenz Julienne <[email protected]>
Tested-by: Nicolas Saenz Julienne <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The per_cpu_pages is cache-aligned on a standard x86-64 distribution
configuration but a later patch will add a new field which would push the
structure into the next cache line. Use only one list to store THP-sized
pages on the per-cpu list. This assumes that the vast majority of
THP-sized allocations are GFP_MOVABLE but even if it was another type, it
would not contribute to serious fragmentation that potentially causes a
later THP allocation failure. Align per_cpu_pages on the cacheline
boundary to ensure there is no false cache sharing.
After this patch, the structure sizing is;
struct per_cpu_pages {
int count; /* 0 4 */
int high; /* 4 4 */
int batch; /* 8 4 */
short int free_factor; /* 12 2 */
short int expire; /* 14 2 */
struct list_head lists[13]; /* 16 208 */
/* size: 256, cachelines: 4, members: 6 */
/* padding: 32 */
} __attribute__((__aligned__(64)));
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mel Gorman <[email protected]>
Tested-by: Minchan Kim <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Tested-by: Yu Zhao <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Nicolas Saenz Julienne <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Drain remote per-cpu directly", v5.
Some setups, notably NOHZ_FULL CPUs, may be running realtime or
latency-sensitive applications that cannot tolerate interference due to
per-cpu drain work queued by __drain_all_pages(). Introduce a new
mechanism to remotely drain the per-cpu lists. It is made possible by
remotely locking 'struct per_cpu_pages' new per-cpu spinlocks. This has
two advantages, the time to drain is more predictable and other unrelated
tasks are not interrupted.
This series has the same intent as Nicolas' series "mm/page_alloc: Remote
per-cpu lists drain support" -- avoid interference of a high priority task
due to a workqueue item draining per-cpu page lists. While many workloads
can tolerate a brief interruption, it may cause a real-time task running
on a NOHZ_FULL CPU to miss a deadline and at minimum, the draining is
non-deterministic.
Currently an IRQ-safe local_lock protects the page allocator per-cpu
lists. The local_lock on its own prevents migration and the IRQ disabling
protects from corruption due to an interrupt arriving while a page
allocation is in progress.
This series adjusts the locking. A spinlock is added to struct
per_cpu_pages to protect the list contents while local_lock_irq is
ultimately replaced by just the spinlock in the final patch. This allows
a remote CPU to safely. Follow-on work should allow the spin_lock_irqsave
to be converted to spin_lock to avoid IRQs being disabled/enabled in most
cases. The follow-on patch will be one kernel release later as it is
relatively high risk and it'll make bisections more clear if there are any
problems.
Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages
and when it is storing per-cpu pages.
Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking
this is not necessary but it avoids per_cpu_pages consuming another
cache line.
Patch 3 is a preparation patch to avoid code duplication.
Patch 4 is a minor correction.
Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still
relying on local_lock to prevent migration, stabilise the pcp
lookup and prevent IRQ reentrancy.
Patch 6 remote drains per-cpu pages directly instead of using a workqueue.
Patch 7 uses a normal spinlock instead of local_lock for remote draining
This patch (of 7):
The page allocator uses page->lru for storing pages on either buddy or PCP
lists. Create page->buddy_list and page->pcp_list as a union with
page->lru. This is simply to clarify what type of list a page is on in
the page allocator.
No functional change intended.
[[email protected]: fix page lru fields in macros]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mel Gorman <[email protected]>
Tested-by: Minchan Kim <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Reviewed-by: Nicolas Saenz Julienne <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Tested-by: Yu Zhao <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|