Age | Commit message (Collapse) | Author | Files | Lines |
|
Subprograms are calling map_poke_track(), but on program release there is no
hook to call map_poke_untrack(). However, on program release, the aux memory
(and poke descriptor table) is freed even though we still have a reference to
it in the element list of the map aux data. When we run map_poke_run(), we then
end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run():
[...]
[ 402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e
[ 402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337
[ 402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G I 5.12.0+ #399
[ 402.824715] Call Trace:
[ 402.824719] dump_stack+0x93/0xc2
[ 402.824727] print_address_description.constprop.0+0x1a/0x140
[ 402.824736] ? prog_array_map_poke_run+0xc2/0x34e
[ 402.824740] ? prog_array_map_poke_run+0xc2/0x34e
[ 402.824744] kasan_report.cold+0x7c/0xd8
[ 402.824752] ? prog_array_map_poke_run+0xc2/0x34e
[ 402.824757] prog_array_map_poke_run+0xc2/0x34e
[ 402.824765] bpf_fd_array_map_update_elem+0x124/0x1a0
[...]
The elements concerned are walked as follows:
for (i = 0; i < elem->aux->size_poke_tab; i++) {
poke = &elem->aux->poke_tab[i];
[...]
The access to size_poke_tab is a 4 byte read, verified by checking offsets
in the KASAN dump:
[ 402.825004] The buggy address belongs to the object at ffff8881905a7800
which belongs to the cache kmalloc-1k of size 1024
[ 402.825008] The buggy address is located 320 bytes inside of
1024-byte region [ffff8881905a7800, ffff8881905a7c00)
The pahole output of bpf_prog_aux:
struct bpf_prog_aux {
[...]
/* --- cacheline 5 boundary (320 bytes) --- */
u32 size_poke_tab; /* 320 4 */
[...]
In general, subprograms do not necessarily manage their own data structures.
For example, BTF func_info and linfo are just pointers to the main program
structure. This allows reference counting and cleanup to be done on the latter
which simplifies their management a bit. The aux->poke_tab struct, however,
did not follow this logic. The initial proposed fix for this use-after-free
bug further embedded poke data tracking into the subprogram with proper
reference counting. However, Daniel and Alexei questioned why we were treating
these objects special; I agree, its unnecessary. The fix here removes the per
subprogram poke table allocation and map tracking and instead simply points
the aux->poke_tab pointer at the main programs poke table. This way, map
tracking is simplified to the main program and we do not need to manage them
per subprogram.
This also means, bpf_prog_free_deferred(), which unwinds the program reference
counting and kfrees objects, needs to ensure that we don't try to double free
the poke_tab when free'ing the subprog structures. This is easily solved by
NULL'ing the poke_tab pointer. The second detail is to ensure that per
subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do
this, we add a pointer in the poke structure to point at the subprogram value
so JITs can easily check while walking the poke_tab structure if the current
entry belongs to the current program. The aux pointer is stable and therefore
suitable for such comparison. On the jit_subprogs() error path, we omit
cleaning up the poke->aux field because these are only ever referenced from
the JIT side, but on error we will never make it to the JIT, so its fine to
leave them dangling. Removing these pointers would complicate the error path
for no reason. However, we do need to untrack all poke descriptors from the
main program as otherwise they could race with the freeing of JIT memory from
the subprograms. Lastly, a748c6975dea3 ("bpf: propagate poke descriptors to
subprograms") had an off-by-one on the subprogram instruction index range
check as it was testing 'insn_idx >= subprog_start && insn_idx <= subprog_end'.
However, subprog_end is the next subprogram's start instruction.
Fixes: a748c6975dea3 ("bpf: propagate poke descriptors to subprograms")
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Co-developed-by: Daniel Borkmann <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Since d4a45c68dc81 ("irqdomain: Protect the linear revmap with RCU"),
any irqdomain lookup requires the RCU read lock to be held.
This assumes that the architecture code will be structured such as
irq_enter() will be called *before* the interrupt is looked up
in the irq domain. However, this isn't the case for MIPS, and a number
of drivers are structured to do it the other way around when handling
an interrupt in their root irqchip (secondary irqchips are OK by
construction).
This results in a RCU splat on a lockdep-enabled kernel when the kernel
takes an interrupt from idle, as reported by Guenter Roeck.
Note that this could have fired previously if any driver had used
tree-based irqdomain, which always had the RCU requirement.
To solve this, provide a MIPS-specific helper (do_domain_IRQ())
as the pendent of do_IRQ() that will do thing in the right order
(and maybe save some cycles in the process).
Ideally, MIPS would be moved over to using handle_domain_irq(),
but that's much more ambitious.
Reported-by: Guenter Roeck <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
[maz: add dependency on CONFIG_IRQ_DOMAIN after report from the kernelci bot]
Signed-off-by: Marc Zyngier <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
Make sure that we disable each of the TX and RX queues in the TDMA and
RDMA control registers. This is a correctness change to be symmetrical
with the code that enables the TX and RX queues.
Tested-by: Maxime Ripard <[email protected]>
Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use the nice helpers to initialize and the uid/gid/cred_uid when passed as mount arguments.
Signed-off-by: Ronnie Sahlberg <[email protected]>
Acked-by: Pavel Shilovsky <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Ivan Mikhaylov says:
====================
net/ncsi: Add NCSI Intel OEM command to keep PHY link up
Add NCSI Intel OEM command to keep PHY link up and prevents any channel
resets during the host load on i210. Also includes dummy response handler for
Intel manufacturer id.
Changes from v1:
1. sparse fixes about casts
2. put it after ncsi_dev_state_probe_cis instead of
ncsi_dev_state_probe_channel because sometimes channel is not ready
after it
3. inl -> intel
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add the dummy response handler for Intel boards to prevent incorrect
handling of OEM commands.
Signed-off-by: Ivan Mikhaylov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This allows to keep PHY link up and prevents any channel resets during
the host load.
It is KEEP_PHY_LINK_UP option(Veto bit) in i210 datasheet which
block PHY reset and power state changes.
Signed-off-by: Ivan Mikhaylov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Sparse reports:
net/ncsi/ncsi-rsp.c:406:24: warning: cast to restricted __be32
net/ncsi/ncsi-manage.c:732:33: warning: cast to restricted __be32
net/ncsi/ncsi-manage.c:756:25: warning: cast to restricted __be32
net/ncsi/ncsi-manage.c:779:25: warning: cast to restricted __be32
Signed-off-by: Ivan Mikhaylov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
PHY_SPARX5_SERDES depends on OF so SPARX5_SWITCH should also depend
on OF since 'select' does not follow any dependencies.
WARNING: unmet direct dependencies detected for PHY_SPARX5_SERDES
Depends on [n]: (ARCH_SPARX5 || COMPILE_TEST [=n]) && OF [=n] && HAS_IOMEM [=y]
Selected by [y]:
- SPARX5_SWITCH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROCHIP [=y] && NET_SWITCHDEV [=y] && HAS_IOMEM [=y]
Fixes: 3cfa11bac9bb ("net: sparx5: add the basic sparx5 driver")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Lars Povlsen <[email protected]>
Cc: Steen Hegelund <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>
|
|
IRQs are requested during driver's ndo_open() and then later
freed up in disable_interrupts() during driver unload.
A race exists where driver can set the CXGB4_FULL_INIT_DONE
flag in ndo_open() after the disable_interrupts() in driver
unload path checks it, and hence misses calling free_irq().
Fix by unregistering netdevice first and sync with driver's
ndo_open(). This ensures disable_interrupts() checks the flag
correctly and frees up the IRQs properly.
Fixes: b37987e8db5f ("cxgb4: Disable interrupts and napi before unregistering netdev")
Signed-off-by: Shahjada Abul Husain <[email protected]>
Signed-off-by: Raju Rangoju <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When reboot system, no power cycles, firmware is already downloaded,
return -EIO will break driver as error:
mt7921e: probe of 0000:03:00.0 failed with error -5
Skip firmware download and continue to probe.
Signed-off-by: Aaron Ma <[email protected]>
Fixes: 1c099ab44727c ("mt76: mt7921: add MCU support")
Signed-off-by: David S. Miller <[email protected]>
|
|
Since Mikrotik 10/25G NIC MDIO op emulation is not 100% reliable,
on rare occasions it can happen that some physical functions of
the NIC do not get initialized due to timeouted early MDIO op.
This changes the atl1c probe on Mikrotik 10/25G NIC not to
depend on MDIO op emulation.
Signed-off-by: Gatis Peisenieks <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
S390's init_idle_preempt_count(p, cpu) doesn't actually let us initialize the
preempt_count of the requested CPU's idle task: it unconditionally writes
to the current CPU's. This clearly conflicts with idle_threads_init(),
which intends to initialize *all* the idle tasks, including their
preempt_count (or their CPU's, if the arch uses a per-CPU preempt_count).
Unfortunately, it seems the way s390 does things doesn't let us initialize
every possible CPU's preempt_count early on, as the pages where this
resides are only allocated when a CPU is brought up and are freed when it
is brought down.
Let the arch-specific code set a CPU's preempt_count when its lowcore is
allocated, and turn init_idle_preempt_count() into an empty stub.
Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Both clang and gcc (for -march=z13 and later) align functions to 16
bytes at -O2 to benefit branch prediction.
Make asm symbols alignment consistent with that.
This also benefits potential ftrace code patching, which is currently
able to patch 8 aligned bytes at once.
With defconfig this currently increases .text size by 4104 bytes.
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Lower case matches the call_on_stack() macro and is easier to read.
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Make sure the to be called function takes no arguments (and returns void).
Otherwise usage of CALL_ON_STACK_NORETURN() would generate broken code.
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
The existing CALL_ON_STACK() macro allows for subtle bugs:
- There is no type checking of the function that is being called. That
is: missing or too many arguments do not cause any compile error or
warning. The same is true if the return type of the called function
changes. This can lead to quite random bugs.
- Sign and zero extension of arguments is missing. Given that the s390
C ABI requires that the caller of a function performs proper sign
and zero extension this can also lead to subtle bugs.
- If arguments to the CALL_ON_STACK() macros contain functions calls
register corruption can happen due to register asm constructs being
used.
Therefore introduce a new call_on_stack() macro which is supposed to
fix all these problems.
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Make on_async_stack() a bit more readable, even though as usual it
depends if one considers "!!!" readable or not.
At least the new construct to check if the async stack is in use or
not is a bit shorter and generates slightly better code.
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Move do_softirq_own_stack() to proper header file so it can be
inlined; saving a few cycles.
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
do_softirq_own_stack() is always called from task context and
therefore it is not necessary to check if the async stack is
currently used.
Remove the check and directly switch to async stack.
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
This is the second part of the cleanup for the header file ap.h
to remove the register asm statements. This patch deals with
the inline ap_dqap() function where within the assembler code
an odd register of an register pair is to be addressed.
[[email protected]: this intentionally breaks compilation with any
clang compilers prior to llvm-project commit 458eac257377 ("[SystemZ]
Support the 'N' code for the odd register in inline-asm.").
This is hopefully the last clang kernel compile breakage caused by
incompatibilities between gcc and clang.]
Signed-off-by: Harald Freudenberger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
PIF_SYSCALL_RESTART is now only used to restart execve when loading
PGSTE binaries. Rename the flag to reflect that, and avoid people
thinking that this bit has anything to do with generic syscall
restarting.
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
On s390, execve might have to be restarted for PGSTE binaries
like kvm. In the past this was done via the PIF_SYSCALL_RESTART
bit. However, with the recent changes, syscalls are now restarted
differently. Now that execve() is the only call that might get
restarted via PIF_SYSCALL_RESTART, move the loop to do_syscall().
This also has the advantage that the restart is no longer visible
to userspace.
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
{rt_}sigreturn is now called from the vdso, so we no longer
need the svc on the stack, and therefore no hack to support that
mechanism on machines with non-executable stack.
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
with generic entry, there's a bug when it comes to restarting of signals.
The failing sequence is:
a) a signal is coming in, and no handler is registered, so the lower
part of arch_do_signal_or_restart() in arch/s390/kernel/signal.c
sets PIF_SYSCALL_RESTART.
b) a second signal gets pending while the kernel is still in the exit
loop, and for that one, a handler exists.
c) The first part of arch_do_signal_or_restart() is called. That part
calls handle_signal(), which sets up stack + registers for handling
the signal.
d) __do_syscall() in arch/s390/kernel/syscall.c checks for
PIF_SYSCALL_RESTART right before leaving to userspace. If it is set,
it restart's the syscall. However, the registers are already setup
for handling a signal from c). The syscall is now restarted with the
wrong arguments.
Change the code to:
- use vdso for syscall_restart() instead of PIF_SYSCALL_RESTART because
we cannot rewind and go back to userspace on s390 because the system call
number might be encoded in the svc instruction.
- for all other syscalls we rewind the PSW and return to userspace.
Cc: <[email protected]> # v5.12+ d57778feb987: s390/vdso: always enable vdso
Cc: <[email protected]> # v5.12+ 686341f2548b: s390/vdso64: add sigreturn,rt_sigreturn and restart_syscall
Cc: <[email protected]> # v5.12+ 43e1f76b0b69: s390/vdso: rename VDSO64_LBASE to VDSO_LBASE
Cc: <[email protected]> # v5.12+ 779df2248739: s390/vdso: add minimal compat vdso
Cc: <[email protected]> # v5.12+
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
We have requests like IORING_OP_FILES_UPDATE that don't go through
->iopoll_list but get completed in place under ->uring_lock, and so
after dropping the lock io_iopoll_check() should expect that some CQEs
might have get completed in a meanwhile.
Currently such events won't be accounted in @nr_events, and the loop
will continue to poll even if there is enough of CQEs. It shouldn't be a
problem as it's not likely to happen and so, but not nice either. Just
return earlier in this case, it should be enough.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/66ef932cc66a34e3771bbae04b2953a8058e9d05.1625747741.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
An earlier commit set the pps_lookup cookie, but the line
was somehow added to the wrong code block. Correct this.
Fixes: 8602e40fc813 ("ptp: Set lookup cookie when creating a PTP PPS source.")
Signed-off-by: Jonathan Lemon <[email protected]>
Signed-off-by: Dario Binacchi <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull drm fixes from Dave Airlie:
"Some fixes for rc1 that came in the past weeks, mainly a bunch of
amdgpu fixes, some i915 and the rest are misc around the place. I'm
sending this a bit early so some more stuff may show up, but I'll
probably take tomorrow off.
dma-buf:
- doc fixes
amdgpu:
- Misc Navi fixes
- Powergating fix
- Yellow Carp updates
- Beige Goby updates
- S0ix fix
- Revert overlay validation fix
- GPU reset fix for DC
- PPC64 fix
- Add new dimgrey cavefish DID
- RAS fix
- TTM fixes
amdkfd:
- SVM fixes
radeon:
- Fix missing drm_gem_object_put in error path
- NULL ptr deref fix
i915:
- display DP VSC fix
- DG1 display fix
- IRQ fixes
- IRQ demidlayering
gma500:
- bo leaks in error paths fixed"
* tag 'drm-next-2021-07-08-1' of git://anongit.freedesktop.org/drm/drm: (52 commits)
drm/i915: Drop all references to DRM IRQ midlayer
drm/i915: Use the correct IRQ during resume
drm/i915/display/dg1: Correctly map DPLLs during state readout
drm/i915/display: Do not zero past infoframes.vsc
drm/amdgpu: Conditionally reset SDMA RAS error counts
drm/amdkfd: Maintain svm_bo reference in page->zone_device_data
drm/amdkfd: add invalid pages debug at vram migration
drm/amdkfd: skip migration for pages already in VRAM
drm/amdkfd: skip invalid pages during migrations
drm/amdkfd: classify and map mixed svm range pages in GPU
drm/amdkfd: use hmm range fault to get both domain pfns
drm/amdgpu: get owner ref in validate and map
drm/amdkfd: set owner ref to svm range prefault
drm/amdkfd: add owner ref param to get hmm pages
drm/amdkfd: device pgmap owner at the svm migrate init
drm/amdkfd: inc counter on child ranges with xnack off
drm/amd/display: Extend DMUB diagnostic logging to DCN3.1
drm/amdgpu: Update NV SIMD-per-CU to 2
drm/amdgpu: add new dimgrey cavefish DID
drm/amd/pm: skip PrepareMp1ForUnload message in s0ix
...
|
|
While TCP stack scales reasonably well, there is still one part that
can be used to DDOS it.
IPv6 Packet too big messages have to lookup/insert a new route,
and if abused by attackers, can easily put hosts under high stress,
with many cpus contending on a spinlock while one is stuck in fib6_run_gc()
ip6_protocol_deliver_rcu()
icmpv6_rcv()
icmpv6_notify()
tcp_v6_err()
tcp_v6_mtu_reduced()
inet6_csk_update_pmtu()
ip6_rt_update_pmtu()
__ip6_rt_update_pmtu()
ip6_rt_cache_alloc()
ip6_dst_alloc()
dst_alloc()
ip6_dst_gc()
fib6_run_gc()
spin_lock_bh() ...
Some of our servers have been hit by malicious ICMPv6 packets
trying to _increase_ the MTU/MSS of TCP flows.
We believe these ICMPv6 packets are a result of a bug in one ISP stack,
since they were blindly sent back for _every_ (small) packet sent to them.
These packets are for one TCP flow:
09:24:36.266491 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.266509 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.316688 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.316704 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.608151 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
TCP stack can filter some silly requests :
1) MTU below IPV6_MIN_MTU can be filtered early in tcp_v6_err()
2) tcp_v6_mtu_reduced() can drop requests trying to increase current MSS.
This tests happen before the IPv6 routing stack is entered, thus
removing the potential contention and route exhaustion.
Note that IPv6 stack was performing these checks, but too late
(ie : after the route has been added, and after the potential
garbage collect war)
v2: fix typo caught by Martin, thanks !
v3: exports tcp_mtu_to_mss(), caught by David, thanks !
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Maciej Żenczykowski <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"This contains mostly various fixes, cleanups and some conversions to
the atomic API. One noteworthy change is that PWM consumers can now
pass a hint to the PWM core about the PWM usage, enabling PWM
providers to implement various optimizations.
There's also a fair bit of simplification here with the addition of
some device-managed helpers as well as unification between the DT and
ACPI firmware interfaces"
* tag 'pwm/for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (50 commits)
pwm: Remove redundant assignment to pointer pwm
pwm: ep93xx: Fix read of uninitialized variable ret
pwm: ep93xx: Prepare clock before using it
pwm: ep93xx: Unfold legacy callbacks into ep93xx_pwm_apply()
pwm: ep93xx: Implement .apply callback
pwm: vt8500: Only unprepare the clock after the pwmchip was removed
pwm: vt8500: Drop if with an always false condition
pwm: tegra: Assert reset only after the PWM was unregistered
pwm: tegra: Don't needlessly enable and disable the clock in .remove()
pwm: tegra: Don't modify HW state in .remove callback
pwm: tegra: Drop an if block with an always false condition
pwm: core: Simplify some devm_*pwm*() functions
pwm: core: Remove unused devm_pwm_put()
pwm: core: Unify fwnode checks in the module
pwm: core: Reuse fwnode_to_pwmchip() in ACPI case
pwm: core: Convert to use fwnode for matching
docs: firmware-guide: ACPI: Add a PWM example
dt-bindings: pwm: pwm-tiecap: Add compatible string for AM64 SoC
dt-bindings: pwm: pwm-tiecap: Convert to json schema
pwm: sprd: Don't check the return code of pwmchip_remove()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull more clk updates from Stephen Boyd:
- A handful of fixes for lmk04832 driver
- Migrate the basic clk divider to use determine rate ops
- Fix modpost build for hisilicon hi3559a driver
- Actually set the parent in k210_clk_set_parent()
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
Revert "clk: divider: Switch from .round_rate to .determine_rate by default"
clk: hisilicon: hi3559a: Drop __init markings everywhere
clk: meson: regmap: switch to determine_rate for the dividers
clk: divider: Switch from .round_rate to .determine_rate by default
clk: divider: Add re-usable determine_rate implementations
clk: k210: Fix k210_clk_set_parent()
clk: lmk04832: Fix spelling mistakes in dev_err messages and comments
clk: lmk04832: fix return value check in lmk04832_probe()
clk: stm32mp1: fix missing spin_lock_init()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Fix dsm_label_utf16s_to_utf8s() buffer overrun (Krzysztof
Wilczyński)
- Rely on lengths from scnprintf(), dsm_label_utf16s_to_utf8s()
(Krzysztof Wilczyński)
- Use sysfs_emit() and sysfs_emit_at() in "show" functions (Krzysztof
Wilczyński)
- Fix 'resource_alignment' newline issues (Krzysztof Wilczyński)
- Add 'devspec' newline (Krzysztof Wilczyński)
- Dynamically map ECAM regions (Russell King)
Resource management:
- Coalesce host bridge contiguous apertures (Kai-Heng Feng)
PCIe native device hotplug:
- Ignore Link Down/Up caused by DPC (Lukas Wunner)
Power management:
- Leave Apple Thunderbolt controllers on for s2idle or standby
(Konstantin Kharlamov)
Virtualization:
- Work around Huawei Intelligent NIC VF FLR erratum (Chiqijun)
- Clarify error message for unbound IOV devices (Moritz Fischer)
- Add pci_reset_bus_function() Secondary Bus Reset interface (Raphael
Norwitz)
Peer-to-peer DMA:
- Simplify distance calculation (Christoph Hellwig)
- Finish RCU conversion of pdev->p2pdma (Eric Dumazet)
- Rename upstream_bridge_distance() and rework doc (Logan Gunthorpe)
- Collect acs list in stack buffer to avoid sleeping (Logan
Gunthorpe)
- Use correct calc_map_type_and_dist() return type (Logan Gunthorpe)
- Warn if host bridge not in whitelist (Logan Gunthorpe)
- Refactor pci_p2pdma_map_type() (Logan Gunthorpe)
- Avoid pci_get_slot(), which may sleep (Logan Gunthorpe)
Altera PCIe controller driver:
- Add Joyce Ooi as Altera PCIe maintainer (Joyce Ooi)
Broadcom iProc PCIe controller driver:
- Fix multi-MSI base vector number allocation (Sandor Bodo-Merle)
- Support multi-MSI only on uniprocessor kernel (Sandor Bodo-Merle)
Freescale i.MX6 PCIe controller driver:
- Limit DBI register length for imx6qp PCIe (Richard Zhu)
- Add "vph-supply" for PHY supply voltage (Richard Zhu)
- Enable PHY internal regulator when supplied >3V (Richard Zhu)
- Remove imx6_pcie_probe() redundant error message (Zhen Lei)
Intel Gateway PCIe controller driver:
- Fix INTx enable (Martin Blumenstingl)
Marvell Aardvark PCIe controller driver:
- Fix checking for PIO Non-posted Request (Pali Rohár)
- Implement workaround for the readback value of VEND_ID (Pali Rohár)
MediaTek PCIe controller driver:
- Remove redundant error printing in mtk_pcie_subsys_powerup() (Zhen
Lei)
MediaTek PCIe Gen3 controller driver:
- Add missing MODULE_DEVICE_TABLE (Zou Wei)
Microchip PolarFlare PCIe controller driver:
- Make struct event_descs static (Krzysztof Wilczyński)
Microsoft Hyper-V host bridge driver:
- Fix race condition when removing the device (Long Li)
- Remove bus device removal unused refcount/functions (Long Li)
Mobiveil PCIe controller driver:
- Remove unused readl and writel functions (Krzysztof Wilczyński)
NVIDIA Tegra PCIe controller driver:
- Add missing MODULE_DEVICE_TABLE (Zou Wei)
NVIDIA Tegra194 PCIe controller driver:
- Fix tegra_pcie_ep_raise_msi_irq() ill-defined shift (Jon Hunter)
- Fix host initialization during resume (Vidya Sagar)
Rockchip PCIe controller driver:
- Register IRQ handlers after device and data are ready (Javier
Martinez Canillas)"
* tag 'pci-v5.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits)
PCI/P2PDMA: Finish RCU conversion of pdev->p2pdma
PCI: xgene: Annotate __iomem pointer
PCI: Fix kernel-doc formatting
PCI: cpcihp: Declare cpci_debug in header file
MAINTAINERS: Add Joyce Ooi as Altera PCIe maintainer
PCI: rockchip: Register IRQ handlers after device and data are ready
PCI: tegra194: Fix tegra_pcie_ep_raise_msi_irq() ill-defined shift
PCI: aardvark: Implement workaround for the readback value of VEND_ID
PCI: aardvark: Fix checking for PIO Non-posted Request
PCI: tegra194: Fix host initialization during resume
PCI: tegra: Add missing MODULE_DEVICE_TABLE
PCI: imx6: Enable PHY internal regulator when supplied >3V
dt-bindings: imx6q-pcie: Add "vph-supply" for PHY supply voltage
PCI: imx6: Limit DBI register length for imx6qp PCIe
PCI: imx6: Remove imx6_pcie_probe() redundant error message
PCI: intel-gw: Fix INTx enable
PCI: iproc: Support multi-MSI only on uniprocessor kernel
PCI: iproc: Fix multi-MSI base vector number allocation
PCI: mediatek-gen3: Add missing MODULE_DEVICE_TABLE
PCI: Dynamically map ECAM regions
...
|
|
Like syscallhdr.sh and syscalltbl.sh, add a simple script to generate
the __NR_syscalls, which should not be exported to userspace.
This script is useful to replace arch/mips/kernel/syscalls/syscallnr.sh,
refactor arch/s390/kernel/syscalls/syscalltbl, and eliminate the code
surrounded by #ifdef __KERNEL__ / #endif from exported uapi/asm/unistd_*.h
files.
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
Currently, syscall{hdr,tbl}.sh sorts the entire syscall table, but you
can assume it is already sorted by the syscall number.
The generated syscall table does not work if the same syscall number
appears twice. Check it in the script.
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:
1GB mremap - Source PTE-aligned, Destination PTE-aligned
mremap time: 2292772ns
1GB mremap - Source PMD-aligned, Destination PMD-aligned
mremap time: 1158928ns
1GB mremap - Source PUD-aligned, Destination PUD-aligned
mremap time: 63886ns
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
flush_tlb_range is special in that we don't specify the page size used for
the translation. Hence when flushing TLB we flush the translation cache
for all possible page sizes. The kernel also uses the same interface when
moving page tables around. Such a move requires us to flush the page walk
cache.
Instead of adding another interface to force page walk cache flush, update
flush_tlb_range to flush page walk cache if the range flushed is more than
the PMD range. A page table move will always involve an invalidate range
more than PMD_SIZE.
Running microbenchmark with mprotect and parallel memory access didn't
show any observable performance impact.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Speedup mremap on ppc64", v8.
This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires
the platform to support updating higher-level page tables without updating
page table entries. This also needs to invalidate the Page Walk Cache on
architecture supporting the same.
This patch (of 3):
Architectures like ppc64 support faster mremap only with radix
translation. Hence allow a runtime check w.r.t support for fast mremap.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: "Aneesh Kumar K . V" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables(). The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken. This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.
As explained in commit eb66ae030829 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page. The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock. This can result in stale TLB entries as show
below.
This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::
Optmized PMD move
CPU 1 CPU 2 CPU 3
mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one
mmap_write_lock_killable()
addr = old_addr
lock(pte_ptl)
lock(pmd_ptl)
pmd = *old_pmd
pmd_clear(old_pmd)
flush_tlb_range(old_addr)
*new_pmd = pmd
*new_addr = 10; and fills
TLB with new addr
and old pfn
unlock(pmd_ptl)
ptep_clear_flush()
old pfn is free.
Stale TLB entry
Optimized PUD move also suffers from a similar race. Both the above race
condition can be fixed if we force mremap path to take rmap lock.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
pmd/pud_populate is the right interface to be used to set the respective
page table entries. Some architectures like ppc64 do assume that
set_pmd/pud_at can only be used to set a hugepage PTE. Since we are not
setting up a hugepage PTE here, use the pmd/pud_populate interface.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With two level page table don't enable move_normal_pud.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With TRANSPARENT_HUGEPAGE_PUD enabled the kernel can find huge PUD
entries. Add a helper to move huge PUD entries on mremap().
This will be used by a later patch to optimize mremap of PUD_SIZE aligned
level 4 PTE mapped address
This also make sure we support mremap on huge PUD entries even with
CONFIG_HAVE_MOVE_PUD disabled.
[[email protected]: fix build failure with clang-10]
Link: https://lore.kernel.org/lkml/YMuOSnJsL9qkxweY@archlinux-ax161
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With a large mmap map size, we can overlap with the text area and using
MAP_FIXED results in unmapping that area. Switch to MAP_FIXED_NOREPLACE
and handle the EEXIST error.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Kalesh Singh <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mrermap fixes", v2.
This patch (of 6):
Instead of hardcoding 4K page size fetch it using sysconf(). For the
performance measurements test still assume 2M and 1G are hugepage sizes.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Kalesh Singh <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|