Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull dma-mapping updates from Christoph Hellwig:
- do not zero buffer in set_memory_decrypted (Kirill A. Shutemov)
- fix return value of dma-debug __setup handlers (Randy Dunlap)
- swiotlb cleanups (Robin Murphy)
- remove most remaining users of the pci-dma-compat.h API
(Christophe JAILLET)
- share the ABI header for the DMA map_benchmark with userspace
(Tian Tao)
- update the maintainer for DMA MAPPING BENCHMARK (Xiang Chen)
- remove CONFIG_DMA_REMAP (me)
* tag 'dma-mapping-5.18' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: benchmark: extract a common header file for map_benchmark definition
dma-debug: fix return value of __setup handlers
dma-mapping: remove CONFIG_DMA_REMAP
media: v4l2-pci-skeleton: Remove usage of the deprecated "pci-dma-compat.h" API
rapidio/tsi721: Remove usage of the deprecated "pci-dma-compat.h" API
sparc: Remove usage of the deprecated "pci-dma-compat.h" API
agp/intel: Remove usage of the deprecated "pci-dma-compat.h" API
alpha: Remove usage of the deprecated "pci-dma-compat.h" API
MAINTAINERS: update maintainer list of DMA MAPPING BENCHMARK
swiotlb: simplify array allocation
swiotlb: tidy up includes
swiotlb: simplify debugfs setup
swiotlb: do not zero buffer in set_memory_decrypted()
|
|
Freescale Layerscape Lynx 28G SerDes PHYs are only present on
Freescale/NXP Layerscape SoCs.
Move PHY_FSL_LYNX_28G outside the block for ARCH_MXC, as the latter
is meant for i.MX8 SoCs, which is a different family than Layerscape.
Add a dependency on ARCH_LAYERSCAPE, to prevent asking the user about
this driver when configuring a kernel without Layerscape SoC support.
Fixes: 02e2af20f4f9f2aa ("Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
Fixes: 8f73b37cf3fbda67 ("phy: add support for the Layerscape SerDes 28G")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This reverts commit 53d862fac4a09b9c56cca0433fa9de5732fd05a1.
It turned out that flush_kernel_vmap_range() is being called with
interrupts disabled. There's no way to flush entire cache with
interrupts disabled.
Signed-off-by: Helge Deller <[email protected]>
|
|
The io-specific memcpy/memset functions use string mmio accesses to do
their work. Under SEV, the hypervisor can't emulate these instructions
because they read/write directly from/to encrypted memory.
KVM will inject a page fault exception into the guest when it is asked
to emulate string mmio instructions for an SEV guest:
BUG: unable to handle page fault for address: ffffc90000065068
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8000100000067 P4D 8000100000067 PUD 80001000fb067 PMD 80001000fc067 PTE 80000000fed40173
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc7 #3
As string mmio for an SEV guest can not be supported by the
hypervisor, unroll the instructions for CC_ATTR_GUEST_UNROLL_STRING_IO
enabled kernels.
This issue appears when kernels are launched in recent libvirt-managed
SEV virtual machines, because virt-install started to add a tpm-crb
device to the guest by default and proactively because, raisins:
https://github.com/virt-manager/virt-manager/commit/eb58c09f488b0633ed1eea012cd311e48864401e
and as that commit says, the default adding of a TPM can be disabled
with "virt-install ... --tpm none".
The kernel driver for tpm-crb uses memcpy_to/from_io() functions to
access MMIO memory, resulting in a page-fault injected by KVM and
crashing the kernel at boot.
[ bp: Massage and extend commit message. ]
Fixes: d8aa7eea78a1 ('x86/mm: Add Secure Encrypted Virtualization (SEV) support')
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
for-5.18/drivers
Pull NVMe fixes from Christoph:
"- fix multipath hang when disk goes live over reconnect (Anton Eidelman)
- fix RCU hole that allowed for endless looping in multipath round robin
(Chris Leech)
- remove redundant assignment after left shift (Colin Ian King)
- add quirks for Samsung X5 SSDs (Monish Kumar R)
- fix the read-only state for zoned namespaces with unsupposed features
(Pankaj Raghav)
- use a private workqueue instead of the system workqueue in nvmet
(Sagi Grimberg)
- allow duplicate NSIDs for private namespaces (Sungup Moon)
- expose use_threaded_interrupts read-only in sysfs (Xin Hao)"
* tag 'nvme-5.18-2022-03-29' of git://git.infradead.org/nvme:
nvme-multipath: fix hang when disk goes live over reconnect
nvme: fix RCU hole that allowed for endless looping in multipath round robin
nvme: allow duplicate NSIDs for private namespaces
nvmet: remove redundant assignment after left shift
nvmet: use a private workqueue instead of the system workqueue
nvme-pci: add quirks for Samsung X5 SSDs
nvme-pci: expose use_threaded_interrupts read-only in sysfs
nvme: fix the read-only state for zoned namespaces with unsupposed features
|
|
ioctls handled by phy_mii_ioctl() will cause a kernel oops when the
interface is down. Fix it by making sure there is a PHY attached.
Fixes: 735fec995b21 ("net: lan966x: Implement SIOCSHWTSTAMP and SIOCGHWTSTAMP")
Signed-off-by: Michael Walle <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Duoming Zhou says:
====================
Fix UAF bugs caused by ax25_release()
The first patch fixes UAF bugs in ax25_send_control, and
the second patch fixes UAF bugs in ax25 timers.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
There are race conditions that may lead to UAF bugs in
ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call
ax25_release() to deallocate ax25_dev.
One of the UAF bugs caused by ax25_release() is shown below:
(Thread 1) | (Thread 2)
ax25_dev_device_up() //(1) |
... | ax25_kill_by_device()
ax25_bind() //(2) |
ax25_connect() | ...
ax25_std_establish_data_link() |
ax25_start_t1timer() | ax25_dev_device_down() //(3)
mod_timer(&ax25->t1timer,..) |
| ax25_release()
(wait a time) | ...
| ax25_dev_put(ax25_dev) //(4)FREE
ax25_t1timer_expiry() |
ax25->ax25_dev->values[..] //USE| ...
... |
We increase the refcount of ax25_dev in position (1) and (2), and
decrease the refcount of ax25_dev in position (3) and (4).
The ax25_dev will be freed in position (4) and be used in
ax25_t1timer_expiry().
The fail log is shown below:
==============================================================
[ 106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60
[ 106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0
[ 106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574
[ 106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14
[ 106.116942] Call Trace:
...
[ 106.116942] ax25_t1timer_expiry+0x1c/0x60
[ 106.116942] call_timer_fn+0x122/0x3d0
[ 106.116942] __run_timers.part.0+0x3f6/0x520
[ 106.116942] run_timer_softirq+0x4f/0xb0
[ 106.116942] __do_softirq+0x1c2/0x651
...
This patch adds del_timer_sync() in ax25_release(), which could ensure
that all timers stop before we deallocate ax25_dev.
Signed-off-by: Duoming Zhou <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
There are UAF bugs in ax25_send_control(), when we call ax25_release()
to deallocate ax25_dev. The possible race condition is shown below:
(Thread 1) | (Thread 2)
ax25_dev_device_up() //(1) |
| ax25_kill_by_device()
ax25_bind() //(2) |
ax25_connect() | ...
ax25->state = AX25_STATE_1 |
... | ax25_dev_device_down() //(3)
(Thread 3)
ax25_release() |
ax25_dev_put() //(4) FREE |
case AX25_STATE_1: |
ax25_send_control() |
alloc_skb() //USE |
The refcount of ax25_dev increases in position (1) and (2), and
decreases in position (3) and (4). The ax25_dev will be freed
before dereference sites in ax25_send_control().
The following is part of the report:
[ 102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210
[ 102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602
[ 102.297448] Call Trace:
[ 102.303751] ax25_send_control+0x33/0x210
[ 102.303751] ax25_release+0x356/0x450
[ 102.305431] __sock_release+0x6d/0x120
[ 102.305431] sock_close+0xf/0x20
[ 102.305431] __fput+0x11f/0x420
[ 102.305431] task_work_run+0x86/0xd0
[ 102.307130] get_signal+0x1075/0x1220
[ 102.308253] arch_do_signal_or_restart+0x1df/0xc00
[ 102.308253] exit_to_user_mode_prepare+0x150/0x1e0
[ 102.308253] syscall_exit_to_user_mode+0x19/0x50
[ 102.308253] do_syscall_64+0x48/0x90
[ 102.308253] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 102.308253] RIP: 0033:0x405ae7
This patch defers the free operation of ax25_dev and net_device after
all corresponding dereference sites in ax25_release() to avoid UAF.
Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
Signed-off-by: Duoming Zhou <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
IPv6 nd target mask was not getting populated in flow dump.
In the function __ovs_nla_put_key the icmp code mask field was checked
instead of icmp code key field to classify the flow as neighbour discovery.
ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0),
skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0),
ct_zone(0/0),ct_mark(0/0),ct_label(0/0),
eth(src=00:00:00:00:00:00/00:00:00:00:00:00,
dst=00:00:00:00:00:00/00:00:00:00:00:00),
eth_type(0x86dd),
ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no),
icmpv6(type=135,code=0),
nd(target=2001::2/::,
sll=00:00:00:00:00:00/00:00:00:00:00:00,
tll=00:00:00:00:00:00/00:00:00:00:00:00),
packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2
Fixes: e64457191a25 (openvswitch: Restructure datapath.c and flow.c)
Signed-off-by: Martin Varghese <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
fresh ANA log from the ctrl. This is essential to have an up to date
path states for both existing namespaces and for those scan_work may
discover once the ctrl is up.
This happens in the following cases:
1) A new ctrl is being connected.
2) An existing ctrl is successfully reconnected.
3) An existing ctrl is being reset.
While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
nvme_read_ana_log() may call nvme_update_ns_ana_state().
This result in a hang when the ANA state of an existing namespace changes
and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
through the ctrl, which does NOT have IO queues yet.
See sample hang below.
Solution:
- nvme_update_ns_ana_state() to call set_live only if ctrl is live
- nvme_read_ana_log() call from nvme_mpath_init_identify()
therefore only fetches and parses the ANA log;
any erros in this process will fail the ctrl setup as appropriate;
- a separate function nvme_mpath_update()
is called in nvme_start_ctrl();
this parses the ANA log without fetching it.
At this point the ctrl is live,
therefore, disks can be set live normally.
Sample failure:
nvme nvme0: starting error recovery
nvme nvme0: Reconnecting in 10 seconds...
block nvme0n6: no usable path - requeuing I/O
INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
Tainted: G E 5.14.5-1.el7.elrepo.x86_64 #1
Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
Call Trace:
__schedule+0x2a2/0x7e0
schedule+0x4e/0xb0
io_schedule+0x16/0x40
wait_on_page_bit_common+0x15c/0x3e0
do_read_cache_page+0x1e0/0x410
read_cache_page+0x12/0x20
read_part_sector+0x46/0x100
read_lba+0x121/0x240
efi_partition+0x1d2/0x6a0
bdev_disk_changed.part.0+0x1df/0x430
bdev_disk_changed+0x18/0x20
blkdev_get_whole+0x77/0xe0
blkdev_get_by_dev+0xd2/0x3a0
__device_add_disk+0x1ed/0x310
device_add_disk+0x13/0x20
nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
nvme_update_ana_state+0xca/0xe0 [nvme_core]
nvme_parse_ana_log+0xac/0x170 [nvme_core]
nvme_read_ana_log+0x7d/0xe0 [nvme_core]
nvme_mpath_init_identify+0x105/0x150 [nvme_core]
nvme_init_identify+0x2df/0x4d0 [nvme_core]
nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
process_one_work+0x1bd/0x360
worker_thread+0x50/0x3d0
Signed-off-by: Anton Eidelman <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Make nvme_ns_remove match the assumptions elsewhere.
1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is
running in __nvme_find_path or nvme_round_robin_path that will
re-assign this ns to current_path.
2) Any matching current_path entries need to be cleared before removing
from the siblings list, to prevent calling nvme_round_robin_path with
an "old" ns that's off list.
3) Finally the list_del_rcu can happen, and then synchronize again
before releasing any reference counts.
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
A NVMe subsystem with multiple controller can have private namespaces
that use the same NSID under some conditions:
"If Namespace Management, ANA Reporting, or NVM Sets are supported, the
NSIDs shall be unique within the NVM subsystem. If the Namespace
Management, ANA Reporting, and NVM Sets are not supported, then NSIDs:
a) for shared namespace shall be unique; and
b) for private namespace are not required to be unique."
Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec.
Make sure this specific setup is supported in Linux.
Fixes: 9ad1927a3bc2 ("nvme: always search for namespace head")
Signed-off-by: Sungup Moon <[email protected]>
[hch: refactored and fixed the controller vs subsystem based naming
conflict]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
|
|
The left shift is followed by a re-assignment back to cc_css, the
assignment is redundant. Fix this by replacing the "<<=" operator with
"<<" instead.
This cleans up the clang scan build warning:
drivers/nvme/target/core.c:1124:10: warning: Although the value stored to 'cc_css' is used in the enclosing expression, the value is never actually read from 'cc_css' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Any attempt to flush kernel-global WQs has possibility of deadlock
so we should simply stop using them, instead introduce nvmet_wq
which is the generic nvmet workqueue for work elements that
don't explicitly require a dedicated workqueue (by the mere fact
that they are using the system_wq).
Changes were done using the following replaces:
- s/schedule_work(/queue_work(nvmet_wq, /g
- s/schedule_delayed_work(/queue_delayed_work(nvmet_wq, /g
- s/flush_scheduled_work()/flush_workqueue(nvmet_wq)/g
Reported-by: Tetsuo Handa <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
llvm upstream patch ([1]) added to issue warning for code like
void test() {
int j = 0;
for (int i = 0; i < 1000; i++)
j++;
return;
}
This triggered several errors in selftests/bpf build since
compilation flag -Werror is used.
...
test_lpm_map.c:212:15: error: variable 'n_matches' set but not used [-Werror,-Wunused-but-set-variable]
size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups;
^
test_lpm_map.c:212:26: error: variable 'n_matches_after_delete' set but not used [-Werror,-Wunused-but-set-variable]
size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups;
^
...
prog_tests/get_stack_raw_tp.c:32:15: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable]
static __u64 cnt;
^
...
For test_lpm_map.c, 'n_matches'/'n_matches_after_delete' are changed to be volatile
in order to silent the warning. I didn't remove these two declarations since
they are referenced in a commented code which might be used by people in certain
cases. For get_stack_raw_tp.c, the variable 'cnt' is removed.
[1] https://reviews.llvm.org/D122271
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Maciej Fijalkowski says:
====================
Hello,
yet another fixes for XSK from Magnus and me.
Magnus addresses the fact that xp_alloc() can return NULL, so this needs
to be handled to avoid clearing entries in the SW ring on driver side.
Then he addresses the off-by-one problem in Tx desc cleaning routine for
ice ZC driver.
From my side, I am adding protection to ZC Rx processing loop so that
cleaning of descriptors wouldn't go over already processed entries.
Then I also fix an issue with assigning XSK pool to Tx queues.
This is directed to bpf tree.
Thanks!
Maciej Fijalkowski (2):
ice: xsk: stop Rx processing when ntc catches ntu
ice: xsk: fix indexing in ice_tx_xsk_pool()
====================
Acked-by: Alexander Lobakin <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Ice driver tries to always create XDP rings array to be
num_possible_cpus() sized, regardless of user's queue count setting that
can be changed via ethtool -L for example.
Currently, ice_tx_xsk_pool() calculates the qid by decrementing the
ring->q_index by the count of XDP queues, but ring->q_index is set to 'i
+ vsi->alloc_txq'.
When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but
vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool()
will do OOB access and in the final result ring would not get xsk_pool
pointer assigned. Then, each ice_xsk_wakeup() call will fail with error
and it will not be possible to get into NAPI and do the processing from
driver side.
Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from
ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the
setting of ring->q_index.
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This can happen with big budget values and some breakage of re-filling
descriptors as we do not clear the entry that ntu is pointing at the end
of ice_alloc_rx_bufs_zc. So if ntc is at ntu then it might be the case
that status_error0 has an old, uncleared value and ntc would go over
with processing which would result in false results.
Break Rx loop when ntc == ntu to avoid broken behavior.
Fixes: 3876ff525de7 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The NIC Tx ring completion routine cleans entries from the ring in
batches. However, it processes one more batch than it is supposed
to. Note that this does not matter from a functionality point of view
since it will not find a set DD bit for the next batch and just exit
the loop. But from a performance perspective, it is faster to
terminate the loop before and not issue an expensive read over PCIe to
get the DD bit.
Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
For the case when xp_alloc_batch() is used but the batched allocation
cannot be used, there is a slow path that uses the non-batched
xp_alloc(). When it fails to allocate an entry, it returns NULL. The
current code wrote this NULL into the entry of the provided results
array (pointer to the driver SW ring usually) and returned. This might
not be what the driver expects and to make things simpler, just write
successfully allocated xdp_buffs into the SW ring,. The driver might
have information in there that is still important after an allocation
failure.
Note that at this point in time, there are no drivers using
xp_alloc_batch() that could trigger this slow path. But one might get
added.
Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool")
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Masami Hiramatsu says:
====================
Here are the 3rd version for generic kretprobe and kretprobe on x86 for
replacing the kretprobe trampoline with rethook. The previous version
is here[1]
[1] https://lore.kernel.org/all/164821817332.2373735.12048266953420821089.stgit@devnote2/T/#u
This version fixed typo and build issues for bpf-next and CONFIG_RETHOOK=y
error. I also add temporary mitigation lines for ANNOTATE_NOENDBR macro
issue for bpf-next tree [2/4].
This will be removed after merging kernel IBT series.
Background:
This rethook came from Jiri's request of multiple kprobe for bpf[2].
He tried to solve an issue that starting bpf with multiple kprobe will
take a long time because bpf-kprobe will wait for RCU grace period for
sync rcu events.
Jiri wanted to attach a single bpf handler to multiple kprobes and
he tried to introduce multiple-probe interface to kprobe. So I asked
him to use ftrace and kretprobe-like hook if it is only for the
function entry and exit, instead of adding ad-hoc interface
to kprobes.
For this purpose, I introduced the fprobe (kprobe like interface for
ftrace) with the rethook (this is a generic return hook feature for
fprobe exit handler)[3].
[2] https://lore.kernel.org/all/[email protected]/T/#u
[3] https://lore.kernel.org/all/164191321766.806991.7930388561276940676.stgit@devnote2/T/#u
The rethook is basically same as the kretprobe trampoline. I just made
it decoupled from kprobes. Eventually, the all arch dependent kretprobe
trampolines will be replaced with the rethook trampoline instead of
cloning and set HAVE_RETHOOK=y.
When I port the rethook for all arch which supports kretprobe, the
legacy kretprobe specific code (which is for CONFIG_KRETPROBE_ON_RETHOOK=n)
will be removed eventually.
====================
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Currently the optprobe trampoline template code ganerate an
almost complete pt_regs on-stack, everything except regs->ss.
The 'regs->ss' points to the top of stack, which is not a
valid segment decriptor.
As same as the rethook does, complete the job by also pushing ss.
Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/164826166027.2455864.14759128090648961900.stgit@devnote2
|
|
Currently arch_rethook_trampoline() generates an almost complete
pt_regs on-stack, everything except regs->ss that is, that currently
points to the fake return address, which is not a valid segment
descriptor.
Since interpretation of regs->[sb]p should be done in the context of
regs->ss, and we have code actually doing that (see
arch/x86/lib/insn-eval.c for instance), complete the job by also
pushing ss.
This ensures that anybody who does do look at regs->ss doesn't
mysteriously malfunction, avoiding much future pain.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Link: https://lore.kernel.org/bpf/164826164851.2455864.17272661073069737350.stgit@devnote2
|
|
Replaces the kretprobe code with rethook on x86. With this patch,
kretprobe on x86 uses the rethook instead of kretprobe specific
trampoline code.
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/bpf/164826163692.2455864.13745421016848209527.stgit@devnote2
|
|
Use rethook for kretprobe function return hooking if the arch sets
CONFIG_HAVE_RETHOOK=y. In this case, CONFIG_KRETPROBE_ON_RETHOOK is
set to 'y' automatically, and the kretprobe internal data fields
switches to use rethook. If not, it continues to use kretprobe
specific function return hooks.
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/164826162556.2455864.12255833167233452047.stgit@devnote2
|
|
Arnaldo reported perf compilation fail with:
$ make -k BUILD_BPF_SKEL=1 CORESIGHT=1 PYTHON=python3
...
In file included from util/bpf_counter.c:28:
/tmp/build/perf//util/bpf_skel/bperf_leader.skel.h: In function ‘bperf_leader_bpf__assert’:
/tmp/build/perf//util/bpf_skel/bperf_leader.skel.h:351:51: error: unused parameter ‘s’ [-Werror=unused-parameter]
351 | bperf_leader_bpf__assert(struct bperf_leader_bpf *s)
| ~~~~~~~~~~~~~~~~~~~~~~~~~^
cc1: all warnings being treated as errors
If there's nothing to generate in the new assert function,
we will get unused 's' warn/error, adding 'unused' attribute to it.
Fixes: 08d4dba6ae77 ("bpftool: Bpf skeletons assert type sizes")
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
14c174633f34 ("random: remove unused tracepoints") removed all the
tracepoints from drivers/char/random.c, one of which,
random:urandom_read, was used by stacktrace_build_id selftest to trigger
stack trace capture.
Fix breakage by switching to kprobing urandom_read() function.
Suggested-by: Yonghong Song <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Since the m->arg_size array can hold up to MAX_BPF_FUNC_ARGS argument
sizes, it's ok that nargs is equal to MAX_BPF_FUNC_ARGS.
Signed-off-by: Yuntao Wang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Commit ee2a098851bf missed updating the comments for helper bpf_get_stack
in tools/include/uapi/linux/bpf.h. Sync it.
Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/ce54617746b7ed5e9ba3b844e55e74cb8a60e0b5.1648110794.git.geliang.tang@suse.com
|
|
Masami Hiramatsu says:
====================
Hi,
These fprobe patches are for fixing the warnings by Smatch and sparse.
This is arch independent part of the fixes.
Thank you,
---
====================
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Since ftrace_ops::local_hash::filter_hash field is an __rcu pointer,
we have to use rcu_access_pointer() to access it.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/164802093635.1732982.4938094876018890866.stgit@devnote2
|
|
Fix the type mismatching warning of 'rethook_node vs fprobe_rethook_node'
found by Smatch.
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/164802092611.1732982.12268174743437084619.stgit@devnote2
|
|
In [1], we added a kconfig knob that can set
/proc/sys/kernel/unprivileged_bpf_disabled to 2
We now check against this value in bpftool feature probe
[1] https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net
Signed-off-by: Milan Landaverde <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Acked-by: KP Singh <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Absolute paths in $ref should always begin with '/schemas'. The tools
mostly work with it omitted, but for correctness the path should be
everything except the hostname as that is taken from the schema's $id
value. This scheme is defined in the json-schema spec.
Cc: Hector Martin <[email protected]>
Cc: Sven Peter <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Vivien Didelot <[email protected]>
Cc: Florian Fainelli <[email protected]>
Cc: Vladimir Oltean <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Chunfeng Yun <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Mukesh Savaliya <[email protected]>
Cc: Akash Asthana <[email protected]>
Cc: Bayi Cheng <[email protected]>
Cc: Chuanhong Guo <[email protected]>
Cc: Min Guo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Rob Herring <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Acked-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
'dma-ranges' in the example is written for cell sizes of 2 cells, but
the schema and example specify sizes of 1 cell. As the h/w has a bus
address of >32-bits, cell sizes of 2 is correct. Update the schema's
'#address-cells' and '#size-cells' to be 2 and adjust the example
throughout.
There's no error currently because dtc only checks 'dma-ranges' is a
correct multiple number of cells (3) and the schema checking is based on
bracketing of entries.
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
PBL can be any of the following values: 1, 2, 4, 8, 16 or 32
according to the datasheet, so modify available values of PBL in
snps,dwmac.yaml.
Signed-off-by: Biao Huang <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
To avoid failure of dt_binding_check perform a slight refactoring
of the examples: the main block is kept, but that required fixing
the address and size cells, plus the inclusion of missing dt-bindings
headers, required to parse some of the values assigned to various
properties.
Fixes: 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml")
Signed-off-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: jason-jh.lin <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Acked-by: Chun-Kuang Hu <[email protected]>
Tested-by: jason-jh.lin <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The property is called 'iommus' and not 'iommu'. Fix this typo.
Fixes: 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml")
Signed-off-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: jason-jh.lin <[email protected]>
Acked-by: Rob Herring <[email protected]>
Acked-by: Chun-Kuang Hu <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The mediatek,gce-events property needs as value an array of uint32
corresponding to the CMDQ events to listen to, and not any phandle.
Fixes: 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml")
Signed-off-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: jason-jh.lin <[email protected]>
Acked-by: Rob Herring <[email protected]>
Acked-by: Chun-Kuang Hu <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This reverts commit e7dcfe64204a5cd9a74a9ca7d9c7a22434dc7fe5.
Because examples property of mediatek,ethdr.yaml should base on [1][2].
Reverting it until [1][2] are applied.
[1] dt-bindings: mediatek: mt8195: Add binding for MM IOMMU
https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/
[2] dt-bindings: reset: mt8195: add vdosys1 reset control bit
https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/
Signed-off-by: jason-jh.lin <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull shm ucounts fix from Eric Biederman:
"The introduction of a new failure mode when the code was converted to
ucounts resulted in user_shm_lock misbehaving.
The change simplifies the code to make the code easier to follow and
removes the known misbehaviors"
* tag 'ucount-rlimit-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
mm/mlock: fix two bugs in user_shm_lock()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
Current release - regressions:
- llc: only change llc->dev when bind() succeeds, fix null-deref
Current release - new code bugs:
- smc: fix a memory leak in smc_sysctl_net_exit()
- dsa: realtek: make interface drivers depend on OF
Previous releases - regressions:
- sched: act_ct: fix ref leak when switching zones
Previous releases - always broken:
- netfilter: egress: report interface as outgoing
- vsock/virtio: enable VQs early on probe and finish the setup before
using them
Misc:
- memcg: enable accounting for nft objects"
* tag 'net-5.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (39 commits)
Revert "selftests: net: Add tls config dependency for tls selftests"
net/smc: Send out the remaining data in sndbuf before close
net: move net_unlink_todo() out of the header
net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator
net: bnxt_ptp: fix compilation error
selftests: net: Add tls config dependency for tls selftests
memcg: enable accounting for nft objects
net/sched: act_ct: fix ref leak when switching zones
net/smc: fix a memory leak in smc_sysctl_net_exit()
selftests: tls: skip cmsg_to_pipe tests with TLS=n
octeontx2-af: initialize action variable
net: sparx5: switchdev: fix possible NULL pointer dereference
net/x25: Fix null-ptr-deref caused by x25_disconnect
qlcnic: dcb: default to returning -EOPNOTSUPP
net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL
net: hns3: fix phy can not link up when autoneg off and reset
net: hns3: add NULL pointer check for hns3_set/get_ringparam()
net: hns3: add netdev reset check for hns3_set_tunable()
net: hns3: clean residual vf config after disable sriov
net: hns3: add max order judgement for tx spare buffer
...
|
|
If there is already an entry present that is of order >= XA_CHUNK_SHIFT
when we call xas_create_range(), xas_create_range() will misinterpret
that entry as a node and dereference xa_node->parent, generally leading
to a crash that looks something like this:
general protection fault, probably for non-canonical address 0xdffffc0000000001:
0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0
RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline]
RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725
It's deterministically reproducable once you know what the problem is,
but producing it in a live kernel requires khugepaged to hit a race.
While the problem has been present since xas_create_range() was
introduced, I'm not aware of a way to hit it before the page cache was
converted to use multi-index entries.
Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Reported-by: [email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
|
|
This reverts commit d9142e1cf3bbdaf21337767114ecab26fe702d47.
The test is supposed to run cleanly with TLS is disabled,
to test compatibility with TCP behavior. I can't repro
the failure [1], the problem should be debugged rather
than papered over.
Link: https://lore.kernel.org/all/20220325161203.7000698c@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ [1]
Fixes: d9142e1cf3bb ("selftests: net: Add tls config dependency for tls selftests")
Signed-off-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The current autocork algorithms will delay the data transmission
in BH context to smc_release_cb() when sock_lock is hold by user.
So there is a possibility that when connection is being actively
closed (sock_lock is hold by user now), some corked data still
remains in sndbuf, waiting to be sent by smc_release_cb(). This
will cause:
- smc_close_stream_wait(), which is called under the sock_lock,
has a high probability of timeout because data transmission is
delayed until sock_lock is released.
- Unexpected data sends may happen after connction closed and use
the rtoken which has been deleted by remote peer through
LLC_DELETE_RKEY messages.
So this patch will try to send out the remaining corked data in
sndbuf before active close process, to ensure data integrity and
avoid unexpected data transmission after close.
Reported-by: Guangguan Wang <[email protected]>
Fixes: 6b88af839d20 ("net/smc: don't send in the BH context if sock_owned_by_user")
Signed-off-by: Wen Gu <[email protected]>
Acked-by: Karsten Graul <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently the way the tid (tree connection) status is tracked
is confusing. The same enum is used for structs cifs_tcon
and cifs_ses and TCP_Server_info, but each of these three has
different states that they transition among. The current
code also unnecessarily uses camelCase.
Convert from use of statusEnum to a new tid_status_enum for
tree connections. The valid states for a tid are:
TID_NEW = 0,
TID_GOOD,
TID_EXITING,
TID_NEED_RECON,
TID_NEED_TCON,
TID_IN_TCON,
TID_NEED_FILES_INVALIDATE, /* unused, considering removing in future */
TID_IN_FILES_INVALIDATE
It also removes CifsNeedTcon, CifsInTcon, CifsNeedFilesInvalidate and
CifsInFilesInvalidate from the statusEnum used for session and
TCP_Server_Info since they are not relevant for those.
A follow on patch will fix the places where we use the
tcon->need_reconnect flag to be more consistent with the tid->status.
Also fixes a bug that was:
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Shyam Prasad N <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb update from Daniel Thompson:
"Only a single patch this cycle. Fix an obvious mistake with the kdb
memory accessors.
It was a stupid mistake (to/from backwards) but it has been there for
a long time since many architectures tolerated it with surprisingly
good grace"
* tag 'kgdb-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: Fix the putarea helper function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux
Pull hexagon update from Brian Cain:
"Maintainer email update"
* tag 'hexagon-5.18-0' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux:
MAINTAINERS: update hexagon maintainer email, tree
|