Age | Commit message (Collapse) | Author | Files | Lines |
|
Heiko Carstens reported that SCM_PIDFD does not work with MSG_CMSG_COMPAT
because scm_pidfd_recv() always checks msg_controllen against sizeof(struct
cmsghdr).
We need to use sizeof(struct compat_cmsghdr) for the compat case.
Fixes: 5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD")
Reported-by: Heiko Carstens <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Tested-by: Heiko Carstens <[email protected]>
Reviewed-by: Alexander Mikhalitsyn <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some corporate proxies block our current NIPA URLs because
they use a free / shady DNS domain. As suggested by Jesse
we got a new DNS entry from Konstantin - netdev.bots.linux.dev,
use it.
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
|
|
|
|
The patchwork states are largely self-explanatory but small
ambiguities may still come up. Document how we interpret
the states in networking.
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There is a race where skb's from the sk_psock_backlog can be referenced
after userspace side has already skb_consumed() the sk_buff and its refcnt
dropped to zer0 causing use after free.
The flow is the following:
while ((skb = skb_peek(&psock->ingress_skb))
sk_psock_handle_Skb(psock, skb, ..., ingress)
if (!ingress) ...
sk_psock_skb_ingress
sk_psock_skb_ingress_enqueue(skb)
msg->skb = skb
sk_psock_queue_msg(psock, msg)
skb_dequeue(&psock->ingress_skb)
The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is
what the application reads when recvmsg() is called. An application can
read this anytime after the msg is placed on the queue. The recvmsg hook
will also read msg->skb and then after user space reads the msg will call
consume_skb(skb) on it effectively free'ing it.
But, the race is in above where backlog queue still has a reference to
the skb and calls skb_dequeue(). If the skb_dequeue happens after the
user reads and free's the skb we have a use after free.
The !ingress case does not suffer from this problem because it uses
sendmsg_*(sk, msg) which does not pass the sk_buff further down the
stack.
The following splat was observed with 'test_progs -t sockmap_listen':
[ 1022.710250][ T2556] general protection fault, ...
[...]
[ 1022.712830][ T2556] Workqueue: events sk_psock_backlog
[ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80
[ 1022.713653][ T2556] Code: ...
[...]
[ 1022.720699][ T2556] Call Trace:
[ 1022.720984][ T2556] <TASK>
[ 1022.721254][ T2556] ? die_addr+0x32/0x80^M
[ 1022.721589][ T2556] ? exc_general_protection+0x25a/0x4b0
[ 1022.722026][ T2556] ? asm_exc_general_protection+0x22/0x30
[ 1022.722489][ T2556] ? skb_dequeue+0x4c/0x80
[ 1022.722854][ T2556] sk_psock_backlog+0x27a/0x300
[ 1022.723243][ T2556] process_one_work+0x2a7/0x5b0
[ 1022.723633][ T2556] worker_thread+0x4f/0x3a0
[ 1022.723998][ T2556] ? __pfx_worker_thread+0x10/0x10
[ 1022.724386][ T2556] kthread+0xfd/0x130
[ 1022.724709][ T2556] ? __pfx_kthread+0x10/0x10
[ 1022.725066][ T2556] ret_from_fork+0x2d/0x50
[ 1022.725409][ T2556] ? __pfx_kthread+0x10/0x10
[ 1022.725799][ T2556] ret_from_fork_asm+0x1b/0x30
[ 1022.726201][ T2556] </TASK>
To fix we add an skb_get() before passing the skb to be enqueued in the
engress queue. This bumps the skb->users refcnt so that consume_skb()
and kfree_skb will not immediately free the sk_buff. With this we can
be sure the skb is still around when we do the dequeue. Then we just
need to decrement the refcnt or free the skb in the backlog case which
we do by calling kfree_skb() on the ingress case as well as the sendmsg
case.
Before locking change from fixes tag we had the sock locked so we
couldn't race with user and there was no issue here.
Fixes: 799aa7f98d53e ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Jiri Olsa <[email protected]>
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Xu Kuohai <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Previously, the defines for phy_device flags in the Micrel driver were
ambiguous in their representation. They were intended to be bit masks
but were mistakenly defined as bit positions. This led to the following
issues:
- MICREL_KSZ8_P1_ERRATA, designated for KSZ88xx switches, overlapped
with MICREL_PHY_FXEN and MICREL_PHY_50MHZ_CLK.
- Due to this overlap, the code path for MICREL_PHY_FXEN, tailored for
the KSZ8041 PHY, was not executed for KSZ88xx PHYs.
- Similarly, the code associated with MICREL_PHY_50MHZ_CLK wasn't
triggered for KSZ88xx.
To rectify this, all three flags have now been explicitly converted to
use the `BIT()` macro, ensuring they are defined as bit masks and
preventing potential overlaps in the future.
Fixes: 49011e0c1555 ("net: phy: micrel: ksz886x/ksz8081: add cabletest support")
Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The existing code incorrectly casted a negative value (the result of a
subtraction) to an unsigned value without checking. For example, if
/proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred
lifetime would jump to 4 billion seconds. On my machine and network the
shortest lifetime that avoided underflow was 3 seconds.
Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR")
Signed-off-by: Alex Henrie <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The veth_xmit function returns NETDEV_TX_OK even when packets are dropped.
This behavior leads to incorrect calculations of statistics counts, as
well as things like txq->trans_start updates.
Fixes: e314dbdc1c0d ("[NET]: Virtual ethernet device driver.")
Signed-off-by: Liang Chen <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
gve_rx_append_frags() is able to build skbs chained with frag_list,
like GRO engine.
Problem is that shinfo->frag_list should only be used
for the head of the chain.
All other links should use skb->next pointer.
Otherwise, built skbs are not valid and can cause crashes.
Equivalent code in GRO (skb_gro_receive()) is:
if (NAPI_GRO_CB(p)->last == p)
skb_shinfo(p)->frag_list = skb;
else
NAPI_GRO_CB(p)->last->next = skb;
NAPI_GRO_CB(p)->last = skb;
Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Bailey Forrest <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: Catherine Sullivan <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Blamed commit changed:
ptr = kmalloc(size);
if (ptr)
size = ksize(ptr);
to:
size = kmalloc_size_roundup(size);
ptr = kmalloc(size);
This allowed various crash as reported by syzbot [1]
and Kyle Zeng.
Problem is that if @size is bigger than 0x80000001,
kmalloc_size_roundup(size) returns 2^32.
kmalloc_reserve() uses a 32bit variable (obj_size),
so 2^32 is truncated to 0.
kmalloc(0) returns ZERO_SIZE_PTR which is not handled by
skb allocations.
Following trace can be triggered if a netdev->mtu is set
close to 0x7fffffff
We might in the future limit netdev->mtu to more sensible
limit (like KMALLOC_MAX_SIZE).
This patch is based on a syzbot report, and also a report
and tentative fix from Kyle Zeng.
[1]
BUG: KASAN: user-memory-access in __build_skb_around net/core/skbuff.c:294 [inline]
BUG: KASAN: user-memory-access in __alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
Write of size 32 at addr 00000000fffffd10 by task syz-executor.4/22554
CPU: 1 PID: 22554 Comm: syz-executor.4 Not tainted 6.1.39-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
Call trace:
dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:279
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:286
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x120/0x1a0 lib/dump_stack.c:106
print_report+0xe4/0x4b4 mm/kasan/report.c:398
kasan_report+0x150/0x1ac mm/kasan/report.c:495
kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:189
memset+0x40/0x70 mm/kasan/shadow.c:44
__build_skb_around net/core/skbuff.c:294 [inline]
__alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
alloc_skb include/linux/skbuff.h:1316 [inline]
igmpv3_newpack+0x104/0x1088 net/ipv4/igmp.c:359
add_grec+0x81c/0x1124 net/ipv4/igmp.c:534
igmpv3_send_cr net/ipv4/igmp.c:667 [inline]
igmp_ifc_timer_expire+0x1b0/0x1008 net/ipv4/igmp.c:810
call_timer_fn+0x1c0/0x9f0 kernel/time/timer.c:1474
expire_timers kernel/time/timer.c:1519 [inline]
__run_timers+0x54c/0x710 kernel/time/timer.c:1790
run_timer_softirq+0x28/0x4c kernel/time/timer.c:1803
_stext+0x380/0xfbc
____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:79
call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891
do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:84
invoke_softirq kernel/softirq.c:437 [inline]
__irq_exit_rcu+0x1c0/0x4cc kernel/softirq.c:683
irq_exit_rcu+0x14/0x78 kernel/softirq.c:695
el0_interrupt+0x7c/0x2e0 arch/arm64/kernel/entry-common.c:717
__el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:724
el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:729
el0t_64_irq+0x1a0/0x1a4 arch/arm64/kernel/entry.S:584
Fixes: 12d6c1d3a2ad ("skbuff: Proactively round up to kmalloc bucket size")
Reported-by: syzbot <[email protected]>
Reported-by: Kyle Zeng <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In current packed virtqueue implementation, the avail_wrap_counter won't
flip, in the case when the driver supplies a descriptor chain with a
length equals to the queue size; total_sg == vq->packed.vring.num.
Let’s assume the following situation:
vq->packed.vring.num=4
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0
Then the driver adds a descriptor chain containing 4 descriptors.
We expect the following result with avail_wrap_counter flipped:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 1
But, the current implementation gives the following result:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0
To reproduce the bug, you can set a packed queue size as small as
possible, so that the driver is more likely to provide a descriptor
chain with a length equal to the packed queue size. For example, in
qemu run following commands:
sudo qemu-system-x86_64 \
-enable-kvm \
-nographic \
-kernel "path/to/kernel_image" \
-m 1G \
-drive file="path/to/rootfs",if=none,id=disk \
-device virtio-blk,drive=disk \
-drive file="path/to/disk_image",if=none,id=rwdisk \
-device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
indirect_desc=off \
-append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
Inside the VM, create a directory and mount the rwdisk device on it. The
rwdisk will hang and mount operation will not complete.
This commit fixes the wrap counter error by flipping the
packed.avail_wrap_counter, when start of descriptor chain equals to the
end of descriptor chain (head == i).
Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support")
Signed-off-by: Yuan Yao <[email protected]>
Message-Id: <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
We try to build affinity mask via create_affinity_masks()
unconditionally which may lead several issues:
- the affinity mask is not used for parent without affinity support
(only VDUSE support the affinity now)
- the logic of create_affinity_masks() might not work for devices
other than block. For example it's not rare in the networking device
where the number of queues could exceed the number of CPUs. Such
case breaks the current affinity logic which is based on
group_cpus_evenly() who assumes the number of CPUs are not less than
the number of groups. This can trigger a warning[1]:
if (ret >= 0)
WARN_ON(nr_present + nr_others < numgrps);
Fixing this by only build the affinity masks only when
- Driver passes affinity descriptor, driver like virtio-blk can make
sure to limit the number of queues when it exceeds the number of CPUs
- Parent support affinity setting config ops
This help to avoid the warning. More optimizations could be done on
top.
[1]
[ 682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 group_cpus_evenly+0x1aa/0x1c0
[ 682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
[ 682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
[ 682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
[ 682.146679] RSP: 0018:ffffc9000215f498 EFLAGS: 00010293
[ 682.146682] RAX: 000000000001f1e0 RBX: 0000000000000041 RCX: 0000000000000000
[ 682.146684] RDX: ffff888109922058 RSI: 0000000000000041 RDI: 0000000000000030
[ 682.146686] RBP: ffff888109922058 R08: ffffc9000215f498 R09: ffffc9000215f4a0
[ 682.146687] R10: 00000000000198d0 R11: 0000000000000030 R12: ffff888107e02800
[ 682.146689] R13: 0000000000000030 R14: 0000000000000030 R15: 0000000000000041
[ 682.146692] FS: 00007fef52315740(0000) GS:ffff888237380000(0000) knlGS:0000000000000000
[ 682.146695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 682.146696] CR2: 00007fef52509000 CR3: 0000000110dbc004 CR4: 0000000000370ee0
[ 682.146698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 682.146700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 682.146701] Call Trace:
[ 682.146703] <TASK>
[ 682.146705] ? __warn+0x7b/0x130
[ 682.146709] ? group_cpus_evenly+0x1aa/0x1c0
[ 682.146712] ? report_bug+0x1c8/0x1e0
[ 682.146717] ? handle_bug+0x3c/0x70
[ 682.146721] ? exc_invalid_op+0x14/0x70
[ 682.146723] ? asm_exc_invalid_op+0x16/0x20
[ 682.146727] ? group_cpus_evenly+0x1aa/0x1c0
[ 682.146729] ? group_cpus_evenly+0x15c/0x1c0
[ 682.146731] create_affinity_masks+0xaf/0x1a0
[ 682.146735] virtio_vdpa_find_vqs+0x83/0x1d0
[ 682.146738] ? __pfx_default_calc_sets+0x10/0x10
[ 682.146742] virtnet_find_vqs+0x1f0/0x370
[ 682.146747] virtnet_probe+0x501/0xcd0
[ 682.146749] ? vp_modern_get_status+0x12/0x20
[ 682.146751] ? get_cap_addr.isra.0+0x10/0xc0
[ 682.146754] virtio_dev_probe+0x1af/0x260
[ 682.146759] really_probe+0x1a5/0x410
Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading mechanism")
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Currently, the virtio core will perform a dma operation for each
buffer. Although, the same page may be operated multiple times.
This patch, the driver does the dma operation and manages the dma
address based the feature premapped of virtio core.
This way, we can perform only one dma operation for the pages of the
alloc frag. This is beneficial for the iommu device.
kernel command line: intel_iommu=on iommu.passthrough=0
| strict=0 | strict=1
Before | 775496pps | 428614pps
After | 1109316pps | 742853pps
Signed-off-by: Xuan Zhuo <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
These API has been introduced:
* virtqueue_dma_need_sync
* virtqueue_dma_sync_single_range_for_cpu
* virtqueue_dma_sync_single_range_for_device
These APIs can be used together with the premapped mechanism to sync the
DMA address.
Signed-off-by: Xuan Zhuo <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Added virtqueue_dma_map_api* to map DMA addresses for virtual memory in
advance. The purpose is to keep memory mapped across multiple add/get
buf operations.
Signed-off-by: Xuan Zhuo <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Introduce virtqueue_reset() to release all buffer inside vq.
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
The subsequent reset function will reuse these logic.
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Modify the "useless" to a more accurate "unused".
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Now we add a case where we skip dma unmap, the vq->premapped is true.
We can't just rely on use_dma_api to determine whether to skip the dma
operation. For convenience, I introduced the "do_unmap". By default, it
is the same as use_dma_api. If the driver is configured with premapped,
then do_unmap is false.
So as long as do_unmap is false, for addr of desc, we should skip dma
unmap operation.
Signed-off-by: Xuan Zhuo <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
If the vq is the premapped mode, use the sg_dma_address() directly.
Signed-off-by: Xuan Zhuo <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.
This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.
Signed-off-by: Xuan Zhuo <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
This patch put the dma addr error check in vring_map_one_sg().
The benefits of doing this:
1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
vring_mapping_error to check the return value. simplifies subsequent
code
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Inside detach_buf_split(), if use_dma_api is false,
vring_unmap_one_split_indirect will be called many times, but actually
nothing is done. So this patch check use_dma_api firstly.
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Start offering the feature in the simulator. Other parent drivers can
follow this code to offer it too.
Signed-off-by: Eugenio Pérez <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
This operation allow vdpa parent to expose its own backend feature bits.
Next patches introduce a feature not compatible with all parent drivers:
the ability to enable vq after driver_ok. Each parent must declare if
it allows it or not.
Signed-off-by: Eugenio Pérez <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Accepting VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature if
userland sets it.
Signed-off-by: Eugenio Pérez <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
This feature flag allows the driver enabling virtqueues both before and
after DRIVER_OK.
This is needed for software assisted live migration, so userland can
restore the device status in devices with control virtqueue before the
dataplane is enabled.
Signed-off-by: Eugenio Pérez <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Commit 29064bfdabd5 ("vdpa/mlx5: Add support library for mlx5 VDPA implementation")
declared but never implemented these.
Signed-off-by: Yue Haibing <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"New controller support and updates to drivers.
New support:
- Qualcomm SM6115 and QCM2290 dmaengine support
- at_xdma support for microchip,sam9x7 controller
Updates:
- idxd updates for wq simplification and ats knob updates
- fsl edma updates for v3 support
- Xilinx AXI4-Stream control support
- Yaml conversion for bcm dma binding"
* tag 'dmaengine-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (53 commits)
dmaengine: fsl-edma: integrate v3 support
dt-bindings: fsl-dma: fsl-edma: add edma3 compatible string
dmaengine: fsl-edma: move tcd into struct fsl_dma_chan
dmaengine: fsl-edma: refactor chan_name setup and safety
dmaengine: fsl-edma: move clearing of register interrupt into setup_irq function
dmaengine: fsl-edma: refactor using devm_clk_get_enabled
dmaengine: fsl-edma: simply ATTR_DSIZE and ATTR_SSIZE by using ffs()
dmaengine: fsl-edma: move common IRQ handler to common.c
dmaengine: fsl-edma: Remove enum edma_version
dmaengine: fsl-edma: transition from bool fields to bitmask flags in drvdata
dmaengine: fsl-edma: clean up EXPORT_SYMBOL_GPL in fsl-edma-common.c
dmaengine: fsl-edma: fix build error when arch is s390
dmaengine: idxd: Fix issues with PRS disable sysfs knob
dmaengine: idxd: Allow ATS disable update only for configurable devices
dmaengine: xilinx_dma: Program interrupt delay timeout
dmaengine: xilinx_dma: Use tasklet_hi_schedule for timing critical usecase
dmaengine: xilinx_dma: Freeup active list based on descriptor completion bit
dmaengine: xilinx_dma: Increase AXI DMA transaction segment count
dmaengine: xilinx_dma: Pass AXI4-Stream control words to dma client
dt-bindings: dmaengine: xilinx_dma: Add xlnx,irq-delay property
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"As usual a couple of new drivers, a bunch of new device support and
few updates to existing drivers
New Support:
- Starfive dphy rx, JH7110 usb and pcie support
- Rockchip rv1126 inno-dsi phy, rk3588 usb and pcie support
- Qualcomm sa8775p PCIe support, M31 USB PHY driver
- Samsung Exynos850 usb support
Updates:
- Mediatek dsi driver clock updates
- Qualcomm sm8150 combo phy with reworking of qmp pcie driver
- Xilinx zynqmp runtime PM support"
* tag 'phy-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (83 commits)
phy: exynos5-usbdrd: Add Exynos850 support
phy: exynos5-usbdrd: Add 26MHz ref clk support
phy: exynos5-usbdrd: Make it possible to pass custom phy ops
dt-bindings: phy: samsung,usb3-drd-phy: Add Exynos850 support
phy: qcom-qmp-combo: fix clock probing
phy: qcom-qmp-pcie: support SM8150 PCIe QMP PHYs
phy: qcom-qmp-pcie: populate offsets configuration
phy: qcom-qmp-pcie: simplify clock handling
phy: qcom-qmp-pcie: keep offset tables sorted
phy: qcom-qmp-pcie: drop ln_shrd from v5_20 config
dt-bindings: phy: qcom,qmp-pcie: describe SM8150 PCIe PHYs
dt-bindings: phy: migrate QMP PCIe PHY bindings to qcom,sc8280xp-qmp-pcie-phy.yaml
phy: fsl-imx8mq-usb: add dev_err_probe if getting vbus failed
phy: qcom: Introduce M31 USB PHY driver
dt-bindings: phy: qcom,m31: Document qcom,m31 USB phy
phy: rockchip: inno-dsidphy: Add rv1126 support
dt-bindings: phy: rockchip-inno-dsidphy: Document rv1126
dt-bindings: phy: mediatek,tphy: allow simple nodename pattern
phy: amlogic: meson-g12a-usb2: fix Wvoid-pointer-to-enum-cast warning
phy: marvell pxa-usb: fix Wvoid-pointer-to-enum-cast warning
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
"Device numbering and intel driver changes are main features:
- Core support for soundwire device number allocation
- intel driver updates for adding hw_params for DAI ops, hybrid
number allocation and power managemnt callback updates
- DT header include changes for subsystem"
* tag 'soundwire-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: intel_ace2x: add DAI hw_params/prepare/hw_free callbacks
soundwire: intel_auxdevice: add hybrid IDA-based device_number allocation
soundwire: bus: add callbacks for device_number allocation
soundwire: extend parameters of new_peripheral_assigned() callback
soundWire: intel_auxdevice: resume 'sdw-master' on startup and system resume
soundwire: intel_auxdevice: enable pm_runtime earlier on startup
soundwire: Explicitly include correct DT includes
|
|
Currently the Kconfig fragments in kernel/configs and arch/*/configs
that aren't used internally aren't discoverable through "make help",
which consists of hard-coded lists of config fragments. Instead, list
all the fragment targets that have a "# Help: " comment prefix so the
targets can be generated dynamically.
Add logic to the Makefile to search for and display the fragment and
comment. Add comments to fragments that are intended to be direct targets.
Signed-off-by: Kees Cook <[email protected]>
Co-developed-by: Masahiro Yamada <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Reviewed-by: Nicolas Schier <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD updates from Miquel Raynal:
"Core MTD changes:
- Use refcount to prevent corruption
- Call external _get and _put in right order
- Fix use-after-free in mtd release
- Explicitly include correct DT includes
- Clean refcounting with MTD_PARTITIONED_MASTER
- mtdblock: make warning messages ratelimited
- dt-bindings: Add SEAMA partition bindings
Device driver changes:
- Use devm helper functions
- Fix questionable cast, remove pointless ones.
- error handling fixes
- add support for new chip versions
- update DT bindings
- misc cleanups - fix typos, whitespace, indentation"
* tag 'mtd/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (105 commits)
dt-bindings: mtd: amlogic,meson-nand: drop unneeded quotes
mtd: spear_smi: Use helper function devm_clk_get_enabled()
mtd: rawnand: orion: Use helper function devm_clk_get_optional_enabled()
mtd: rawnand: vf610_nfc: Use helper function devm_clk_get_enabled()
mtd: rawnand: sunxi: Use helper function devm_clk_get_enabled()
mtd: rawnand: stm32_fmc2: Use helper function devm_clk_get_enabled()
mtd: rawnand: mtk: Use helper function devm_clk_get_enabled()
mtd: rawnand: mpc5121: Use helper function devm_clk_get_enabled()
mtd: rawnand: lpc32xx_slc: Use helper function devm_clk_get_enabled()
mtd: rawnand: intel: Use helper function devm_clk_get_enabled()
mtd: rawnand: fsmc: Use helper function devm_clk_get_enabled()
mtd: rawnand: arasan: Use helper function devm_clk_get_enabled()
mtd: rawnand: qcom: Add read/read_start ops in exec_op path
mtd: rawnand: qcom: Clear buf_count and buf_start in raw read
mtd: maps: fix -Wvoid-pointer-to-enum-cast warning
mtd: rawnand: fix -Wvoid-pointer-to-enum-cast warning
mtd: rawnand: fsmc: handle clk prepare error in fsmc_nand_resume()
mtd: rawnand: Propagate error and simplify ternary operators for brcmstb_nand_wait_for_completion()
mtd: rawnand: qcom: Sort includes alphabetically
mtd: rawnand: qcom: Do not override the error no of submit_descs()
...
|
|
Disable virtualization features on 82580 just as on i210/i211.
This avoids that virt functions are acidentally called on 82850.
Fixes: 55cac248caa4 ("igb: Add full support for 82580 devices")
Signed-off-by: Corinna Vinschen <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we don't have a highlighted feature enhancement, but
mostly have fixed issues mainly in two parts: 1) zoned block device,
and 2) compression support.
For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases
caused by wrong compression policy and logics. Other than them, there
were some reverts and stat corrections.
Bug fixes:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and
clean-ups"
* tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits)
f2fs: use finish zone command when closing a zone
f2fs: compress: fix to assign compress_level for lz4 correctly
f2fs: fix error path of f2fs_submit_page_read()
f2fs: clean up error handling in sanity_check_{compress_,}inode()
f2fs: avoid false alarm of circular locking
Revert "f2fs: do not issue small discard commands during checkpoint"
f2fs: doc: fix description of max_small_discards
f2fs: should update REQ_TIME for direct write
f2fs: fix to account cp stats correctly
f2fs: fix to account gc stats correctly
f2fs: remove unneeded check condition in __f2fs_setxattr()
f2fs: fix to update i_ctime in __f2fs_setxattr()
Revert "f2fs: fix to do sanity check on extent cache correctly"
f2fs: increase usage of folio_next_index() helper
f2fs: Only lfs mode is allowed with zoned block device feature
f2fs: check zone type before sending async reset zone command
f2fs: compress: don't {,de}compress non-full cluster
f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
f2fs: don't reopen the main block device in f2fs_scan_devices
f2fs: fix to avoid mmap vs set_compress_option case
...
|
|
Commit bde5f6bc68db ("kmemleak: add scheduling point to kmemleak_scan()")
added a cond_resched() call to the struct page scanning loop to prevent
soft lockup from happening. However, soft lockup can still happen in that
loop in some corner cases when the pages that satisfy the "!(pfn & 63)"
check are skipped for some reasons.
Fix this corner case by moving up the cond_resched() check so that it will
be called every 64 pages unconditionally.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: bde5f6bc68db ("kmemleak: add scheduling point to kmemleak_scan()")
Signed-off-by: Waiman Long <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Yisheng Xie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In the past, movable allocations could be disallowed from CMA through
PF_MEMALLOC_PIN. As CMA pages are funneled through the MOVABLE pcplist,
this required filtering that cornercase during allocations, such that
pinnable allocations wouldn't accidentally get a CMA page.
However, since 8e3560d963d2 ("mm: honor PF_MEMALLOC_PIN for all movable
pages"), PF_MEMALLOC_PIN automatically excludes __GFP_MOVABLE. Once
again, MOVABLE implies CMA is allowed.
Remove the stale filtering code. Also remove a stale comment that was
introduced as part of the filtering code, because the filtering let
order-0 pages fall through to the buddy allocator. See 1d91df85f399
("mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore}
APIs") for context. The comment's been obsolete since the introduction of
the explicit ALLOC_HIGHATOMIC flag in eb2e2b425c69 ("mm/page_alloc:
explicitly record high-order atomic allocations in alloc_flags").
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Make it easier to figure out where to send patches for this file.
Link: https://lkml.kernel.org/r/efbc7689d35a48ff402644d696aa9a8d8bb6333a.1692877089.git.baruch@tkos.co.il
Signed-off-by: Baruch Siach <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
anon_vma_link() is unused since commit 5beb49305251 ("mm: change anon_vma
linking to fix multi-process server scalability issue").
Link: https://lkml.kernel.org/r/cdce9b00c9ab15f6d02eddf40dcad537d3e9676f.1692877089.git.baruch@tkos.co.il
Signed-off-by: Baruch Siach <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
With madvise and prctl KSM can be enabled for different VMA's. Once it is
enabled we can query how effective KSM is overall. However we cannot
easily query if an individual VMA benefits from KSM.
This commit adds a KSM section to the /prod/<pid>/smaps file. It reports
how many of the pages are KSM pages. Note that KSM-placed zeropages are
not included, only actual KSM pages.
Here is a typical output:
7f420a000000-7f421a000000 rw-p 00000000 00:00 0
Size: 262144 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51212 kB
Pss: 8276 kB
Shared_Clean: 172 kB
Shared_Dirty: 42996 kB
Private_Clean: 196 kB
Private_Dirty: 7848 kB
Referenced: 15388 kB
Anonymous: 51212 kB
KSM: 41376 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 202016 kB
SwapPss: 3882 kB
Locked: 0 kB
THPeligible: 0
ProtectionKey: 0
ksm_state: 0
ksm_skip_base: 0
ksm_skip_count: 0
VmFlags: rd wr mr mw me nr mg anon
This information also helps with the following workflow:
- First enable KSM for all the VMA's of a process with prctl.
- Then analyze with the above smaps report which VMA's benefit the most
- Change the application (if possible) to add the corresponding madvise
calls for the VMA's that benefit the most
[[email protected]: v5]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Stefan Roesch <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In the discussion of "Improve hugetlbfs read on HWPOISON hugepages" [1],
Matthew Wilcox suggests hwp is a bad abbreviation of hwpoison, as hwp is
already used as "an acronym by acpi, intel_pstate, some clock drivers, an
ethernet driver, and a scsi driver"[1].
So rename hwp_walk and hwp_walk_ops to hwpoison_walk and
hwpoison_walk_ops respectively.
raw_hwp_(page|list), *_raw_hwp, and raw_hwp_unreliable flag are other
major appearances of "hwp". However, given the "raw" hint in the name, it
is easy to differentiate them from other "hwp" acronyms. Since renaming
them is not as straightforward as renaming hwp_walk*, they are not covered
by this commit.
[1] https://lore.kernel.org/lkml/[email protected]/T/#me6fecb8ce1ad4d5769199c9e162a44bc88f7bdec
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiaqi Yan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Memory failure is not interested in logically offlined pages. Skip this
type of page.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr, libsas) and
the usual minor updates and bug fixes but no significant core changes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (116 commits)
scsi: storvsc: Handle additional SRB status values
scsi: libsas: Delete sas_ata_task.retry_count
scsi: libsas: Delete sas_ata_task.stp_affil_pol
scsi: libsas: Delete sas_ata_task.set_affil_pol
scsi: libsas: Delete sas_ssp_task.task_prio
scsi: libsas: Delete sas_ssp_task.enable_first_burst
scsi: libsas: Delete sas_ssp_task.retry_count
scsi: libsas: Delete struct scsi_core
scsi: libsas: Delete enum sas_phy_type
scsi: libsas: Delete enum sas_class
scsi: libsas: Delete sas_ha_struct.lldd_module
scsi: target: Fix write perf due to unneeded throttling
scsi: lpfc: Do not abuse UUID APIs and LPFC_COMPRESS_VMID_SIZE
scsi: pm8001: Remove unused declarations
scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock
scsi: elx: sli4: Remove code duplication
scsi: bfa: Replace one-element array with flexible-array member in struct fc_rscn_pl_s
scsi: qla2xxx: Remove unused declarations
scsi: pmcraid: Use pci_dev_id() to simplify the code
scsi: pm80xx: Set RETFIS when requested by libsas
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- kprobes: use struct_size() for variable size kretprobe_instance data
structure.
- eprobe: Simplify trace_eprobe list iteration.
- probe events: Data structure field access support on BTF argument.
- Update BTF argument support on the functions in the kernel
loadable modules (only loaded modules are supported).
- Move generic BTF access function (search function prototype and
get function parameters) to a separated file.
- Add a function to search a member of data structure in BTF.
- Support accessing BTF data structure member from probe args by
C-like arrow('->') and dot('.') operators. e.g.
't sched_switch next=next->pid vruntime=next->se.vruntime'
- Support accessing BTF data structure member from $retval. e.g.
'f getname_flags%return +0($retval->name):string'
- Add string type checking if BTF type info is available. This will
reject if user specify ":string" type for non "char pointer"
type.
- Automatically assume the fprobe event as a function return event
if $retval is used.
- selftests/ftrace: Add BTF data field access test cases.
- Documentation: Update fprobe event example with BTF data field.
* tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
Documentation: tracing: Update fprobe event example with BTF field
selftests/ftrace: Add BTF fields access testcases
tracing/fprobe-event: Assume fprobe is a return event by $retval
tracing/probes: Add string type check with BTF
tracing/probes: Support BTF field access from $retval
tracing/probes: Support BTF based data structure field access
tracing/probes: Add a function to search a member of a struct/union
tracing/probes: Move finding func-proto API and getting func-param API to trace_btf
tracing/probes: Support BTF argument on module functions
tracing/eprobe: Iterate trace_eprobe directly
kernel: kprobes: Use struct_size()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing updates from Steven Rostedt:
"Tracing fixes and clean ups:
- Replace strlcpy() with strscpy()
- Initialize the pipe cpumask to zero on allocation
- Use within_module() instead of open coding it
- Remove extra space in hwlat_detectory/mode output
- Use LIST_HEAD() instead of open coding it
- A bunch of clean ups and fixes for the cpumask filter
- Set local da_mon_##name to static
- Fix race in snapshot buffer between cpu write and swap"
* tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/filters: Fix coding style issues
tracing/filters: Change parse_pred() cpulist ternary into an if block
tracing/filters: Fix double-free of struct filter_pred.mask
tracing/filters: Fix error-handling of cpulist parsing buffer
tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
ftrace: Use LIST_HEAD to initialize clear_hash
ftrace: Use within_module to check rec->ip within specified module.
tracing: Replace strlcpy with strscpy in trace/events/task.h
tracing: Fix race issue between cpu buffer write and swap
tracing: Remove extra space at the end of hwlat_detector/mode
rv: Set variable 'da_mon_##name' to static
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fix from Kees Cook:
- Adjust sizes of buffers just avoid uncompress failures (Ard
Biesheuvel)
* tag 'pstore-v6.6-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore: Base compression input buffer size on estimated compressed size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 selftest fix from Ingo Molnar:
"Fix the __NR_map_shadow_stack syscall-renumbering fallout in the x86
self-test code.
[ Arguably the existing code was unnecessarily fragile, and tooling
should have picked up the new syscall number, and a wider fix is
being worked on - but meanwhile, let's not have the old syscall
number in the kernel tree. ]"
* tag 'x86-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/x86: Update map_shadow_stack syscall nr
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix false positive 'softirq work is pending' messages on -rt kernels,
caused by a buggy factoring-out of existing code"
* tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/rcu: Fix false positive "softirq work is pending" messages
|