| Age | Commit message (Collapse) | Author | Files | Lines |
|
The HDR10 standard compound controls were missing the corresponding
pointers in videodev2.h. Add these and document them.
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
The p_h264_pps pointer in struct v4l2_ext_control was missing the
__user annotation. Add this.
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
Use one of the struct v4l2_create_buffers reserved bytes to report
the maximum possible number of buffers for the queue.
V4l2 framework set V4L2_BUF_CAP_SUPPORTS_MAX_NUM_BUFFERS flags in queue
capabilities so userland can know when the field is valid.
Does the same change in v4l2_create_buffers32 structure.
Signed-off-by: Benjamin Gaignard <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
Add the UAPI implementation for the PowerVR driver.
Changes from v8:
- Fixed documentation for unmapping, which previously suggested the
size was not used
- Corrected license identifier
Changes from v7:
- Remove prefixes from DRM_PVR_BO_* flags
- Improve struct drm_pvr_ioctl_create_hwrt_dataset_args documentation
- Remove references to static area carveouts
- CREATE_BO ioctl now returns an error if provided size isn't page aligned
- Clarify documentation for DRM_PVR_STATIC_DATA_AREA_EOT
Changes from v6:
- Add padding to struct drm_pvr_dev_query_gpu_info
- Improve BYPASS_CACHE flag documentation
- Add SUBMIT_JOB_FRAG_CMD_DISABLE_PIXELMERGE flag
Changes from v4:
- Remove CREATE_ZEROED flag for BO creation (all buffers are now zeroed)
Co-developed-by: Frank Binns <[email protected]>
Signed-off-by: Frank Binns <[email protected]>
Co-developed-by: Boris Brezillon <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Co-developed-by: Matt Coster <[email protected]>
Signed-off-by: Matt Coster <[email protected]>
Co-developed-by: Donald Robson <[email protected]>
Signed-off-by: Donald Robson <[email protected]>
Signed-off-by: Sarah Walker <[email protected]>
Reviewed-by: Faith Ekstrand <[email protected]>
Link: https://lore.kernel.org/r/8c95a3a1d685e2b44d361b95a19eae5a478fb9d1.1700668843.git.donald.robson@imgtec.com
Signed-off-by: Maxime Ripard <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-11-21
We've added 85 non-merge commits during the last 12 day(s) which contain
a total of 63 files changed, 4464 insertions(+), 1484 deletions(-).
The main changes are:
1) Huge batch of verifier changes to improve BPF register bounds logic
and range support along with a large test suite, and verifier log
improvements, all from Andrii Nakryiko.
2) Add a new kfunc which acquires the associated cgroup of a task within
a specific cgroup v1 hierarchy where the latter is identified by its id,
from Yafang Shao.
3) Extend verifier to allow bpf_refcount_acquire() of a map value field
obtained via direct load which is a use-case needed in sched_ext,
from Dave Marchevsky.
4) Fix bpf_get_task_stack() helper to add the correct crosstask check
for the get_perf_callchain(), from Jordan Rome.
5) Fix BPF task_iter internals where lockless usage of next_thread()
was wrong. The rework also simplifies the code, from Oleg Nesterov.
6) Fix uninitialized tail padding via LIBBPF_OPTS_RESET, and another
fix for certain BPF UAPI structs to fix verifier failures seen
in bpf_dynptr usage, from Yonghong Song.
7) Add BPF selftest fixes for map_percpu_stats flakes due to per-CPU BPF
memory allocator not being able to allocate per-CPU pointer successfully,
from Hou Tao.
8) Add prep work around dynptr and string handling for kfuncs which
is later going to be used by file verification via BPF LSM and fsverity,
from Song Liu.
9) Improve BPF selftests to update multiple prog_tests to use ASSERT_*
macros, from Yuran Pereira.
10) Optimize LPM trie lookup to check prefixlen before walking the trie,
from Florian Lehner.
11) Consolidate virtio/9p configs from BPF selftests in config.vm file
given they are needed consistently across archs, from Manu Bretelle.
12) Small BPF verifier refactor to remove register_is_const(),
from Shung-Hsi Yu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in vmlinux
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bind_perm
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
selftests/bpf: reduce verboseness of reg_bounds selftest logs
bpf: bpf_iter_task_next: use next_task(kit->task) rather than next_task(kit->pos)
bpf: bpf_iter_task_next: use __next_thread() rather than next_thread()
bpf: task_group_seq_get_next: use __next_thread() rather than next_thread()
bpf: emit frameno for PTR_TO_STACK regs if it differs from current one
bpf: smarter verifier log number printing logic
bpf: omit default off=0 and imm=0 in register state log
bpf: emit map name in register state if applicable and available
bpf: print spilled register state in stack slot
bpf: extract register state printing
bpf: move verifier state printing code to kernel/bpf/log.c
bpf: move verbose_linfo() into kernel/bpf/log.c
bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS
bpf: Remove test for MOVSX32 with offset=32
selftests/bpf: add iter test requiring range x range logic
veristat: add ability to set BPF_F_TEST_SANITY_STRICT flag with -r flag
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The EXT_external_objects extension is a bit awkward as it doesn't pass
explicit modifiers, leaving the importer to guess with incomplete
information. In the case of vk (turnip) exporting and gl (freedreno)
importing, the "OPTIMAL_TILING_EXT" layout depends on VkImageCreateInfo
flags (among other things), which the importer does not know. Which
unfortunately leaves us with the need for a metadata back-channel.
The contents of the metadata are defined by userspace. The
EXT_external_objects extension is only required to work between
compatible versions of gl and vk drivers, as defined by device and
driver UUIDs.
v2: add missing metadata kfree
v3: Rework to move copy_from/to_user out from under gem obj lock
to avoid angering lockdep about deadlocks against fs-reclaim
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/566157/
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 6.8:
UAPI Changes:
- drm: Introduce CLOSE_FB ioctl
- drm/dp-mst: Documentation for the PATH property
- fdinfo: Do not align to a MB if the size is larger than 1MiB
- virtio-gpu: add explicit virtgpu context debug name
Cross-subsystem Changes:
- dma-buf: Add dma_fence_timestamp helper
Core Changes:
- client: Do not acquire module reference
- edid: split out drm_eld, add SAD helpers
- format-helper: Cache format conversion buffers
- sched: Move from a kthread to a workqueue, rename some internal
functions to make it clearer, implement dynamic job-flow control
- gpuvm: Provide more features to handle GEM objects
- tests: Remove slow kunit tests
Driver Changes:
- ivpu: Update FW API, new debugfs file, a new NOP job submission test
mode, improve suspend/resume, PM improvements, MMU PT optimizations,
firmware profiling frequency support, support for uncached buffers,
switch to gem shmem helpers, replace kthread with threaded
interrupts
- panfrost: PM improvements
- qaic: Allow to run with a single MSI, support host/device time
synchronization, misc improvements
- simplefb: Support memory-regions, support power-domains
- ssd130x: Unitialized variable fixes
- omapdrm: dma-fence lockdep annotation fix
- tidss: dma-fence lockdep annotation fix
- v3d: Support BCM2712 (RaspberryPi5), Support fdinfo and gputop
- panel:
- edp: Support AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49
V8.0, plus a whole bunch of panels used on Mediatek chromebooks.
Note that the one missing s-o-b for 0da611a87021 ("dma-buf: add
dma_fence_timestamp helper") has been supplied here, and rebasing the
entire tree with upsetting committers didn't seem worth the trouble:
https://lore.kernel.org/dri-devel/[email protected]/
Signed-off-by: Daniel Vetter <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/y4awn5vcfy2lr2hpauo7rc4nfpnc6kksr7btmnwaz7zk63pwoi@gwwef5iqpzva
|
|
Revert following commits:
commit acec05fb78ab ("net_tstamp: Add TIMESTAMPING SOFTWARE and HARDWARE mask")
commit 11d55be06df0 ("net: ethtool: Add a command to expose current time stamping layer")
commit bb8645b00ced ("netlink: specs: Introduce new netlink command to get current timestamp")
commit d905f9c75329 ("net: ethtool: Add a command to list available time stamping layers")
commit aed5004ee7a0 ("netlink: specs: Introduce new netlink command to list available time stamping layers")
commit 51bdf3165f01 ("net: Replace hwtstamp_source by timestamping layer")
commit 0f7f463d4821 ("net: Change the API of PHY default timestamp to MAC")
commit 091fab122869 ("net: ethtool: ts: Update GET_TS to reply the current selected timestamp")
commit 152c75e1d002 ("net: ethtool: ts: Let the active time stamping layer be selectable")
commit ee60ea6be0d3 ("netlink: specs: Introduce time stamping set command")
They need more time for reviews.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.open-mesh.org/linux-merge
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Implement new multicast packet type, including its transmission,
forwarding and parsing, by Linus Lüssing (3 patches)
- Switch to new headers for sprintf and array size,
by Sven Eckelmann (2 patches)
Signed-off-by: David S. Miller <[email protected]>
|
|
Now that the current timestamp is saved in a variable lets add the
ETHTOOL_MSG_TS_SET ethtool netlink socket to make it selectable.
Signed-off-by: Kory Maincent <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Introduce a new netlink message that lists all available time stamping
layers on a given interface.
Signed-off-by: Kory Maincent <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Time stamping on network packets may happen either in the MAC or in
the PHY, but not both. In preparation for making the choice
selectable, expose both the current layers via ethtool.
In accordance with the kernel implementation as it stands, the current
layer will always read as "phy" when a PHY time stamping device is
present. Future patches will allow changing the current layer
administratively.
Signed-off-by: Kory Maincent <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Timestamping software or hardware flags are often used as a group,
therefore adding these masks will easier future use.
I did not use SOF_TIMESTAMPING_SYS_HARDWARE flag as it is deprecated and
not use at all.
Signed-off-by: Kory Maincent <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If a mount is released then its mnt_id can immediately be reused. This is
bad news for user interfaces that want to uniquely identify a mount.
Implementing a unique mount ID is trivial (use a 64bit counter).
Unfortunately userspace assumes 32bit size and would overflow after the
counter reaches 2^32.
Introduce a new 64bit ID alongside the old one. Initialize the counter to
2^32, this guarantees that the old and new IDs are never mixed up.
Signed-off-by: Miklos Szeredi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Ian Kent <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
When vfs_getattr_nosec() calls a filesystem's getattr interface function
then the 'nosec' should propagate into this function so that
vfs_getattr_nosec() can again be called from the filesystem's gettattr
rather than vfs_getattr(). The latter would add unnecessary security
checks that the initial vfs_getattr_nosec() call wanted to avoid.
Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass
with the new getattr_flags parameter to the getattr interface function.
In overlayfs and ecryptfs use this flag to determine which one of the
two functions to call.
In a recent code change introduced to IMA vfs_getattr_nosec() ended up
calling vfs_getattr() in overlayfs, which in turn called
security_inode_getattr() on an exiting process that did not have
current->fs set anymore, which then caused a kernel NULL pointer
dereference. With this change the call to security_inode_getattr() can
be avoided, thus avoiding the NULL pointer dereference.
Reported-by: <[email protected]>
Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
Cc: Alexander Viro <[email protected]>
Cc: <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Tyler Hicks <[email protected]>
Cc: Mimi Zohar <[email protected]>
Suggested-by: Christian Brauner <[email protected]>
Co-developed-by: Amir Goldstein <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Rename verifier internal flag BPF_F_TEST_SANITY_STRICT to more neutral
BPF_F_TEST_REG_INVARIANTS. This is a follow up to [0].
A few selftests and veristat need to be adjusted in the same patch as
well.
[0] https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
By default, VXLAN encapsulation over IPv6 sets the flow label to 0, with
an option for a fixed value. This commits add the ability to inherit the
flow label from the inner packet, like for other tunnel implementations.
This enables devices using only L3 headers for ECMP to correctly balance
VXLAN-encapsulated IPv6 packets.
```
$ ./ip/ip link add dummy1 type dummy
$ ./ip/ip addr add 2001:db8::2/64 dev dummy1
$ ./ip/ip link set up dev dummy1
$ ./ip/ip link add vxlan1 type vxlan id 100 flowlabel inherit remote 2001:db8::1 local 2001:db8::2
$ ./ip/ip link set up dev vxlan1
$ ./ip/ip addr add 2001:db8:1::2/64 dev vxlan1
$ ./ip/ip link set arp off dev vxlan1
$ ping -q 2001:db8:1::1 &
$ tshark -d udp.port==8472,vxlan -Vpni dummy1 -c1
[...]
Internet Protocol Version 6, Src: 2001:db8::2, Dst: 2001:db8::1
0110 .... = Version: 6
.... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
.... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
.... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
.... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb
[...]
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 100
[...]
Internet Protocol Version 6, Src: 2001:db8:1::2, Dst: 2001:db8:1::1
0110 .... = Version: 6
.... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
.... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
.... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
.... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb
```
Signed-off-by: Alce Lafranque <[email protected]>
Co-developed-by: Vincent Bernat <[email protected]>
Signed-off-by: Vincent Bernat <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The problem is this line here from subdev_do_ioctl().
client_cap->capabilities &= ~V4L2_SUBDEV_CLIENT_CAP_STREAMS;
The "client_cap->capabilities" variable is a u64. The AND operation
is supposed to clear out the V4L2_SUBDEV_CLIENT_CAP_STREAMS flag. But
because it's a 32 bit variable it accidentally clears out the high 32
bits as well.
Currently we only use the first bit and none of the upper bits so this
doesn't affect runtime behavior.
Fixes: f57fa2959244 ("media: v4l2-subdev: Add new ioctl for client capabilities")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
|
|
Pull virtio fixes from Michael Tsirkin:
"Bugfixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost-vdpa: fix use after free in vhost_vdpa_probe()
virtio_pci: Switch away from deprecated irq_set_affinity_hint
riscv, qemu_fw_cfg: Add support for RISC-V architecture
vdpa_sim_blk: allocate the buffer zeroed
virtio_pci: move structure to a header
|
|
Add simple sanity checks that validate well-formed ranges (min <= max)
across u64, s64, u32, and s32 ranges. Also for cases when the value is
constant (either 64-bit or 32-bit), we validate that ranges and tnums
are in agreement.
These bounds checks are performed at the end of BPF_ALU/BPF_ALU64
operations, on conditional jumps, and for LDX instructions (where subreg
zero/sign extension is probably the most important to check). This
covers most of the interesting cases.
Also, we validate the sanity of the return register when manually
adjusting it for some special helpers.
By default, sanity violation will trigger a warning in verifier log and
resetting register bounds to "unbounded" ones. But to aid development
and debugging, BPF_F_TEST_SANITY_STRICT flag is added, which will
trigger hard failure of verification with -EFAULT on register bounds
violations. This allows selftests to catch such issues. veristat will
also gain a CLI option to enable this behavior.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Shung-Hsi Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Let's kickstart the v6.8 release cycle.
Signed-off-by: Maxime Ripard <[email protected]>
|
|
Avoid conflicts, base on fixes.
Signed-off-by: Peter Zijlstra <[email protected]>
|
|
Introduce several new KVM uAPIs to ultimately create a guest-first memory
subsystem within KVM, a.k.a. guest_memfd. Guest-first memory allows KVM
to provide features, enhancements, and optimizations that are kludgly
or outright impossible to implement in a generic memory subsystem.
The core KVM ioctl() for guest_memfd is KVM_CREATE_GUEST_MEMFD, which
similar to the generic memfd_create(), creates an anonymous file and
returns a file descriptor that refers to it. Again like "regular"
memfd files, guest_memfd files live in RAM, have volatile storage,
and are automatically released when the last reference is dropped.
The key differences between memfd files (and every other memory subystem)
is that guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be resized.
guest_memfd files do however support PUNCH_HOLE, which can be used to
convert a guest memory area between the shared and guest-private states.
A second KVM ioctl(), KVM_SET_MEMORY_ATTRIBUTES, allows userspace to
specify attributes for a given page of guest memory. In the long term,
it will likely be extended to allow userspace to specify per-gfn RWX
protections, including allowing memory to be writable in the guest
without it also being writable in host userspace.
The immediate and driving use case for guest_memfd are Confidential
(CoCo) VMs, specifically AMD's SEV-SNP, Intel's TDX, and KVM's own pKVM.
For such use cases, being able to map memory into KVM guests without
requiring said memory to be mapped into the host is a hard requirement.
While SEV+ and TDX prevent untrusted software from reading guest private
data by encrypting guest memory, pKVM provides confidentiality and
integrity *without* relying on memory encryption. In addition, with
SEV-SNP and especially TDX, accessing guest private memory can be fatal
to the host, i.e. KVM must be prevent host userspace from accessing
guest memory irrespective of hardware behavior.
Long term, guest_memfd may be useful for use cases beyond CoCo VMs,
for example hardening userspace against unintentional accesses to guest
memory. As mentioned earlier, KVM's ABI uses userspace VMA protections to
define the allow guest protection (with an exception granted to mapping
guest memory executable), and similarly KVM currently requires the guest
mapping size to be a strict subset of the host userspace mapping size.
Decoupling the mappings sizes would allow userspace to precisely map
only what is needed and with the required permissions, without impacting
guest performance.
A guest-first memory subsystem also provides clearer line of sight to
things like a dedicated memory pool (for slice-of-hardware VMs) and
elimination of "struct page" (for offload setups where userspace _never_
needs to DMA from or into guest memory).
guest_memfd is the result of 3+ years of development and exploration;
taking on memory management responsibilities in KVM was not the first,
second, or even third choice for supporting CoCo VMs. But after many
failed attempts to avoid KVM-specific backing memory, and looking at
where things ended up, it is quite clear that of all approaches tried,
guest_memfd is the simplest, most robust, and most extensible, and the
right thing to do for KVM and the kernel at-large.
The "development cycle" for this version is going to be very short;
ideally, next week I will merge it as is in kvm/next, taking this through
the KVM tree for 6.8 immediately after the end of the merge window.
The series is still based on 6.6 (plus KVM changes for 6.7) so it
will require a small fixup for changes to get_file_rcu() introduced in
6.7 by commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU").
The fixup will be done as part of the merge commit, and most of the text
above will become the commit message for the merge.
Pending post-merge work includes:
- hugepage support
- looking into using the restrictedmem framework for guest memory
- introducing a testing mechanism to poison memory, possibly using
the same memory attributes introduced here
- SNP and TDX support
There are two non-KVM patches buried in the middle of this series:
fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
|
|
Add a new x86 VM type, KVM_X86_SW_PROTECTED_VM, to serve as a development
and testing vehicle for Confidential (CoCo) VMs, and potentially to even
become a "real" product in the distant future, e.g. a la pKVM.
The private memory support in KVM x86 is aimed at AMD's SEV-SNP and
Intel's TDX, but those technologies are extremely complex (understatement),
difficult to debug, don't support running as nested guests, and require
hardware that's isn't universally accessible. I.e. relying SEV-SNP or TDX
for maintaining guest private memory isn't a realistic option.
At the very least, KVM_X86_SW_PROTECTED_VM will enable a variety of
selftests for guest_memfd and private memory support without requiring
unique hardware.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Fuad Tabba <[email protected]>
Tested-by: Fuad Tabba <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add support for resolving page faults on guest private memory for VMs
that differentiate between "shared" and "private" memory. For such VMs,
KVM_MEM_GUEST_MEMFD memslots can include both fd-based private memory and
hva-based shared memory, and KVM needs to map in the "correct" variant,
i.e. KVM needs to map the gfn shared/private as appropriate based on the
current state of the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE flag.
For AMD's SEV-SNP and Intel's TDX, the guest effectively gets to request
shared vs. private via a bit in the guest page tables, i.e. what the guest
wants may conflict with the current memory attributes. To support such
"implicit" conversion requests, exit to user with KVM_EXIT_MEMORY_FAULT
to forward the request to userspace. Add a new flag for memory faults,
KVM_MEMORY_EXIT_FLAG_PRIVATE, to communicate whether the guest wants to
map memory as shared vs. private.
Like KVM_MEMORY_ATTRIBUTE_PRIVATE, use bit 3 for flagging private memory
so that KVM can use bits 0-2 for capturing RWX behavior if/when userspace
needs such information, e.g. a likely user of KVM_EXIT_MEMORY_FAULT is to
exit on missing mappings when handling guest page fault VM-Exits. In
that case, userspace will want to know RWX information in order to
correctly/precisely resolve the fault.
Note, private memory *must* be backed by guest_memfd, i.e. shared mappings
always come from the host userspace page tables, and private mappings
always come from a guest_memfd instance.
Co-developed-by: Yu Zhang <[email protected]>
Signed-off-by: Yu Zhang <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Fuad Tabba <[email protected]>
Tested-by: Fuad Tabba <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Introduce an ioctl(), KVM_CREATE_GUEST_MEMFD, to allow creating file-based
memory that is tied to a specific KVM virtual machine and whose primary
purpose is to serve guest memory.
A guest-first memory subsystem allows for optimizations and enhancements
that are kludgy or outright infeasible to implement/support in a generic
memory subsystem. With guest_memfd, guest protections and mapping sizes
are fully decoupled from host userspace mappings. E.g. KVM currently
doesn't support mapping memory as writable in the guest without it also
being writable in host userspace, as KVM's ABI uses VMA protections to
define the allow guest protection. Userspace can fudge this by
establishing two mappings, a writable mapping for the guest and readable
one for itself, but that’s suboptimal on multiple fronts.
Similarly, KVM currently requires the guest mapping size to be a strict
subset of the host userspace mapping size, e.g. KVM doesn’t support
creating a 1GiB guest mapping unless userspace also has a 1GiB guest
mapping. Decoupling the mappings sizes would allow userspace to precisely
map only what is needed without impacting guest performance, e.g. to
harden against unintentional accesses to guest memory.
Decoupling guest and userspace mappings may also allow for a cleaner
alternative to high-granularity mappings for HugeTLB, which has reached a
bit of an impasse and is unlikely to ever be merged.
A guest-first memory subsystem also provides clearer line of sight to
things like a dedicated memory pool (for slice-of-hardware VMs) and
elimination of "struct page" (for offload setups where userspace _never_
needs to mmap() guest memory).
More immediately, being able to map memory into KVM guests without mapping
said memory into the host is critical for Confidential VMs (CoCo VMs), the
initial use case for guest_memfd. While AMD's SEV and Intel's TDX prevent
untrusted software from reading guest private data by encrypting guest
memory with a key that isn't usable by the untrusted host, projects such
as Protected KVM (pKVM) provide confidentiality and integrity *without*
relying on memory encryption. And with SEV-SNP and TDX, accessing guest
private memory can be fatal to the host, i.e. KVM must be prevent host
userspace from accessing guest memory irrespective of hardware behavior.
Attempt #1 to support CoCo VMs was to add a VMA flag to mark memory as
being mappable only by KVM (or a similarly enlightened kernel subsystem).
That approach was abandoned largely due to it needing to play games with
PROT_NONE to prevent userspace from accessing guest memory.
Attempt #2 to was to usurp PG_hwpoison to prevent the host from mapping
guest private memory into userspace, but that approach failed to meet
several requirements for software-based CoCo VMs, e.g. pKVM, as the kernel
wouldn't easily be able to enforce a 1:1 page:guest association, let alone
a 1:1 pfn:gfn mapping. And using PG_hwpoison does not work for memory
that isn't backed by 'struct page', e.g. if devices gain support for
exposing encrypted memory regions to guests.
Attempt #3 was to extend the memfd() syscall and wrap shmem to provide
dedicated file-based guest memory. That approach made it as far as v10
before feedback from Hugh Dickins and Christian Brauner (and others) led
to it demise.
Hugh's objection was that piggybacking shmem made no sense for KVM's use
case as KVM didn't actually *want* the features provided by shmem. I.e.
KVM was using memfd() and shmem to avoid having to manage memory directly,
not because memfd() and shmem were the optimal solution, e.g. things like
read/write/mmap in shmem were dead weight.
Christian pointed out flaws with implementing a partial overlay (wrapping
only _some_ of shmem), e.g. poking at inode_operations or super_operations
would show shmem stuff, but address_space_operations and file_operations
would show KVM's overlay. Paraphrashing heavily, Christian suggested KVM
stop being lazy and create a proper API.
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/linux-mm/20230306191944.GA15773@monkey
Link: https://lore.kernel.org/linux-mm/[email protected]
Cc: Fuad Tabba <[email protected]>
Cc: Vishal Annapurve <[email protected]>
Cc: Ackerley Tng <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Maciej Szmigiero <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Quentin Perret <[email protected]>
Cc: Michael Roth <[email protected]>
Cc: Wang <[email protected]>
Cc: Liam Merwick <[email protected]>
Cc: Isaku Yamahata <[email protected]>
Co-developed-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Co-developed-by: Yu Zhang <[email protected]>
Signed-off-by: Yu Zhang <[email protected]>
Co-developed-by: Chao Peng <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
Co-developed-by: Ackerley Tng <[email protected]>
Signed-off-by: Ackerley Tng <[email protected]>
Co-developed-by: Isaku Yamahata <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Co-developed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Fuad Tabba <[email protected]>
Tested-by: Fuad Tabba <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Implement functionality to receive and forward a new TVLV capable
multicast packet type.
The new batman-adv multicast packet type allows to contain several
originator destination addresses within a TVLV. Routers on the way will
potentially split the batman-adv multicast packet and adjust its tracker
TVLV contents.
Routing decisions are still based on the selected BATMAN IV or BATMAN V
routing algorithm. So this new batman-adv multicast packet type retains
the same loop-free properties.
Also a new OGM multicast TVLV flag is introduced to signal to other
nodes that we are capable of handling a batman-adv multicast packet and
multicast tracker TVLV. And that all of our hard interfaces have an MTU
of at least 1280 bytes (IPv6 minimum MTU), as a simple solution for now
to avoid MTU issues while forwarding.
Signed-off-by: Linus Lüssing <[email protected]>
Signed-off-by: Simon Wunderlich <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix potential overflow in returned value from SEARCH_TREE_V2
ioctl on 32bit architecture
- zoned mode fixes:
- drop unnecessary write pointer check for RAID0/RAID1/RAID10
profiles, now it works because of raid-stripe-tree
- wait for finishing the zone when direct IO needs a new
allocation
- simple quota fixes:
- pass correct owning root pointer when cleaning up an
aborted transaction
- fix leaking some structures when processing delayed refs
- change key type number of BTRFS_EXTENT_OWNER_REF_KEY,
reorder it before inline refs that are supposed to be
sorted, keeping the original number would complicate a lot
of things; this change needs an updated version of
btrfs-progs to work and filesystems need to be recreated
- fix error pointer dereference after failure to allocate fs
devices
- fix race between accounting qgroup extents and removing a
qgroup
* tag 'for-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: make OWNER_REF_KEY type value smallest among inline refs
btrfs: fix qgroup record leaks when using simple quotas
btrfs: fix race between accounting qgroup extents and removing a qgroup
btrfs: fix error pointer dereference after failure to allocate fs devices
btrfs: make found_logical_ret parameter mandatory for function queue_scrub_stripe()
btrfs: get correct owning_root when dropping snapshot
btrfs: zoned: wait for data BG to be finished on direct IO allocation
btrfs: zoned: drop no longer valid write pointer check
btrfs: directly return 0 on no error code in btrfs_insert_raid_extent()
btrfs: use u64 for buffer sizes in the tree search ioctls
|
|
In confidential computing usages, whether a page is private or shared is
necessary information for KVM to perform operations like page fault
handling, page zapping etc. There are other potential use cases for
per-page memory attributes, e.g. to make memory read-only (or no-exec,
or exec-only, etc.) without having to modify memslots.
Introduce the KVM_SET_MEMORY_ATTRIBUTES ioctl, advertised by
KVM_CAP_MEMORY_ATTRIBUTES, to allow userspace to set the per-page memory
attributes to a guest memory range.
Use an xarray to store the per-page attributes internally, with a naive,
not fully optimized implementation, i.e. prioritize correctness over
performance for the initial implementation.
Use bit 3 for the PRIVATE attribute so that KVM can use bits 0-2 for RWX
attributes/protections in the future, e.g. to give userspace fine-grained
control over read, write, and execute protections for guest memory.
Provide arch hooks for handling attribute changes before and after common
code sets the new attributes, e.g. x86 will use the "pre" hook to zap all
relevant mappings, and the "post" hook to track whether or not hugepages
can be used to map the range.
To simplify the implementation wrap the entire sequence with
kvm_mmu_invalidate_{begin,end}() even though the operation isn't strictly
guaranteed to be an invalidation. For the initial use case, x86 *will*
always invalidate memory, and preventing arch code from creating new
mappings while the attributes are in flux makes it much easier to reason
about the correctness of consuming attributes.
It's possible that future usages may not require an invalidation, e.g.
if KVM ends up supporting RWX protections and userspace grants _more_
protections, but again opt for simplicity and punt optimizations to
if/when they are needed.
Suggested-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Cc: Fuad Tabba <[email protected]>
Cc: Xu Yilun <[email protected]>
Cc: Mickaël Salaün <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add a new KVM exit type to allow userspace to handle memory faults that
KVM cannot resolve, but that userspace *may* be able to handle (without
terminating the guest).
KVM will initially use KVM_EXIT_MEMORY_FAULT to report implicit
conversions between private and shared memory. With guest private memory,
there will be two kind of memory conversions:
- explicit conversion: happens when the guest explicitly calls into KVM
to map a range (as private or shared)
- implicit conversion: happens when the guest attempts to access a gfn
that is configured in the "wrong" state (private vs. shared)
On x86 (first architecture to support guest private memory), explicit
conversions will be reported via KVM_EXIT_HYPERCALL+KVM_HC_MAP_GPA_RANGE,
but reporting KVM_EXIT_HYPERCALL for implicit conversions is undesriable
as there is (obviously) no hypercall, and there is no guarantee that the
guest actually intends to convert between private and shared, i.e. what
KVM thinks is an implicit conversion "request" could actually be the
result of a guest code bug.
KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to
be implicit conversions.
Note! To allow for future possibilities where KVM reports
KVM_EXIT_MEMORY_FAULT and fills run->memory_fault on _any_ unresolved
fault, KVM returns "-EFAULT" (-1 with errno == EFAULT from userspace's
perspective), not '0'! Due to historical baggage within KVM, exiting to
userspace with '0' from deep callstacks, e.g. in emulation paths, is
infeasible as doing so would require a near-complete overhaul of KVM,
whereas KVM already propagates -errno return codes to userspace even when
the -errno originated in a low level helper.
Report the gpa+size instead of a single gfn even though the initial usage
is expected to always report single pages. It's entirely possible, likely
even, that KVM will someday support sub-page granularity faults, e.g.
Intel's sub-page protection feature allows for additional protections at
128-byte granularity.
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Cc: Anish Moorthy <[email protected]>
Cc: David Matlack <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Co-developed-by: Yu Zhang <[email protected]>
Signed-off-by: Yu Zhang <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Fuad Tabba <[email protected]>
Tested-by: Fuad Tabba <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional
information can be supplied without setting userspace up to fail. The
padding in the new kvm_userspace_memory_region2 structure will be used to
pass a file descriptor in addition to the userspace_addr, i.e. allow
userspace to point at a file descriptor and map memory into a guest that
is NOT mapped into host userspace.
Alternatively, KVM could simply add "struct kvm_userspace_memory_region2"
without a new ioctl(), but as Paolo pointed out, adding a new ioctl()
makes detection of bad flags a bit more robust, e.g. if the new fd field
is guarded only by a flag and not a new ioctl(), then a userspace bug
(setting a "bad" flag) would generate out-of-bounds access instead of an
-EINVAL error.
Cc: Jarkko Sakkinen <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Fuad Tabba <[email protected]>
Tested-by: Fuad Tabba <[email protected]>
Message-Id: <[email protected]>
Acked-by: Kai Huang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Rounding up the queue depth to power of two is not a hardware requirement.
In order to optimize the per connection memory usage, removing drivers
implementation which round up to the queue depths to the power of 2.
Implements a mask to maintain backward compatibility with older
library.
Signed-off-by: Chandramohan Akula <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
When IMA becomes a proper LSM we will reintroduce an appropriate
LSM ID, but drop it from the userspace API for now in an effort
to put an end to debates around the naming of the LSM ID macro.
Reviewed-by: Roberto Sassu <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
|
|
Wireup lsm_get_self_attr, lsm_set_self_attr and lsm_list_modules
system calls.
Signed-off-by: Casey Schaufler <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Cc: [email protected]
Reviewed-by: Mickaël Salaün <[email protected]>
[PM: forward ported beyond v6.6 due merge window changes]
Signed-off-by: Paul Moore <[email protected]>
|
|
Create a system call lsm_get_self_attr() to provide the security
module maintained attributes of the current process.
Create a system call lsm_set_self_attr() to set a security
module maintained attribute of the current process.
Historically these attributes have been exposed to user space via
entries in procfs under /proc/self/attr.
The attribute value is provided in a lsm_ctx structure. The structure
identifies the size of the attribute, and the attribute value. The format
of the attribute value is defined by the security module. A flags field
is included for LSM specific information. It is currently unused and must
be 0. The total size of the data, including the lsm_ctx structure and any
padding, is maintained as well.
struct lsm_ctx {
__u64 id;
__u64 flags;
__u64 len;
__u64 ctx_len;
__u8 ctx[];
};
Two new LSM hooks are used to interface with the LSMs.
security_getselfattr() collects the lsm_ctx values from the
LSMs that support the hook, accounting for space requirements.
security_setselfattr() identifies which LSM the attribute is
intended for and passes it along.
Signed-off-by: Casey Schaufler <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Serge Hallyn <[email protected]>
Reviewed-by: John Johansen <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
|
|
Create a struct lsm_id to contain identifying information about Linux
Security Modules (LSMs). At inception this contains the name of the
module and an identifier associated with the security module. Change
the security_add_hooks() interface to use this structure. Change the
individual modules to maintain their own struct lsm_id and pass it to
security_add_hooks().
The values are for LSM identifiers are defined in a new UAPI
header file linux/lsm.h. Each existing LSM has been updated to
include it's LSMID in the lsm_id.
The LSM ID values are sequential, with the oldest module
LSM_ID_CAPABILITY being the lowest value and the existing modules
numbered in the order they were included in the main line kernel.
This is an arbitrary convention for assigning the values, but
none better presents itself. The value 0 is defined as being invalid.
The values 1-99 are reserved for any special case uses which may
arise in the future. This may include attributes of the LSM
infrastructure itself, possibly related to namespacing or network
attribute management. A special range is identified for such attributes
to help reduce confusion for developers unfamiliar with LSMs.
LSM attribute values are defined for the attributes presented by
modules that are available today. As with the LSM IDs, The value 0
is defined as being invalid. The values 1-99 are reserved for any
special case uses which may arise in the future.
Cc: linux-security-module <[email protected]>
Signed-off-by: Casey Schaufler <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Serge Hallyn <[email protected]>
Reviewed-by: Mickael Salaun <[email protected]>
Reviewed-by: John Johansen <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Nacked-by: Tetsuo Handa <[email protected]>
[PM: forward ported beyond v6.6 due merge window changes]
Signed-off-by: Paul Moore <[email protected]>
|
|
Currently we only support configuration for number of channels and
sample rate.
Reviewed-by: Péter Ujfalusi <[email protected]>
Reviewed-by: Iuliana Prodan <[email protected]>
Signed-off-by: Daniel Baluta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
There are two problems with the current method of determining the
virtio-gpu debug name.
1) TASK_COMM_LEN is defined to be 16 bytes only, and this is a
Linux kernel idiom (see PR_SET_NAME + PR_GET_NAME). Though,
Android/FreeBSD get around this via setprogname(..)/getprogname(..)
in libc.
On Android, names longer than 16 bytes are common. For example,
one often encounters a program like "com.android.systemui".
The virtio-gpu spec allows the debug name to be up to 64 bytes, so
ideally userspace should be able to set debug names up to 64 bytes.
2) The current implementation determines the debug name using whatever
task initiated virtgpu. This is could be a "RenderThread" of a
larger program, when we actually want to propagate the debug name
of the program.
To fix these issues, add a new CONTEXT_INIT param that allows userspace
to set the debug name when creating a context.
It takes a null-terminated C-string as the param value. The length of the
string (excluding the terminator) **should** be <= 64 bytes. Otherwise,
the debug_name will be truncated to 64 bytes.
Link to open-source userspace:
https://android-review.googlesource.com/c/platform/hardware/google/gfxstream/+/2787176
Signed-off-by: Gurchetan Singh <[email protected]>
Reviewed-by: Josh Simonot <[email protected]>
Reviewed-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Dmitry Osipenko <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently get_perf_callchain only supports user stack walking for
the current task. Passing the correct *crosstask* param will return
0 frames if the task passed to __bpf_get_stack isn't the current
one instead of a single incorrect frame/address. This change
passes the correct *crosstask* param but also does a preemptive
check in __bpf_get_stack if the task is current and returns
-EOPNOTSUPP if it is not.
This issue was found using bpf_get_task_stack inside a BPF
iterator ("iter/task"), which iterates over all tasks.
bpf_get_task_stack works fine for fetching kernel stacks
but because get_perf_callchain relies on the caller to know
if the requested *task* is the current one (via *crosstask*)
it was failing in a confusing way.
It might be possible to get user stacks for all tasks utilizing
something like access_process_vm but that requires the bpf
program calling bpf_get_task_stack to be sleepable and would
therefore be a breaking change.
Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
Signed-off-by: Jordan Rome <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Martin and Vadim reported a verifier failure with bpf_dynptr usage.
The issue is mentioned but Vadim workarounded the issue with source
change ([1]). The below describes what is the issue and why there
is a verification failure.
int BPF_PROG(skb_crypto_setup) {
struct bpf_dynptr algo, key;
...
bpf_dynptr_from_mem(..., ..., 0, &algo);
...
}
The bpf program is using vmlinux.h, so we have the following definition in
vmlinux.h:
struct bpf_dynptr {
long: 64;
long: 64;
};
Note that in uapi header bpf.h, we have
struct bpf_dynptr {
long: 64;
long: 64;
} __attribute__((aligned(8)));
So we lost alignment information for struct bpf_dynptr by using vmlinux.h.
Let us take a look at a simple program below:
$ cat align.c
typedef unsigned long long __u64;
struct bpf_dynptr_no_align {
__u64 :64;
__u64 :64;
};
struct bpf_dynptr_yes_align {
__u64 :64;
__u64 :64;
} __attribute__((aligned(8)));
void bar(void *, void *);
int foo() {
struct bpf_dynptr_no_align a;
struct bpf_dynptr_yes_align b;
bar(&a, &b);
return 0;
}
$ clang --target=bpf -O2 -S -emit-llvm align.c
Look at the generated IR file align.ll:
...
%a = alloca %struct.bpf_dynptr_no_align, align 1
%b = alloca %struct.bpf_dynptr_yes_align, align 8
...
The compiler dictates the alignment for struct bpf_dynptr_no_align is 1 and
the alignment for struct bpf_dynptr_yes_align is 8. So theoretically compiler
could allocate variable %a with alignment 1 although in reallity the compiler
may choose a different alignment by considering other local variables.
In [1], the verification failure happens because variable 'algo' is allocated
on the stack with alignment 4 (fp-28). But the verifer wants its alignment
to be 8.
To fix the issue, the RFC patch ([1]) tried to add '__attribute__((aligned(8)))'
to struct bpf_dynptr plus other similar structs. Andrii suggested that
we could directly modify uapi struct with named fields like struct 'bpf_iter_num':
struct bpf_iter_num {
/* opaque iterator state; having __u64 here allows to preserve correct
* alignment requirements in vmlinux.h, generated from BTF
*/
__u64 __opaque[1];
} __attribute__((aligned(8)));
Indeed, adding named fields for those affected structs in this patch can preserve
alignment when bpf program references them in vmlinux.h. With this patch,
the verification failure in [1] can also be resolved.
[1] https://lore.kernel.org/bpf/[email protected]/
[2] https://lore.kernel.org/bpf/[email protected]/
Cc: Vadim Fedorenko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Suggested-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp:
- fix usec timestamps with TCP fastopen
- fix possible out-of-bounds reads in tcp_hash_fail()
- fix SYN option room calculation for TCP-AO
- tcp_sigpool: fix some off by one bugs
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come"
* tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
net: ti: icss-iep: fix setting counter value
ptp: fix corrupted list in ptp_open
ptp: ptp_read should not release queue
net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
net: kcm: fill in MODULE_DESCRIPTION()
net/sched: act_ct: Always fill offloading tuple iifidx
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
ipvs: add missing module descriptions
netfilter: nf_tables: remove catchall element in GC sync path
netfilter: add missing module descriptions
drivers/net/ppp: use standard array-copy-function
net: enetc: shorten enetc_setup_xdp_prog() error message to fit NETLINK_MAX_FMTMSG_LEN
virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()
r8169: respect userspace disabling IFF_MULTICAST
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
net: phylink: initialize carrier state at creation
test/vsock: add dobule bind connect test
test/vsock: refactor vsock_accept
...
|
|
BTRFS_EXTENT_OWNER_REF_KEY is the type of simple quotas extent owner
refs. This special inline ref goes in front of all other inline refs.
In general, inline refs have a required sorted order s.t. type never
decreases (among other requirements). This was recently reified into a
tree-checker and fsck rule, which broke simple quotas. To be fair,
though, in a sense, the new owner ref item had also violated that not
yet fully enforced requirement.
This fix brings the owner ref item into compliance with the requirement
that inline ref type never decrease.
btrfs/301 exercises this behavior and should pass again with this fix.
Fixes: d9a620f77e33 ("btrfs: new inline ref storing owning subvol of data extents")
Signed-off-by: Boris Burkov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Usages of DRM_IVPU_BO_UNCACHED should be replaced by DRM_IVPU_BO_WC.
There is no functional benefit from DRM_IVPU_BO_UNCACHED if these
buffers are never mapped to host VM.
This allows to cut the buffer handling code in the kernel driver
by half.
Usage of DRM_IVPU_BO_UNCACHED buffers was removed from user-space
driver and will not be part of first UMD release.
Signed-off-by: Jacek Lawrynowicz <[email protected]>
Reviewed-by: Jeffrey Hugo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Pull more drm updates from Dave Airlie:
"Geert pointed out I missed the renesas reworks in my main pull, so
this pull contains the renesas next work for atomic conversion and DT
support.
It also contains a bunch of amdgpu and some small ssd13xx fixes.
renesas:
- atomic conversion
- DT support
ssd13xx:
- dt binding fix for ssd132x
- Initialize ssd130x crtc_state to NULL.
amdgpu:
- Fix RAS support check
- RAS fixes
- MES fixes
- SMU13 fixes
- Contiguous memory allocation fix
- BACO fixes
- GPU reset fixes
- Min power limit fixes
- GFX11 fixes
- USB4/TB hotplug fixes
- ARM regression fix
- GFX9.4.3 fixes
- KASAN/KCSAN stack size check fixes
- SR-IOV fixes
- SMU14 fixes
- PSP13 fixes
- Display blend fixes
- Flexible array size fixes
amdkfd:
- GPUVM fix
radeon:
- Flexible array size fixes"
* tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm: (83 commits)
drm/amd/display: Enable fast update on blendTF change
drm/amd/display: Fix blend LUT programming
drm/amd/display: Program plane color setting correctly
drm/amdgpu: Query and report boot status
drm/amdgpu: Add psp v13 function to query boot status
drm/amd/swsmu: remove fw version check in sw_init.
drm/amd/swsmu: update smu v14_0_0 driver if and metrics table
drm/amdgpu: Add C2PMSG_109/126 reg field shift/masks
drm/amdgpu: Optimize the asic type fix code
drm/amdgpu: fix GRBM read timeout when do mes_self_test
drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count
drm/amd/pm: only check sriov vf flag once when creating hwmon sysfs
drm/amdgpu: Attach eviction fence on alloc
drm/amdkfd: Improve amdgpu_vm_handle_moved
drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2
drm/amd/display: Avoid NULL dereference of timing generator
drm/amdkfd: Update cache info for GFX 9.4.3
drm/amdkfd: Populate cache info for GFX 9.4.3
drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64
drm/amdgpu/smu13: drop compute workload workaround
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- the old V4L2 core videobuf kAPI was finally removed. All media
drivers should now be using VB2 kAPI
- new automotive driver: mgb4
- new platform video driver: npcm-video
- new sensor driver: mt9m114
- new TI driver used in conjunction with Cadence CSI2RX IP to bridge
TI-specific parts
- ir-rx51 was removed and the N900 DT binding was moved to the
pwm-ir-tx generic driver
- drop atomisp-specific ov5693, using the upstream driver instead
- the camss driver has gained RDI3 support for VFE 17x
- the atomisp driver now detects ISP2400 or ISP2401 at run time. No
need to set it up at build time anymore
- lots of driver fixes, cleanups and improvements
* tag 'media/v6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (377 commits)
media: nuvoton: VIDEO_NPCM_VCD_ECE should depend on ARCH_NPCM
media: venus: Fix firmware path for resources
media: venus: hfi_cmds: Replace one-element array with flex-array member and use __counted_by
media: venus: hfi_parser: Add check to keep the number of codecs within range
media: venus: hfi: add checks to handle capabilities from firmware
media: venus: hfi: fix the check to handle session buffer requirement
media: venus: hfi: add checks to perform sanity on queue pointers
media: platform: cadence: select MIPI_DPHY dependency
media: MAINTAINERS: Fix path for J721E CSI2RX bindings
media: cec: meson: always include meson sub-directory in Makefile
media: videobuf2: Fix IS_ERR checking in vb2_dc_put_userptr()
media: platform: mtk-mdp3: fix uninitialized variable in mdp_path_config()
media: mediatek: vcodec: using encoder device to alloc/free encoder memory
media: imx-jpeg: notify source chagne event when the first picture parsed
media: cx231xx: Use EP5_BUF_SIZE macro
media: siano: Drop unnecessary error check for debugfs_create_dir/file()
media: mediatek: vcodec: Handle invalid encoder vsi
media: aspeed: Drop unnecessary error check for debugfs_create_file()
Documentation: media: buffer.rst: fix V4L2_BUF_FLAG_PREPARED
Documentation: media: gen-errors.rst: fix confusing ENOTTY description
...
|
|
Commit 8cea95b0bd79 ("tools: ynl-gen: handle do ops with no input attrs")
added support for some of the previously-skipped ops in nfsd.
Regenerate the user space parsers to fill them in.
Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Chuck Lever <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull virtio updates from Michael Tsirkin:
"vhost,virtio,vdpa: features, fixes, cleanups.
vdpa/mlx5:
- VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
- new maintainer
vdpa:
- support for vq descriptor mappings
- decouple reset of iotlb mapping from device reset
and fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
vdpa_sim: implement .reset_map support
vdpa/mlx5: implement .reset_map driver op
vhost-vdpa: clean iotlb map during reset for older userspace
vdpa: introduce .compat_reset operation callback
vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
vhost-vdpa: reset vendor specific mapping to initial state in .release
vdpa: introduce .reset_map operation callback
virtio_pci: add check for common cfg size
virtio-blk: fix implicit overflow on virtio_max_dma_size
virtio_pci: add build offset check for the new common cfg items
virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
vduse: make vduse_class constant
vhost-scsi: Spelling s/preceeding/preceding/g
virtio: kdoc for struct virtio_pci_modern_device
vdpa: Update sysfs ABI documentation
MAINTAINERS: Add myself as mlx5_vdpa driver
virtio-balloon: correct the comment of virtballoon_migratepage()
mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
vdpa/mlx5: Update cvq iotlb mapping on ASID change
vdpa/mlx5: Make iotlb helper functions more generic
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
- UBI Fastmap improvements
- Minor issues found by static analysis bots in both UBI and UBIFS
- Fix for wrong dentry length UBIFS in fscrypt mode
* tag 'ubifs-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: ubifs_link: Fix wrong name len calculating when UBIFS is encrypted
ubi: block: Fix use-after-free in ubiblock_cleanup
ubifs: fix possible dereference after free
ubi: fastmap: Add control in 'UBI_IOCATT' ioctl to reserve PEBs for filling pools
ubi: fastmap: Add module parameter to control reserving filling pool PEBs
ubi: fastmap: Fix lapsed wear leveling for first 64 PEBs
ubi: fastmap: Get wl PEB even ec beyonds the 'max' if free PEBs are run out
ubi: fastmap: may_reserve_for_fm: Don't reserve PEB if fm_anchor exists
ubi: fastmap: Remove unneeded break condition while filling pools
ubi: fastmap: Wait until there are enough free PEBs before filling pools
ubi: fastmap: Use free pebs reserved for bad block handling
ubi: Replace erase_block() with sync_erase()
ubi: fastmap: Allocate memory with GFP_NOFS in ubi_update_fastmap
ubi: fastmap: erase_block: Get erase counter from wl_entry rather than flash
ubi: fastmap: Fix missed ec updating after erasing old fastmap data block
ubifs: Fix missing error code err
ubifs: Fix memory leak of bud->log_hash
ubifs: Fix some kernel-doc comments
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux
Pull unified attestation reporting from Dan Williams:
"In an ideal world there would be a cross-vendor standard attestation
report format for confidential guests along with a common device
definition to act as the transport.
In the real world the situation ended up with multiple platform
vendors inventing their own attestation report formats with the
SEV-SNP implementation being a first mover to define a custom
sev-guest character device and corresponding ioctl(). Later, this
configfs-tsm proposal intercepted an attempt to add a tdx-guest
character device and a corresponding new ioctl(). It also anticipated
ARM and RISC-V showing up with more chardevs and more ioctls().
The proposal takes for granted that Linux tolerates the vendor report
format differentiation until a standard arrives. From talking with
folks involved, it sounds like that standardization work is unlikely
to resolve anytime soon. It also takes the position that kernfs ABIs
are easier to maintain than ioctl(). The result is a shared configfs
mechanism to return per-vendor report-blobs with the option to later
support a standard when that arrives.
Part of the goal here also is to get the community into the
"uncomfortable, but beneficial to the long term maintainability of the
kernel" state of talking to each other about their differentiation and
opportunities to collaborate. Think of this like the device-driver
equivalent of the common memory-management infrastructure for
confidential-computing being built up in KVM.
As for establishing an "upstream path for cross-vendor
confidential-computing device driver infrastructure" this is something
I want to discuss at Plumbers. At present, the multiple vendor
proposals for assigning devices to confidential computing VMs likely
needs a new dedicated repository and maintainer team, but that is a
discussion for v6.8.
For now, Greg and Thomas have acked this approach and this is passing
is AMD, Intel, and Google tests.
Summary:
- Introduce configfs-tsm as a shared ABI for confidential computing
attestation reports
- Convert sev-guest to additionally support configfs-tsm alongside
its vendor specific ioctl()
- Added signed attestation report retrieval to the tdx-guest driver
forgoing a new vendor specific ioctl()
- Misc cleanups and a new __free() annotation for kvfree()"
* tag 'tsm-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux:
virt: tdx-guest: Add Quote generation support using TSM_REPORTS
virt: sevguest: Add TSM_REPORTS support for SNP_GET_EXT_REPORT
mm/slab: Add __free() support for kvfree
virt: sevguest: Prep for kernel internal get_ext_report()
configfs-tsm: Introduce a shared ABI for attestation reports
virt: coco: Add a coco/Makefile and coco/Kconfig
virt: sevguest: Fix passing a stack buffer as a scatterlist target
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
- Big pile of __counted_by attribute annotations to several structures
for bounds checking of flexible arrays at run-time
- Another big pile platform remove callback returning void changes
- Device tree device_get_match_data() usage and dropping
of_match_device() calls
- Minor driver updates to pxa, idxd fsl, hisi etc drivers
* tag 'dmaengine-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (106 commits)
dmaengine: stm32-mdma: correct desc prep when channel running
dmaengine: dw-axi-dmac: Add support DMAX_NUM_CHANNELS > 16
dmaengine: xilinx: xilinx_dma: Fix kernel doc about xilinx_dma_remove()
dmaengine: mmp_tdma: drop unused variable 'of_id'
MAINTAINERS: Add entries for NXP(Freescale) eDMA drivers
dmaengine: xilinx: xdma: Support cyclic transfers
dmaengine: xilinx: xdma: Prepare the introduction of cyclic transfers
dmaengine: Drop unnecessary of_match_device() calls
dmaengine: Use device_get_match_data()
dmaengine: pxa_dma: Annotate struct pxad_desc_sw with __counted_by
dmaengine: pxa_dma: Remove an erroneous BUG_ON() in pxad_free_desc()
dmaengine: xilinx: xdma: Use resource_size() in xdma_probe()
dmaengine: fsl-dpaa2-qdma: Remove redundant initialization owner in dpaa2_qdma_driver
dmaengine: Remove unused declaration dma_chan_cleanup()
dmaengine: mmp: fix Wvoid-pointer-to-enum-cast warning
dmaengine: qcom: fix Wvoid-pointer-to-enum-cast warning
dmaengine: fsl-edma: Remove redundant dev_err() for platform_get_irq()
dmaengine: ep93xx_dma: Annotate struct ep93xx_dma_engine with __counted_by
dmaengine: idxd: add wq driver name support for accel-config user tool
dmaengine: fsl-edma: Annotate struct struct fsl_edma_engine with __counted_by
...
|