| Age | Commit message (Collapse) | Author | Files | Lines |
|
The 8250 BCM7271 UART is not a direct match to PORT_16550A and other
generic ports do not match its hardware capabilities. PORT_ALTR matches
the rx trigger levels, but its vendor configurations are not compatible.
Unfortunately this means we need to create another port to fully capture
the hardware capabilities of the BCM7271 UART.
To alleviate some latency pressures, we default the rx trigger level to 8.
Signed-off-by: Justin Chen <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Acked-by: Doug Berger <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Currently, changing the parameters of the n_gsm mux gives no direct control
to the user whether this should trigger a mux reset or not. The decision is
solely made by the driver based on the assumption which parameter changes
are compatible or not. Therefore, the user has no means to perform an
automatic mux reset after parameter configuration for non-conflicting
changes.
Add the parameter 'flags' to 'gsm_config_ext' to force a mux reset after
ioctl setting regardless of whether the changes made require this or not
by setting this to 'GSM_FL_RESTART'. This is done similar to
'GSM_FL_RESTART' in gsm_dlci_config.flags.
Note that 'GSM_FL_RESTART' is currently the only allowed flag to allow
additions here.
Signed-off-by: Daniel Starke <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Currently, all available structure fields in gsmmux.h except those
for gsm_config are commented. Furthermore, no kernel doc comments are used.
Fix this by adding appropriate comments to the not commented fields of
gsm_config. Convert the comments of the other structs to kernel doc format.
Note that 'mru' and 'mtu' refer to the size without basic/advanced option
mode header and byte stuffing as defined in the standard in chapter 5.7.2.
Link: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516
Signed-off-by: Daniel Starke <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Currently, changing the parameters of a DLCI gives no direct control to the
user whether this should trigger a channel reset or not. The decision is
solely made by the driver based on the assumption which parameter changes
are compatible or not. Therefore, the user has no means to perform an
automatic channel reset after parameter configuration for non-conflicting
changes.
Add the parameter 'flags' to 'gsm_dlci_config' to force a channel reset
after ioctl setting regardless of whether the changes made require this or
not by setting this to 'GSM_FL_RESTART'.
Note that 'GSM_FL_RESTART' is currently the only allow flag to allow
additions here.
Signed-off-by: Daniel Starke <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
As suggested by Kees[1], replace the old-style 0-element array members
of multiple structs in ebtables.h with modern C99 flexible array.
[1]: https://lore.kernel.org/all/[email protected]/
[ [email protected]:
keep struct ebt_entry_target as-is, causes compiler warning:
"variable sized type 'struct ebt_entry_target' not at the end of a
struct or class is a GNU extension" ]
Link: https://github.com/KSPP/linux/issues/21
Signed-off-by: GONG, Ruiqi <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
|
|
When compiling with gcc 13 and CONFIG_FORTIFY_SOURCE=y, the following
warning appears:
In function ‘fortify_memcpy_chk’,
inlined from ‘size_entry_mwt’ at net/bridge/netfilter/ebtables.c:2118:2:
./include/linux/fortify-string.h:592:25: error: call to ‘__read_overflow2_field’
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Werror=attribute-warning]
592 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The compiler is complaining:
memcpy(&offsets[1], &entry->watchers_offset,
sizeof(offsets) - sizeof(offsets[0]));
where memcpy reads beyong &entry->watchers_offset to copy
{watchers,target,next}_offset altogether into offsets[]. Silence the
warning by wrapping these three up via struct_group().
Signed-off-by: GONG, Ruiqi <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
|
|
For the last couple of years Linux kernel got rid of a few architectures
and many platforms. Hence some PORT_* definitions in the serial_core.h
become unused and redundant. Remove them for good.
Removed IDs are checked for users against Debian Code Search engine.
Hence safe to remove as there are no consumers found (only providers).
While at it, add a note about 0-13, that are defined in the other file.
Signed-off-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Adding support to specify pid for uprobe_multi link and the uprobes
are created only for task with given pid value.
Using the consumer.filter filter callback for that, so the task gets
filtered during the uprobe installation.
We still need to check the task during runtime in the uprobe handler,
because the handler could get executed if there's another system
wide consumer on the same uprobe (thanks Oleg for the insight).
Cc: Oleg Nesterov <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Adding support to specify cookies array for uprobe_multi link.
The cookies array share indexes and length with other uprobe_multi
arrays (offsets/ref_ctr_offsets).
The cookies[i] value defines cookie for i-the uprobe and will be
returned by bpf_get_attach_cookie helper when called from ebpf
program hooked to that specific uprobe.
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: Yafang Shao <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Adding new multi uprobe link that allows to attach bpf program
to multiple uprobes.
Uprobes to attach are specified via new link_create uprobe_multi
union:
struct {
__aligned_u64 path;
__aligned_u64 offsets;
__aligned_u64 ref_ctr_offsets;
__u32 cnt;
__u32 flags;
} uprobe_multi;
Uprobes are defined for single binary specified in path and multiple
calling sites specified in offsets array with optional reference
counters specified in ref_ctr_offsets array. All specified arrays
have length of 'cnt'.
The 'flags' supports single bit for now that marks the uprobe as
return probe.
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: Yafang Shao <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Switching BPF_F_KPROBE_MULTI_RETURN macro to anonymous enum,
so it'd show up in vmlinux.h. There's not functional change
compared to having this as macro.
Acked-by: Yafang Shao <[email protected]>
Suggested-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The v0 extent item has been deprecated for a long time, and we don't have
any report from the community either.
So it's time to remove the v0 extent specific error handling, and just
treat them as regular extent tree corruption.
This patch would remove the btrfs_print_v0_err() helper, and enhance the
involved error handling to treat them just as any extent tree
corruption. No reports regarding v0 extents have been seen since the
graceful handling was added in 2018.
This involves:
- btrfs_backref_add_tree_node()
This change is a little tricky, the new code is changed to only handle
BTRFS_TREE_BLOCK_REF_KEY and BTRFS_SHARED_BLOCK_REF_KEY.
But this is safe, as we have rejected any unknown inline refs through
btrfs_get_extent_inline_ref_type().
For keyed backrefs, we're safe to skip anything we don't know (that's
if it can pass tree-checker in the first place).
- btrfs_lookup_extent_info()
- lookup_inline_extent_backref()
- run_delayed_extent_op()
- __btrfs_free_extent()
- add_tree_block()
Regular error handling of unexpected extent tree item, and abort
transaction (if we have a trans handle).
- remove_extent_data_ref()
It's pretty much the same as the regular rejection of unknown backref
key.
But for this particular case, we can also remove a BUG_ON().
- extent_data_ref_count()
We can remove the BTRFS_EXTENT_REF_V0_KEY BUG_ON(), as it would be
rejected by the only caller.
- btrfs_print_leaf()
Remove the handling for BTRFS_EXTENT_REF_V0_KEY.
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
There isn't any reason to not support REQ_OP_ZONE_RESET_ALL given everything
is actually handled in userspace, not mention it is pretty easy to support
RESET_ALL.
So enable REQ_OP_ZONE_RESET_ALL and let userspace handle it.
Verified by 'tools/zbc_reset_zone -all /dev/ublkb0' in libzbc[1] with
libublk-rs based ublk-zoned target prototype[2], follows command line
for creating ublk-zoned:
cargo run --example zoned -- add -1 1024 # add $dev_id $DEV_SIZE
[1] https://github.com/westerndigitalcorporation/libzbc
[2] https://github.com/ming1/libublk-rs/tree/zoned.v2
Cc: Niklas Cassel <[email protected]>
Cc: Damien Le Moal <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add SMC_NLA_LGR_R_V2_MAX_CONNS and SMC_NLA_LGR_R_V2_MAX_LINKS
to SMCR v2 linkgroup netlink attribute SMC_NLA_LGR_R_V2 for
linkgroup's detail info showing.
Signed-off-by: Guangguan Wang <[email protected]>
Reviewed-by: Jan Karcher <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Update the userfaultfd API to advertise this feature as part of feature
flags and supported ioctls (returned upon registration).
Add basic documentation describing the new feature.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Axel Rasmussen <[email protected]>
Acked-by: Peter Xu <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Brian Geffon <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Huang, Ying <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: Jan Alexander Steffens (heftig) <[email protected]>
Cc: Jiaqi Yan <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suleiman Souhlal <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: T.J. Alumbaugh <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: ZhangPeng <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The basic idea here is to "simulate" memory poisoning for VMs. A VM
running on some host might encounter a memory error, after which some
page(s) are poisoned (i.e., future accesses SIGBUS). They expect that
once poisoned, pages can never become "un-poisoned". So, when we live
migrate the VM, we need to preserve the poisoned status of these pages.
When live migrating, we try to get the guest running on its new host as
quickly as possible. So, we start it running before all memory has been
copied, and before we're certain which pages should be poisoned or not.
So the basic way to use this new feature is:
- On the new host, the guest's memory is registered with userfaultfd, in
either MISSING or MINOR mode (doesn't really matter for this purpose).
- On any first access, we get a userfaultfd event. At this point we can
communicate with the old host to find out if the page was poisoned.
- If so, we can respond with a UFFDIO_POISON - this places a swap marker
so any future accesses will SIGBUS. Because the pte is now "present",
future accesses won't generate more userfaultfd events, they'll just
SIGBUS directly.
UFFDIO_POISON does not handle unmapping previously-present PTEs. This
isn't needed, because during live migration we want to intercept all
accesses with userfaultfd (not just writes, so WP mode isn't useful for
this). So whether minor or missing mode is being used (or both), the PTE
won't be present in any case, so handling that case isn't needed.
Similarly, UFFDIO_POISON won't replace existing PTE markers. This might
be okay to do, but it seems to be safer to just refuse to overwrite any
existing entry (like a UFFD_WP PTE marker).
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Axel Rasmussen <[email protected]>
Acked-by: Peter Xu <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Brian Geffon <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Huang, Ying <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: Jan Alexander Steffens (heftig) <[email protected]>
Cc: Jiaqi Yan <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suleiman Souhlal <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: T.J. Alumbaugh <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: ZhangPeng <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Acked-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Under nested IOMMU translation, userspace owns the stage-1 translation
table (e.g. the stage-1 page table of Intel VT-d or the context table of
ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and
need to be compatible with the underlying IOMMU hardware. Hence, userspace
should know the IOMMU hardware capability before creating and configuring
the stage-1 translation table to kernel.
This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information
(a.k.a capability) for a given device. The returned data is vendor
specific, userspace needs to decode it with the structure by the output
@out_data_type field.
As only physical devices have IOMMU hardware, so this will return error if
the given device is not a physical device.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Co-developed-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Introduce a new iommu op to get the IOMMU hardware capabilities for
iommufd. This information will be used by any vIOMMU driver which is owned
by userspace.
This op chooses to make the special parameters opaque to the core. This
suits the current usage model where accessing any of the IOMMU device
special parameters does require a userspace driver that matches the kernel
driver. If a need for common parameters, implemented similarly by several
drivers, arises then there's room in the design to grow a generic
parameter set as well. No wrapper API is added as it is supposed to be
used by iommufd only.
Different IOMMU hardware would have different hardware information. So the
information reported differs as well. To let the external user understand
the difference, enum iommu_hw_info_type is defined. For the iommu drivers
that are capable to report hardware information, it should have a unique
iommu_hw_info_type and return to caller. For the driver doesn't report
hardware information, caller just uses IOMMU_HW_INFO_TYPE_NONE if a type
is required.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Co-developed-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Add prng attribute to struct netem_sched_data and
allows setting the seed of the PRNG through netlink
using the new TCA_NETEM_PRNG_SEED attribute.
The PRNG attribute is not actually used yet.
Signed-off-by: François Michel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
GCC and Clang's current RFCs name this attribute "counted_by", and have
moved away from using a string for the member name. Update the kernel's
macros to match. Additionally provide a UAPI no-op macro for UAPI structs
that will gain annotations.
Cc: Miguel Ojeda <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Fixes: dd06e72e68bc ("Compiler Attributes: Add __counted_by macro")
Acked-by: Miguel Ojeda <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
The VFIO_DEVICE_GET_INFO, VFIO_DEVICE_GET_REGION_INFO, and
VFIO_IOMMU_GET_INFO ioctls fill in an info struct followed by capability
structs:
+------+---------+---------+-----+
| info | caps[0] | caps[1] | ... |
+------+---------+---------+-----+
Both the info and capability struct sizes are not always multiples of
sizeof(u64), leaving u64 fields in later capability structs misaligned.
Userspace applications currently need to handle misalignment manually in
order to support CPU architectures and programming languages with strict
alignment requirements.
Make life easier for userspace by ensuring alignment in the kernel. This
is done by padding info struct definitions and by copying out zeroes
after capability structs that are not aligned.
The new layout is as follows:
+------+---------+---+---------+-----+
| info | caps[0] | 0 | caps[1] | ... |
+------+---------+---+---------+-----+
In this example caps[0] has a size that is not multiples of sizeof(u64),
so zero padding is added to align the subsequent structure.
Adding zero padding between structs does not break the uapi. The memory
layout is specified by the info.cap_offset and caps[i].next fields
filled in by the kernel. Applications use these field values to locate
structs and are therefore unaffected by the addition of zero padding.
Note that code that copies out info structs with padding is updated to
always zero the struct and copy out as many bytes as userspace
requested. This makes the code shorter and avoids potential information
leaks by ensuring padding is initialized.
Originally-by: Alex Williamson <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
Use the same structure as statx.
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
FOPEN_DIRECT_IO is usually set by fuse daemon to indicate need of strong
coherency, e.g. network filesystems. Thus shared mmap is disabled since it
leverages page cache and may write to it, which may cause inconsistence.
But FOPEN_DIRECT_IO can be used not for coherency but to reduce memory
footprint as well, e.g. reduce guest memory usage with virtiofs.
Therefore, add a new fuse init flag FUSE_DIRECT_IO_RELAX to relax
restrictions in that mode, currently, it allows shared mmap. One thing to
note is to make sure it doesn't break coherency in your use case.
Signed-off-by: Hao Xu <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
The use of the "class" argument name in the ioprio_value() inline
function in include/uapi/linux/ioprio.h confuses C++ compilers
resulting in compilation errors such as:
/usr/include/linux/ioprio.h:110:43: error: expected primary-expression before ‘int’
110 | static __always_inline __u16 ioprio_value(int class, int level, int hint)
| ^~~
for user C++ programs including linux/ioprio.h.
Avoid these errors by renaming the arguments of the ioprio_value()
function to prioclass, priolevel and priohint. For consistency, the
arguments of the IOPRIO_PRIO_VALUE() and IOPRIO_PRIO_VALUE_HINT() macros
are also renamed in the same manner.
Reported-by: Igor Pylypiv <[email protected]>
Fixes: 01584c1e2337 ("scsi: block: Improve ioprio value validity checks")
Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Tested-by: Igor Pylypiv <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Required for following patches.
Resolve merge conflict by using the hunk from the for-next branch and
shifting the iommufd_object_deref_user() into iommufd_hw_pagetable_put()
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
We need the USB and Thunderbolt fixes in here to build on.
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Summary
=======
This introduces FSCONFIG_CMD_CREATE_EXCL which will allows userspace to
implement something like mount -t ext4 --exclusive /dev/sda /B which
fails if a superblock for the requested filesystem does already exist:
Before this patch
-----------------
$ sudo ./move-mount -f xfs -o source=/dev/sda4 /A
Requesting filesystem type xfs
Mount options requested: source=/dev/sda4
Attaching mount at /A
Moving single attached mount
Setting key(source) with val(/dev/sda4)
$ sudo ./move-mount -f xfs -o source=/dev/sda4 /B
Requesting filesystem type xfs
Mount options requested: source=/dev/sda4
Attaching mount at /B
Moving single attached mount
Setting key(source) with val(/dev/sda4)
After this patch with --exclusive as a switch for FSCONFIG_CMD_CREATE_EXCL
--------------------------------------------------------------------------
$ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /A
Requesting filesystem type xfs
Request exclusive superblock creation
Mount options requested: source=/dev/sda4
Attaching mount at /A
Moving single attached mount
Setting key(source) with val(/dev/sda4)
$ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /B
Requesting filesystem type xfs
Request exclusive superblock creation
Mount options requested: source=/dev/sda4
Attaching mount at /B
Moving single attached mount
Setting key(source) with val(/dev/sda4)
Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing filesystem not allowed
Details
=======
As mentioned on the list (cf. [1]-[3]) mount requests like
mount -t ext4 /dev/sda /A are ambigous for userspace. Either a new
superblock has been created and mounted or an existing superblock has
been reused and a bind-mount has been created.
This becomes clear in the following example where two processes create
the same mount for the same block device:
P1 P2
fd_fs = fsopen("ext4"); fd_fs = fsopen("ext4");
fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda");
fsconfig(fd_fs, FSCONFIG_SET_STRING, "dax", "always"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "resuid", "1000");
// wins and creates superblock
fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...)
// finds compatible superblock of P1
// spins until P1 sets SB_BORN and grabs a reference
fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...)
fd_mnt1 = fsmount(fd_fs); fd_mnt2 = fsmount(fd_fs);
move_mount(fd_mnt1, "/A") move_mount(fd_mnt2, "/B")
Not just does P2 get a bind-mount but the mount options that P2
requestes are silently ignored. The VFS itself doesn't, can't and
shouldn't enforce filesystem specific mount option compatibility. It
only enforces incompatibility for read-only <-> read-write transitions:
mount -t ext4 /dev/sda /A
mount -t ext4 -o ro /dev/sda /B
The read-only request will fail with EBUSY as the VFS can't just
silently transition a superblock from read-write to read-only or vica
versa without risking security issues.
To userspace this silent superblock reuse can become a security issue in
because there is currently no straightforward way for userspace to know
that they did indeed manage to create a new superblock and didn't just
reuse an existing one.
This adds a new FSCONFIG_CMD_CREATE_EXCL command to fsconfig() that
returns EBUSY if an existing superblock would be reused. Userspace that
needs to be sure that it did create a new superblock with the requested
mount options can request superblock creation using this command. If the
command succeeds they can be sure that they did create a new superblock
with the requested mount options.
This requires the new mount api. With the old mount api it would be
necessary to plumb this through every legacy filesystem's
file_system_type->mount() method. If they want this feature they are
most welcome to switch to the new mount api.
Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on
each high-level superblock creation helper:
(1) get_tree_nodev()
Always allocate new superblock. Hence, FSCONFIG_CMD_CREATE and
FSCONFIG_CMD_CREATE_EXCL are equivalent.
The binderfs or overlayfs filesystems are examples.
(4) get_tree_keyed()
Finds an existing superblock based on sb->s_fs_info. Hence,
FSCONFIG_CMD_CREATE would reuse an existing superblock whereas
FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY.
The mqueue or nfsd filesystems are examples.
(2) get_tree_bdev()
This effectively works like get_tree_keyed().
The ext4 or xfs filesystems are examples.
(3) get_tree_single()
Only one superblock of this filesystem type can ever exist.
Hence, FSCONFIG_CMD_CREATE would reuse an existing superblock
whereas FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY.
The securityfs or configfs filesystems are examples.
Note that some single-instance filesystems never destroy the
superblock once it has been created during the first mount. For
example, if securityfs has been mounted at least onces then the
created superblock will never be destroyed again as long as there is
still an LSM making use it. Consequently, even if securityfs is
unmounted and the superblock seemingly destroyed it really isn't
which means that FSCONFIG_CMD_CREATE_EXCL will continue rejecting
reusing an existing superblock.
This is acceptable thugh since special purpose filesystems such as
this shouldn't have a need to use FSCONFIG_CMD_CREATE_EXCL anyway
and if they do it's probably to make sure that mount options aren't
ignored.
Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on
filesystems that make use of the low-level sget_fc() helper directly.
They're all effectively variants on get_tree_keyed(), get_tree_bdev(),
or get_tree_nodev():
(5) mtd_get_sb()
Similar logic to get_tree_keyed().
(6) afs_get_tree()
Similar logic to get_tree_keyed().
(7) ceph_get_tree()
Similar logic to get_tree_keyed().
Already explicitly allows forcing the allocation of a new superblock
via CEPH_OPT_NOSHARE. This turns it into get_tree_nodev().
(8) fuse_get_tree_submount()
Similar logic to get_tree_nodev().
(9) fuse_get_tree()
Forces reuse of existing FUSE superblock.
Forces reuse of existing superblock if passed in file refers to an
existing FUSE connection.
If FSCONFIG_CMD_CREATE_EXCL is specified together with an fd
referring to an existing FUSE connections this would cause the
superblock reusal to fail. If reusing is the intent then
FSCONFIG_CMD_CREATE_EXCL shouldn't be specified.
(10) fuse_get_tree()
-> get_tree_nodev()
Same logic as in get_tree_nodev().
(11) fuse_get_tree()
-> get_tree_bdev()
Same logic as in get_tree_bdev().
(12) virtio_fs_get_tree()
Same logic as get_tree_keyed().
(13) gfs2_meta_get_tree()
Forces reuse of existing gfs2 superblock.
Mounting gfs2meta enforces that a gf2s superblock must already
exist. If not, it will error out. Consequently, mounting gfs2meta
with FSCONFIG_CMD_CREATE_EXCL would always fail. If reusing is the
intent then FSCONFIG_CMD_CREATE_EXCL shouldn't be specified.
(14) kernfs_get_tree()
Similar logic to get_tree_keyed().
(15) nfs_get_tree_common()
Similar logic to get_tree_keyed().
Already explicitly allows forcing the allocation of a new superblock
via NFS_MOUNT_UNSHARED. This effectively turns it into
get_tree_nodev().
Link: [1] https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner
Link: [2] https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner
Link: [3] https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@brauner
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Aleksa Sarai <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
From: Eric Garver <[email protected]>
This adds an explicit drop action. This is used by OVS to drop packets
for which it cannot determine what to do. An explicit action in the
kernel allows passing the reason _why_ the packet is being dropped or
zero to indicate no particular error happened (i.e: OVS intentionally
dropped the packet).
Since the error codes coming from userspace mean nothing for the kernel,
we squash all of them into only two drop reasons:
- OVS_DROP_EXPLICIT_WITH_ERROR to indicate a non-zero value was passed
- OVS_DROP_EXPLICIT to indicate a zero value was passed (no error)
e.g. trace all OVS dropped skbs
# perf trace -e skb:kfree_skb --filter="reason >= 0x30000"
[..]
106.023 ping/2465 skb:kfree_skb(skbaddr: 0xffffa0e8765f2000, \
location:0xffffffffc0d9b462, protocol: 2048, reason: 196611)
reason: 196611 --> 0x30003 (OVS_DROP_EXPLICIT)
Also, this patch allows ovs-dpctl.py to add explicit drop actions as:
"drop" -> implicit empty-action drop
"drop(0)" -> explicit non-error action drop
"drop(42)" -> explicit error action drop
Signed-off-by: Eric Garver <[email protected]>
Co-developed-by: Adrian Moreno <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We need the char/misc fixes in here as well to build on top of.
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joel/fsi into char-misc-next
Joen writes:
FSI changes for v6.6
* New drivers for the I2C Responder master and SCOM device
* Misc janitor fixes
* tag 'fsi-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/fsi:
fsi: fix some spelling mistakes in comment
fsi: master-ast-cf: Add MODULE_FIRMWARE macro
docs: ABI: fix spelling/grammar in SBEFIFO timeout interface
fsi: Add I2C Responder SCOM driver
fsi: Add IBM I2C Responder virtual FSI master
dt-bindings: fsi: Document the IBM I2C Responder virtual FSI master
fsi: Lock mutex for master device registration
fsi: Improve master indexing
fsi: core: Switch to ida_alloc/free
fsi: core: Fix legacy minor numbering
fsi: core: Add trace events for scan and unregister
fsi: aspeed: Reset master errors after CFAM reset
fsi: sbefifo: Remove limits on user-specified read timeout
fsi: sbefifo: Add configurable in-command timeout
fsi: sbefifo: Don't check status during probe
fsi: Use of_match_table for bus matching if specified
fsi: Add aliased device numbering
fsi: Move fsi_slave structure definition to header
fsi: Use of_property_read_reg() to parse "reg"
fsi: Explicitly include correct DT includes
|
|
Define one uncompressed capture format V4L2_PIX_FMT_MT2110R in order to
support 10bit for H264 in mt8195.
Signed-off-by: Mingjia Zhang <[email protected]>
Co-developed-by: Yunfei Dong <[email protected]>
Signed-off-by: Yunfei Dong <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
|
|
Define one uncompressed capture format V4L2_PIX_FMT_MT2110T in order to
support 10bit for AV1/VP9/HEVC in mt8195.
Signed-off-by: Mingjia Zhang <[email protected]>
Co-developed-by: Yunfei Dong <[email protected]>
Signed-off-by: Yunfei Dong <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
|
|
Enable io_uring commands on network sockets. Create two new
SOCKET_URING_OP commands that will operate on sockets.
In order to call ioctl on sockets, use the file_operations->io_uring_cmd
callbacks, and map it to a uring socket function, which handles the
SOCKET_URING_OP accordingly, and calls socket ioctls.
This patches was tested by creating a new test case in liburing.
Link: https://github.com/leitao/liburing/tree/io_uring_cmd
Signed-off-by: Breno Leitao <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Wireless USB has long been defunct, and kernel support for it was
removed in 2020 by commit caa6772db4c1 ("Staging: remove wusbcore and
UWB from the kernel tree.").
Nevertheless, some vestiges of the old implementation still clutter up
the USB subsystem and one or two other places. Let's get rid of them
once and for all.
The only parts still left are the user-facing APIs in
include/uapi/linux/usb/ch9.h. (There are also a couple of misleading
instances, such as the Sierra Wireless USB modem, which is a USB modem
made by Sierra Wireless.)
Signed-off-by: Alan Stern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Add new shmem quota format, its quota_format_ops together with
dquot_operations
Signed-off-by: Lukas Czerner <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
A new use case for the SBEFIFO requires a long in-command timeout
as the SBE processes each part of the command before clearing the
upstream FIFO for the next part of the command.
Signed-off-by: Eddie James <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joel Stanley <[email protected]>
|
|
Add zoned storage support to ublk: report_zones and operations:
- REQ_OP_ZONE_OPEN
- REQ_OP_ZONE_CLOSE
- REQ_OP_ZONE_FINISH
- REQ_OP_ZONE_RESET
- REQ_OP_ZONE_APPEND
The zone append feature uses the `addr` field of `struct ublksrv_io_cmd` to
communicate ALBA back to the kernel. Therefore ublk must be used with the
user copy feature (UBLK_F_USER_COPY) for zoned storage support to be
available. Without this feature, ublk will not allow zoned storage support.
Signed-off-by: Andreas Hindborg <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Tested-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Adding support for bpf_get_func_ip helper for uprobe program to return
probed address for both uprobe and return uprobe.
We discussed this in [1] and agreed that uprobe can have special use
of bpf_get_func_ip helper that differs from kprobe.
The kprobe bpf_get_func_ip returns:
- address of the function if probe is attach on function entry
for both kprobe and return kprobe
- 0 if the probe is not attach on function entry
The uprobe bpf_get_func_ip returns:
- address of the probe for both uprobe and return uprobe
The reason for this semantic change is that kernel can't really tell
if the probe user space address is function entry.
The uprobe program is actually kprobe type program attached as uprobe.
One of the consequences of this design is that uprobes do not have its
own set of helpers, but share them with kprobes.
As we need different functionality for bpf_get_func_ip helper for uprobe,
I'm adding the bool value to the bpf_trace_run_ctx, so the helper can
detect that it's executed in uprobe context and call specific code.
The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which
is currently used only for executing bpf programs in uprobe.
Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe
to address that it's only used for uprobes and that it sets the
run_ctx.is_uprobe as suggested by Yafang Shao.
Suggested-by: Andrii Nakryiko <[email protected]>
Tested-by: Alan Maguire <[email protected]>
[1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Viktor Malik <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Fixes the warning:
include/uapi/linux/sync_file.h:77: warning: Function parameter or member 'num_fences' not described in 'sync_file_info'
Fixes: 2d75c88fefb2 ("staging/android: refactor SYNC IOCTLs")
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2023-08-03
We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 84 files changed, 4026 insertions(+), 562 deletions(-).
The main changes are:
1) Add SO_REUSEPORT support for TC bpf_sk_assign from Lorenz Bauer,
Daniel Borkmann
2) Support new insns from cpu v4 from Yonghong Song
3) Non-atomically allocate freelist during prefill from YiFei Zhu
4) Support defragmenting IPv(4|6) packets in BPF from Daniel Xu
5) Add tracepoint to xdp attaching failure from Leon Hwang
6) struct netdev_rx_queue and xdp.h reshuffling to reduce
rebuild time from Jakub Kicinski
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
net: invert the netdevice.h vs xdp.h dependency
net: move struct netdev_rx_queue out of netdevice.h
eth: add missing xdp.h includes in drivers
selftests/bpf: Add testcase for xdp attaching failure tracepoint
bpf, xdp: Add tracepoint to xdp attaching failure
selftests/bpf: fix static assert compilation issue for test_cls_*.c
bpf: fix bpf_probe_read_kernel prototype mismatch
riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework
libbpf: fix typos in Makefile
tracing: bpf: use struct trace_entry in struct syscall_tp_t
bpf, devmap: Remove unused dtab field from bpf_dtab_netdev
bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry
netfilter: bpf: Only define get_proto_defrag_hook() if necessary
bpf: Fix an array-index-out-of-bounds issue in disasm.c
net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn
docs/bpf: Fix malformed documentation
bpf: selftests: Add defrag selftests
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Support not connecting client socket
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/dsa/port.c
9945c1fb03a3 ("net: dsa: fix older DSA drivers using phylink")
a88dd7538461 ("net: dsa: remove legacy_pre_march2020 detection")
https://lore.kernel.org/all/[email protected]/
net/xdp/xsk.c
3c5b4d69c358 ("net: annotate data-races around sk->sk_mark")
b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
https://lore.kernel.org/all/[email protected]/
drivers/net/ethernet/broadcom/bnxt/bnxt.c
37b61cda9c16 ("bnxt: don't handle XDP in netpoll")
2b56b3d99241 ("eth: bnxt: handle invalid Tx completions more gracefully")
https://lore.kernel.org/all/[email protected]/
Adjacent changes:
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
62da08331f1a ("net/mlx5e: Set proper IPsec source port in L4 selector")
fbd517549c32 ("net/mlx5e: Add function to get IPsec offload namespace")
drivers/net/ethernet/sfc/selftest.c
55c1528f9b97 ("sfc: fix field-spanning memcpy in selftest")
ae9d445cd41f ("sfc: Miscellaneous comment removals")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and wireless.
Nothing scary here. Feels like the first wave of regressions from v6.5
is addressed - one outstanding fix still to come in TLS for the
sendpage rework.
Current release - regressions:
- udp: fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
- dsa: fix older DSA drivers using phylink
Previous releases - regressions:
- gro: fix misuse of CB in udp socket lookup
- mlx5: unregister devlink params in case interface is down
- Revert "wifi: ath11k: Enable threaded NAPI"
Previous releases - always broken:
- sched: cls_u32: fix match key mis-addressing
- sched: bind logic fixes for cls_fw, cls_u32 and cls_route
- add bound checks to a number of places which hand-parse netlink
- bpf: disable preemption in perf_event_output helpers code
- qed: fix scheduling in a tasklet while getting stats
- avoid using APIs which are not hardirq-safe in couple of drivers,
when we may be in a hard IRQ (netconsole)
- wifi: cfg80211: fix return value in scan logic, avoid page
allocator warning
- wifi: mt76: mt7615: do not advertise 5 GHz on first PHY of MT7615D
(DBDC)
Misc:
- drop handful of inactive maintainers, put some new in place"
* tag 'net-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (98 commits)
MAINTAINERS: update TUN/TAP maintainers
test/vsock: remove vsock_perf executable on `make clean`
tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
tcp_metrics: annotate data-races around tm->tcpm_net
tcp_metrics: annotate data-races around tm->tcpm_vals[]
tcp_metrics: annotate data-races around tm->tcpm_lock
tcp_metrics: annotate data-races around tm->tcpm_stamp
tcp_metrics: fix addr_same() helper
prestera: fix fallback to previous version on same major version
udp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
net/mlx5e: Set proper IPsec source port in L4 selector
net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
net/mlx5: fs_core: Make find_closest_ft more generic
wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()
vxlan: Fix nexthop hash size
ip6mr: Fix skb_under_panic in ip6mr_cache_report()
s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
net: tap_open(): set sk_uid from current_fsuid()
net: tun_chr_open(): set sk_uid from current_fsuid()
net: dcb: choose correct policy to parse DCB_ATTR_BCN
...
|
|
Some applications (like GDB) would like to tweak shadow stack state via
ptrace. This allows for existing functionality to continue to work for
seized shadow stack applications. Provide a regset interface for
manipulating the shadow stack pointer (SSP).
There is already ptrace functionality for accessing xstate, but this
does not include supervisor xfeatures. So there is not a completely
clear place for where to put the shadow stack state. Adding it to the
user xfeatures regset would complicate that code, as it currently shares
logic with signals which should not have supervisor features.
Don't add a general supervisor xfeature regset like the user one,
because it is better to maintain flexibility for other supervisor
xfeatures to define their own interface. For example, an xfeature may
decide not to expose all of it's state to userspace, as is actually the
case for shadow stack ptrace functionality. A lot of enum values remain
to be used, so just put it in dedicated shadow stack regset.
The only downside to not having a generic supervisor xfeature regset,
is that apps need to be enlightened of any new supervisor xfeature
exposed this way (i.e. they can't try to have generic save/restore
logic). But maybe that is a good thing, because they have to think
through each new xfeature instead of encountering issues when a new
supervisor xfeature was added.
By adding a shadow stack regset, it also has the effect of including the
shadow stack state in a core dump, which could be useful for debugging.
The shadow stack specific xstate includes the SSP, and the shadow stack
and WRSS enablement status. Enabling shadow stack or WRSS in the kernel
involves more than just flipping the bit. The kernel is made aware that
it has to do extra things when cloning or handling signals. That logic
is triggered off of separate feature enablement state kept in the task
struct. So the flipping on HW shadow stack enforcement without notifying
the kernel to change its behavior would severely limit what an application
could do without crashing, and the results would depend on kernel
internal implementation details. There is also no known use for controlling
this state via ptrace today. So only expose the SSP, which is something
that userspace already has indirect control over.
Co-developed-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Tested-by: John Allen <[email protected]>
Tested-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/all/20230613001108.3040476-41-rick.p.edgecombe%40intel.com
|
|
tc flower rules support to classify ESP/AH
packets matching SPI field.
Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add interrupt_coalesce config in send_queue and receive_queue to cache user
config.
Send per virtqueue interrupt moderation config to underlying device in
order to have more efficient interrupt moderation and cpu utilization of
guest VM.
Additionally, address all the VQs when updating the global configuration,
as now the individual VQs configuration can diverge from the global
configuration.
Signed-off-by: Gavin Li <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Heng Qi <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This commit adds support for enabling IP defrag using pre-existing
netfilter defrag support. Basically all the flag does is bump a refcnt
while the link the active. Checks are also added to ensure the prog
requesting defrag support is run _after_ netfilter defrag hooks.
We also take care to avoid any issues w.r.t. module unloading -- while
defrag is active on a link, the module is prevented from unloading.
Signed-off-by: Daniel Xu <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
accept_ra_min_rtr_lft only considered the lifetime of the default route
and discarded entire RAs accordingly.
This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and
applies the value to individual RA sections; in particular, router
lifetime, PIO preferred lifetime, and RIO lifetime. If any of those
lifetimes are lower than the configured value, the specific RA section
is ignored.
In order for the sysctl to be useful to Android, it should really apply
to all lifetimes in the RA, since that is what determines the minimum
frequency at which RAs must be processed by the kernel. Android uses
hardware offloads to drop RAs for a fraction of the minimum of all
lifetimes present in the RA (some networks have very frequent RAs (5s)
with high lifetimes (2h)). Despite this, we have encountered networks
that set the router lifetime to 30s which results in very frequent CPU
wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the
WiFi firmware) entirely on such networks, it seems better to ignore the
misconfigured routers while still processing RAs from other IPv6 routers
on the same network (i.e. to support IoT applications).
The previous implementation dropped the entire RA based on router
lifetime. This turned out to be hard to expand to the other lifetimes
present in the RA in a consistent manner; dropping the entire RA based
on RIO/PIO lifetimes would essentially require parsing the whole thing
twice.
Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft")
Cc: Lorenzo Colitti <[email protected]>
Signed-off-by: Patrick Rohr <[email protected]>
Reviewed-by: Maciej Żenczykowski <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull block fixes from Jens Axboe:
"A few fixes that should go into the current kernel release, mainly:
- Set of fixes for dasd (Stefan)
- Handle interruptible waits returning because of a signal for ublk
(Ming)"
* tag 'block-6.5-2023-07-28' of git://git.kernel.dk/linux:
ublk: return -EINTR if breaking from waiting for existed users in DEL_DEV
ublk: fail to recover device if queue setup is interrupted
ublk: fail to start device if queue setup is interrupted
block: Fix a source code comment in include/uapi/linux/blkzoned.h
s390/dasd: print copy pair message only for the correct error
s390/dasd: fix hanging device after request requeue
s390/dasd: use correct number of retries for ERP requests
s390/dasd: fix hanging device after quiesce/resume
|
|
Also add support to pass topdir to ynl-regen.sh (Jakub) and call
it from the makefile to update the UAPI headers.
Signed-off-by: Stanislav Fomichev <[email protected]>
Co-developed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|