| Age | Commit message (Collapse) | Author | Files | Lines |
|
Add interpreter/jit support for new sign-extension load insns
which adds a new mode (BPF_MEMSX).
Also add verifier support to recognize these insns and to
do proper verification with new insns. In verifier, besides
to deduce proper bounds for the dst_reg, probed memory access
is also properly handled.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
This is not used, so can remove it.
Signed-off-by: YueHaibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.
A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().
userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):
userfaultfd_register
userfaultfd_set_vm_flags
vm_flags_reset
vma_start_write
down_write(&vma->vm_lock->lock)
vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
up_write(&vma->vm_lock->lock)
vm_flags_init
[sets VM_UFFD_* in __vm_flags]
vma->vm_userfaultfd_ctx.ctx = ctx
mmap_write_unlock
vma_end_write_all
WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]
There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked. That's bad, we definitely need a
store-release for the unlock operation.
The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock. There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).
On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:
lock_vma_under_rcu
vma_start_read
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
down_read_trylock(&vma->vm_lock->lock)
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
userfaultfd_armed
checks vma->vm_flags & __VM_UFFD_FLAGS
Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked. To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.
Some of the comment wording is based on suggestions by Suren.
BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything. I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write(). If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <[email protected]>
Reviewed-by: Suren Baghdasaryan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Command stats is an array with more than 2K entries, which amounts to
~180KB. This is way more than actually needed, as only ~190 entries
are being used.
Therefore, replace the array with xarray.
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Downstream patch will split mlx5_cmd_init() to probe and reload
routines. As a preparation, organize mlx5_cmd struct so that any
field that will be used in the reload routine are grouped at new
nested struct.
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Update devcom infrastructure to be more generic, without
depending on max supported ports definition or a device guid,
and also more encapsulated so callers don't need to pass
the register devcom component id per event call.
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When seq_show_option_n() is used, it is for non-string memory that
happens to be printable bytes. As such, we must use memcpy() to copy the
bytes and then explicitly NUL-terminate the result.
Cc: Andy Shevchenko <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Muchun Song <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
gcc gets confused when -ftrivial-auto-var-init=pattern is used on sparse
bit fields such as 'struct spi_mem_op', which caused the previous false
positive warning about an uninitialized variable:
drivers/mtd/spi-nor/spansion.c: error: 'op' is used uninitialized [-Werror=uninitialized]
In fact, the variable is fully initialized and gcc does not see it being
used, so the warning is entirely bogus. The problem appears to be
a misoptimization in the initialization of single bit fields when the
rest of the bytes are not initialized.
A previous workaround added another initialization, which ended up
shutting up the warning in spansion.c, though it apparently still happens
in other files as reported by Peter Foley in the gcc bugzilla. The
workaround of adding a fake initialization seems particularly bad
because it would set values that can never be correct but prevent the
compiler from warning about actually missing initializations.
Revert the broken workaround and instead pad the structure to only
have bitfields that add up to full bytes, which should avoid this
behavior in all drivers.
I also filed a new bug against gcc with what I found, so this can
hopefully be addressed in future gcc releases. At the moment, only
gcc-12 and gcc-13 are affected.
Cc: Peter Foley <[email protected]>
Cc: Pedro Falcato <[email protected]>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110743
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108402
Link: https://godbolt.org/z/efMMsG1Kx
Fixes: 420c4495b5e56 ("mtd: spi-nor: spansion: make sure local struct does not contain garbage")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Mark Brown <[email protected]>
Acked-by: Tudor Ambarus <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]
|
|
The corgi_lcd_limit_intensity() function is called from platform
and defined in a driver, but the driver does not see the declaration:
drivers/video/backlight/corgi_lcd.c:434:6: error: no previous prototype for 'corgi_lcd_limit_intensity' [-Werror=missing-prototypes]
434 | void corgi_lcd_limit_intensity(int limit)
Move the prototype into a header that can be included from both
sides to shut up the warning.
Reviewed-by: Daniel Thompson <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
On the userspace side fchmodat(3) is implemented as a wrapper
function which implements the POSIX-specified interface. This
interface differs from the underlying kernel system call, which does not
have a flags argument. Most implementations require procfs [1][2].
There doesn't appear to be a good userspace workaround for this issue
but the implementation in the kernel is pretty straight-forward.
The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
unlike existing fchmodat.
[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
[2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Message-Id: <f2a846ef495943c5d101011eebcf01179d0c7b61.1689092120.git.legion@kernel.org>
[brauner: pre reviews, do flag conversion in do_fchmodat() directly]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.
The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence. To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.
To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference. In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.
In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
|
|
Microchip LAN8740/LAN8742 PHYs support basic unicast, broadcast, and
Magic Packet WoL. They have one pattern filter matching up to 128 bytes
of frame data, which can be used to implement ARP or multicast WoL.
ARP WoL matches any ARP frame with broadcast address.
Multicast WoL matches any multicast frame.
Signed-off-by: Tristram Ha <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Fix these htmldoc build warnings:
include/linux/of.h:115: warning: cannot understand function prototype: 'const struct kobj_type of_node_ktype; '
include/linux/of.h:118: warning: Excess function parameter 'phandle_name' description in 'of_node_init'
Reported-by: Stephen Rothwell <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
Fixes: d9194e009efe ("of: dynamic: add lifecycle docbook info to node creation functions")
Signed-off-by: Stephen Rothwell <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
Add support for handling MMIO based devices via platform driver. We need to
make sure that :
1) The APB clock, if present is enabled at probe and via runtime_pm ops
2) Use the ETM4x architecture or CoreSight architecture registers to
identify a device as CoreSight ETM4x, instead of relying a white list of
"Peripheral IDs"
The driver doesn't get to handle the devices yet, until we wire the ACPI
changes to move the devices to be handled via platform driver than the
etm4_amba driver.
Cc: Mathieu Poirier <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Acked-by: Sudeep Holla <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Coresight device pid can be retrieved from its iomem base address, which is
stored in 'struct etm4x_drvdata'. This drops pid argument from etm4_probe()
and 'struct etm4_init_arg'. Instead etm4_check_arch_features() derives the
coresight device pid with a new helper coresight_get_pid(), right before it
is consumed in etm4_hisi_match_pid().
Cc: Mathieu Poirier <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Allow the selftest to call the function on the mock idev, add some tests
to exercise it.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
commit 8af26be06272 ("perf/core: Fix arch_perf_get_page_size()")
left behind this.
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Since commit bd2756811766 ("perf: Rewrite core context handling") the
relationship between perf_event_context and PMUs has changed so that
the error scenario that PERF_PMU_CAP_HETEROGENEOUS_CPUS originally
silenced no longer exists.
Remove the capability to avoid confusion that it actually influences
any perf core behavior and shift down the following capability bits to
fill in the unused space. This change should be a no-op.
Signed-off-by: James Clark <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add PERF_MEM_LVLNUM_NA wherever PERF_MEM_NA is used to set default values.
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Remove unused HAVE_HW_TIME_STAMP feature define (introduced by
commit ac45f602ee3d ("net: infrastructure for hardware time stamping").
Signed-off-by: Peter Seiderer <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In Realtek SoC, the parameter of usb phy is designed to can dynamic
tuning base on port status. Therefore, add a notify callback of phy
driver when usb port status change.
The Realtek phy driver is designed to dynamically adjust disconnection
level and calibrate phy parameters. When the device connected bit changes
and when the disconnected bit changes, do port status change notification:
Check if portstatus is USB_PORT_STAT_CONNECTION and portchange is
USB_PORT_STAT_C_CONNECTION.
1. The device is connected, the driver lowers the disconnection level and
calibrates the phy parameters.
2. The device disconnects, the driver increases the disconnect level and
calibrates the phy parameters.
When controller to notify connect that device is already ready. If we
adjust the disconnection level in notify_connect, the disconnect may have
been triggered at this stage. So we need to change that as early as
possible. The status change of connection is before port reset.
Therefore, we add an api to notify phy the port status changes. In this
stage, the device is not port enable, and it will not trigger
disconnection.
Signed-off-by: Stanley Chang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Both the character and flag are 8-bit values. So switch from unsigned
ints to u8s. The drivers will be cleaned up in the next round.
Signed-off-by: Jiri Slaby (SUSE) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Propagate u8 from the sysrq code further up to serial's
uart_handle_sysrq_char() and friends.
Signed-off-by: Jiri Slaby (SUSE) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Propagate u8 more from the bottom to the interface, so that sysrq
callers (usually drivers) see that u8 is expected.
Signed-off-by: Jiri Slaby (SUSE) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The passed parameter to sysrq handlers is a key (a character). So change
the type from 'int' to 'u8'. Let it specifically be 'u8' for two
reasons:
* unsigned: unsigned values come from the upper layers (devices) and the
tty layer assumes unsigned on most places, and
* 8-bit: as that what's supposed to be one day in all the layers built
on the top of tty. (Currently, we use mostly 'unsigned char' and
somewhere still only 'char'. (But that also translates to the former
thanks to -funsigned-char.))
Signed-off-by: Jiri Slaby (SUSE) <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Jason Wessel <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: Douglas Anderson <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Zqiang <[email protected]>
Acked-by: Thomas Zimmermann <[email protected]> # DRM
Acked-by: WANG Xuerui <[email protected]> # loongarch
Acked-by: Paul E. McKenney <[email protected]>
Acked-by: Daniel Thompson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
vfio_group is not needed for vfio device cdev, so with vfio device cdev
introduced, the vfio_group infrastructures can be compiled out if only
cdev is needed.
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Tested-by: Matthew Rosato <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
This adds ioctl for userspace to bind device cdev fd to iommufd.
VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
control provided by the iommufd. open_device
op is called after bind_iommufd op.
Tested-by: Nicolin Chen <[email protected]>
Tested-by: Matthew Rosato <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
It's common to get a reference to the iommufd context from a given file
descriptor. So adds an API for it. Existing users of this API are compiled
only when IOMMUFD is enabled, so no need to have a stub for the IOMMUFD
disabled case.
Tested-by: Yanting Jiang <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
This adds cdev support for vfio_device. It allows the user to directly
open a vfio device w/o using the legacy container/group interface, as a
prerequisite for supporting new iommu features like nested translation
and etc.
The device fd opened in this manner doesn't have the capability to access
the device as the fops open() doesn't open the device until the successful
VFIO_DEVICE_BIND_IOMMUFD ioctl which will be added in a later patch.
With this patch, devices registered to vfio core would have both the legacy
group and the new device interfaces created.
- group interface : /dev/vfio/$groupID
- device interface: /dev/vfio/devices/vfioX - normal device
("X" is a unique number across vfio devices)
For a given device, the user can identify the matching vfioX by searching
the vfio-dev folder under the sysfs path of the device. Take PCI device
(0000:6a:01.0) as an example, /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfioX
implies the matching vfioX under /dev/vfio/devices/, and vfio-dev/vfioX/dev
contains the major:minor number of the matching /dev/vfio/devices/vfioX.
The user can get device fd by opening the /dev/vfio/devices/vfioX.
The vfio_device cdev logic in this patch:
*) __vfio_register_dev() path ends up doing cdev_device_add() for each
vfio_device if VFIO_DEVICE_CDEV configured.
*) vfio_unregister_group_dev() path does cdev_device_del();
cdev interface does not support noiommu devices, so VFIO only creates the
legacy group interface for the physical devices that do not have IOMMU.
noiommu users should use the legacy group interface.
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Tested-by: Matthew Rosato <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
This prepares for adding DETACH ioctl for emulated VFIO devices.
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Tested-by: Matthew Rosato <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
Previously, the detach routine is only done by the destroy(). And it was
called by vfio_iommufd_emulated_unbind() when the device runs close(), so
all the mappings in iopt were cleaned in that setup, when the call trace
reaches this detach() routine.
Now, there's a need of a detach uAPI, meaning that it does not only need
a new iommufd_access_detach() API, but also requires access->ops->unmap()
call as a cleanup. So add one.
However, leaving that unprotected can introduce some potential of a race
condition during the pin_/unpin_pages() call, where access->ioas->iopt is
getting referenced. So, add an ioas_lock to protect the context of iopt
referencings.
Also, to allow the iommufd_access_unpin_pages() callback to happen via
this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
be affected by the "access->ioas = NULL" trick.
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Tested-by: Matthew Rosato <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
This prepares for adding DETACH ioctl for physical VFIO devices.
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Tested-by: Matthew Rosato <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.
bool vfio_file_enforced_coherent(struct file *file);
void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Tested-by: Matthew Rosato <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
of the cdev device to check the ownership of the other affected devices.
When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
the values returned are IOMMUFD devids rather than group IDs as used when
accessing vfio devices through the conventional vfio group interface.
Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
in this mode if all of the devices affected by the hot-reset are owned by
either virtue of being directly bound to the same iommufd context as the
calling device, or implicitly owned via a shared IOMMU group.
Suggested-by: Jason Gunthorpe <[email protected]>
Suggested-by: Alex Williamson <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
There are drivers that need to search vfio_device within a given dev_set.
e.g. vfio-pci. So add a helper.
vfio_pci_is_device_in_set() now returns -EBUSY in commit a882c16a2b7e
("vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set") where
it was trying to preserve the return of vfio_pci_try_zap_and_vma_lock_cb().
However, it makes more sense to return -ENODEV.
Suggested-by: Alex Williamson <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
This can be used to differentiate whether to report group_id or devid in
the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
cdev path yet, so the vfio_device_cdev_opened() helper always returns false.
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
This is needed by the vfio-pci driver to report affected devices in the
hot-reset for a given device.
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
This adds the helper to check if any device within the given iommu_group
has been bound with the iommufd_ctx. This is helpful for the checking on
device ownership for the devices which have not been bound but cannot be
bound to any other iommufd_ctx as the iommu_group has been bound.
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Yanting Jiang <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
Provide an ability to check if flow steering supports UDP
encapsulation and decapsulation of IPsec ESP packets.
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Strip out all the pre-March 2020 legacy code from phylink now that the
last user of it is gone.
Reviewed-by: Daniel Golle <[email protected]>
Tested-by: Daniel Golle <[email protected]>
Tested-by: Frank Wunderlich <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Right now the regulator helpers expect raw register values for the range
selectors. This is different from the voltage selectors, which are
normalized as bitfield values. This leads to a bit of confusion. Also,
raw values are harder to copy from datasheets or match up with them,
as datasheets will typically have bitfield values.
Make the helpers expect bitfield values, and convert existing users. The
field in regulator_desc is renamed to |linear_range_selectors_bitfield|.
This is intended to cause drivers added in the same merge window and
out-of-tree drivers using the incorrect variable and values to break,
preventing incorrect values being used on actual hardware and potentially
producing magic smoke.
Also include bitops.h explicitly for ffs(), and reorder the header include
statements. While at it, also replace module.h with export.h, since the
only use is EXPORT_SYMBOL_GPL.
Signed-off-by: Chen-Yu Tsai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
load_nls() take a char * parameter, use it to find nls module in list or
construct the module name to load it.
This change make load_nls() take a const parameter, so we don't need do
some cast like this:
ses->local_nls = load_nls((char *)ctx->local_nls->charset);
Suggested-by: Stephen Rothwell <[email protected]>
Signed-off-by: Winston Wen <[email protected]>
Reviewed-by: Paulo Alcantara <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
When filesystem blocksize is less than folio size (either with
mapping_large_folio_support() or with blocksize < pagesize) and when the
folio is uptodate in pagecache, then even a byte write can cause
an entire folio to be written to disk during writeback. This happens
because we currently don't have a mechanism to track per-block dirty
state within struct iomap_folio_state. We currently only track uptodate
state.
This patch implements support for tracking per-block dirty state in
iomap_folio_state->state bitmap. This should help improve the filesystem
write performance and help reduce write amplification.
Performance testing of below fio workload reveals ~16x performance
improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
1. <test_randwrite.fio>
[global]
ioengine=psync
rw=randwrite
overwrite=1
pre_read=1
direct=0
bs=4k
size=1G
dir=./
numjobs=8
fdatasync=1
runtime=60
iodepth=64
group_reporting=1
[fio-run]
2. Also our internal performance team reported that this patch improves
their database workload performance by around ~83% (with XFS on Power)
Reported-by: Aravinda Herle <[email protected]>
Reported-by: Brian Foster <[email protected]>
Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
|
|
Cited commit under 'Fixes' tag introduced new member to struct
net_device without providing description of it - fix it.
Reported-by: Stephen Rothwell <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Simon Horman <[email protected]> # build-tested
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Use the size of the write as a hint for the size of the folio to create.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
|
|
Allow callers of __filemap_get_folio() to specify a preferred folio
order in the FGP flags. This is only honoured in the FGP_CREATE path;
if there is already a folio in the page cache that covers the index,
we will return it, no matter what its order is. No create-around is
attempted; we will only create folios which start at the specified index.
Unmodified callers will continue to allocate order 0 folios.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
|
|
Similarly to gfp_t, define fgf_t as its own type to prevent various
misuses and confusion. Leave the flags as FGP_* for now to reduce the
size of this patch; they will be converted to FGF_* later. Move the
documentation to the definition of the type insted of burying it in the
__filemap_get_folio() documentation.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Kent Overstreet <[email protected]>
|
|
Add a folio wrapper around copy_page_from_iter_atomic().
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
|
|
Trying to restrict the '$'-prefix change to RISC-V caused some fallout,
so let's just treat all those symbols as special.
Fixes: c05780ef3c190 ("module: Ignore RISC-V mapping symbols too")
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Palmer Dabbelt <[email protected]>
Reviewed-by: Masahiro Yamada <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|