Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr) plus some
misc small fixes.
The only core changes are to both bsg and scsi to pass in the device
instead of setting it afterwards as q->queuedata, so no functional
change"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (69 commits)
scsi: aha152x: Use DECLARE_COMPLETION_ONSTACK for non-constant completion
scsi: qla2xxx: Convert comma to semicolon
scsi: qla2xxx: Update version to 10.02.09.300-k
scsi: qla2xxx: Use QP lock to search for bsg
scsi: qla2xxx: Reduce fabric scan duplicate code
scsi: qla2xxx: Fix optrom version displayed in FDMI
scsi: qla2xxx: During vport delete send async logout explicitly
scsi: qla2xxx: Complete command early within lock
scsi: qla2xxx: Fix flash read failure
scsi: qla2xxx: Return ENOBUFS if sg_cnt is more than one for ELS cmds
scsi: qla2xxx: Fix for possible memory corruption
scsi: qla2xxx: validate nvme_local_port correctly
scsi: qla2xxx: Unable to act on RSCN for port online
scsi: ufs: exynos: Add support for Flash Memory Protector (FMP)
scsi: ufs: core: Add UFSHCD_QUIRK_KEYS_IN_PRDT
scsi: ufs: core: Add fill_crypto_prdt variant op
scsi: ufs: core: Add UFSHCD_QUIRK_BROKEN_CRYPTO_ENABLE
scsi: ufs: core: fold ufshcd_clear_keyslot() into its caller
scsi: ufs: core: Add UFSHCD_QUIRK_CUSTOM_CRYPTO_PROFILE
scsi: ufs: mcq: Make .get_hba_mac() optional
...
|
|
Pull rdma updates from Jason Gunthorpe:
"Usual collection of small improvements and fixes:
- Bug fixes and minor improvments in efa, irdma, mlx4, mlx5, rxe,
hf1, qib, ocrdma
- bnxt_re support for MSN, which is a new retransmit logic
- Initial mana support for RC qps
- Use after free bug and cleanups in iwcm
- Reduce resource usage in mlx5 when RDMA verbs features are not used
- New verb to drain shared recieve queues, similar to normal recieve
queues. This is necessary to allow ULPs a clean shutdown. Used in
the iscsi rdma target
- mlx5 support for more than 16 bits of doorbell indexes
- Doorbell moderation support for bnxt_re
- IB multi-plane support for mlx5
- New EFA adaptor PCI IDs
- RDMA_NAME_ASSIGN_TYPE_USER to hint to userspace that it shouldn't
rename the device
- A collection of hns bugs
- Fix long standing bug in bnxt_re with incorrect endian handling of
immediate data"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits)
IB/hfi1: Constify struct flag_table
RDMA/mana_ib: Set correct device into ib
bnxt_re: Fix imm_data endianness
RDMA: Fix netdev tracker in ib_device_set_netdev
RDMA/hns: Fix mbx timing out before CMD execution is completed
RDMA/hns: Fix insufficient extend DB for VFs.
RDMA/hns: Fix undifined behavior caused by invalid max_sge
RDMA/hns: Fix shift-out-bounds when max_inline_data is 0
RDMA/hns: Fix missing pagesize and alignment check in FRMR
RDMA/hns: Fix unmatch exception handling when init eq table fails
RDMA/hns: Fix soft lockup under heavy CEQE load
RDMA/hns: Check atomic wr length
RDMA/ocrdma: Don't inline statistics functions
RDMA/core: Introduce "name_assign_type" for an IB device
RDMA/qib: Fix truncation compilation warnings in qib_verbs.c
RDMA/qib: Fix truncation compilation warnings in qib_init.c
RDMA/efa: Add EFA 0xefa3 PCI ID
RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register
net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
RDMA/mlx5: Add plane index support when querying PTYS registers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
- The iova_bitmap logic for efficiently reporting dirty pages back to
userspace has a few more tricky corner case bugs that have been
resolved and backed with new tests.
The revised version has simpler logic.
- Shared branch with iommu for handle support when doing domain attach.
Handles allow the domain owner to include additional private data on
a per-device basis.
- IO Page Fault Reporting to userspace via iommufd. Page faults can be
generated on fault capable HWPTs when a translation is not present.
Routing them to userspace would allow a VMM to be able to virtualize
them into an emulated vIOMMU. This is the next step to fully enabling
vSVA support.
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (26 commits)
iommufd: Put constants for all the uAPI enums
iommufd: Fix error pointer checking
iommufd: Add check on user response code
iommufd: Remove IOMMUFD_PAGE_RESP_FAILURE
iommufd: Require drivers to supply the cache_invalidate_user ops
iommufd/selftest: Add coverage for IOPF test
iommufd/selftest: Add IOPF support for mock device
iommufd: Associate fault object with iommufd_hw_pgtable
iommufd: Fault-capable hwpt attach/detach/replace
iommufd: Add iommufd fault object
iommufd: Add fault and response message definitions
iommu: Extend domain attach group with handle support
iommu: Add attach handle to struct iopf_group
iommu: Remove sva handle list
iommu: Introduce domain attachment handle
iommufd/iova_bitmap: Remove iterator logic
iommufd/iova_bitmap: Dynamic pinning on iova_bitmap_set()
iommufd/iova_bitmap: Consolidate iova_bitmap_set exit conditionals
iommufd/iova_bitmap: Move initial pinning to iova_bitmap_for_each()
iommufd/iova_bitmap: Cache mapped length in iova_bitmap_map struct
...
|
|
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add support for large folios
- Implement rpcrdma generic device removal notification
- Add client support for attribute delegations
- Use a LAYOUTRETURN during reboot recovery to report layoutstats
and errors
- Improve throughput for random buffered writes
- Add NVMe support to pnfs/blocklayout
Bugfixes:
- Fix rpcrdma_reqs_reset()
- Avoid soft lockups when using UDP
- Fix an nfs/blocklayout premature PR key unregestration
- Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
- Do not extend writes to the entire folio
- Pass explicit offset and count values to tracepoints
- Fix a race to wake up sleeping SUNRPC sync tasks
- Fix gss_status tracepoint output
Cleanups:
- Add missing MODULE_DESCRIPTION() macros
- Add blocklayout / SCSI layout tracepoints
- Remove asm-generic headers from xprtrdma verbs.c
- Remove unused 'struct mnt_fhstatus'
- Other delegation related cleanups
- Other folio related cleanups
- Other pNFS related cleanups
- Other xprtrdma cleanups"
* tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits)
SUNRPC: Fixup gss_status tracepoint error output
SUNRPC: Fix a race to wake a sync task
nfs: split nfs_read_folio
nfs: pass explicit offset/count to trace events
nfs: do not extend writes to the entire folio
nfs/blocklayout: add support for NVMe
nfs: remove nfs_page_length
nfs: remove the unused max_deviceinfo_size field from struct pnfs_layoutdriver_type
nfs: don't reuse partially completed requests in nfs_lock_and_join_requests
nfs: move nfs_wait_on_request to write.c
nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests
nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests
nfs: simplify nfs_folio_find_and_lock_request
nfs: remove nfs_folio_private_request
nfs: remove dead code for the old swap over NFS implementation
NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
nfs: Block on write congestion
nfs: Properly initialize server->writeback
nfs: Drop pointless check from nfs_commit_release_pages()
nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg
...
|
|
PVC, Xe2 and later platforms have 16-wide EUs. We were implicitly
reporting for PVC the number of 16-wide EUs without giving userspace any
hint that they were different than for other platforms. Xe2 and later
also have 16-wide, but in those cases the reported number would
correspond to the 8-wide count.
To avoid confusion and make sure the right number is used by userspace
depending on the platform, add a new item to the topology query and drop
the one that is not available. The new mask reported for both PVC and
Xe2 should now match the numbers reported via hwconfig.
v2: Use a different topo item with EU type in its name to report the
new mask instead of adding the type itself as the item (Matt Roper)
Reviewed-by: Matt Roper <[email protected]>
Acked-by: José Roberto de Souza <[email protected]>
Acked-by: Mateusz Jablonski <[email protected]>
Acked-by: Wenbin Lu <[email protected]>
Acked-by: Effie Yu <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
"Uprobes:
- x86/shstk: Make return uprobe work with shadow stack
- Add uretprobe syscall which speeds up the uretprobe 10-30% faster.
This syscall is automatically used from user-space trampolines
which are generated by the uretprobe. If this syscall is used by
normal user program, it will cause SIGILL. Note that this is
currently only implemented on x86_64.
(This also has two fixes for adjusting the syscall number to avoid
conflict with new *attrat syscalls.)
- uprobes/perf: fix user stack traces in the presence of pending
uretprobe. This corrects the uretprobe's trampoline address in the
stacktrace with correct return address
- selftests/x86: Add a return uprobe with shadow stack test
- selftests/bpf: Add uretprobe syscall related tests.
- test case for register integrity check
- test case with register changing case
- test case for uretprobe syscall without uprobes (expected to fail)
- test case for uretprobe with shadow stack
- selftests/bpf: add test validating uprobe/uretprobe stack traces
- MAINTAINERS: Add uprobes entry. This does not specify the tree but
to clarify who maintains and reviews the uprobes
Kprobes:
- tracing/kprobes: Test case cleanups.
Replace redundant WARN_ON_ONCE() + pr_warn() with WARN_ONCE() and
remove unnecessary code from selftest
- tracing/kprobes: Add symbol counting check when module loads.
This checks the uniqueness of the probed symbol on modules. The
same check has already done for kernel symbols
(This also has a fix for build error with CONFIG_MODULES=n)
Cleanup:
- Add MODULE_DESCRIPTION() macros for fprobe and kprobe examples"
* tag 'probes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
MAINTAINERS: Add uprobes entry
selftests/bpf: Change uretprobe syscall number in uprobe_syscall test
uprobe: Change uretprobe syscall scope and number
tracing/kprobes: Fix build error when find_module() is not available
tracing/kprobes: Add symbol counting check when module loads
selftests/bpf: add test validating uprobe/uretprobe stack traces
perf,uprobes: fix user stack traces in the presence of pending uretprobes
tracing/kprobe: Remove cleanup code unrelated to selftest
tracing/kprobe: Integrate test warnings into WARN_ONCE
selftests/bpf: Add uretprobe shadow stack test
selftests/bpf: Add uretprobe syscall call from user space test
selftests/bpf: Add uretprobe syscall test for regs changes
selftests/bpf: Add uretprobe syscall test for regs integrity
selftests/x86: Add return uprobe shadow stack test
uprobe: Add uretprobe syscall to speed up return probe
uprobe: Wire up uretprobe system call
x86/shstk: Make return uprobe work with shadow stack
samples: kprobes: add missing MODULE_DESCRIPTION() macros
fprobe: add missing MODULE_DESCRIPTION() macro
|
|
Pull drm updates from Dave Airlie:
"There's a lot of stuff in here, amd, i915 and xe have new platform
work, lots of core rework around EDID handling, some new COMPILE_TEST
options, maintainer changes and a lots of other stuff. Summary:
core:
- deprecate DRM data and return 0 date
- connector: Create a set of helpers to help with HDMI support
- Remove driver owner assignments
- Allow more drivers to compile with COMPILE_TEST
- Conversions to drm_edid
- Sprinkle MODULE_DESCRIPTIONS everywhere they are missing
- Remove drm_mm_replace_node
- print: Add a drm prefix to warn level messages too, remove
___drm_dbg, consolidate prefix handling
- New monochrome TV mode variant
ttm:
- improve number of page faults on some platforms
- fix test builds under PREEMPT_RT
- more test coverage
ci:
- Require a more recent version of mesa
- improve farm setup and test generation
dma-buf:
- warn if reserving 0 fence slots
- internal API heap enhancements
fbdev:
- Create memory manager optimized fbdev emulation
panic:
- Allow to select fonts
- improve drm_fb_dma_get_scanout_buffer
- Allow to dump kmsg to the screen
bridge:
- Remove redundant checks on bridge->encoder
- Remove drm_bridge_chain_mode_fixup
- bridge-connector: Plumb in the new HDMI helper
- analogix_dp: Various improvements, handle AUX transfers timeout
- samsung-dsim: Fix timings calculation
- tc358767: Plenty of small fixes, fix no connector attach, fix
clocks
- sii902x: state validation improvements
panels:
- Switch panels from register table initialization to proper code
- Now that the panel code tracks the panel state, remove every ad-hoc
implementation in the panel drivers
- More cleanup of prepare / enable state tracking in drivers
- edp: Drop legacy panel compatibles
- simple-bridge: Switch to devm_drm_bridge_add
- New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0,
BOE nv110wum-l60, IVO t109nw41, WL-355608-A8, PrimeView
PM070WL4, Lincoln Technologies LCD197, Ortustech
COM35H3P70ULC, AUO G104STN01, K&d kd101ne3-40ti
amdgpu:
- DCN 4.0.x support
- GC 12.0 support
- GMC 12.0 support
- SDMA 7.0 support
- MES12 support
- MMHUB 4.1 support
- GFX12 modifier and DCC support
- lots of IP fixes/updates
amdkfd:
- Contiguous VRAM allocations
- GC 12.0 support
- SDMA 7.0 support
- SR-IOV fixes
- KFD GFX ALU exceptions
i915:
- Battlemage Xe2 HPD display enablement
- Panel Replay enabling
- DP AUX-less ALPM/LOBF
- Enable link training failure fallback for DP MST links
- CMRR (Content Match Refresh Rate) enabling
- Increase ADL-S/ADL-P/DG2+ max TMDS bitrate to 6 Gbps
- Enable eDP AUX based HDR backlight
- Support replaying GPU hangs with captured context image
- Automate CCS Mode setting during engine resets
- lots of refactoring
- Support replaying GPU hangs with captured context image
- Increase FLR timeout from 3s to 9s
- Enable w/a 16021333562 for DG2, MTL and ARL [guc]
xe:
- update MAINATINERS
- New uapi adding OA functionality to Xe
- expose l3 bank mask
- fix display detect on ADL-N
- runtime PM Fixes
- Fix silent backmerge issues
- More prep for SR-IOV
- HWmon additions
- per client usage info
- Rework GPU page fault handling
- Drop EXEC_QUEUE_FLAG_BANNED
- Add BMG PCI IDs
- Scheduler fixes and improvements
- Rename xe_exec_queue::compute to xe_exec_queue::lr
- Use ttm_uncached for BO with NEEDS_UC flag
- Rename xe perf layer as xe observation layer
- lots of refactoring
radeon:
- Backlight workaround for iMac
- Silence UBSAN flex array warnings
msm:
- Validate registers XML description against schema in CI
- core/dpu: SM7150 support
- mdp5: Add support for MSM8937
- gpu: Add param for userspace to know if raytracing is supported
- gpu: X185 support (aka gpu in X1 laptop chips)
- gpu: a505 support
ivpu:
- hardware scheduler support
- profiling support
- improvements to the platform support layer
- firmware handling improvements
- clocks/power mgmt improvements
- scheduler/logging improvements
habanalabs:
- Gradual sleep in polling memory macro
- Reduce Gaudi2 MSI-X interrupt count to 128
- Add Gaudi2-D revision support
- Add timestamp to CPLD info
- Gaudi2: Assume hard-reset by firmware upon MC SEI severe error
- Align Gaudi2 interrupt names
- Check for errors after preboot is ready
- Change habanalabs maintainer and git repo path
mgag200:
- refactoring and improvements
- Add BMC output
- enable polling
nouveau:
- add registry command line
v3d:
- perf counters improvements
zynqmp:
- irq and debugfs improvements
atmel-hlcdc:
- Support XLCDC in sam9x7
mipi-dbi:
- Remove mipi_dbi_machine_little_endian
- make SPI bits per word configurable
- support RGB888
- allow pixel formats to be specified in the DT
sun4i:
- Rework the blender setup for DE2
panfrost:
- Enable MT8188 support
vc4:
- Monochrome TV support
exynos:
- fix fallback mode regression
- fix memory leak
- Use drm_edid_duplicate() instead of kmemdup()
etnaviv:
- fix i.MX8MP NPU clock gating
- workaround FE register cdc issues on some cores
- fix DMA sync handling for cached buffers
- fix job timeout handling
- keep TS enabled on MMUv2 cores for improved performance
mediatek:
- Convert to platform remove callback returning void-
- Drop chain_mode_fixup call in mode_valid()
- Fixes the errors of MediaTek display driver found by IGT
- Add display support for the MT8365-EVK board
- Fix bit depth overwritten for mtk_ovl_set bit_depth()
- Fix possible_crtcs calculation
- Fix spurious kfree()
ast:
- refactor mode setting code
stm:
- Add LVDS support
- DSI PHY updates"
* tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel: (2501 commits)
drm/amdgpu/mes12: add missing opcode string
drm/amdgpu/mes11: update opcode strings
Revert "drm/amd/display: Reset freesync config before update new state"
drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB
drm/xe: Drop trace_xe_hw_fence_free
drm/xe/uapi: Rename xe perf layer as xe observation layer
drm/amdgpu: remove exp hw support check for gfx12
drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed
drm/amdgpu: flush all cached ras bad pages to eeprom
drm/amdgpu: select compute ME engines dynamically
drm/amd/display: Allow display DCC for DCN401
drm/amdgpu: select compute ME engines dynamically
drm/amdgpu/job: Replace DRM_INFO/ERROR logging
drm/amdgpu: select compute ME engines dynamically
drm/amd/pm: Ignore initial value in smu response register
drm/amdgpu: Initialize VF partition mode
drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping
MAINTAINERS: fix Xinhui's name
MAINTAINERS: update powerplay and swsmu
drm/qxl: Pin buffer objects for internal mappings
...
|
|
* Fix some typos, incomplete or confusing phrases.
* Split paragraphs where appropriate.
* List the same error code multiple times,
if it has multiple possible causes.
* Bring wording closer to the man page wording,
which has undergone more thorough review
(esp. for LANDLOCK_ACCESS_FS_WRITE_FILE).
* Small semantic clarifications
* Call the ephemeral port range "ephemeral"
* Clarify reasons for EFAULT in landlock_add_rule()
* Clarify @rule_type doc for landlock_add_rule()
This is a collection of small fixes which I collected when preparing the
corresponding man pages [1].
Cc: Alejandro Colomar <[email protected]>
Cc: Konstantin Meskhidze <[email protected]>
Link: https://lore.kernel.org/r/[email protected] [1]
Signed-off-by: Günther Noack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[mic: Add label to link, fix formatting spotted by make htmldocs,
synchronize userspace-api documentation's date]
Signed-off-by: Mickaël Salaün <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- New sensor drivers: gc05a2, gc08a3 and imx283
- New serializer/deserializer drivers: max96714 and max96717
- New JPEG encoder driver: e5010
- Support for Raspberry Pi PiSP Backend (BE) ISP driver
- Old documentation for av7110 driver removed, as a new version was
added as Documentation/userspace-api/media/dvb/legacy*.rst
- atompisp: Linux firmwares are now available, so drop firmware-related
task from TODO and update firmware logic
- The imx258 driver has gained several improvements
- wave5 driver has gained support for HEVC decoding
- em28xx gained support for MyGica UTV3
- av7110 budget-patch driver removed
- Lots of other cleanups, improvements and fixes
* tag 'media/v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (301 commits)
media: raspberrypi: Switch to remove_new
media: uapi: pisp_be_config: Add extra config fields
media: uapi: pisp_be_config: Re-sort pisp_be_tiles_config
media: uapi: pisp_common: Capitalize all macros
media: uapi: pisp_common: Add 32 bpp format test
media: uapi: pisp_be_config: Drop BIT() from uAPI
media: stm32: dcmipp: correct error handling in dcmipp_create_subdevs
media: atomisp: Fix spelling mistakes in sh_css_sp.c
media: atomisp: Fix spelling mistake in ia_css_debug.c
media: atomisp: Fix spelling mistake in hmm_bo.c
media: atomisp: Fix spelling mistake in ia_css_eed1_8.host.c
media: atomisp: Fix spelling mistake in sh_css_internal.h
media: atomisp: Fix spelling mistake "pipline" -> "pipeline"
media: atomisp: Remove unused GPIO related defines and APIs
media: atomisp: Replace COMPILATION_ERROR_IF() by static_assert()
media: atomisp: Clean up unused macros from math_support.h
media: atomisp: csi2-bridge: Add DMI quirk for OV5693 on Xiaomi Mipad2
media: atomisp: Update TODO
media: atomisp: Prefix firmware paths with "intel/ipu/"
media: atomisp: Remove firmware_name module parameter
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"The highlights are new logic behind background block group reclaim,
automatic removal of qgroup after removing a subvolume and new
'rescue=' mount options.
The rest is optimizations, cleanups and refactoring.
User visible features:
- dynamic block group reclaim:
- tunable framework to avoid situations where eager data
allocations prevent creating new metadata chunks due to lack of
unallocated space
- reuse sysfs knob bg_reclaim_threshold (otherwise used only in
zoned mode) for a fixed value threshold
- new on/off sysfs knob "dynamic_reclaim" calculating the value
based on heuristics, aiming to keep spare working space for
relocating chunks but not to needlessly relocate partially
utilized block groups or reclaim newly allocated ones
- stats are exported in sysfs per block group type, files
"reclaim_*"
- this may increase IO load at unexpected times but the corner
case of no allocatable block groups is known to be worse
- automatically remove qgroup of deleted subvolumes:
- adjust qgroup removal conditions, make sure all related
subvolume data are already removed, or return EBUSY, also take
into account setting of sysfs drop_subtree_threshold
- also works in squota mode
- mount option updates: new modes of 'rescue=' that allow to mount
images (read-only) that could have been partially converted by user
space tools
- ignoremetacsums - invalid metadata checksums are ignored
- ignoresuperflags - super block flags that track conversion in
progress (like UUID or checksums)
Core:
- size of struct btrfs_inode is now below 1024 (on a release config),
improved memory packing and other secondary effects
- switch tracking of open inodes from rb-tree to xarray, minor
performance improvement
- reduce number of empty transaction commits when there are no dirty
data/metadata
- memory allocation optimizations (reduced numbers, reordering out of
critical sections)
- extent map structure optimizations and refactoring, more sanity
checks
- more subpage in zoned mode preparations or fixes
- general snapshot code cleanups, improvements and documentation
- tree-checker updates: more file extent ram_bytes fixes, continued
- raid-stripe-tree update (not backward compatible):
- remove extent encoding field from the structure, can be inferred
from other information
- requires btrfs-progs 6.9.1 or newer
- cleanups and refactoring
- error message updates
- error handling improvements
- return type and parameter cleanups and improvements"
* tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (152 commits)
btrfs: fix extent map use-after-free when adding pages to compressed bio
btrfs: fix bitmap leak when loading free space cache on duplicate entry
btrfs: remove the BUG_ON() inside extent_range_clear_dirty_for_io()
btrfs: move extent_range_clear_dirty_for_io() into inode.c
btrfs: enhance compression error messages
btrfs: fix data race when accessing the last_trans field of a root
btrfs: rename the extra_gfp parameter of btrfs_alloc_page_array()
btrfs: remove the extra_gfp parameter from btrfs_alloc_folio_array()
btrfs: introduce new "rescue=ignoresuperflags" mount option
btrfs: introduce new "rescue=ignoremetacsums" mount option
btrfs: output the unrecognized super block flags as hex
btrfs: remove unused Opt enums
btrfs: tree-checker: add extra ram_bytes and disk_num_bytes check
btrfs: fix the ram_bytes assignment for truncated ordered extents
btrfs: make validate_extent_map() catch ram_bytes mismatch
btrfs: ignore incorrect btrfs_file_extent_item::ram_bytes
btrfs: cleanup the bytenr usage inside btrfs_extent_item_to_extent_map()
btrfs: fix typo in error message in btrfs_validate_super()
btrfs: move the direct IO code into its own file
btrfs: pass a btrfs_inode to btrfs_set_prop()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
- New flag DLM_LSFL_SOFTIRQ_SAFE can be set by code using dlm to
indicate callbacks can be run from softirq
- Change md-cluster to set DLM_LSFL_SOFTIRQ_SAFE
- Clean up for previous changes, e.g. unused code and parameters
- Remove custom pre-allocation of rsb structs which is unnecessary with
kmem caches
- Change idr to xarray for lkb structs in use
- Change idr to xarray for rsb structs being recovered
- Change outdated naming related to internal rsb states
- Fix some incorrect add/remove of rsb on scan list
- Use rcu to free rsb structs
* tag 'dlm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: add rcu_barrier before destroy kmem cache
dlm: remove DLM_LSFL_SOFTIRQ from exflags
fs: dlm: remove unused struct 'dlm_processed_nodes'
md-cluster: use DLM_LSFL_SOFTIRQ for dlm_new_lockspace()
dlm: implement LSFL_SOFTIRQ_SAFE
dlm: introduce DLM_LSFL_SOFTIRQ_SAFE
dlm: use LSFL_FS to check for kernel lockspace
dlm: use rcu to avoid an extra rsb struct lookup
dlm: fix add_scan and del_scan usage
dlm: change list and timer names
dlm: move recover idr to xarray datastructure
dlm: move lkb idr to xarray datastructure
dlm: drop own rsb pre allocation mechanism
dlm: remove ls_local_handle from struct dlm_ls
dlm: remove unused parameter in dlm_midcomms_addr
dlm: don't kref_init rsbs created for toss list
dlm: remove scand leftovers
|
|
Pull nfsd updates from Chuck Lever:
"This is a light release containing optimizations, code clean-ups, and
minor bug fixes.
This development cycle focused on work outside of upstream kernel
development:
- Continuing to build upstream CI for NFSD based on kdevops
- Continuing to focus on the quality of NFSD in LTS kernels
- Participation in IETF nfsv4 WG discussions about NFSv4 ACLs,
directory delegation, and NFSv4.2 COPY offload
Notable features for v6.11 that do not come through the NFSD tree
include NFS server-side support for the new pNFS NVMe layout type
[RFC9561]. Functional testing for pNFS block layouts like this one has
been introduced to our kdevops CI harness. Work on improving the
resolution of file attribute time stamps in local filesystems is also
ongoing tree-wide.
As always I am grateful to NFSD contributors, reviewers, testers, and
bug reporters who participated during this cycle"
* tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument
gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey
MAINTAINERS: Add a bugzilla link for NFSD
nfsd: new netlink ops to get/set server pool_mode
sunrpc: refactor pool_mode setting code
nfsd: allow passing in array of thread counts via netlink
nfsd: make nfsd_svc take an array of thread counts
sunrpc: fix up the special handling of sv_nrpools == 1
SUNRPC: Add a trace point in svc_xprt_deferred_close
NFSD: Support write delegations in LAYOUTGET
lockd: Use *-y instead of *-objs in Makefile
NFSD: Fix nfsdcld warning
svcrdma: Handle ADDR_CHANGE CM event properly
svcrdma: Refactor the creation of listener CMA ID
NFSD: remove unused structs 'nfsd3_voidargs'
NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
|
|
When requesting an attestation report a guest is able to specify whether
it wants SNP firmware to sign the report using either a Versioned Chip
Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
Key Derivation Service (KDS) and derived from seeds allocated to
enrolled cloud service providers (CSPs).
For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
them into the system after obtaining them from the KDS. Add a
corresponding userspace interface so to allow the loading of VLEK keys
into the system.
See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.
Reviewed-by: Tom Lendacky <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Not much excitement - a handful of large patchsets (devmem among them)
did not make it in time.
Core & protocols:
- Use local_lock in addition to local_bh_disable() to protect per-CPU
resources in networking, a step closer for local_bh_disable() not
to act as a big lock on PREEMPT_RT
- Use flex array for netdevice priv area, ensure its cache alignment
- Add a sysctl knob to allow user to specify a default rto_min at
socket init time. Bit of a big hammer but multiple companies were
independently carrying such patch downstream so clearly it's useful
- Support scheduling transmission of packets based on CLOCK_TAI
- Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned
off using cpusets
- Support multiple L2TPv3 UDP tunnels using the same 5-tuple address
- Allow configuration of multipath hash seed, to both allow
synchronizing hashing of two routers, and preventing partial
accidental sync
- Improve TCP compliance with RFC 9293 for simultaneous connect()
- Support sending NAT keepalives in IPsec ESP in UDP states.
Userspace IKE daemon had to do this before, but the kernel can
better keep track of it
- Support sending supervision HSR frames with MAC addresses stored in
ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled
- Introduce IPPROTO_SMC for selecting SMC when socket is created
- Allow UDP GSO transmit from devices with no checksum offload
- openvswitch: add packet sampling via psample, separating the
sampled traffic from "upcall" packets sent to user space for
forwarding
- nf_tables: shrink memory consumption for transaction objects
Things we sprinkled into general kernel code:
- Power Sequencing subsystem (used by Qualcomm Bluetooth driver for
QCA6390) [ Already merged separately - Linus ]
- Add IRQ information in sysfs for auxiliary bus
- Introduce guard definition for local_lock
- Add aligned flavor of __cacheline_group_{begin, end}() markings for
grouping fields in structures
BPF:
- Notify user space (via epoll) when a struct_ops object is getting
detached/unregistered
- Add new kfuncs for a generic, open-coded bits iterator
- Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head
- Support resilient split BTF which cuts down on duplication and
makes BTF as compact as possible WRT BTF from modules
- Add support for dumping kfunc prototypes from BTF which enables
both detecting as well as dumping compilable prototypes for kfuncs
- riscv64 BPF JIT improvements in particular to add 12-argument
support for BPF trampolines and to utilize bpf_prog_pack for the
latter
- Add the capability to offload the netfilter flowtable in XDP layer
through kfuncs
Driver API:
- Allow users to configure IRQ tresholds between which automatic IRQ
moderation can choose
- Expand Power Sourcing (PoE) status with power, class and failure
reason. Support setting power limits
- Track additional RSS contexts in the core, make sure configuration
changes don't break them
- Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
ESP data paths
- Support updating firmware on SFP modules
Tests and tooling:
- mptcp: use net/lib.sh to manage netns
- TCP-AO and TCP-MD5: replace debug prints used by tests with
tracepoints
- openvswitch: make test self-contained (don't depend on OvS CLI
tools)
Drivers:
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- increase the max total outstanding PTP TX packets to 4
- add timestamping statistics support
- implement netdev_queue_mgmt_ops
- support new RSS context API
- Intel (100G, ice, idpf):
- implement FEC statistics and dumping signal quality indicators
- support E825C products (with 56Gbps PHYs)
- nVidia/Mellanox:
- support HW-GRO
- mlx4/mlx5: support per-queue statistics via netlink
- obey the max number of EQs setting in sub-functions
- AMD/Solarflare:
- support new RSS context API
- AMD/Pensando:
- ionic: rework fix for doorbell miss to lower overhead and
skip it on new HW
- Wangxun:
- txgbe: support Flow Director perfect filters
- Ethernet NICs consumer, embedded and virtual:
- Add driver for Tehuti Networks TN40xx chips
- Add driver for Meta's internal NIC chips
- Add driver for Ethernet MAC on Airoha EN7581 SoCs
- Add driver for Renesas Ethernet-TSN devices
- Google cloud vNIC:
- flow steering support
- Microsoft vNIC:
- support page sizes other than 4KB on ARM64
- vmware vNIC:
- support latency measurement (update to version 9)
- VirtIO net:
- support for Byte Queue Limits
- support configuring thresholds for automatic IRQ moderation
- support for AF_XDP Rx zero-copy
- Synopsys (stmmac):
- support for STM32MP13 SoC
- let platforms select the right PCS implementation
- TI:
- icssg-prueth: add multicast filtering support
- icssg-prueth: enable PTP timestamping and PPS
- Renesas:
- ravb: improve Rx performance 30-400% by using page pool,
theaded NAPI and timer-based IRQ coalescing
- ravb: add MII support for R-Car V4M
- Cadence (macb):
- macb: add ARP support to Wake-On-LAN
- Cortina:
- use phylib for RX and TX pause configuration
- Ethernet switches:
- nVidia/Mellanox:
- support configuration of multipath hash seed
- report more accurate max MTU
- use page_pool to improve Rx performance
- MediaTek:
- mt7530: add support for bridge port isolation
- Qualcomm:
- qca8k: add support for bridge port isolation
- Microchip:
- lan9371/2: add 100BaseTX PHY support
- NXP:
- vsc73xx: implement VLAN operations
- Ethernet PHYs:
- aquantia: enable support for aqr115c
- aquantia: add support for PHY LEDs
- realtek: add support for rtl8224 2.5Gbps PHY
- xpcs: add memory-mapped device support
- add BroadR-Reach link mode and support in Broadcom's PHY driver
- CAN:
- add document for ISO 15765-2 protocol support
- mcp251xfd: workaround for erratum DS80000789E, use timestamps to
catch when device returns incorrect FIFO status
- WiFi:
- mac80211/cfg80211:
- parse Transmit Power Envelope (TPE) data in mac80211 instead
of in drivers
- improvements for 6 GHz regulatory flexibility
- multi-link improvements
- support multiple radios per wiphy
- remove DEAUTH_NEED_MGD_TX_PREP flag
- Intel (iwlwifi):
- bump FW API to 91 for BZ/SC devices
- report 64-bit radiotap timestamp
- enable P2P low latency by default
- handle Transmit Power Envelope (TPE) advertised by AP
- remove support for older FW for new devices
- fast resume (keeping the device configured)
- mvm: re-enable Multi-Link Operation (MLO)
- aggregation (A-MSDU) optimizations
- MediaTek (mt76):
- mt7925 Multi-Link Operation (MLO) support
- Qualcomm (ath10k):
- LED support for various chipsets
- Qualcomm (ath12k):
- remove unsupported Tx monitor handling
- support channel 2 in 6 GHz band
- support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
- supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
Advertisements (EMA)
- support dynamic VLAN
- add panic handler for resetting the firmware state
- DebugFS support for datapath statistics
- WCN7850: support for Wake on WLAN
- Microchip (wilc1000):
- read MAC address during probe to make it visible to user space
- suspend/resume improvements
- TI (wl18xx):
- support newer firmware versions
- RealTek (rtw89):
- preparation for RTL8852BE-VT support
- Wake on WLAN support for WiFi 6 chips
- 36-bit PCI DMA support
- RealTek (rtlwifi):
- RTL8192DU support
- Broadcom (brcmfmac):
- Management Frame Protection support (to enable WPA3)
- Bluetooth:
- qualcomm: use the power sequencer for QCA6390
- btusb: mediatek: add ISO data transmission functions
- hci_bcm4377: add BCM4388 support
- btintel: add support for BlazarU core
- btintel: add support for Whale Peak2
- btnxpuart: add support for AW693 A1 chipset
- btnxpuart: add support for IW615 chipset
- btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591"
* tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits)
eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering"
tcp: Replace strncpy() with strscpy()
wifi: ath12k: fix build vs old compiler
tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child().
eth: fbnic: Write the TCAM tables used for RSS control and Rx to host
eth: fbnic: Add L2 address programming
eth: fbnic: Add basic Rx handling
eth: fbnic: Add basic Tx handling
eth: fbnic: Add link detection
eth: fbnic: Add initial messaging to notify FW of our presence
eth: fbnic: Implement Rx queue alloc/start/stop/free
eth: fbnic: Implement Tx queue alloc/start/stop/free
eth: fbnic: Allocate a netdevice and napi vectors with queues
eth: fbnic: Add FW communication mechanism
eth: fbnic: Add message parsing for FW messages
eth: fbnic: Add register init to set PCIe/Ethernet device config
eth: fbnic: Allocate core device specific structures and devlink interface
eth: fbnic: Add scaffolding for Meta's NIC driver
PCI: Add Meta Platforms vendor ID
net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
- Intel PT support enhancements & fixes
- Fix leaked SIGTRAP events
- Improve and fix the Intel uncore driver
- Add support for Intel HBM and CXL uncore counters
- Add Intel Lake and Arrow Lake support
- AMD uncore driver fixes
- Make SIGTRAP and __perf_pending_irq() work on RT
- Micro-optimizations
- Misc cleanups and fixes
* tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
perf/x86/intel: Add a distinct name for Granite Rapids
perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake
perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated
perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR
perf: Split __perf_pending_irq() out of perf_pending_irq()
perf: Don't disable preemption in perf_pending_task().
perf: Move swevent_htable::recursion into task_struct.
perf: Shrink the size of the recursion counter.
perf: Enqueue SIGTRAP always via task_work.
task_work: Add TWA_NMI_CURRENT as an additional notify mode.
perf: Move irq_work_queue() where the event is prepared.
perf: Fix event leak upon exec and file release
perf: Fix event leak upon exit
task_work: Introduce task_work_cancel() again
task_work: s/task_work_cancel()/task_work_cancel_func()/
perf/x86/amd/uncore: Fix DF and UMC domain identification
perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable
perf/x86/intel: Support Perfmon MSRs aliasing
perf/x86/intel: Support PERFEVTSEL extension
perf/x86: Add config_mask to represent EVENTSEL bitmask
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"Most of this is part of my ongoing work to clean up the system call
tables. In this bit, all of the newer architectures are converted to
use the machine readable syscall.tbl format instead in place of
complex macros in include/uapi/asm-generic/unistd.h.
This follows an earlier series that fixed various API mismatches and
in turn is used as the base for planned simplifications.
The other two patches are dead code removal and a warning fix"
* tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
vmlinux.lds.h: catch .bss..L* sections into BSS")
fixmap: Remove unused set_fixmap_offset_io()
riscv: convert to generic syscall table
openrisc: convert to generic syscall table
nios2: convert to generic syscall table
loongarch: convert to generic syscall table
hexagon: use new system call table
csky: convert to generic syscall table
arm64: rework compat syscall macros
arm64: generate 64-bit syscall.tbl
arm64: convert unistd_32.h to syscall.tbl format
arc: convert to generic syscall table
clone3: drop __ARCH_WANT_SYS_CLONE3 macro
kbuild: add syscall table generation to scripts/Makefile.asm-headers
kbuild: verify asm-generic header list
loongarch: avoid generating extra header files
um: don't generate asm/bpf_perf_event.h
csky: drop asm/gpio.h wrapper
syscalls: add generic scripts/syscall.tbl
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- Fix bootup lock-ups on Warp1260, Atari TT, and MegaSTe
- Miscellaneous fixes and improvements
- defconfig updates
* tag 'm68k-for-v6.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: cmpxchg: Fix return value for default case in __arch_xchg()
m68k: defconfig: Update defconfigs for v6.10-rc1
m68k: atari: Fix TT bootup freeze / unexpected (SCU) interrupt messages
zorro: Use str_plural() in amiga_zorro_probe()
m68k: emu: Add missing MODULE_DESCRIPTION() macros
m68k: amiga: Turn off Warp1260 interrupts during boot
|
|
The GHCB 2.0 specification defines 2 GHCB request types to allow SNP guests
to send encrypted messages/requests to firmware: SNP Guest Requests and SNP
Extended Guest Requests. These encrypted messages are used for things like
servicing attestation requests issued by the guest. Implementing support for
these is required to be fully GHCB-compliant.
For the most part, KVM only needs to handle forwarding these requests to
firmware (to be issued via the SNP_GUEST_REQUEST firmware command defined
in the SEV-SNP Firmware ABI), and then forwarding the encrypted response to
the guest.
However, in the case of SNP Extended Guest Requests, the host is also
able to provide the certificate data corresponding to the endorsement key
used by firmware to sign attestation report requests. This certificate data
is provided by userspace because:
1) It allows for different keys/key types to be used for each particular
guest with requiring any sort of KVM API to configure the certificate
table in advance on a per-guest basis.
2) It provides additional flexibility with how attestation requests might
be handled during live migration where the certificate data for
source/dest might be different.
3) It allows all synchronization between certificates and firmware/signing
key updates to be handled purely by userspace rather than requiring
some in-kernel mechanism to facilitate it. [1]
To support fetching certificate data from userspace, a new KVM exit type will
be needed to handle fetching the certificate from userspace. An attempt to
define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this
was introduced in v1 of this patchset, but is still being discussed by
community, so for now this patchset only implements a stub version of SNP
Extended Guest Requests that does not provide certificate data, but is still
enough to provide compliance with the GHCB 2.0 spec.
|
|
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Reviewed-by: Liam Merwick <[email protected]>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The explanation for @handled_access_fs and @handled_access_net has
significant overlap and is better explained together.
* Explain the commonalities in structure-level documentation.
* Clarify some wording and break up longer sentences.
* Put emphasis on the word "handled" to make it clearer that "handled"
is a term with special meaning in the context of Landlock.
I'd like to transfer this wording into the man pages as well.
Signed-off-by: Günther Noack <[email protected]>
Cc: Alejandro Colomar <[email protected]>
Cc: Konstantin Meskhidze <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
[mic: Format commit message]
Signed-off-by: Mickaël Salaün <[email protected]>
|
|
KVM x86 misc changes for 6.11
- Add a global struct to consolidate tracking of host values, e.g. EFER, and
move "shadow_phys_bits" into the structure as "maxphyaddr".
- Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
bus frequency, because TDX.
- Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
- Clean up KVM's handling of vendor specific emulation to consistently act on
"compatible with Intel/AMD", versus checking for a specific vendor.
- Misc cleanups
|
|
KVM generic changes for 6.11
- Enable halt poll shrinking by default, as Intel found it to be a clear win.
- Setup empty IRQ routing when creating a VM to avoid having to synchronize
SRCU when creating a split IRQCHIP on x86.
- Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
that arch code can use for hooking both sched_in() and sched_out().
- Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
truncating a bogus value from userspace, e.g. to help userspace detect bugs.
- Mark a vCPU as preempted if and only if it's scheduled out while in the
KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
memory when retrieving guest state during live migration blackout.
- A few minor cleanups
|
|
Pull block updates from Jens Axboe:
- NVMe updates via Keith:
- Device initialization memory leak fixes (Keith)
- More constants defined (Weiwen)
- Target debugfs support (Hannes)
- PCIe subsystem reset enhancements (Keith)
- Queue-depth multipath policy (Redhat and PureStorage)
- Implement get_unique_id (Christoph)
- Authentication error fixes (Gaosheng)
- MD updates via Song
- sync_action fix and refactoring (Yu Kuai)
- Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu
Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li)
- Fix loop detach/open race (Gulam)
- Fix lower control limit for blk-throttle (Yu)
- Add module descriptions to various drivers (Jeff)
- Add support for atomic writes for block devices, and statx reporting
for same. Includes SCSI and NVMe (John, Prasad, Alan)
- Add IO priority information to block trace points (Dongliang)
- Various zone improvements and tweaks (Damien)
- mq-deadline tag reservation improvements (Bart)
- Ignore direct reclaim swap writes in writeback throttling (Baokun)
- Block integrity improvements and fixes (Anuj)
- Add basic support for rust based block drivers. Has a dummy null_blk
variant for now (Andreas)
- Series converting driver settings to queue limits, and cleanups and
fixes related to that (Christoph)
- Cleanup for poking too deeply into the bvec internals, in preparation
for DMA mapping API changes (Christoph)
- Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas,
Ming, Zhu, Damien, Christophe, Chaitanya)
* tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits)
floppy: add missing MODULE_DESCRIPTION() macro
loop: add missing MODULE_DESCRIPTION() macro
ublk_drv: add missing MODULE_DESCRIPTION() macro
xen/blkback: add missing MODULE_DESCRIPTION() macro
block/rnbd: Constify struct kobj_type
block: take offset into account in blk_bvec_map_sg again
block: fix get_max_segment_size() warning
loop: Don't bother validating blocksize
virtio_blk: Don't bother validating blocksize
null_blk: Don't bother validating blocksize
block: Validate logical block size in blk_validate_limits()
virtio_blk: Fix default logical block size fallback
nvmet-auth: fix nvmet_auth hash error handling
nvme: implement ->get_unique_id
block: pass a phys_addr_t to get_max_segment_size
block: add a bvec_phys helper
blk-lib: check for kill signal in ioctl BLKZEROOUT
block: limit the Write Zeroes to manually writing zeroes fallback
block: refacto blkdev_issue_zeroout
block: move read-only and supported checks into (__)blkdev_issue_zeroout
...
|
|
Pull io_uring updates from Jens Axboe:
"Here are the io_uring updates queued up for 6.11.
Nothing major this time around, various minor improvements and
cleanups/fixes. This contains:
- Add bind/listen opcodes. Main motivation is to support direct
descriptors, to avoid needing a regular fd just for doing these two
operations (Gabriel)
- Probe fixes (Gabriel)
- Treat io-wq work flags as atomics. Not fixing a real issue, but may
as well and it silences a KCSAN warning (me)
- Cleanup of rsrc __set_current_state() usage (me)
- Add 64-bit for {m,f}advise operations (me)
- Improve performance of data ring messages (me)
- Fix for ring message overflow posting (Pavel)
- Fix for freezer interaction with TWA_NOTIFY_SIGNAL. Not strictly an
io_uring thing, but since TWA_NOTIFY_SIGNAL was originally added
for faster task_work signaling for io_uring, bundling it with this
pull (Pavel)
- Add Pavel as a co-maintainer
- Various cleanups (me, Thorsten)"
* tag 'for-6.11/io_uring-20240714' of git://git.kernel.dk/linux: (28 commits)
io_uring/net: check socket is valid in io_bind()/io_listen()
kernel: rerun task_work while freezing in get_signal()
io_uring/io-wq: limit retrying worker initialisation
io_uring/napi: Remove unnecessary s64 cast
io_uring/net: cleanup io_recv_finish() bundle handling
io_uring/msg_ring: fix overflow posting
MAINTAINERS: change Pavel Begunkov from io_uring reviewer to maintainer
io_uring/msg_ring: use kmem_cache_free() to free request
io_uring/msg_ring: check for dead submitter task
io_uring/msg_ring: add an alloc cache for io_kiocb entries
io_uring/msg_ring: improve handling of target CQE posting
io_uring: add io_add_aux_cqe() helper
io_uring: add remote task_work execution helper
io_uring/msg_ring: tighten requirement for remote posting
io_uring: Allocate only necessary memory in io_probe
io_uring: Fix probe of disabled operations
io_uring: Introduce IORING_OP_LISTEN
io_uring: Introduce IORING_OP_BIND
net: Split a __sys_listen helper for io_uring
net: Split a __sys_bind helper for io_uring
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
"This contains work to make it possible to derive namespace file
descriptors from pidfd file descriptors.
Right now it is already possible to use a pidfd with setns() to
atomically change multiple namespaces at the same time. In other
words, it is possible to switch to the namespace context of a process
using a pidfd. There is no need to first open namespace file
descriptors via procfs.
The work included here is an extension of these abilities by allowing
to open namespace file descriptors using a pidfd. This means it is now
possible to interact with namespaces without ever touching procfs.
To this end a new set of ioctls() on pidfds is introduced covering all
supported namespace types"
* tag 'vfs-6.11.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
pidfs: allow retrieval of namespace file descriptors
nsfs: add open_namespace()
nsproxy: add helper to go from arbitrary namespace to ns_common
nsproxy: add a cleanup helper for nsproxy
file: add take_fd() cleanup helper
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace-fs updates from Christian Brauner:
"This adds ioctls allowing to translate PIDs between PID namespaces.
The motivating use-case comes from LXCFS which is a tiny fuse
filesystem used to virtualize various aspects of procfs. LXCFS is run
on the host. The files and directories it creates can be bind-mounted
by e.g. a container at startup and mounted over the various procfs
files the container wishes to have virtualized.
When e.g. a read request for uptime is received, LXCFS will receive
the pid of the reader. In order to virtualize the corresponding read,
LXCFS needs to know the pid of the init process of the reader's pid
namespace.
In order to do this, LXCFS first needs to fork() two helper processes.
The first helper process setns() to the readers pid namespace. The
second helper process is needed to create a process that is a proper
member of the pid namespace.
The second helper process then creates a ucred message with ucred.pid
set to 1 and sends it back to LXCFS. The kernel will translate the
ucred.pid field to the corresponding pid number in LXCFS's pid
namespace. This way LXCFS can learn the init pid number of the
reader's pid namespace and can go on to virtualize.
Since these two forks() are costly LXCFS maintains an init pid cache
that caches a given pid for a fixed amount of time. The cache is
pruned during new read requests. However, even with the cache the hit
of the two forks() is singificant when a very large number of
containers are running.
So this adds a simple set of ioctls that let's a caller translate PIDs
from and into a given PID namespace. This significantly improves
performance with a very simple change.
To protect against races pidfds can be used to check whether the
process is still valid"
* tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nsfs: add pid translation ioctls
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount query updates from Christian Brauner:
"This contains work to extend the abilities of listmount() and
statmount() and various fixes and cleanups.
Features:
- Allow iterating through mounts via listmount() from newest to
oldest. This makes it possible for mount(8) to keep iterating the
mount table in reverse order so it gets newest mounts first.
- Relax permissions on listmount() and statmount().
It's not necessary to have capabilities in the initial namespace:
it is sufficient to have capabilities in the owning namespace of
the mount namespace we're located in to list unreachable mounts in
that namespace.
- Extend both listmount() and statmount() to list and stat mounts in
foreign mount namespaces.
Currently the only way to iterate over mount entries in mount
namespaces that aren't in the caller's mount namespace is by
crawling through /proc in order to find /proc/<pid>/mountinfo for
the relevant mount namespace.
This is both very clumsy and hugely inefficient. So extend struct
mnt_id_req with a new member that allows to specify the mount
namespace id of the mount namespace we want to look at.
Luckily internally we already have most of the infrastructure for
this so we just need to expose it to userspace. Give userspace a
way to retrieve the id of a mount namespace via statmount() and
through a new nsfs ioctl() on mount namespace file descriptor.
This comes with appropriate selftests.
- Expose mount options through statmount().
Currently if userspace wants to get mount options for a mount and
with statmount(), they still have to open /proc/<pid>/mountinfo to
parse mount options. Simply the information through statmount()
directly.
Afterwards it's possible to only rely on statmount() and
listmount() to retrieve all and more information than
/proc/<pid>/mountinfo provides.
This comes with appropriate selftests.
Fixes:
- Avoid copying to userspace under the namespace semaphore in
listmount.
Cleanups:
- Simplify the error handling in listmount by relying on our newly
added cleanup infrastructure.
- Refuse invalid mount ids early for both listmount and statmount"
* tag 'vfs-6.11.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: reject invalid last mount id early
fs: refuse mnt id requests with invalid ids early
fs: find rootfs mount of the mount namespace
fs: only copy to userspace on success in listmount()
sefltests: extend the statmount test for mount options
fs: use guard for namespace_sem in statmount()
fs: export mount options via statmount()
fs: rename show_mnt_opts -> show_vfsmnt_opts
selftests: add a test for the foreign mnt ns extensions
fs: add an ioctl to get the mnt ns id from nsfs
fs: Allow statmount() in foreign mount namespace
fs: Allow listmount() in foreign mount namespace
fs: export the mount ns id via statmount
fs: keep an index of current mount namespaces
fs: relax permissions for statmount()
listmount: allow listing in reverse order
fs: relax permissions for listmount()
fs: simplify error handling
fs: don't copy to userspace under namespace semaphore
path: add cleanup helper
|
|
Pull drm fixes from Dave Airlie:
"Oh I screwed up last week's fixes pull, and forgot to send..
Back to work, thanks to Sima for last week, not too many fixes as
expected getting close to release [ sic - Linus ], amdgpu and xe have
a couple each, and then some other misc ones.
amdgpu:
- PSR-SU fix
- Reseved VMID fix
xe:
- Use write-back caching mode for system memory on DGFX
- Do not leak object when finalizing hdcp gsc
bridge:
- adv7511 EDID irq fix
gma500:
- NULL mode fixes.
meson:
- fix resource leak"
* tag 'drm-fixes-2024-07-12' of https://gitlab.freedesktop.org/drm/kernel:
Revert "drm/amd/display: Reset freesync config before update new state"
drm/xe/display/xe_hdcp_gsc: Free arbiter on driver removal
drm/xe: Use write-back caching mode for system memory on DGFX
drm/amdgpu: reject gang submit on reserved VMIDs
drm/gma500: fix null pointer dereference in cdv_intel_lvds_get_modes
drm/gma500: fix null pointer dereference in psb_intel_lvds_get_modes
drm/meson: fix canvas release in bind function
drm/bridge: adv7511: Fix Intermittent EDID failures
|
|
This patch changes how TCA_FLOWER_KEY_ENC_FLAGS is used, so that
it is used with TCA_FLOWER_KEY_FLAGS_* flags, in the same way as
TCA_FLOWER_KEY_FLAGS is currently used.
Where TCA_FLOWER_KEY_FLAGS uses {key,mask}->control.flags, then
TCA_FLOWER_KEY_ENC_FLAGS now uses {key,mask}->enc_control.flags,
therefore {key,mask}->enc_flags is now unused.
As the generic fl_set_key_flags/fl_dump_key_flags() is used with
encap set to true, then fl_{set,dump}_key_enc_flags() is removed.
This breaks unreleased userspace API (net-next since 2024-06-04).
Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]>
Tested-by: Davide Caratti <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Define new TCA_FLOWER_KEY_FLAGS_* flags for use in struct
flow_dissector_key_control, covering the same flags as
currently exposed through TCA_FLOWER_KEY_ENC_FLAGS.
Put the new flags under FLOW_DIS_F_*. The idea is that we can
later, move the existing flags under FLOW_DIS_F_* as well.
The ynl flag names have been taken from the RFC iproute2 patch.
Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]>
Reviewed-by: Donald Hunter <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Redefine the flower control flags as an enum, so they are
included in BTF info.
Make the kernel-side enum a more explicit superset of
TCA_FLOWER_KEY_FLAGS_*, new flags still need to be added to
both enums, but at least the bit position only has to be
defined once.
FLOW_DIS_ENCAPSULATION is never set for mask, so it can't be
exposed to userspace in an unsupported flags mask error message,
so it will be placed one bit position above the last uAPI flag.
Suggested-by: Alexander Lobakin <[email protected]>
Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for for v6.11
There are a lot of changes in here, though the big bulk of things is
cleanups and simplifications of various kinds which are internally
rather than externally visible. A good chunk of those are DT schema
conversions, but there's also a lot of changes in the code.
Highlights:
- Syncing of features between simple-audio-card and the two
audio-graph cards so there is no reason to stick with an older
driver.
- Support for specifying the order of operations for components within
cards to allow quirking for unusual systems.
- New support for Asahi Kasei AK4619, Cirrus Logic CS530x, Everest
Semiconductors ES8311, NXP i.MX95 and LPC32xx, Qualcomm LPASS v2.5
and WCD937x, Realtek RT1318 and RT1320 and Texas Instruments PCM5242.
|
|
Relying on position in the enum makes it subtly harder when doing merge
resolutions or backporting as it is easy to grab a patch and not notice it
is a uAPI change with a differently ordered enum. This may become a bigger
problem in next cycles when iommu_hwpt_invalidate_data_type and other
per-driver enums have patches flowing through different trees.
So lets start including constants for all the uAPI enums to make this
safer.
No functional change.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
Tested-by: Yi Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Introduce a new link mode necessary for 10 MBit single-pair
connection in BroadR-Reach mode on bcm5481x PHY by Broadcom.
This new link mode, 10baseT1BRR, is known as 1BR10 in the Broadcom
terminology. Another link mode to be used is 1BR100 and it is already
present as 100baseT1, because Broadcom's 1BR100 became 100baseT1
(IEEE 802.3bw).
Signed-off-by: Kamil Horák (2N) <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-07-13
1) Support sending NAT keepalives in ESP in UDP states.
Userspace IKE daemon had to do this before, but the
kernel can better keep track of it.
From Eyal Birger.
2) Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
ESP data paths. Currently, IPsec crypto offload is enabled for GRO
code path only. This patchset support UDP encapsulation for the non
GRO path. From Mike Yu.
* tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet
xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet
xfrm: Allow UDP encapsulation in crypto offload control path
xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path
xfrm: support sending NAT keepalives in ESP in UDP states
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
f7ce5eb2cb79 ("bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring()")
20c8ad72eb7f ("eth: bnxt: use the RSS context XArray instead of the local list")
Adjacent changes:
net/ethtool/ioctl.c
503757c80928 ("net: ethtool: Fix RSS setting")
eac9122f0c41 ("net: ethtool: record custom RSS contexts in the XArray")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The need to get ELF build ID reliably is an important aspect when dealing
with profiling and stack trace symbolization, and /proc/<pid>/maps textual
representation doesn't help with this.
To get backing file's ELF build ID, application has to first resolve VMA,
then use it's start/end address range to follow a special
/proc/<pid>/map_files/<start>-<end> symlink to open the ELF file (this is
necessary because backing file might have been removed from the disk or
was already replaced with another binary in the same file path.
Such approach, beyond just adding complexity of having to do a bunch of
extra work, has extra security implications. Because application opens
underlying ELF file and needs read access to its entire contents (as far
as kernel is concerned), kernel puts additional capable() checks on
following /proc/<pid>/map_files/<start>-<end> symlink. And that makes
sense in general.
But in the case of build ID, profiler/symbolizer doesn't need the contents
of ELF file, per se. It's only build ID that is of interest, and ELF
build ID itself doesn't provide any sensitive information.
So this patch adds a way to request backing file's ELF build ID along the
rest of VMA information in the same API. User has control over whether
this piece of information is requested or not by either setting
build_id_size field to zero or non-zero maximum buffer size they provided
through build_id_addr field (which encodes user pointer as __u64 field).
This is a completely optional piece of information, and so has no
performance implications for user cases that don't care about build ID,
while improving performance and simplifying the setup for those
application that do need it.
Kernel already implements build ID fetching, which is used from BPF
subsystem. We are reusing this code here, but plan a follow up changes to
make it work better under more relaxed assumption (compared to what
existing code assumes) of being called from user process context, in which
page faults are allowed. BPF-specific implementation currently bails out
if necessary part of ELF file is not paged in, all due to extra
BPF-specific restrictions (like the need to fetch build ID in restrictive
contexts such as NMI handler).
[[email protected]: fix integer to pointer cast warning in do_procmap_query()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Liam R. Howlett <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
/proc/<pid>/maps file is extremely useful in practice for various tasks
involving figuring out process memory layout, what files are backing any
given memory range, etc. One important class of applications that
absolutely rely on this are profilers/stack symbolizers (perf tool being
one of them). Patterns of use differ, but they generally would fall into
two categories.
In on-demand pattern, a profiler/symbolizer would normally capture stack
trace containing absolute memory addresses of some functions, and would
then use /proc/<pid>/maps file to find corresponding backing ELF files
(normally, only executable VMAs are of interest), file offsets within
them, and then continue from there to get yet more information (ELF
symbols, DWARF information) to get human-readable symbolic information.
This pattern is used by Meta's fleet-wide profiler, as one example.
In preprocessing pattern, application doesn't know the set of addresses of
interest, so it has to fetch all relevant VMAs (again, probably only
executable ones), store or cache them, then proceed with profiling and
stack trace capture. Once done, it would do symbolization based on stored
VMA information. This can happen at much later point in time. This
patterns is used by perf tool, as an example.
In either case, there are both performance and correctness requirement
involved. This address to VMA information translation has to be done as
efficiently as possible, but also not miss any VMA (especially in the case
of loading/unloading shared libraries). In practice, correctness can't be
guaranteed (due to process dying before VMA data can be captured, or
shared library being unloaded, etc), but any effort to maximize the chance
of finding the VMA is appreciated.
Unfortunately, for all the /proc/<pid>/maps file universality and
usefulness, it doesn't fit the above use cases 100%.
First, it's main purpose is to emit all VMAs sequentially, but in practice
captured addresses would fall only into a smaller subset of all process'
VMAs, mainly containing executable text. Yet, library would need to parse
most or all of the contents to find needed VMAs, as there is no way to
skip VMAs that are of no use. Efficient library can do the linear pass
and it is still relatively efficient, but it's definitely an overhead that
can be avoided, if there was a way to do more targeted querying of the
relevant VMA information.
Second, it's a text based interface, which makes its programmatic use from
applications and libraries more cumbersome and inefficient due to the need
to handle text parsing to get necessary pieces of information. The
overhead is actually payed both by kernel, formatting originally binary
VMA data into text, and then by user space application, parsing it back
into binary data for further use.
For the on-demand pattern of usage, described above, another problem when
writing generic stack trace symbolization library is an unfortunate
performance-vs-correctness tradeoff that needs to be made. Library has to
make a decision to either cache parsed contents of /proc/<pid>/maps (after
initial processing) to service future requests (if application requests to
symbolize another set of addresses (for the same process), captured at
some later time, which is typical for periodic/continuous profiling cases)
to avoid higher costs of re-parsing this file. Or it has to choose to
cache the contents in memory to speed up future requests. In the former
case, more memory is used for the cache and there is a risk of getting
stale data if application loads or unloads shared libraries, or otherwise
changed its set of VMAs somehow, e.g., through additional mmap() calls.
In the latter case, it's the performance hit that comes from re-opening
the file and re-parsing its contents all over again.
This patch aims to solve this problem by providing a new API built on top
of /proc/<pid>/maps. It's meant to address both non-selectiveness and
text nature of /proc/<pid>/maps, by giving user more control of what sort
of VMA(s) needs to be queried, and being binary-based interface eliminates
the overhead of text formatting (on kernel side) and parsing (on user
space side).
It's also designed to be extensible and forward/backward compatible by
including required struct size field, which user has to provide. We use
established copy_struct_from_user() approach to handle extensibility.
User has a choice to pick either getting VMA that covers provided address
or -ENOENT if none is found (exact, least surprising, case). Or, with an
extra query flag (PROCMAP_QUERY_COVERING_OR_NEXT_VMA), they can get either
VMA that covers the address (if there is one), or the closest next VMA
(i.e., VMA with the smallest vm_start > addr). The latter allows more
efficient use, but, given it could be a surprising behavior, requires an
explicit opt-in.
There is another query flag that is useful for some use cases.
PROCMAP_QUERY_FILE_BACKED_VMA instructs this API to only return
file-backed VMAs. Combining this with PROCMAP_QUERY_COVERING_OR_NEXT_VMA
makes it possible to efficiently iterate only file-backed VMAs of the
process, which is what profilers/symbolizers are normally interested in.
All the above querying flags can be combined with (also optional) set of
desired VMA permissions flags. This allows to, for example, iterate only
an executable subset of VMAs, which is what preprocessing pattern, used by
perf tool, would benefit from, as the assumption is that captured stack
traces would have addresses of executable code. This saves time by
skipping non-executable VMAs altogether efficienty.
All these querying flags (modifiers) are orthogonal and can be combined in
a semantically meaningful and natural way.
Basing this ioctl()-based API on top of /proc/<pid>/maps's FD makes sense
given it's querying the same set of VMA data. It's also benefitial
because permission checks for /proc/<pid>/maps is performed at open time
once, and the actual data read of text contents of /proc/<pid>/maps is
done without further permission checks. We piggyback on this pattern with
ioctl()-based API as well, as that's a desired property. Both for
performance reasons, but also for security and flexibility reasons.
Allowing application to open an FD for /proc/self/maps without any extra
capabilities, and then passing it to some sort of profiling agent through
Unix-domain socket, would allow such profiling agent to not require some
of the capabilities that are otherwise expected when opening
/proc/<pid>/maps file for *another* process. This is a desirable property
for some more restricted setups.
This new ioctl-based implementation doesn't interfere with seq_file-based
implementation of /proc/<pid>/maps textual interface, and so could be used
together or independently without paying any price for that.
Note also, that fetching VMA name (e.g., backing file path, or special
hard-coded or user-provided names) is optional just like build ID. If
user sets vma_name_size to zero, kernel code won't attempt to retrieve it,
saving resources.
Earlier versions of this patch set were adding per-VMA locking, which is
why we have a code structure that is ready for abstracting mmap_lock vs
vm_lock differences (query_vma_setup(), query_vma_teardown(), and
query_vma_find_by_addr()), but given anon_vma_name() is not yet compatible
with per-VMA locking, initial implementation sticks to using only
mmap_lock for now. It will be easy to add back per-VMA locking once all
the pieces are ready later on. Which is why we keep existing code
structure with setup/teardown/query helper functions.
[[email protected]: improve PROCMAP_QUERY's compat mode handling]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Liam R. Howlett <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
"Here are some small remaining driver fixes for 6.10-final that have
all been in linux-next for a while and resolve reported issues.
Included in here are:
- mei driver fixes (and a spelling fix at the end just to be clean)
- iio driver fixes for reported problems
- fastrpc bugfixes
- nvmem small fixes"
* tag 'char-misc-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mei: vsc: Fix spelling error
mei: vsc: Enhance SPI transfer of IVSC ROM
mei: vsc: Utilize the appropriate byte order swap function
mei: vsc: Prevent timeout error with added delay post-firmware download
mei: vsc: Enhance IVSC chipset stability during warm reboot
nvmem: core: limit cell sysfs permissions to main attribute ones
nvmem: core: only change name to fram for current attribute
nvmem: meson-efuse: Fix return value of nvmem callbacks
nvmem: rmem: Fix return value of rmem_read()
misc: microchip: pci1xxxx: Fix return value of nvmem callbacks
hpet: Support 32-bit userspace
misc: fastrpc: Restrict untrusted app to attach to privileged PD
misc: fastrpc: Fix ownership reassignment of remote heap
misc: fastrpc: Fix memory leak in audio daemon attach operation
misc: fastrpc: Avoid updating PD type for capability request
misc: fastrpc: Copy the complete capability structure to user
misc: fastrpc: Fix DSP capabilities request
iio: light: apds9306: Fix error handing
iio: trigger: Fix condition for own trigger
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.11
1. Add ParaVirt steal time support.
2. Add some VM migration enhancement.
3. Add perf kvm-stat support for loongarch.
|
|
Add a new ioctl KVM_PRE_FAULT_MEMORY in the KVM common code. It iterates on the
memory range and calls the arch-specific function. The implementation is
optional and enabled by a Kconfig symbol.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Rick Edgecombe <[email protected]>
Message-ID: <819322b8f25971f2b9933bfa4506e618508ad782.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Complete the pisp_be_config strcture by adding fields that even if not
written to the HW are relevant to complete the uAPI and put it in par
with the BSP driver.
Fixes: c6c49bac8770 ("media: uapi: Add Raspberry Pi PiSP Back End uAPI")
Signed-off-by: Jacopo Mondi <[email protected]>
Signed-off-by: Sakari Ailus <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
|
|
The order of the members of pisp_be_tiles_config is relevant
as the driver logic assumes 'config' to be at offset 0.
Re-sort the member to match the driver's expectations.
Fixes: c6c49bac8770 ("media: uapi: Add Raspberry Pi PiSP Back End uAPI")
Signed-off-by: Jacopo Mondi <[email protected]>
Acked-by: Naushir Patuck <[email protected]>
Signed-off-by: Sakari Ailus <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
|
|
The macro used to inspect an image format characteristic use a mixture
of capitalized and non-capitalized letters, which is rather unusual for
the Linux kernel style.
Capitalize all identifiers.
Fixes: c6c49bac8770 ("media: uapi: Add Raspberry Pi PiSP Back End uAPI")
Signed-off-by: Jacopo Mondi <[email protected]>
Signed-off-by: Sakari Ailus <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
|
|
Add definition and test for 32-bits image formats to the pisp_common.h
uAPI header.
Fixes: c6c49bac8770 ("media: uapi: Add Raspberry Pi PiSP Back End uAPI")
Signed-off-by: Jacopo Mondi <[email protected]>
Acked-by: David Plowman <[email protected]>
Signed-off-by: Sakari Ailus <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
|
|
The pisp_be_config.h uAPI header file contains a bit-field definition
that uses the BIT() helper macro.
As the BIT() identifier is not defined in userspace, drop it from the
uAPI header.
Fixes: c6c49bac8770 ("media: uapi: Add Raspberry Pi PiSP Back End uAPI")
Signed-off-by: Jacopo Mondi <[email protected]>
Reviewed-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Sakari Ailus <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.11
Most likely the last "new features" pull request for v6.11 with
changes both in stack and in drivers. The big thing is the multiple
radios for wiphy feature which makes it possible to better advertise
radio capabilities to user space. mt76 enabled MLO and iwlwifi
re-enabled MLO, ath12k and rtw89 Wi-Fi 6 devices got WoWLAN support.
Major changes:
cfg80211/mac80211
* remove DEAUTH_NEED_MGD_TX_PREP flag
* multiple radios per wiphy support
mac80211_hwsim
* multi-radio wiphy support
ath12k
* DebugFS support for datapath statistics
* WCN7850: support for WoW (Wake on WLAN)
* WCN7850: device-tree bindings
ath11k
* QCA6390: device-tree bindings
iwlwifi
* mvm: re-enable Multi-Link Operation (MLO)
* aggregation (A-MSDU) optimisations
rtw89
* preparation for RTL8852BE-VT support
* WoWLAN support for WiFi 6 chips
* 36-bit PCI DMA support
mt76
* mt7925 Multi-Link Operation (MLO) support
* tag 'wireless-next-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (204 commits)
wifi: mac80211: fix AP chandef capturing in CSA
wifi: iwlwifi: correctly reference TSO page information
wifi: mt76: mt792x: fix scheduler interference in drv own process
wifi: mt76: mt7925: enabling MLO when the firmware supports it
wifi: mt76: mt7925: remove the unused mt7925_mcu_set_chan_info
wifi: mt76: mt7925: update mt7925_mac_link_bss_add for MLO
wifi: mt76: mt7925: update mt7925_mcu_bss_basic_tlv for MLO
wifi: mt76: mt7925: update mt7925_mcu_set_timing for MLO
wifi: mt76: mt7925: update mt7925_mcu_sta_phy_tlv for MLO
wifi: mt76: mt7925: update mt7925_mcu_sta_rate_ctrl_tlv for MLO
wifi: mt76: mt7925: add mt7925_mcu_sta_eht_mld_tlv for MLO
wifi: mt76: mt7925: update mt7925_mcu_sta_update for MLO
wifi: mt76: mt7925: update mt7925_mcu_add_bss_info for MLO
wifi: mt76: mt7925: update mt7925_mcu_bss_mld_tlv for MLO
wifi: mt76: mt7925: update mt7925_mcu_sta_mld_tlv for MLO
wifi: mt76: mt7925: add mt7925_[assign,unassign]_vif_chanctx
wifi: mt76: add def_wcid to struct mt76_wcid
wifi: mt76: mt7925: report link information in rx status
wifi: mt76: mt7925: update rate index according to link id
wifi: mt76: mt7925: add link handling in the mt7925_ipv6_addr_change
...
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The response code of IOMMUFD_PAGE_RESP_FAILURE was defined to be
equivalent to the "Response Failure" in PCI spec, section 10.4.2.1.
This response code indicates that one or more pages within the
associated request group have encountered or caused an unrecoverable
error. Therefore, this response disables the PRI at the function.
Modern I/O virtualization technologies, like SR-IOV, share PRI among
the assignable device units. Therefore, a response failure on one unit
might cause I/O failure on other units.
Remove this response code so that user space can only respond with
SUCCESS or INVALID. The VMM is recommended to emulate a failure response
as a PRI reset, or PRI disable and changing to a non-PRI domain.
Fixes: c714f15860fc ("iommufd: Add fault and response message definitions")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/act_ct.c
26488172b029 ("net/sched: Fix UAF when resolving a clash")
3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine")
No adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The caching mode for buffer objects with VRAM as a possible
placement was forced to write-combined, regardless of placement.
However, write-combined system memory is expensive to allocate and
even though it is pooled, the pool is expensive to shrink, since
it involves global CPU TLB flushes.
Moreover write-combined system memory from TTM is only reliably
available on x86 and DGFX doesn't have an x86 restriction.
So regardless of the cpu caching mode selected for a bo,
internally use write-back caching mode for system memory on DGFX.
Coherency is maintained, but user-space clients may perceive a
difference in cpu access speeds.
v2:
- Update RB- and Ack tags.
- Rephrase wording in xe_drm.h (Matt Roper)
v3:
- Really rephrase wording.
Signed-off-by: Thomas Hellström <[email protected]>
Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode")
Cc: Pallavi Mishra <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Cc: Effie Yu <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Jose Souza <[email protected]>
Cc: Michal Mrozek <[email protected]>
Cc: <[email protected]> # v6.8+
Acked-by: Matthew Auld <[email protected]>
Acked-by: José Roberto de Souza <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode")
Acked-by: Michal Mrozek <[email protected]>
Acked-by: Effie Yu <[email protected]> #On chat
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 01e0cfc994be484ddcb9e121e353e51d8bb837c0)
Signed-off-by: Lucas De Marchi <[email protected]>
|