Age | Commit message (Collapse) | Author | Files | Lines |
|
Make the logic of searching average fragment list of a given order reusable
by abstracting it out to a differnet function. This will also avoid
code duplication in upcoming patches.
No functional changes.
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/028c11d95b17ce0285f45456709a0ca922df1b83.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Before this patch, the call stack in ext4_run_li_request is as follows:
/*
* nr = no. of BGs we want to fetch (=s_mb_prefetch)
* prefetch_ios = no. of BGs not uptodate after
* ext4_read_block_bitmap_nowait()
*/
next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios);
ext4_mb_prefetch_fini(sb, next_group prefetch_ios);
ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in
range [next_group - prefetch_ios, next_group). This is incorrect since
sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to
incorrectly ignore some of the BGs that might need initialization. This
issue is more notable now with the previous patch enabling "fetching" of
BLOCK_UNINIT BGs which are marked buffer_uptodate by default.
Fix this by passing nr to ext4_mb_prefetch_fini() instead of
prefetch_ios so that it considers the right range of groups.
Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in
ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT
groups that would need buddy initialization.
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/05e648ae04ec5b754207032823e9c1de9a54f87a.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Currently, ext4_mb_prefetch() and ext4_mb_prefetch_fini() skip
BLOCK_UNINIT groups since fetching their bitmaps doesn't need disk IO.
As a consequence, we end not initializing the buddy structures and CR0/1
lists for these BGs, even though it can be done without any disk IO
overhead. Hence, don't skip such BGs during prefetch and prefetch_fini.
This improves the accuracy of CR0/1 allocation as earlier, we could have
essentially empty BLOCK_UNINIT groups being ignored by CR0/1 due to their buddy
not being initialized, leading to slower CR2 allocations. With this patch CR0/1
will be able to discover these groups as well, thus improving performance.
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/dc3130b8daf45ffe63d8a3c1edcf00eb8ba70e1f.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
When we are inside ext4_mb_complex_scan_group() in CR1, we can be sure
that this group has atleast 1 big enough continuous free extent to satisfy
our request because (free / fragments) > goal length.
Hence, instead of wasting time looping over smaller free extents, only
try to consider the free extent if we are sure that it has enough
continuous free space to satisfy goal length. This is particularly
useful when scanning highly fragmented BGs in CR1 as, without this
patch, the allocator might stop scanning early before reaching the big
enough free extent (due to ac_found > mb_max_to_scan) which causes us to
uncessarily trim the request.
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/a5473df4517c53ec940bc9b603ef83a547032a32.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Track number of allocations where the length of blocks allocated is equal to the
length of goal blocks (post normalization). This metric could be useful if
making changes to the allocator logic in the future as it could give us
visibility into how often do we trim our requests.
PS: ac_b_ex.fe_len might get modified due to preallocation efforts and
hence we use ac_f_ex.fe_len instead since we want to compare how much the
allocator was able to actually find.
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/343620e2be8a237239ea2613a7a866ee8607e973.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
This gives better visibility into the number of extents scanned in each
particular CR. For example, this information can be used to see how out
block group scanning logic is performing when the BG is fragmented.
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/55bb6d80f6e22ed2a5a830aa045572bdffc8b1b9.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Convert criteria to be an enum so it easier to maintain and
update the tracefiles to use enum names. This change also makes
it easier to insert new criterias in the future.
There is no functional change in this patch.
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Link: https://lore.kernel.org/r/5d82fd467bdf70ea45bdaef810af3b146013946c.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
ext4_mb_stats & ext4_mb_max_to_scan are never used. We use
sbi->s_mb_stats and sbi->s_mb_max_to_scan instead.
Hence kill these extern declarations.
Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/928b3142062172533b6d1b5a94de94700590fef3.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
There will be changes coming in future patches which will introduce a new
criteria for block allocation. This removes the useless setting of ac_criteria.
AFAIU, this might be only used to differentiate between whether a preallocated
blocks was allocated or was regular allocator called for allocating blocks.
Hence this also adds the debug prints to identify what type of block allocation
was done in ext4_mb_show_ac().
Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/1dbae05617519cb6202f1b299c9d1be3e7cda763.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Function ext4_free_blocks_simple needs count in cluster. Function
ext4_free_blocks accepts count in block. Convert count to cluster
to fix the mismatch.
Signed-off-by: Kemeng Shi <[email protected]>
Cc: [email protected]
Reviewed-by: Ojaswin Mujoo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Function ext4_issue_discard need count in cluster. Pass count_clusters
instead of count to fix the mismatch.
Signed-off-by: Kemeng Shi <[email protected]>
Cc: [email protected]
Reviewed-by: Ojaswin Mujoo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Two cleanups for ext4_mb_new_blocks_simple:
Remove unused parameter handle of ext4_mb_new_blocks_simple.
Move ext4_mb_new_blocks_simple definition before ext4_mb_new_blocks to
remove unnecessary forward declaration of ext4_mb_new_blocks_simple.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tdx updates from Dave Hansen:
- Fix a race window where load_unaligned_zeropad() could cause a fatal
shutdown during TDX private<=>shared conversion
The race has never been observed in practice but might allow
load_unaligned_zeropad() to catch a TDX page in the middle of its
conversion process which would lead to a fatal and unrecoverable
guest shutdown.
- Annotate sites where VM "exit reasons" are reused as hypercall
numbers.
* tag 'x86_tdx_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix enc_status_change_finish_noop()
x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()
x86/mm: Allow guest.enc_status_change_prepare() to fail
x86/tdx: Wrap exit reason with hcall_func()
|
|
When CONFIG_MODULES is disabled for ARCH=um, 'make (bin)deb-pkg' fails
with an error like follows:
cp: cannot create regular file 'debian/linux-image/usr/lib/uml/modules/6.4.0-rc2+/System.map': No such file or directory
Remove the CONFIG_MODULES check completely so ${pdir}/usr/lib/uml/modules
will always be created and modules.builtin.(modinfo) will be installed
under it for ARCH=um.
Fixes: b611daae5efc ("kbuild: deb-pkg: split image and debug objects staging out into functions")
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
Even for a non-modular kernel, the kernel builds modules.builtin and
modules.builtin.modinfo, with information about the built-in modules.
Tools such as initramfs-tools need these files to build a working
initramfs on some systems, such as those requiring firmware.
Now that `make modules_install` works even in non-modular kernels and
installs these files, unconditionally invoke it when building a Debian
package.
Signed-off-by: Josh Triplett <[email protected]>
Reviewed-by: Nicolas Schier <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into clk-microchip
Pull Microchip clk driver updates from Claudiu Beznea:
It contains support for parent_data, parent_hw in AT91 clock drivers
used by SAMA7G5 SoC (e.g. main, master, generic, peripheral, programmable,
system, utmi, slow clocks) and also the update of SAMA7G5 to use
this new support.
* tag 'clk-microchip-6.5-2' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
clk: at91: sama7g5: s/ep_chg_chg_id/ep_chg_id
clk: at91: sama7g5: switch to parent_hw and parent_data
clk: at91: sckc: switch to parent_data/parent_hw
clk: at91: clk-sam9x60-pll: add support for parent_hw
clk: at91: clk-utmi: add support for parent_hw
clk: at91: clk-system: add support for parent_hw
clk: at91: clk-programmable: add support for parent_hw
clk: at91: clk-peripheral: add support for parent_hw
clk: at91: clk-master: add support for parent_hw
clk: at91: clk-generated: add support for parent_hw
clk: at91: clk-main: add support for parent_data/parent_hw
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Dave Hansen:
"Allow CPUs in SGX/HPE Ultraviolet to start using Sub-NUMA clustering
(SNC) mode. SNC has been around outside the UV world for a while but
evidently never worked on UV systems.
SNC is rather notorious for breaking bad assumptions of a 1:1
relationship between physical sockets and NUMA nodes. The UV code was
rather prolific with these assumptions and took quite a bit of
refactoring to remove them"
* tag 'x86_platform_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Update UV[23] platform code for SNC
x86/platform/uv: Remove remaining BUG_ON() and BUG() calls
x86/platform/uv: UV support for sub-NUMA clustering
x86/platform/uv: Helper functions for allocating and freeing conversion tables
x86/platform/uv: When searching for minimums, start at INT_MAX not 99999
x86/platform/uv: Fix printed information in calc_mmioh_map
x86/platform/uv: Introduce helper function uv_pnode_to_socket.
x86/platform/uv: Add platform resolving #defines for misc GAM_MMIOH_REDIRECT*
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq updates from Dave Hansen:
"Add Hyper-V interrupts to /proc/stat"
* tag 'x86_irq_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Add hardcoded hypervisor interrupts to /proc/stat
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:
- Compute the purposeful misalignment of zen_untrain_ret automatically
and assert __x86_return_thunk's alignment so that future changes to
the symbol macros do not accidentally break them.
- Remove CONFIG_X86_FEATURE_NAMES Kconfig option as its existence is
pointless
* tag 'x86_cpu_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retbleed: Add __x86_return_thunk alignment checks
x86/cpu: Remove X86_FEATURE_NAMES
x86/Kconfig: Make X86_FEATURE_NAMES non-configurable in prompt
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 confidential computing update from Borislav Petkov:
- Add support for unaccepted memory as specified in the UEFI spec v2.9.
The gist of it all is that Intel TDX and AMD SEV-SNP confidential
computing guests define the notion of accepting memory before using
it and thus preventing a whole set of attacks against such guests
like memory replay and the like.
There are a couple of strategies of how memory should be accepted -
the current implementation does an on-demand way of accepting.
* tag 'x86_cc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
virt: sevguest: Add CONFIG_CRYPTO dependency
x86/efi: Safely enable unaccepted memory in UEFI
x86/sev: Add SNP-specific unaccepted memory support
x86/sev: Use large PSC requests if applicable
x86/sev: Allow for use of the early boot GHCB for PSC requests
x86/sev: Put PSC struct on the stack in prep for unaccepted memory support
x86/sev: Fix calculation of end address based on number of pages
x86/tdx: Add unaccepted memory support
x86/tdx: Refactor try_accept_one()
x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub
efi/unaccepted: Avoid load_unaligned_zeropad() stepping into unaccepted memory
efi: Add unaccepted memory support
x86/boot/compressed: Handle unaccepted memory
efi/libstub: Implement support for unaccepted memory
efi/x86: Get full memory map in allocate_e820()
mm: Add support for unaccepted memory
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Implement a rename operation in resctrlfs to facilitate handling of
application containers with dynamically changing task lists
- When reading the tasks file, show the tasks' pid which are only in
the current namespace as opposed to showing the pids from the init
namespace too
- Other fixes and improvements
* tag 'x86_cache_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/x86: Documentation for MON group move feature
x86/resctrl: Implement rename op for mon groups
x86/resctrl: Factor rdtgroup lock for multi-file ops
x86/resctrl: Only show tasks' pid in current pid namespace
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build update from Borislav Petkov:
- Remove relocation information from vmlinux as it is not needed by
other tooling and thus a slimmer binary is generated.
This is important for distros who have to distribute vmlinux blobs
with their kernel packages too and that extraneous unnecessary data
bloats them for no good reason
* tag 'x86_build_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Avoid relocation information in final vmlinux
|
|
Prepare input updates for 6.5 merge window.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 instruction alternatives updates from Borislav Petkov:
- Up until now the Fast Short Rep Mov optimizations implied the
presence of the ERMS CPUID flag. AMD decoupled them with a BIOS
setting so decouple that dependency in the kernel code too
- Teach the alternatives machinery to handle relocations
- Make debug_alternative accept flags in order to see only that set of
patching done one is interested in
- Other fixes, cleanups and optimizations to the patching code
* tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternative: PAUSE is not a NOP
x86/alternatives: Add cond_resched() to text_poke_bp_batch()
x86/nospec: Shorten RESET_CALL_DEPTH
x86/alternatives: Add longer 64-bit NOPs
x86/alternatives: Fix section mismatch warnings
x86/alternative: Optimize returns patching
x86/alternative: Complicate optimize_nops() some more
x86/alternative: Rewrite optimize_nops() some
x86/lib/memmove: Decouple ERMS from FSRM
x86/alternative: Support relocations in alternatives
x86/alternative: Make debug-alternative selective
|
|
ext4_free_blocks will retrieve block from bh if block parameter is zero.
Retrieve block before ext4_free_blocks_simple to avoid potentially
passing wrong block to ext4_free_blocks_simple.
Signed-off-by: Kemeng Shi <[email protected]>
Cc: [email protected]
Reviewed-by: Ojaswin Mujoo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- Add initial support for RAS hardware found on AMD server GPUs (MI200).
Those GPUs and CPUs are connected together through the coherent
fabric and the GPU memory controllers report errors through x86's MCA
so EDAC needs to support them. The amd64_edac driver supports now HBM
(High Bandwidth Memory) and thus such heterogeneous memory controller
systems
- Other small cleanups and improvements
* tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
EDAC/amd64: Cache and use GPU node map
EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
EDAC/amd64: Document heterogeneous system enumeration
x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
x86/amd_nb: Re-sort and re-indent PCI defines
x86/amd_nb: Add MI200 PCI IDs
ras/debugfs: Fix error checking for debugfs_create_dir()
x86/MCE: Check a hw error's address to determine proper recovery action
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- amd64_edac: Add support for Zen4 client hardware
- amd64_edac: Remove the version string as it is useless and actively
confusing when looking at backported versions of the driver
- Add a driver for the Nuvoton NPCM memory controller
- A debugfs error checking cleanup
* tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/npcm: Add NPCM memory controller driver
dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller
EDAC/thunderx: Check debugfs file creation retval properly
EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
EDAC/amd64: Remove module version string
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into clk-qcom
Pull Qualcomm clk driver updates from Bjorn Andersson:
This introduces Global Clock Controller for SDX75, LPASS clock
controllers for SC8280XP, video clock controller for SM8350, SM8450 and
SM8550, GPU clock controller for SM8450 and SM8550, RPMH clock support
for SDX75 and IPQ9574 support in APSS IPQ PLL driver.
Support for branch2 clocks with inverted off-bit is introduced and a
couple of fixes to Alpha PLLs handling of TEST_CTL updates.
The handling of active-only clocks in SMD RPM is improved, to ensure
votes are appropriately placed.
SC7180 camera GDSCs are made children of the titan_top GDSC.
A couple of fixes to the display clocks on QCM2290 and shared RCGs in
GCC are marked as such.
SDCC clocks for IPQ6018 and IPQ5332 are corrected to use floor ops, and
network-related resets on IPQ6018 are updated to cover all bits of each
reset.
Crypto clocks are added to IPQ9574 global clock controller, together
with a few cleanups.
Runtime PM is enabeld for SC8280XP GCC and GPUCC, and SM6375 GPUCC.
A few fixes for MSM8974 multi-media clock controller.
Support for some RCG clocks to be automatically controlled by downstream
branches, and added to SM8450 GCC clocks.
Further Kconfig depdenencies are introduce to avoid building Qualcomm
clock drivers on unrelated architectures.
Lastly, related DeviceTree binding updates are made.
The tail of this is not bisectable, due to the missing DeviceTree
binding include files. Rebase at this point in time is not desirable.
* tag 'qcom-clk-for-6.5-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (63 commits)
clk: qcom: gcc-sc8280xp: Add runtime PM
clk: qcom: gpucc-sc8280xp: Add runtime PM
clk: qcom: mmcc-msm8974: fix MDSS_GDSC power flags
clk: qcom: gpucc-sm6375: Enable runtime pm
dt-bindings: clock: sm6375-gpucc: Add VDD_GX
clk: qcom: gcc-sm6115: Add missing PLL config properties
clk: qcom: clk-alpha-pll: Add a way to update some bits of test_ctl(_hi)
clk: qcom: gcc-ipq6018: remove duplicate initializers
clk: qcom: gcc-ipq9574: Enable crypto clocks
dt-bindings: clock: Add crypto clock and reset definitions
clk: qcom: Add lpass audio clock controller driver for SC8280XP
clk: qcom: Add lpass clock controller driver for SC8280XP
dt-bindings: clock: Add LPASS AUDIOCC and reset controller for SC8280XP
dt-bindings: clock: Add LPASSCC and reset controller for SC8280XP
dt-bindings: clock: qcom,mmcc: define clocks/clock-names for MSM8226
clk: qcom: gpucc-sm8550: Add support for graphics clock controller
clk: qcom: Add support for SM8450 GPUCC
clk: qcom: gcc-sm8450: Enable hw_clk_ctrl
clk: qcom: rcg2: Make hw_clk_ctrl toggleable
dt-bindings: clock: qcom: Add SM8550 graphics clock controller
...
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Thomas Gleixner:
"A set of fixes for kexec(), reboot and shutdown issues:
- Ensure that the WBINVD in stop_this_cpu() has been completed before
the control CPU proceedes.
stop_this_cpu() is used for kexec(), reboot and shutdown to park
the APs in a HLT loop.
The control CPU sends an IPI to the APs and waits for their CPU
online bits to be cleared. Once they all are marked "offline" it
proceeds.
But stop_this_cpu() clears the CPU online bit before issuing
WBINVD, which means there is no guarantee that the AP has reached
the HLT loop.
This was reported to cause intermittent reboot/shutdown failures
due to some dubious interaction with the firmware.
This is not only a problem of WBINVD. The code to actually "stop"
the CPU which runs between clearing the online bit and reaching the
HLT loop can cause large enough delays on its own (think
virtualization). That's especially dangerous for kexec() as kexec()
expects that all APs are in a safe state and not executing code
while the boot CPU jumps to the new kernel. There are more issues
vs kexec() which are addressed separately.
Cure this by implementing an explicit synchronization point right
before the AP reaches HLT. This guarantees that the AP has
completed the full stop proceedure.
- Fix the condition for WBINVD in stop_this_cpu().
The WBINVD in stop_this_cpu() is required for ensuring that when
switching to or from memory encryption no dirty data is left in the
cache lines which might cause a write back in the wrong more later.
This checks CPUID directly because the feature bit might have been
cleared due to a command line option.
But that CPUID check accesses leaf 0x8000001f::EAX unconditionally.
Intel CPUs return the content of the highest supported leaf when a
non-existing leaf is read, while AMD CPUs return all zeros for
unsupported leafs.
So the result of the test on Intel CPUs is lottery and on AMD its
just correct by chance.
While harmless it's incorrect and causes the conditional wbinvd()
to be issued where not required, which caused the above issue to be
unearthed.
- Make kexec() robust against AP code execution
Ashok observed triple faults when doing kexec() on a system which
had been booted with "nosmt".
It turned out that the SMT siblings which had been brought up
partially are parked in mwait_play_dead() to enable power savings.
mwait_play_dead() is monitoring the thread flags of the AP's idle
task, which has been chosen as it's unlikely to be written to.
But kexec() can overwrite the previous kernel text and data
including page tables etc. When it overwrites the cache lines
monitored by an AP that AP resumes execution after the MWAIT on
eventually overwritten text, stack and page tables, which obviously
might end up in a triple fault easily.
Make this more robust in several steps:
1) Use an explicit per CPU cache line for monitoring.
2) Write a command to these cache lines to kick APs out of MWAIT
before proceeding with kexec(), shutdown or reboot.
The APs confirm the wakeup by writing status back and then
enter a HLT loop.
3) If the system uses INIT/INIT/STARTUP for AP bringup, park the
APs in INIT state.
HLT is not a guarantee that an AP won't wake up and resume
execution. HLT is woken up by NMI and SMI. SMI puts the CPU
back into HLT (+/- firmware bugs), but NMI is delivered to the
CPU which executes the NMI handler. Same issue as the MWAIT
scenario described above.
Sending an INIT/INIT sequence to the APs puts them into wait
for STARTUP state, which is safe against NMI.
There is still an issue remaining which can't be fixed: #MCE
If the AP sits in HLT and receives a broadcast #MCE it will try to
handle it with the obvious consequences.
INIT/INIT clears CR4.MCE in the AP which will cause a broadcast
#MCE to shut down the machine.
So there is a choice between fire (HLT) and frying pan (INIT).
Frying pan has been chosen as it's at least preventing the NMI
issue.
On systems which are not using INIT/INIT/STARTUP there is not much
which can be done right now, but at least the obvious and easy to
trigger MWAIT issue has been addressed"
* tag 'x86-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Put CPUs into INIT on shutdown if possible
x86/smp: Split sending INIT IPI out into a helper function
x86/smp: Cure kexec() vs. mwait_play_dead() breakage
x86/smp: Use dedicated cache-line for mwait_play_dead()
x86/smp: Remove pointless wmb()s from native_stop_other_cpus()
x86/smp: Dont access non-existing CPUID leaf
x86/smp: Make stop_other_cpus() more robust
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Time, timekeeping and related device driver updates:
Core:
- A set of fixes, cleanups and enhancements to the posix timer code:
- Prevent another possible live lock scenario in the exit() path,
which affects POSIX_CPU_TIMERS_TASK_WORK enabled architectures.
- Fix a loop termination issue which was reported syzcaller/KSAN
in the posix timer ID allocation code.
That triggered a deeper look into the posix-timer code which
unearthed more small issues.
- Add missing READ/WRITE_ONCE() annotations
- Fix or remove completely outdated comments
- Document places which are subtle and completely undocumented.
- Add missing hrtimer modes to the trace event decoder
- Small cleanups and enhancements all over the place
Drivers:
- Rework the Hyper-V clocksource and sched clock setup code
- Remove a deprecated clocksource driver
- Small fixes and enhancements all over the place"
* tag 'timers-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe
dt-bindings: timers: Add Ralink SoCs timer
clocksource/drivers/hyper-v: Rework clocksource and sched clock setup
dt-bindings: timer: brcm,kona-timer: convert to YAML
clocksource/drivers/imx-gpt: Fold <soc/imx/timer.h> into its only user
clk: imx: Drop inclusion of unused header <soc/imx/timer.h>
hrtimer: Add missing sparse annotations to hrtimer locking
clocksource/drivers/imx-gpt: Use only a single name for functions
clocksource/drivers/loongson1: Move PWM timer to clocksource framework
dt-bindings: timer: Add Loongson-1 clocksource
MIPS: Loongson32: Remove deprecated PWM timer clocksource
clocksource/drivers/ingenic-timer: Use pm_sleep_ptr() macro
tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode().
posix-timers: Add sys_ni_posix_timers() prototype
tick/rcu: Fix bogus ratelimit condition
alarmtimer: Remove unnecessary (void *) cast
alarmtimer: Remove unnecessary initialization of variable 'ret'
posix-timers: Refer properly to CONFIG_HIGH_RES_TIMERS
posix-timers: Polish coding style in a few places
posix-timers: Remove pointless comments
...
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP updates from Thomas Gleixner:
"A large update for SMP management:
- Parallel CPU bringup
The reason why people are interested in parallel bringup is to
shorten the (kexec) reboot time of cloud servers to reduce the
downtime of the VM tenants.
The current fully serialized bringup does the following per AP:
1) Prepare callbacks (allocate, intialize, create threads)
2) Kick the AP alive (e.g. INIT/SIPI on x86)
3) Wait for the AP to report alive state
4) Let the AP continue through the atomic bringup
5) Let the AP run the threaded bringup to full online state
There are two significant delays:
#3 The time for an AP to report alive state in start_secondary()
on x86 has been measured in the range between 350us and 3.5ms
depending on vendor and CPU type, BIOS microcode size etc.
#4 The atomic bringup does the microcode update. This has been
measured to take up to ~8ms on the primary threads depending
on the microcode patch size to apply.
On a two socket SKL server with 56 cores (112 threads) the boot CPU
spends on current mainline about 800ms busy waiting for the APs to
come up and apply microcode. That's more than 80% of the actual
onlining procedure.
This can be reduced significantly by splitting the bringup
mechanism into two parts:
1) Run the prepare callbacks and kick the AP alive for each AP
which needs to be brought up.
The APs wake up, do their firmware initialization and run the
low level kernel startup code including microcode loading in
parallel up to the first synchronization point. (#1 and #2
above)
2) Run the rest of the bringup code strictly serialized per CPU
(#3 - #5 above) as it's done today.
Parallelizing that stage of the CPU bringup might be possible
in theory, but it's questionable whether required surgery
would be justified for a pretty small gain.
If the system is large enough the first AP is already waiting at
the first synchronization point when the boot CPU finished the
wake-up of the last AP. That reduces the AP bringup time on that
SKL from ~800ms to ~80ms, i.e. by a factor ~10x.
The actual gain varies wildly depending on the system, CPU,
microcode patch size and other factors. There are some
opportunities to reduce the overhead further, but that needs some
deep surgery in the x86 CPU bringup code.
For now this is only enabled on x86, but the core functionality
obviously works for all SMP capable architectures.
- Enhancements for SMP function call tracing so it is possible to
locate the scheduling and the actual execution points. That allows
to measure IPI delivery time precisely"
* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
trace,smp: Add tracepoints for scheduling remotelly called functions
trace,smp: Add tracepoints around remotelly called functions
MAINTAINERS: Add CPU HOTPLUG entry
x86/smpboot: Fix the parallel bringup decision
x86/realmode: Make stack lock work in trampoline_compat()
x86/smp: Initialize cpu_primary_thread_mask late
cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
x86/smpboot: Support parallel startup of secondary CPUs
x86/smpboot: Implement a bit spinlock to protect the realmode stack
x86/apic: Save the APIC virtual base address
cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
x86/apic: Provide cpu_primary_thread mask
x86/smpboot: Enable split CPU startup
cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
cpu/hotplug: Reset task stack state in _cpu_up()
cpu/hotplug: Remove unused state functions
riscv: Switch to hotplug core state synchronization
parisc: Switch to hotplug core state synchronization
...
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Thomas Gleixner:
"Initialize FPU late.
Right now FPU is initialized very early during boot. There is no real
requirement to do so. The only requirement is to have it done before
alternatives are patched.
That's done in check_bugs() which does way more than what the function
name suggests.
So first rename check_bugs() to arch_cpu_finalize_init() which makes
it clear what this is about.
Move the invocation of arch_cpu_finalize_init() earlier in
start_kernel() as it has to be done before fork_init() which needs to
know the FPU register buffer size.
With those prerequisites the FPU initialization can be moved into
arch_cpu_finalize_init(), which removes it from the early and fragile
part of the x86 bringup"
* tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build
x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
x86/fpu: Mark init functions __init
x86/fpu: Remove cpuinfo argument from init functions
x86/init: Initialize signal frame size late
init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
init: Invoke arch_cpu_finalize_init() earlier
init: Remove check_bugs() leftovers
um/cpu: Switch to arch_cpu_finalize_init()
sparc/cpu: Switch to arch_cpu_finalize_init()
sh/cpu: Switch to arch_cpu_finalize_init()
mips/cpu: Switch to arch_cpu_finalize_init()
m68k/cpu: Switch to arch_cpu_finalize_init()
loongarch/cpu: Switch to arch_cpu_finalize_init()
ia64/cpu: Switch to arch_cpu_finalize_init()
ARM: cpu: Switch to arch_cpu_finalize_init()
x86/cpu: Switch to arch_cpu_finalize_init()
init: Provide arch_cpu_finalize_init()
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Convert the interrupt descriptor storage to a maple tree to
overcome the limitations of the radixtree + fixed size bitmap.
This allows us to handle very large servers with a huge number of
guests without imposing a huge memory overhead on everyone
- Implement optional retriggering of interrupts which utilize the
fasteoi handler to work around a GICv3 architecture issue
Drivers:
- A set of fixes and updates for the Loongson/Loongarch related
drivers
- Workaound for an ASR8601 integration hickup which ends up with CPU
numbering which can't be represented in the GIC implementation
- The usual set of boring fixes and updates all over the place"
* tag 'irq-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
Revert "irqchip/mxs: Include linux/irqchip/mxs.h"
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/stm32-exti: Fix warning on initialized field overwritten
irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map
irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype
irqchip/mxs: Include linux/irqchip/mxs.h
irqchip/clps711x: Remove unused clps711x_intc_init() function
irqchip/mmp: Remove non-DT codepath
irqchip/ftintc010: Mark all function static
irqdomain: Include internals.h for function prototypes
irqchip/loongson-eiointc: Add DT init support
dt-bindings: interrupt-controller: Add Loongson EIOINTC
irqchip/loongson-eiointc: Fix irq affinity setting during resume
irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flag
irqchip/loongson-liointc: Fix IRQ trigger polarity
irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment
irqchip/loongson-pch-pic: Fix initialization of HT vector register
irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
genirq: Allow fasteoi handler to resend interrupts on concurrent handling
genirq: Expand doc for PENDING and REPLAY flags
...
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects update from Thomas Gleixner:
"A single update for debug objects:
- Recheck whether debug objects is enabled before reporting a problem
to avoid spamming the logs with messages which are caused by a
concurrent OOM"
* tag 'core-debugobjects-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Recheck debug_objects_enabled before reporting
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Allow slightly larger IPVS connection table size from Kconfig for
64-bit arch, from Abhijeet Rastogi.
2) Since IPVS connection table might be larger than 2^20 after previous
patch, allow to limit it depending on the available memory.
Moreover, use kvmalloc. From Julian Anastasov.
3) Do not rebuild VLAN header in nft_payload when matching source and
destination MAC address.
4) Remove nested rcu read lock side in ip_set_test(), from Florian Westphal.
5) Allow to update set size, also from Florian.
6) Improve NAT tuple selection when connection is closing,
from Florian Westphal.
7) Support for resetting set element stateful expression, from Phil Sutter.
8) Use NLA_POLICY_MAX to narrow down maximum attribute value in nf_tables,
from Florian Westphal.
* tag 'nf-next-23-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: limit allowed range via nla_policy
netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET
netfilter: snat: evict closing tcp entries on reply tuple collision
netfilter: nf_tables: permit update of set size
netfilter: ipset: remove rcu_read_lock_bh pair from ip_set_test
netfilter: nft_payload: rebuild vlan header when needed
ipvs: dynamically limit the connection hash table
ipvs: increase ip_vs_conn_tab_bits range for 64BIT
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Various cleanups all around (Irvin, Chaitanya, Christophe)
- Better struct packing (Christophe JAILLET)
- Reduce controller error logs for optional commands (Keith)
- Support for >=64KiB block sizes (Daniel Gomez)
- Fabrics fixes and code organization (Max, Chaitanya, Daniel
Wagner)
- bcache updates via Coly:
- Fix a race at init time (Mingzhe Zou)
- Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye)
- use page pinning in the block layer for dio (David)
- convert old block dio code to page pinning (David, Christoph)
- cleanups for pktcdvd (Andy)
- cleanups for rnbd (Guoqing)
- use the unchecked __bio_add_page() for the initial single page
additions (Johannes)
- fix overflows in the Amiga partition handling code (Michael)
- improve mq-deadline zoned device support (Bart)
- keep passthrough requests out of the IO schedulers (Christoph, Ming)
- improve support for flush requests, making them less special to deal
with (Christoph)
- add bdev holder ops and shutdown methods (Christoph)
- fix the name_to_dev_t() situation and use cases (Christoph)
- decouple the block open flags from fmode_t (Christoph)
- ublk updates and cleanups, including adding user copy support (Ming)
- BFQ sanity checking (Bart)
- convert brd from radix to xarray (Pankaj)
- constify various structures (Thomas, Ivan)
- more fine grained persistent reservation ioctl capability checks
(Jingbo)
- misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan,
Jordy, Li, Min, Yu, Zhong, Waiman)
* tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits)
scsi/sg: don't grab scsi host module reference
ext4: Fix warning in blkdev_put()
block: don't return -EINVAL for not found names in devt_from_devname
cdrom: Fix spectre-v1 gadget
block: Improve kernel-doc headers
blk-mq: don't insert passthrough request into sw queue
bsg: make bsg_class a static const structure
ublk: make ublk_chr_class a static const structure
aoe: make aoe_class a static const structure
block/rnbd: make all 'class' structures const
block: fix the exclusive open mask in disk_scan_partitions
block: add overflow checks for Amiga partition support
block: change all __u32 annotations to __be32 in affs_hardblocks.h
block: fix signed int overflow in Amiga partition support
block: add capacity validation in bdev_add_partition()
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
block: disallow Persistent Reservation on partitions
reiserfs: fix blkdev_put() warning from release_journal_dev()
block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()
block: document the holder argument to blkdev_get_by_path
...
|
|
Pull io_uring updates from Jens Axboe:
"Nothing major in this release, just a bunch of cleanups and some
optimizations around networking mostly.
- clean up file request flags handling (Christoph)
- clean up request freeing and CQ locking (Pavel)
- support for using pre-registering the io_uring fd at setup time
(Josh)
- Add support for user allocated ring memory, rather than having the
kernel allocate it. Mostly for packing rings into a huge page (me)
- avoid an unnecessary double retry on receive (me)
- maintain ordering for task_work, which also improves performance
(me)
- misc cleanups/fixes (Pavel, me)"
* tag 'for-6.5/io_uring-2023-06-23' of git://git.kernel.dk/linux: (39 commits)
io_uring: merge conditional unlock flush helpers
io_uring: make io_cq_unlock_post static
io_uring: inline __io_cq_unlock
io_uring: fix acquire/release annotations
io_uring: kill io_cq_unlock()
io_uring: remove IOU_F_TWQ_FORCE_NORMAL
io_uring: don't batch task put on reqs free
io_uring: move io_clean_op()
io_uring: inline io_dismantle_req()
io_uring: remove io_free_req_tw
io_uring: open code io_put_req_find_next
io_uring: add helpers to decode the fixed file file_ptr
io_uring: use io_file_from_index in io_msg_grab_file
io_uring: use io_file_from_index in __io_sync_cancel
io_uring: return REQ_F_ flags from io_file_get_flags
io_uring: remove io_req_ffs_set
io_uring: remove a confusing comment above io_file_get_flags
io_uring: remove the mode variable in io_file_get_flags
io_uring: remove __io_file_supports_nowait
io_uring: wait interruptibly for request completions on exit
...
|
|
Pull splice updates from Jens Axboe:
"This kills off ITER_PIPE to avoid a race between truncate,
iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio
with unpinned/unref'ed pages from an O_DIRECT splice read. This causes
memory corruption.
Instead, we either use (a) filemap_splice_read(), which invokes the
buffered file reading code and splices from the pagecache into the
pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads
into it and then pushes the filled pages into the pipe; or (c) handle
it in filesystem-specific code.
Summary:
- Rename direct_splice_read() to copy_splice_read()
- Simplify the calculations for the number of pages to be reclaimed
in copy_splice_read()
- Turn do_splice_to() into a helper, vfs_splice_read(), so that it
can be used by overlayfs and coda to perform the checks on the
lower fs
- Make vfs_splice_read() jump to copy_splice_read() to handle
direct-I/O and DAX
- Provide shmem with its own splice_read to handle non-existent pages
in the pagecache. We don't want a ->read_folio() as we don't want
to populate holes, but filemap_get_pages() requires it
- Provide overlayfs with its own splice_read to call down to a lower
layer as overlayfs doesn't provide ->read_folio()
- Provide coda with its own splice_read to call down to a lower layer
as coda doesn't provide ->read_folio()
- Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs
and random files as they just copy to the output buffer and don't
splice pages
- Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3,
ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation
- Make cifs use filemap_splice_read()
- Replace pointers to generic_file_splice_read() with pointers to
filemap_splice_read() as DIO and DAX are handled in the caller;
filesystems can still provide their own alternate ->splice_read()
op
- Remove generic_file_splice_read()
- Remove ITER_PIPE and its paraphernalia as generic_file_splice_read
was the only user"
* tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits)
splice: kdoc for filemap_splice_read() and copy_splice_read()
iov_iter: Kill ITER_PIPE
splice: Remove generic_file_splice_read()
splice: Use filemap_splice_read() instead of generic_file_splice_read()
cifs: Use filemap_splice_read()
trace: Convert trace/seq to use copy_splice_read()
zonefs: Provide a splice-read wrapper
xfs: Provide a splice-read wrapper
orangefs: Provide a splice-read wrapper
ocfs2: Provide a splice-read wrapper
ntfs3: Provide a splice-read wrapper
nfs: Provide a splice-read wrapper
f2fs: Provide a splice-read wrapper
ext4: Provide a splice-read wrapper
ecryptfs: Provide a splice-read wrapper
ceph: Provide a splice-read wrapper
afs: Provide a splice-read wrapper
9p: Add splice_read wrapper
net: Make sock_splice_read() use copy_splice_read() by default
tty, proc, kernfs, random: Use copy_splice_read()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Mainly core changes, refactoring and optimizations.
Performance is improved in some areas, overall there may be a
cumulative improvement due to refactoring that removed lookups in the
IO path or simplified IO submission tracking.
Core:
- submit IO synchronously for fast checksums (crc32c and xxhash),
remove high priority worker kthread
- read extent buffer in one go, simplify IO tracking, bio submission
and locking
- remove additional tracking of redirtied extent buffers, originally
added for zoned mode but actually not needed
- track ordered extent pointer in bio to avoid rbtree lookups during
IO
- scrub, use recovered data stripes as cache to avoid unnecessary
read
- in zoned mode, optimize logical to physical mappings of extents
- remove PageError handling, not set by VFS nor writeback
- cleanups, refactoring, better structure packing
- lots of error handling improvements
- more assertions, lockdep annotations
- print assertion failure with the exact line where it happens
- tracepoint updates
- more debugging prints
Performance:
- speedup in fsync(), better tracking of inode logged status can
avoid transaction commit
- IO path structures track logical offsets in data structures and
does not need to look it up
User visible changes:
- don't commit transaction for every created subvolume, this can
reduce time when many subvolumes are created in a batch
- print affected files when relocation fails
- trigger orphan file cleanup during START_SYNC ioctl
Notable fixes:
- fix crash when disabling quota and relocation
- fix crashes when removing roots from drity list
- fix transacion abort during relocation when converting from newer
profiles not covered by fallback
- in zoned mode, stop reclaiming block groups if filesystem becomes
read-only
- fix rare race condition in tree mod log rewind that can miss some
btree node slots
- with enabled fsverity, drop up-to-date page bit in case the
verification fails"
* tag 'for-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (194 commits)
btrfs: fix race between quota disable and relocation
btrfs: add comment to struct btrfs_fs_info::dirty_cowonly_roots
btrfs: fix race when deleting free space root from the dirty cow roots list
btrfs: fix race when deleting quota root from the dirty cow roots list
btrfs: tracepoints: also show actual number of the outstanding extents
btrfs: update i_version in update_dev_time
btrfs: make btrfs_compressed_bioset static
btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profile
btrfs: scrub: remove btrfs_fs_info::scrub_wr_completion_workers
btrfs: scrub: remove scrub_ctx::csum_list member
btrfs: do not BUG_ON after failure to migrate space during truncation
btrfs: do not BUG_ON on failure to get dir index for new snapshot
btrfs: send: do not BUG_ON() on unexpected symlink data extent
btrfs: do not BUG_ON() when dropping inode items from log root
btrfs: replace BUG_ON() at split_item() with proper error handling
btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr()
btrfs: do not BUG_ON() on tree mod log failures at insert_ptr()
btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()
btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
btrfs: abort transaction at update_ref_for_cow() when ref count is zero
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs updates from Damien Le Moal:
- Modify the synchronous direct write path to use iomap instead of
manually coding issuing zone append write BIOs (me)
- Use the FMODE_CAN_ODIRECT file flag to indicate support from direct
IO instead of using the old way with noop direct_io methods
(Christoph)
* tag 'zonefs-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
zonefs: use iomap for synchronous direct writes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"No outstanding new feature for this cycle.
Most of these commits are decompression cleanups which are part of the
ongoing development for subpage/folio compression support as well as
xattr cleanups for the upcoming xattr bloom filter optimization [1].
In addition, there are bugfixes to address some corner cases of
compressed images due to global data de-duplication and arm64 16k
pages.
Summary:
- Fix rare I/O hang on deduplicated compressed images due to loop
hooked chains
- Fix compact compression layout of 16k blocks on arm64 devices
- Fix atomic context detection of async decompression
- Decompression/Xattr code cleanups"
Link: https://lore.kernel.org/r/[email protected] [1]
* tag 'erofs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: clean up zmap.c
erofs: remove unnecessary goto
erofs: Fix detection of atomic context
erofs: use separate xattr parsers for listxattr/getxattr
erofs: unify inline/shared xattr iterators for listxattr/getxattr
erofs: make the size of read data stored in buffer_ofs
erofs: unify xattr_iter structures
erofs: use absolute position in xattr iterator
erofs: fix compact 4B support for 16k block size
erofs: convert erofs_read_metabuf() to erofs_bread() for xattr
erofs: use poison pointer to replace the hard-coded address
erofs: use struct lockref to replace handcrafted approach
erofs: adapt managed inode operations into folios
erofs: kill hooked chains to avoid loops on deduplicated compressed images
erofs: avoid on-stack pagepool directly passed by arguments
erofs: allocate extra bvec pages directly instead of retrying
erofs: clean up z_erofs_pcluster_readmore()
erofs: remove the member readahead from struct z_erofs_decompress_frontend
erofs: fold in z_erofs_decompress()
|
|
- Convert platform_device .remove() callbacks to return void instead of a
mostly useless int (Uwe Kleine-König)
* pci/controller/remove-void-callbacks:
PCI: xgene-msi: Convert to platform remove callback returning void
PCI: tegra: Convert to platform remove callback returning void
PCI: rockchip-host: Convert to platform remove callback returning void
PCI: mvebu: Convert to platform remove callback returning void
PCI: mt7621: Convert to platform remove callback returning void
PCI: mediatek-gen3: Convert to platform remove callback returning void
PCI: mediatek: Convert to platform remove callback returning void
PCI: iproc: Convert to platform remove callback returning void
PCI: hisi-error: Convert to platform remove callback returning void
PCI: dwc: Convert to platform remove callback returning void
PCI: j721e: Convert to platform remove callback returning void
PCI: brcmstb: Convert to platform remove callback returning void
PCI: altera-msi: Convert to platform remove callback returning void
PCI: altera: Convert to platform remove callback returning void
PCI: aardvark: Convert to platform remove callback returning void
|
|
- Change "PCI Endpoint Virtual NTB driver" Kconfig prompt to be different
from "PCI Endpoint NTB driver" (Shunsuke Mie)
- Automatically create a function specific attributes group for endpoint
drivers to avoid reference counting issues (Damien Le Moal)
- Move and unexport pci_epf_type_add_cfs() (Damien Le Moal)
- Reinitialize EPF test DMA transfer completion before submitting it to
avoid losing the completion notification (Damien Le Moal)
- Fix EPF test DMA transfer completion detection (Damien Le Moal)
- Submit EPF test DMA transfers with dmaengine_submit(), not tx_submit()
(Damien Le Moal)
- Simplify EPF test read/write/copy functions (Damien Le Moal)
- Simplify EPF test "raise IRQ" interface (Damien Le Moal)
- Simplify EPF test IRQ command execution (Damien Le Moal)
- Improve EPF test command/status register handling (Damien Le Moal)
- Free IRQs before removing device (Damien Le Moal)
- Reinitialize IRQ completions for every test (Damien Le Moal)
- Don't write status in IRQ handler to avoid race (Damien Le Moal)
- Fix dma_chan direction in data transfer test (Yoshihiro Shimoda)
- Return pci_epf_type_add_cfs() error if EPF has no driver (Damien Le Moal)
- Add kernel-doc for pci_epc_raise_irq() and pci_epc_map_msi_irq() MSI
vector parameters (Manivannan Sadhasivam)
- Pass EPF device ID to driver probe functions (Manivannan Sadhasivam)
- Return -EALREADY if EPC has already been started/stopped (Manivannan
Sadhasivam)
- Add linkdown notifier support and use it in qcom-ep (Manivannan
Sadhasivam)
- Add Bus Master Enable event support and use it in qcom-ep (Manivannan
Sadhasivam)
- Add Qualcomm Modem Host Interface (MHI) endpoint driver (Manivannan
Sadhasivam)
- Add Layerscape PME interrupt handling to manage link-up notification
(Frank Li)
* pci/controller/endpoint:
PCI: layerscape: Add the endpoint linkup notifier support
PCI: endpoint: pci-epf-vntb: Fix typo in comments
MAINTAINERS: Add PCI MHI endpoint function driver under MHI bus
PCI: endpoint: Add PCI Endpoint function driver for MHI bus
PCI: qcom-ep: Add support for BME notification
PCI: qcom-ep: Add support for Link down notification
PCI: endpoint: Add BME notifier support
PCI: endpoint: Add linkdown notifier support
PCI: endpoint: Return error if EPC is started/stopped multiple times
PCI: endpoint: Pass EPF device ID to the probe function
PCI: endpoint: Add missing documentation about the MSI/MSI-X range
PCI: endpoint: Improve pci_epf_type_add_cfs()
PCI: endpoint: functions/pci-epf-test: Fix dma_chan direction
misc: pci_endpoint_test: Simplify pci_endpoint_test_msi_irq()
misc: pci_endpoint_test: Do not write status in IRQ handler
misc: pci_endpoint_test: Re-init completion for every test
misc: pci_endpoint_test: Free IRQs before removing the device
PCI: epf-test: Simplify transfers result print
PCI: epf-test: Simplify DMA support checks
PCI: epf-test: Cleanup request result handling
PCI: epf-test: Cleanup pci_epf_test_cmd_handler()
PCI: epf-test: Improve handling of command and status registers
PCI: epf-test: Simplify IRQ test commands execution
PCI: epf-test: Simplify pci_epf_test_raise_irq()
PCI: epf-test: Simplify read/write/copy test functions
PCI: epf-test: Use dmaengine_submit() to initiate DMA transfer
PCI: epf-test: Fix DMA transfer completion detection
PCI: epf-test: Fix DMA transfer completion initialization
PCI: endpoint: Move pci_epf_type_add_cfs() code
PCI: endpoint: Automatically create a function specific attributes group
PCI: endpoint: Fix a Kconfig prompt of vNTB driver
|
|
- Reset VMD config register between soft reboots (Nirmal Patel)
- Capture pci_reset_bus() return value instead of printing junk when it
fails (Xinghui Li)
* pci/controller/vmd:
PCI: vmd: Fix uninitialized variable usage in vmd_enable_domain()
PCI: vmd: Reset VMD config register between soft reboots
|
|
- Remove writes to unused registers (Rick Wertenbroek)
- Write endpoint Device ID using correct register (Rick Wertenbroek)
- Assert PCI Configuration Enable bit after probe so endpoint responds
instead of generating Request Retry Status messages (Rick Wertenbroek)
- Poll waiting for PHY PLLs to lock (Rick Wertenbroek)
- Update RK3399 example DT binding to be valid (Rick Wertenbroek)
- Use RK3399 PCIE_CLIENT_LEGACY_INT_CTRL to generate INTx instead of
manually generating PCIe message (Rick Wertenbroek)
- Use multiple windows to avoid address translation conflicts (Rick
Wertenbroek)
- Use u32 (not u16) when accessing 32-bit registers (Rick Wertenbroek)
- Hide MSI-X Capability, since RK3399 can't generate MSI-X (Rick
Wertenbroek)
- Set endpoint controller required alignment to 256 (Damien Le Moal)
* pci/controller/rockchip:
PCI: rockchip: Set address alignment for endpoint mode
PCI: rockchip: Don't advertise MSI-X in PCIe capabilities
PCI: rockchip: Use u32 variable to access 32-bit registers
PCI: rockchip: Fix window mapping and address translation for endpoint
PCI: rockchip: Fix legacy IRQ generation for RK3399 PCIe endpoint core
dt-bindings: PCI: Update the RK3399 example to a valid one
PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked
PCI: rockchip: Assert PCI Configuration Enable bit after probe
PCI: rockchip: Write PCI Device ID to correct register
PCI: rockchip: Remove writes to unused registers
|
|
- Remove unused static pcie_base and pcie_dev (Geert Uytterhoeven)
* pci/controller/rcar:
PCI: rcar: Use correct product family name for Renesas R-Car
PCI: rcar-host: Remove unused static pcie_base and pcie_dev
|
|
- Disable register write access after init for IP v2.3.3, v2.9.0
(Manivannan Sadhasivam)
- Use DWC helpers for enabling/disabling writes to DBI registers
(Manivannan Sadhasivam)
- Hide slot hotplug capability for IP v1.0.0, v1.9.0, v2.1.0, v2.3.2,
v2.3.3, v2.7.0, v2.9.0 (Manivannan Sadhasivam)
- Reuse v2.3.2 post-init sequence for v2.4.0 (Manivannan Sadhasivam)
-
* pci/controller/qcom:
PCI: qcom: Do not advertise hotplug capability for IP v2.1.0
PCI: qcom: Do not advertise hotplug capability for IP v1.0.0
PCI: qcom: Use post init sequence of IP v2.3.2 for v2.4.0
PCI: qcom: Do not advertise hotplug capability for IP v2.3.2
PCI: qcom: Do not advertise hotplug capability for IPs v2.3.3 and v2.9.0
PCI: qcom: Do not advertise hotplug capability for IPs v2.7.0 and v1.9.0
PCI: qcom: Disable write access to read only registers for IP v2.9.0
PCI: qcom: Use DWC helpers for modifying the read-only DBI registers
PCI: qcom: Disable write access to read only registers for IP v2.3.3
|
|
- Release clock resources on error paths (Junyan Ye)
* pci/pci/ftpci100:
PCI: ftpci100: Release the clock resources
|
|
- Wait for link to come up only if we've initiated link training (Ajay
Agarwal)
- Save and restore imx6 Root Port MSI control to work around hardware
defect (Richard Zhu)
* pci/controller/dwc:
PCI: imx6: Save and restore root port MSI control in suspend and resume
PCI: dwc: Wait for link up only if link is started
|
|
- Wait for link retrain to complete when working around the J721E i2085
erratum with Gen2 mode (Siddharth Vadapalli)
* pci/controller/cadence:
PCI: cadence: Fix Gen2 Link Retraining process
|