Age | Commit message (Collapse) | Author | Files | Lines |
|
The xpt_flags field frequently changes between the time that
svc_xprt_ready() grabs a copy and execution flow arrives at the
tracepoint at the tail of svc_xprt_enqueue(). In fact, there's
usually a sleep/wake-up in there, so those flags are almost
guaranteed to be different.
It would be more useful to record the exact flags that were used to
decide whether the transport is ready, so move the tracepoint.
Moving it means the tracepoint can't pick up the waker's pid. That
can be added to struct svc_rqst if it turns out that is important.
Signed-off-by: Chuck Lever <[email protected]>
|
|
In addition to the benefits of using an enum rather than a set of
macros, we now have a named type that can improve static type
checking of function return values.
As part of this change, I removed a stale comment from svcauth.h;
the return values from current implementations of the
auth_ops::release method are all zero/negative errno, not the SVC_OK
enum values as the old comment suggested.
Suggested-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
When a sequence of numbers are needed for internal-use only, an enum is
typically best. The sequence will inevitably need to be changed one
day, and having an enum means the developer doesn't need to think about
renumbering after insertion or deletion. Such patches will be easier
to review.
Suggested-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in
add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support
tracking KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with
UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the
GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
architectures to take GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency
improvements ("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation,
from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code
("Two minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
most file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
memmap on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock
migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table
range API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM
subsystem documentation ("Improve mm documentation").
* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
maple_tree: shrink struct maple_tree
maple_tree: clean up mas_wr_append()
secretmem: convert page_is_secretmem() to folio_is_secretmem()
nios2: fix flush_dcache_page() for usage from irq context
hugetlb: add documentation for vma_kernel_pagesize()
mm: add orphaned kernel-doc to the rst files.
mm: fix clean_record_shared_mapping_range kernel-doc
mm: fix get_mctgt_type() kernel-doc
mm: fix kernel-doc warning from tlb_flush_rmaps()
mm: remove enum page_entry_size
mm: allow ->huge_fault() to be called without the mmap_lock held
mm: move PMD_ORDER to pgtable.h
mm: remove checks for pte_index
memcg: remove duplication detection for mem_cgroup_uncharge_swap
mm/huge_memory: work on folio->swap instead of page->private when splitting folio
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
mm/swap: use dedicated entry for swap in folio
mm/swap: stop using page->private on tail pages for THP_SWAP
selftests/mm: fix WARNING comparing pointer to 0
selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations
- Store netdevs in an xarray, to simplify iterating over netdevs
- Refactor nexthop selection for multipath routes
- Improve sched class lifetime handling
- Add backup nexthop ID support for bridge
- Implement drop reasons support in openvswitch
- Several data races annotations and fixes
- Constify the sk parameter of routing functions
- Prepend kernel version to netconsole message
Protocols:
- Implement support for TCP probing the peer being under memory
pressure
- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes
- In-kernel support for the TLS alert protocol
- Better support for UDP reuseport with connected sockets
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size
- Get rid of additional ancillary per MPTCP connection struct socket
- Implement support for BPF-based MPTCP packet schedulers
- Format MPTCP subtests selftests results in TAP
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation
BPF:
- Multi-buffer support in AF_XDP
- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds
- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it
- Add SO_REUSEPORT support for TC bpf_sk_assign
- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64
- Support defragmenting IPv(4|6) packets in BPF
- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling
- Introduce bpf map element count and enable it for all program types
- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress
- Add uprobe support for the bpf_get_func_ip helper
- Check skb ownership against full socket
- Support for up to 12 arguments in BPF trampoline
- Extend link_info for kprobe_multi and perf_event links
Netfilter:
- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types
Driver API:
- Page pool optimizations, to improve data locality and cache usage
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers
- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info
- Extend and use the yaml devlink specs to [re]generate the split ops
- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes
- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool
- Remove phylink legacy mode support
- Support offload LED blinking to phy
- Add devlink port function attributes for IPsec
New hardware / drivers:
- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy
- WiFi:
- MediaTek mt7981 support
- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers:
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering"
* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"There's been quite a lot of generic activity here, but more
administrative than featuers. We also have a bunch of new drivers,
including one that's part of a MFD so we pulled in the core parts of
that:
- Lots of work from both Yang Yingliang and Andy Shevchenko on moving
to host/device/controller based terminology for devices.
- QuadSPI SPI support for Allwinner sun6i.
- New device support Cirrus Logic CS43L43, Longsoon, Qualcomm GENI
QuPv3 and StarFive JH7110 QSPI"
* tag 'spi-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (151 commits)
spi: at91-usart: Use PTR_ERR_OR_ZERO() to simplify code
spi: spi-sn-f-ospi: switch to use modern name
spi: sifive: switch to use modern name
spi: sh: switch to use modern name
spi: sh-sci: switch to use modern name
spi: sh-msiof: switch to use modern name
spi: sh-hspi: switch to use modern name
spi: sc18is602: switch to use modern name
spi: s3c64xx: switch to use modern name
spi: rzv2m-csi: switch to use devm_spi_alloc_host()
spi: rspi: switch to use spi_alloc_host()
spi: rockchip: switch to use modern name
spi: rockchip-sfc: switch to use modern name
spi: realtek-rtl: switch to use devm_spi_alloc_host()
spi: rb4xx: switch to use modern name
spi: qup: switch to use modern name
spi: spi-qcom-qspi: switch to use modern name
spi: pxa2xx: switch to use modern name
spi: ppc4xx: switch to use modern name
spi: spl022: switch to use modern name
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"No new features, the bulk of the changes are fixes, refactoring and
cleanups. The notable fix is the scrub performance restoration after
rewrite in 6.4, though still only partial.
Fixes:
- scrub performance drop due to rewrite in 6.4 partially restored:
- do IO grouping by blg_plug/blk_unplug again
- avoid unnecessary tree searches when processing stripes, in
extent and checksum trees
- the drop is noticeable on fast PCIe devices, -66% and restored
to -33% of the original
- backports to 6.4 planned
- handle more corner cases of transaction commit during orphan
cleanup or delayed ref processing
- use correct fsid/metadata_uuid when validating super block
- copy directory permissions and time when creating a stub subvolume
Core:
- debugging feature integrity checker deprecated, to be removed in
6.7
- in zoned mode, zones are activated just before the write, making
error handling easier, now the overcommit mechanism can be enabled
again which improves performance by avoiding more frequent flushing
- v0 extent handling completely removed, deprecated long time ago
- error handling improvements
- tests:
- extent buffer bitmap tests
- pinned extent splitting tests
- cleanups and refactoring:
- compression writeback
- extent buffer bitmap
- space flushing, ENOSPC handling"
* tag 'for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (110 commits)
btrfs: zoned: skip splitting and logical rewriting on pre-alloc write
btrfs: tests: test invalid splitting when skipping pinned drop extent_map
btrfs: tests: add a test for btrfs_add_extent_mapping
btrfs: tests: add extent_map tests for dropping with odd layouts
btrfs: scrub: move write back of repaired sectors to scrub_stripe_read_repair_worker()
btrfs: scrub: don't go ordered workqueue for dev-replace
btrfs: scrub: fix grouping of read IO
btrfs: scrub: avoid unnecessary csum tree search preparing stripes
btrfs: scrub: avoid unnecessary extent tree search preparing stripes
btrfs: copy dir permission and time when creating a stub subvolume
btrfs: remove pointless empty list check when reading delayed dir indexes
btrfs: drop redundant check to use fs_devices::metadata_uuid
btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super
btrfs: use the correct superblock to compare fsid in btrfs_validate_super
btrfs: simplify memcpy either of metadata_uuid or fsid
btrfs: add a helper to read the superblock metadata_uuid
btrfs: remove v0 extent handling
btrfs: output extra debug info if we failed to find an inline backref
btrfs: move the !zoned assert into run_delalloc_cow
btrfs: consolidate the error handling in run_delalloc_nocow
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, a xattr bloom filter feature is introduced to speed up
negative xattr lookups, which was originally suggested by Alexander
for Composefs use cases.
Additionally, the DEFLATE algorithm is now supported, which can be
used together with hardware accelerators for our cloud workloads. Each
supported compression algorithm can be selected on a per-file basis
for specific access patterns too.
There are also some random fixes and cleanups as usual:
- Support xattr bloom filter to optimize negative xattr lookups
- Support DEFLATE compression algorithm as an alternative
- Fix a regression that ztailpacking pclusters don't release properly
- Avoid warning dedupe and fragments features anymore
- Some folio conversions and cleanups"
* tag 'erofs-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: release ztailpacking pclusters properly
erofs: don't warn dedupe and fragments features anymore
erofs: adapt folios for z_erofs_read_folio()
erofs: adapt folios for z_erofs_readahead()
erofs: get rid of fe->backmost for cache decompression
erofs: drop z_erofs_page_mark_eio()
erofs: tidy up z_erofs_do_read_page()
erofs: move preparation logic into z_erofs_pcluster_begin()
erofs: avoid obsolete {collector,collection} terms
erofs: simplify z_erofs_read_fragment()
erofs: remove redundant erofs_fs_type declaration in super.c
erofs: add necessary kmem_cache_create flags for erofs inode cache
erofs: clean up redundant comment and adjust code alignment
erofs: refine warning messages for zdata I/Os
erofs: boost negative xattr lookup with bloom filter
erofs: update on-disk format for xattr name filter
erofs: DEFLATE compression support
|
|
It's a straight-forward conversion and no logic changes (except that
it renames the corresponding tracepoint.)
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The v0 extent item has been deprecated for a long time, and we don't have
any report from the community either.
So it's time to remove the v0 extent specific error handling, and just
treat them as regular extent tree corruption.
This patch would remove the btrfs_print_v0_err() helper, and enhance the
involved error handling to treat them just as any extent tree
corruption. No reports regarding v0 extents have been seen since the
graceful handling was added in 2018.
This involves:
- btrfs_backref_add_tree_node()
This change is a little tricky, the new code is changed to only handle
BTRFS_TREE_BLOCK_REF_KEY and BTRFS_SHARED_BLOCK_REF_KEY.
But this is safe, as we have rejected any unknown inline refs through
btrfs_get_extent_inline_ref_type().
For keyed backrefs, we're safe to skip anything we don't know (that's
if it can pass tree-checker in the first place).
- btrfs_lookup_extent_info()
- lookup_inline_extent_backref()
- run_delayed_extent_op()
- __btrfs_free_extent()
- add_tree_block()
Regular error handling of unexpected extent tree item, and abort
transaction (if we have a trans handle).
- remove_extent_data_ref()
It's pretty much the same as the regular rejection of unknown backref
key.
But for this particular case, we can also remove a BUG_ON().
- extent_data_ref_count()
We can remove the BTRFS_EXTENT_REF_V0_KEY BUG_ON(), as it would be
rejected by the only caller.
- btrfs_print_leaf()
Remove the handling for BTRFS_EXTENT_REF_V0_KEY.
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
After commit 6bfd0133bee2 ("btrfs: raid56: switch scrub path to use a
single function"), the raid56 implementation no longer uses different
endio functions for RMW/recover/scrub.
All read operations end in submit_read_wait_bio_list(), while all write
operations end in submit_write_bios(). This means quite some trace
events are out-of-date and no longer utilized.
This patch would unify the trace events into just two:
- trace_raid56_read()
Replaces trace_raid56_read_partial(), trace_raid56_scrub_read() and
trace_raid56_scrub_read_recover().
- trace_raid56_write()
Replaces trace_raid56_write_stripe() and
trace_raid56_scrub_write_stripe().
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Merge series from Charles Keepax <[email protected]>:
This patch chain adds support for the Cirrus Logic cs42l43 PC focused
SoundWire CODEC.
|
|
This is enabled only with radix translation and 1G hugepage size. This
will be used with devdax device memory with a namespace alignment of 1G.
Anon transparent hugepage is not supported even though we do have helpers
checking pud_trans_huge(). We should never find that return true. The
only expected pte bit combination is _PAGE_PTE | _PAGE_DEVMAP.
Some of the helpers are never expected to get called on hash translation
and hence is marked to call BUG() in such a case.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
A follow-up patch will add a pud variant for this same event. Using event
class makes that addition simpler.
No functional change in this patch.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
We need the char/misc fixes in here as well to build on top of.
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joel/fsi into char-misc-next
Joen writes:
FSI changes for v6.6
* New drivers for the I2C Responder master and SCOM device
* Misc janitor fixes
* tag 'fsi-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/fsi:
fsi: fix some spelling mistakes in comment
fsi: master-ast-cf: Add MODULE_FIRMWARE macro
docs: ABI: fix spelling/grammar in SBEFIFO timeout interface
fsi: Add I2C Responder SCOM driver
fsi: Add IBM I2C Responder virtual FSI master
dt-bindings: fsi: Document the IBM I2C Responder virtual FSI master
fsi: Lock mutex for master device registration
fsi: Improve master indexing
fsi: core: Switch to ida_alloc/free
fsi: core: Fix legacy minor numbering
fsi: core: Add trace events for scan and unregister
fsi: aspeed: Reset master errors after CFAM reset
fsi: sbefifo: Remove limits on user-specified read timeout
fsi: sbefifo: Add configurable in-command timeout
fsi: sbefifo: Don't check status during probe
fsi: Use of_match_table for bus matching if specified
fsi: Add aliased device numbering
fsi: Move fsi_slave structure definition to header
fsi: Use of_property_read_reg() to parse "reg"
fsi: Explicitly include correct DT includes
|
|
The I2C Responder (I2CR) is an I2C device that translates I2C commands
to CFAM or SCOM operations, effectively implementing an FSI master and
bus.
Signed-off-by: Eddie James <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joel Stanley <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/intel/igc/igc_main.c
06b412589eef ("igc: Add lock to safeguard global Qbv variables")
d3750076d464 ("igc: Add TransmissionOverrun counter")
drivers/net/ethernet/microsoft/mana/mana_en.c
a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive")
a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h")
92272ec4107e ("eth: add missing xdp.h includes in drivers")
net/mptcp/protocol.h
511b90e39250 ("mptcp: fix disconnect vs accept race")
b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning")
tools/testing/selftests/net/mptcp/mptcp_join.sh
c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test")
03668c65d153 ("selftests: mptcp: join: rework detailed report")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
I currently debug nfs plock handling and introduce those two tracepoints
for getting more information about what is happening there if the user
space reads plock operations from kernel and writing the result back.
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: David Teigland <[email protected]>
|
|
Before this code is copied, add the missing family, as we did in
commit 3dd344ea84e1 ("net: tracepoint: exposing sk_family in all tcp:tracepoints")
Fixes: 15fcdf6ae116 ("tcp: Add tracepoint for tcp_set_ca_state")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Ping Gan <[email protected]>
Cc: Manjusaka <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add more trace events for the scanning and unregistration
functions for debug purposes.
Signed-off-by: Eddie James <[email protected]>
Reviewed-by: Joel Stanley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joel Stanley <[email protected]>
|
|
Split types and pure function declarations from page_pool.h
and add them in page_page/types.h, so that C sources can
include page_pool.h and headers should generally only include
page_pool/types.h as suggested by jakub.
Rename page_pool.h to page_pool/helpers.h to have both in
one place.
Signed-off-by: Yunsheng Lin <[email protected]>
Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[Jakub: change microsoft/mana, fix kdoc paths in Documentation]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2023-08-03
We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 84 files changed, 4026 insertions(+), 562 deletions(-).
The main changes are:
1) Add SO_REUSEPORT support for TC bpf_sk_assign from Lorenz Bauer,
Daniel Borkmann
2) Support new insns from cpu v4 from Yonghong Song
3) Non-atomically allocate freelist during prefill from YiFei Zhu
4) Support defragmenting IPv(4|6) packets in BPF from Daniel Xu
5) Add tracepoint to xdp attaching failure from Leon Hwang
6) struct netdev_rx_queue and xdp.h reshuffling to reduce
rebuild time from Jakub Kicinski
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
net: invert the netdevice.h vs xdp.h dependency
net: move struct netdev_rx_queue out of netdevice.h
eth: add missing xdp.h includes in drivers
selftests/bpf: Add testcase for xdp attaching failure tracepoint
bpf, xdp: Add tracepoint to xdp attaching failure
selftests/bpf: fix static assert compilation issue for test_cls_*.c
bpf: fix bpf_probe_read_kernel prototype mismatch
riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework
libbpf: fix typos in Makefile
tracing: bpf: use struct trace_entry in struct syscall_tp_t
bpf, devmap: Remove unused dtab field from bpf_dtab_netdev
bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry
netfilter: bpf: Only define get_proto_defrag_hook() if necessary
bpf: Fix an array-index-out-of-bounds issue in disasm.c
net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn
docs/bpf: Fix malformed documentation
bpf: selftests: Add defrag selftests
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Support not connecting client socket
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
xdp.h is far more specific and is included in only 67 other
files vs netdevice.h's 1538 include sites.
Make xdp.h include netdevice.h, instead of the other way around.
This decreases the incremental allmodconfig builds size when
xdp.h is touched from 5947 to 662 objects.
Move bpf_prog_run_xdp() to xdp.h, seems appropriate and filter.h
is a mega-header in its own right so it's nice to avoid xdp.h
getting included there as well.
The only unfortunate part is that the typedef for xdp_features_t
has to move to netdevice.h, since its embedded in struct netdevice.
Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
When error happens in dev_xdp_attach(), it should have a way to tell
users the error message like the netlink approach.
To avoid breaking uapi, adding a tracepoint in bpf_xdp_link_attach() is
an appropriate way to notify users the error message.
Hence, bpf libraries are able to retrieve the error message by this
tracepoint, and then report the error message to users.
Signed-off-by: Leon Hwang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add a new config option that controls building the buffer_head code, and
select it from all file systems and stacking drivers that need it.
For the block device nodes and alternative iomap based buffered I/O path
is provided when buffer_head support is not enabled, and iomap needs a
a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call
into the buffer_head code when it doesn't exist.
Otherwise this is just Kconfig and ifdef changes.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add observability for the new TLS Alert infrastructure.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Link: https://lore.kernel.org/r/169047947409.5241.14548832149596892717.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There's several things here that will really help my CI.
|
|
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fix
- spelling typos
- capitalization of acronyms
in the comments.
While at it, fix the multi-line comment style.
Signed-off-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost
the same, so merge them into journal_shrink_one_cp_list(), remove the
nr_to_scan parameter, always scan and try to free the whole checkpoint
list.
Signed-off-by: Zhang Yi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and wireguard.
Current release - regressions:
- nvme-tcp: fix comma-related oops after sendpage changes
Current release - new code bugs:
- ptp: make max_phase_adjustment sysfs device attribute invisible
when not supported
Previous releases - regressions:
- sctp: fix potential deadlock on &net->sctp.addr_wq_lock
- mptcp:
- ensure subflow is unhashed before cleaning the backlog
- do not rely on implicit state check in mptcp_listen()
Previous releases - always broken:
- net: fix net_dev_start_xmit trace event vs skb_transport_offset()
- Bluetooth:
- fix use-bdaddr-property quirk
- L2CAP: fix multiple UaFs
- ISO: use hci_sync for setting CIG parameters
- hci_event: fix Set CIG Parameters error status handling
- hci_event: fix parsing of CIS Established Event
- MGMT: fix marking SCAN_RSP as not connectable
- wireguard: queuing: use saner cpu selection wrapping
- sched: act_ipt: various bug fixes for iptables <> TC interactions
- sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
- dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
- eth: sfc: fix null-deref in devlink port without MAE access
- eth: ibmvnic: do not reset dql stats on NON_FATAL err
Misc:
- xsk: honor SO_BINDTODEVICE on bind"
* tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
nfp: clean mc addresses in application firmware when closing port
selftests: mptcp: pm_nl_ctl: fix 32-bit support
selftests: mptcp: depend on SYN_COOKIES
selftests: mptcp: userspace_pm: report errors with 'remove' tests
selftests: mptcp: userspace_pm: use correct server port
selftests: mptcp: sockopt: return error if wrong mark
selftests: mptcp: sockopt: use 'iptables-legacy' if available
selftests: mptcp: connect: fail if nft supposed to work
mptcp: do not rely on implicit state check in mptcp_listen()
mptcp: ensure subflow is unhashed before cleaning the backlog
s390/qeth: Fix vipa deletion
octeontx-af: fix hardware timestamp configuration
net: dsa: sja1105: always enable the send_meta options
net: dsa: tag_sja1105: fix MAC DA patching from meta frames
net: Replace strlcpy with strscpy
pptp: Fix fib lookup calls.
mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check
net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
xsk: Honor SO_BINDTODEVICE on bind
ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we've mainly investigated the zoned block device
support along with patches such as correcting write pointers between
f2fs and storage, adding asynchronous zone reset flow, and managing
the number of open zones.
Other than them, f2fs adds another mount option, "errors=x" to specify
how to handle when it detects an unexpected behavior at runtime.
Enhancements:
- support 'errors=remount-ro|continue|panic' mount option
- enforce some inode flag policies
- allow .tmp compression given extensions
- add some ioctls to manage the f2fs compression
- improve looped node chain flow
- avoid issuing small-sized discard commands during checkpoint
- implement an asynchronous zone reset
Bug fixes:
- fix deadlock in xattr and inode page lock
- fix and add sanity check in some error paths
- fix to avoid NULL pointer dereference f2fs_write_end_io() along
with put_super
- set proper flags to quota files
- fix potential deadlock due to unpaired node_write lock use
- fix over-estimating free section during FG GC
- fix the wrong condition to determine atomic context
As usual, also there are a number of patches with code refactoring and
minor clean-ups"
* tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
f2fs: fix to do sanity check on direct node in truncate_dnode()
f2fs: only set release for file that has compressed data
f2fs: fix compile warning in f2fs_destroy_node_manager()
f2fs: fix error path handling in truncate_dnode()
f2fs: fix deadlock in i_xattr_sem and inode page lock
f2fs: remove unneeded page uptodate check/set
f2fs: update mtime and ctime in move file range method
f2fs: compress tmp files given extension
f2fs: refactor struct f2fs_attr macro
f2fs: convert to use sbi directly
f2fs: remove redundant assignment to variable err
f2fs: do not issue small discard commands during checkpoint
f2fs: check zone write pointer points to the end of zone
f2fs: add f2fs_ioc_get_compress_blocks
f2fs: cleanup MIN_INLINE_XATTR_SIZE
f2fs: add helper to check compression level
f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
f2fs: do more sanity check on inode
f2fs: compress: fix to check validity of i_compress_flag field
f2fs: add sanity compress level check for compressed file
...
|
|
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After blamed commit, we must be more careful about using
skb_transport_offset(), as reminded us by syzbot:
WARNING: CPU: 0 PID: 10 at include/linux/skbuff.h:2868 skb_transport_offset include/linux/skbuff.h:2977 [inline]
WARNING: CPU: 0 PID: 10 at include/linux/skbuff.h:2868 perf_trace_net_dev_start_xmit+0x89a/0xce0 include/trace/events/net.h:14
Modules linked in:
CPU: 0 PID: 10 Comm: kworker/u4:1 Not tainted 6.1.30-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet
RIP: 0010:skb_transport_header include/linux/skbuff.h:2868 [inline]
RIP: 0010:skb_transport_offset include/linux/skbuff.h:2977 [inline]
RIP: 0010:perf_trace_net_dev_start_xmit+0x89a/0xce0 include/trace/events/net.h:14
Code: 8b 04 25 28 00 00 00 48 3b 84 24 c0 00 00 00 0f 85 4e 04 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc e8 56 22 01 fd <0f> 0b e9 f6 fc ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 86 f9 ff
RSP: 0018:ffffc900002bf700 EFLAGS: 00010293
RAX: ffffffff8485d8ca RBX: 000000000000ffff RCX: ffff888100914280
RDX: 0000000000000000 RSI: 000000000000ffff RDI: 000000000000ffff
RBP: ffffc900002bf818 R08: ffffffff8485d5b6 R09: fffffbfff0f8fb5e
R10: 0000000000000000 R11: dffffc0000000001 R12: 1ffff110217d8f67
R13: ffff88810bec7b3a R14: dffffc0000000000 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8881f6a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f96cf6d52f0 CR3: 000000012224c000 CR4: 0000000000350ef0
Call Trace:
<TASK>
[<ffffffff84715e35>] trace_net_dev_start_xmit include/trace/events/net.h:14 [inline]
[<ffffffff84715e35>] xmit_one net/core/dev.c:3643 [inline]
[<ffffffff84715e35>] dev_hard_start_xmit+0x705/0x980 net/core/dev.c:3660
[<ffffffff8471a232>] __dev_queue_xmit+0x16b2/0x3370 net/core/dev.c:4324
[<ffffffff85416493>] dev_queue_xmit include/linux/netdevice.h:3030 [inline]
[<ffffffff85416493>] batadv_send_skb_packet+0x3f3/0x680 net/batman-adv/send.c:108
[<ffffffff85416744>] batadv_send_broadcast_skb+0x24/0x30 net/batman-adv/send.c:127
[<ffffffff853bc52a>] batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:393 [inline]
[<ffffffff853bc52a>] batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:421 [inline]
[<ffffffff853bc52a>] batadv_iv_send_outstanding_bat_ogm_packet+0x69a/0x840 net/batman-adv/bat_iv_ogm.c:1701
[<ffffffff8151023c>] process_one_work+0x8ac/0x1170 kernel/workqueue.c:2289
[<ffffffff81511938>] worker_thread+0xaa8/0x12d0 kernel/workqueue.c:2436
Fixes: 66e4c8d95008 ("net: warn if transport header was not set")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull NFS client updates from Trond Myklebust:
"Stable fixes and other bugfixes:
- nfs: don't report STATX_BTIME in ->getattr
- Revert 'NFSv4: Retry LOCK on OLD_STATEID during delegation return'
since it breaks NFSv4 state recovery.
- NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
- Fix the NFSv4.2 xattr cache shrinker_id
- Force a ctime update after a NFSv4.2 SETXATTR call
Features and cleanups:
- NFS and RPC over TLS client code from Chuck Lever
- Support for use of abstract unix socket addresses with the rpcbind
daemon
- Sysfs API to allow shutdown of the kernel RPC client and prevent
umount() hangs if the server is known to be permanently down
- XDR cleanups from Anna"
* tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (33 commits)
Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return"
NFS: Don't cleanup sysfs superblock entry if uninitialized
nfs: don't report STATX_BTIME in ->getattr
NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
NFSv4.2: fix wrong shrinker_id
NFSv4: Clean up some shutdown loops
NFS: Cancel all existing RPC tasks when shutdown
NFS: add sysfs shutdown knob
NFS: add a sysfs link to the acl rpc_client
NFS: add a sysfs link to the lockd rpc_client
NFS: Add sysfs links to sunrpc clients for nfs_clients
NFS: add superblock sysfs entries
NFS: Make all of /sys/fs/nfs network-namespace unique
NFS: Open-code the nfs_kset kset_create_and_add()
NFS: rename nfs_client_kobj to nfs_net_kobj
NFS: rename nfs_client_kset to nfs_kset
NFS: Add an "xprtsec=" NFS mount option
NFS: Have struct nfs_client carry a TLS policy field
SUNRPC: Add a TCP-with-TLS RPC transport class
SUNRPC: Capture CMSG metadata on client-side receive
...
|
|
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
lpfc, qla2xxx).
We have a couple of major core changes impacting other systems:
- Command Duration Limits, which spills into block and ATA
- block level Persistent Reservation Operations, which touches block,
nvme, target and dm
Both of these are added with merge commits containing a cover letter
explaining what's going on"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
scsi: core: Improve warning message in scsi_device_block()
scsi: core: Replace scsi_target_block() with scsi_block_targets()
scsi: core: Don't wait for quiesce in scsi_device_block()
scsi: core: Don't wait for quiesce in scsi_stop_queue()
scsi: core: Merge scsi_internal_device_block() and device_block()
scsi: sg: Increase number of devices
scsi: bsg: Increase number of devices
scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
scsi: ufs: ufs-qcom: Switch to the new ICE API
scsi: ufs: dt-bindings: qcom: Add ICE phandle
scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
scsi: ufs: core: Remove dedicated hwq for dev command
scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Various cleanups and bug fixes in ext4's extent status tree,
journalling, and block allocator subsystems.
Also improve performance for parallel DIO overwrites"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (55 commits)
ext4: avoid updating the superblock on a r/o mount if not needed
jbd2: skip reading super block if it has been verified
ext4: fix to check return value of freeze_bdev() in ext4_shutdown()
ext4: refactoring to use the unified helper ext4_quotas_off()
ext4: turn quotas off if mount failed after enabling quotas
ext4: update doc about journal superblock description
ext4: add journal cycled recording support
jbd2: continue to record log between each mount
jbd2: remove j_format_version
jbd2: factor out journal initialization from journal_get_superblock()
jbd2: switch to check format version in superblock directly
jbd2: remove unused feature macros
ext4: ext4_put_super: Remove redundant checking for 'sbi->s_journal_bdev'
ext4: Fix reusing stale buffer heads from last failed mounting
ext4: allow concurrent unaligned dio overwrites
ext4: clean up mballoc criteria comments
ext4: make ext4_zeroout_es() return void
ext4: make ext4_es_insert_extent() return void
ext4: make ext4_es_insert_delayed_block() return void
ext4: make ext4_es_remove_extent() return void
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs
- Yosry has also eliminated cgroup's atomic rstat flushing
- Nhat Pham adds the new cachestat() syscall. It provides userspace
with the ability to query pagecache status - a similar concept to
mincore() but more powerful and with improved usability
- Mel Gorman provides more optimizations for compaction, reducing the
prevalence of page rescanning
- Lorenzo Stoakes has done some maintanance work on the
get_user_pages() interface
- Liam Howlett continues with cleanups and maintenance work to the
maple tree code. Peng Zhang also does some work on maple tree
- Johannes Weiner has done some cleanup work on the compaction code
- David Hildenbrand has contributed additional selftests for
get_user_pages()
- Thomas Gleixner has contributed some maintenance and optimization
work for the vmalloc code
- Baolin Wang has provided some compaction cleanups,
- SeongJae Park continues maintenance work on the DAMON code
- Huang Ying has done some maintenance on the swap code's usage of
device refcounting
- Christoph Hellwig has some cleanups for the filemap/directio code
- Ryan Roberts provides two patch series which yield some
rationalization of the kernel's access to pte entries - use the
provided APIs rather than open-coding accesses
- Lorenzo Stoakes has some fixes to the interaction between pagecache
and directio access to file mappings
- John Hubbard has a series of fixes to the MM selftesting code
- ZhangPeng continues the folio conversion campaign
- Hugh Dickins has been working on the pagetable handling code, mainly
with a view to reducing the load on the mmap_lock
- Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
from 128 to 8
- Domenico Cerasuolo has improved the zswap reclaim mechanism by
reorganizing the LRU management
- Matthew Wilcox provides some fixups to make gfs2 work better with the
buffer_head code
- Vishal Moola also has done some folio conversion work
- Matthew Wilcox has removed the remnants of the pagevec code - their
functionality is migrated over to struct folio_batch
* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
mm/hugetlb: remove hugetlb_set_page_subpool()
mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
hugetlb: revert use of page_cache_next_miss()
Revert "page cache: fix page_cache_next/prev_miss off by one"
mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
mm: memcg: rename and document global_reclaim()
mm: kill [add|del]_page_to_lru_list()
mm: compaction: convert to use a folio in isolate_migratepages_block()
mm: zswap: fix double invalidate with exclusive loads
mm: remove unnecessary pagevec includes
mm: remove references to pagevec
mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
mm: remove struct pagevec
net: convert sunrpc from pagevec to folio_batch
i915: convert i915_gpu_error to use a folio_batch
pagevec: rename fbatch_count()
mm: remove check_move_unevictable_pages()
drm: convert drm_gem_put_pages() to use a folio_batch
i915: convert shmem_sg_free_table() to use a folio_batch
scatterlist: add sg_set_folio()
...
|
|
mballoc criterias have historically been called by numbers
like CR0, CR1... however this makes it confusing to understand
what each criteria is about.
Change these criterias from numbers to symbolic names and add
relevant comments. While we are at it, also reformat and add some
comments to ext4_seq_mb_stats_show() for better readability.
Additionally, define CR_FAST which signifies the criteria
below which we can make quicker decisions like:
* quitting early if (free block < requested len)
* avoiding to scan free extents smaller than required len.
* avoiding to initialize buddy cache and work with existing cache
* limiting prefetches
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Ojaswin Mujoo <[email protected]>
Link: https://lore.kernel.org/r/a2dc6ec5aea5e5e68cf8e788c2a964ffead9c8b0.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
CR1_5 aims to optimize allocations which can't be satisfied in CR1. The
fact that we couldn't find a group in CR1 suggests that it would be
difficult to find a continuous extent to compleltely satisfy our
allocations. So before falling to the slower CR2, in CR1.5 we
proactively trim the the preallocations so we can find a group with
(free / fragments) big enough. This speeds up our allocation at the
cost of slightly reduced preallocation.
The patch also adds a new sysfs tunable:
* /sys/fs/ext4/<partition>/mb_cr1_5_max_trim_order
This controls how much CR1.5 can trim a request before falling to CR2.
For example, for a request of order 7 and max trim order 2, CR1.5 can
trim this upto order 5.
Suggested-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Link: https://lore.kernel.org/r/150fdf65c8e4cc4dba71e020ce0859bcf636a5ff.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Convert criteria to be an enum so it easier to maintain and
update the tracefiles to use enum names. This change also makes
it easier to insert new criterias in the future.
There is no functional change in this patch.
Signed-off-by: Ojaswin Mujoo <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Link: https://lore.kernel.org/r/5d82fd467bdf70ea45bdaef810af3b146013946c.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Time, timekeeping and related device driver updates:
Core:
- A set of fixes, cleanups and enhancements to the posix timer code:
- Prevent another possible live lock scenario in the exit() path,
which affects POSIX_CPU_TIMERS_TASK_WORK enabled architectures.
- Fix a loop termination issue which was reported syzcaller/KSAN
in the posix timer ID allocation code.
That triggered a deeper look into the posix-timer code which
unearthed more small issues.
- Add missing READ/WRITE_ONCE() annotations
- Fix or remove completely outdated comments
- Document places which are subtle and completely undocumented.
- Add missing hrtimer modes to the trace event decoder
- Small cleanups and enhancements all over the place
Drivers:
- Rework the Hyper-V clocksource and sched clock setup code
- Remove a deprecated clocksource driver
- Small fixes and enhancements all over the place"
* tag 'timers-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe
dt-bindings: timers: Add Ralink SoCs timer
clocksource/drivers/hyper-v: Rework clocksource and sched clock setup
dt-bindings: timer: brcm,kona-timer: convert to YAML
clocksource/drivers/imx-gpt: Fold <soc/imx/timer.h> into its only user
clk: imx: Drop inclusion of unused header <soc/imx/timer.h>
hrtimer: Add missing sparse annotations to hrtimer locking
clocksource/drivers/imx-gpt: Use only a single name for functions
clocksource/drivers/loongson1: Move PWM timer to clocksource framework
dt-bindings: timer: Add Loongson-1 clocksource
MIPS: Loongson32: Remove deprecated PWM timer clocksource
clocksource/drivers/ingenic-timer: Use pm_sleep_ptr() macro
tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode().
posix-timers: Add sys_ni_posix_timers() prototype
tick/rcu: Fix bogus ratelimit condition
alarmtimer: Remove unnecessary (void *) cast
alarmtimer: Remove unnecessary initialization of variable 'ret'
posix-timers: Refer properly to CONFIG_HIGH_RES_TIMERS
posix-timers: Polish coding style in a few places
posix-timers: Remove pointless comments
...
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP updates from Thomas Gleixner:
"A large update for SMP management:
- Parallel CPU bringup
The reason why people are interested in parallel bringup is to
shorten the (kexec) reboot time of cloud servers to reduce the
downtime of the VM tenants.
The current fully serialized bringup does the following per AP:
1) Prepare callbacks (allocate, intialize, create threads)
2) Kick the AP alive (e.g. INIT/SIPI on x86)
3) Wait for the AP to report alive state
4) Let the AP continue through the atomic bringup
5) Let the AP run the threaded bringup to full online state
There are two significant delays:
#3 The time for an AP to report alive state in start_secondary()
on x86 has been measured in the range between 350us and 3.5ms
depending on vendor and CPU type, BIOS microcode size etc.
#4 The atomic bringup does the microcode update. This has been
measured to take up to ~8ms on the primary threads depending
on the microcode patch size to apply.
On a two socket SKL server with 56 cores (112 threads) the boot CPU
spends on current mainline about 800ms busy waiting for the APs to
come up and apply microcode. That's more than 80% of the actual
onlining procedure.
This can be reduced significantly by splitting the bringup
mechanism into two parts:
1) Run the prepare callbacks and kick the AP alive for each AP
which needs to be brought up.
The APs wake up, do their firmware initialization and run the
low level kernel startup code including microcode loading in
parallel up to the first synchronization point. (#1 and #2
above)
2) Run the rest of the bringup code strictly serialized per CPU
(#3 - #5 above) as it's done today.
Parallelizing that stage of the CPU bringup might be possible
in theory, but it's questionable whether required surgery
would be justified for a pretty small gain.
If the system is large enough the first AP is already waiting at
the first synchronization point when the boot CPU finished the
wake-up of the last AP. That reduces the AP bringup time on that
SKL from ~800ms to ~80ms, i.e. by a factor ~10x.
The actual gain varies wildly depending on the system, CPU,
microcode patch size and other factors. There are some
opportunities to reduce the overhead further, but that needs some
deep surgery in the x86 CPU bringup code.
For now this is only enabled on x86, but the core functionality
obviously works for all SMP capable architectures.
- Enhancements for SMP function call tracing so it is possible to
locate the scheduling and the actual execution points. That allows
to measure IPI delivery time precisely"
* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
trace,smp: Add tracepoints for scheduling remotelly called functions
trace,smp: Add tracepoints around remotelly called functions
MAINTAINERS: Add CPU HOTPLUG entry
x86/smpboot: Fix the parallel bringup decision
x86/realmode: Make stack lock work in trampoline_compat()
x86/smp: Initialize cpu_primary_thread_mask late
cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
x86/smpboot: Support parallel startup of secondary CPUs
x86/smpboot: Implement a bit spinlock to protect the realmode stack
x86/apic: Save the APIC virtual base address
cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
x86/apic: Provide cpu_primary_thread mask
x86/smpboot: Enable split CPU startup
cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
cpu/hotplug: Reset task stack state in _cpu_up()
cpu/hotplug: Remove unused state functions
riscv: Switch to hotplug core state synchronization
parisc: Switch to hotplug core state synchronization
...
|
|
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Various cleanups all around (Irvin, Chaitanya, Christophe)
- Better struct packing (Christophe JAILLET)
- Reduce controller error logs for optional commands (Keith)
- Support for >=64KiB block sizes (Daniel Gomez)
- Fabrics fixes and code organization (Max, Chaitanya, Daniel
Wagner)
- bcache updates via Coly:
- Fix a race at init time (Mingzhe Zou)
- Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye)
- use page pinning in the block layer for dio (David)
- convert old block dio code to page pinning (David, Christoph)
- cleanups for pktcdvd (Andy)
- cleanups for rnbd (Guoqing)
- use the unchecked __bio_add_page() for the initial single page
additions (Johannes)
- fix overflows in the Amiga partition handling code (Michael)
- improve mq-deadline zoned device support (Bart)
- keep passthrough requests out of the IO schedulers (Christoph, Ming)
- improve support for flush requests, making them less special to deal
with (Christoph)
- add bdev holder ops and shutdown methods (Christoph)
- fix the name_to_dev_t() situation and use cases (Christoph)
- decouple the block open flags from fmode_t (Christoph)
- ublk updates and cleanups, including adding user copy support (Ming)
- BFQ sanity checking (Bart)
- convert brd from radix to xarray (Pankaj)
- constify various structures (Thomas, Ivan)
- more fine grained persistent reservation ioctl capability checks
(Jingbo)
- misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan,
Jordy, Li, Min, Yu, Zhong, Waiman)
* tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits)
scsi/sg: don't grab scsi host module reference
ext4: Fix warning in blkdev_put()
block: don't return -EINVAL for not found names in devt_from_devname
cdrom: Fix spectre-v1 gadget
block: Improve kernel-doc headers
blk-mq: don't insert passthrough request into sw queue
bsg: make bsg_class a static const structure
ublk: make ublk_chr_class a static const structure
aoe: make aoe_class a static const structure
block/rnbd: make all 'class' structures const
block: fix the exclusive open mask in disk_scan_partitions
block: add overflow checks for Amiga partition support
block: change all __u32 annotations to __be32 in affs_hardblocks.h
block: fix signed int overflow in Amiga partition support
block: add capacity validation in bdev_add_partition()
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
block: disallow Persistent Reservation on partitions
reiserfs: fix blkdev_put() warning from release_journal_dev()
block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()
block: document the holder argument to blkdev_get_by_path
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Mainly core changes, refactoring and optimizations.
Performance is improved in some areas, overall there may be a
cumulative improvement due to refactoring that removed lookups in the
IO path or simplified IO submission tracking.
Core:
- submit IO synchronously for fast checksums (crc32c and xxhash),
remove high priority worker kthread
- read extent buffer in one go, simplify IO tracking, bio submission
and locking
- remove additional tracking of redirtied extent buffers, originally
added for zoned mode but actually not needed
- track ordered extent pointer in bio to avoid rbtree lookups during
IO
- scrub, use recovered data stripes as cache to avoid unnecessary
read
- in zoned mode, optimize logical to physical mappings of extents
- remove PageError handling, not set by VFS nor writeback
- cleanups, refactoring, better structure packing
- lots of error handling improvements
- more assertions, lockdep annotations
- print assertion failure with the exact line where it happens
- tracepoint updates
- more debugging prints
Performance:
- speedup in fsync(), better tracking of inode logged status can
avoid transaction commit
- IO path structures track logical offsets in data structures and
does not need to look it up
User visible changes:
- don't commit transaction for every created subvolume, this can
reduce time when many subvolumes are created in a batch
- print affected files when relocation fails
- trigger orphan file cleanup during START_SYNC ioctl
Notable fixes:
- fix crash when disabling quota and relocation
- fix crashes when removing roots from drity list
- fix transacion abort during relocation when converting from newer
profiles not covered by fallback
- in zoned mode, stop reclaiming block groups if filesystem becomes
read-only
- fix rare race condition in tree mod log rewind that can miss some
btree node slots
- with enabled fsverity, drop up-to-date page bit in case the
verification fails"
* tag 'for-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (194 commits)
btrfs: fix race between quota disable and relocation
btrfs: add comment to struct btrfs_fs_info::dirty_cowonly_roots
btrfs: fix race when deleting free space root from the dirty cow roots list
btrfs: fix race when deleting quota root from the dirty cow roots list
btrfs: tracepoints: also show actual number of the outstanding extents
btrfs: update i_version in update_dev_time
btrfs: make btrfs_compressed_bioset static
btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profile
btrfs: scrub: remove btrfs_fs_info::scrub_wr_completion_workers
btrfs: scrub: remove scrub_ctx::csum_list member
btrfs: do not BUG_ON after failure to migrate space during truncation
btrfs: do not BUG_ON on failure to get dir index for new snapshot
btrfs: send: do not BUG_ON() on unexpected symlink data extent
btrfs: do not BUG_ON() when dropping inode items from log root
btrfs: replace BUG_ON() at split_item() with proper error handling
btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr()
btrfs: do not BUG_ON() on tree mod log failures at insert_ptr()
btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()
btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
btrfs: abort transaction at update_ref_for_cow() when ref count is zero
...
|
|
Pull nfsd updates from Chuck Lever:
- Clean-ups in the READ path in anticipation of MSG_SPLICE_PAGES
- Better NUMA awareness when allocating pages and other objects
- A number of minor clean-ups to XDR encoding
- Elimination of a race when accepting a TCP socket
- Numerous observability enhancements
* tag 'nfsd-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (46 commits)
nfsd: remove redundant assignments to variable len
svcrdma: Fix stale comment
NFSD: Distinguish per-net namespace initialization
nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net
SUNRPC: Address RCU warning in net/sunrpc/svc.c
SUNRPC: Use sysfs_emit in place of strlcpy/sprintf
SUNRPC: Remove transport class dprintk call sites
SUNRPC: Fix comments for transport class registration
svcrdma: Remove an unused argument from __svc_rdma_put_rw_ctxt()
svcrdma: trace cc_release calls
svcrdma: Convert "might sleep" comment into a code annotation
NFSD: Add an nfsd4_encode_nfstime4() helper
SUNRPC: Move initialization of rq_stime
SUNRPC: Optimize page release in svc_rdma_sendto()
svcrdma: Prevent page release when nothing was received
svcrdma: Revert 2a1e4f21d841 ("svcrdma: Normalize Send page handling")
SUNRPC: Revert 579900670ac7 ("svcrdma: Remove unused sc_pages field")
SUNRPC: Revert cc93ce9529a6 ("svcrdma: Retain the page backing rq_res.head[0].iov_base")
NFSD: add encoding of op_recall flag for write delegation
NFSD: Add "official" reviewers for this subsystem
...
|
|
|
|
When commit 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for
wait_on_page_writeback()") repurposed the writeback_dirty_page trace event
as a template to create its new wait_on_page_writeback trace event, it
ended up opening a window to NULL pointer dereference crashes due to the
(infrequent) occurrence of a race where an access to a page in the
swap-cache happens concurrently with the moment this page is being written
to disk and the tracepoint is enabled:
BUG: kernel NULL pointer dereference, address: 0000000000000040
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023
RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0
Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d
RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044
RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006
R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000
R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200
FS: 00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x76/0x170
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x65/0x150
? asm_exc_page_fault+0x22/0x30
? trace_event_raw_event_writeback_folio_template+0x76/0xf0
folio_wait_writeback+0x6b/0x80
shmem_swapin_folio+0x24a/0x500
? filemap_get_entry+0xe3/0x140
shmem_get_folio_gfp+0x36e/0x7c0
? find_busiest_group+0x43/0x1a0
shmem_fault+0x76/0x2a0
? __update_load_avg_cfs_rq+0x281/0x2f0
__do_fault+0x33/0x130
do_read_fault+0x118/0x160
do_pte_missing+0x1ed/0x2a0
__handle_mm_fault+0x566/0x630
handle_mm_fault+0x91/0x210
do_user_addr_fault+0x22c/0x740
exc_page_fault+0x65/0x150
asm_exc_page_fault+0x22/0x30
This problem arises from the fact that the repurposed writeback_dirty_page
trace event code was written assuming that every pointer to mapping
(struct address_space) would come from a file-mapped page-cache object,
thus mapping->host would always be populated, and that was a valid case
before commit 19343b5bdd16. The swap-cache address space
(swapper_spaces), however, doesn't populate its ->host (struct inode)
pointer, thus leading to the crashes in the corner-case aforementioned.
commit 19343b5bdd16 ended up breaking the assignment of __entry->name and
__entry->ino for the wait_on_page_writeback tracepoint -- both dependent
on mapping->host carrying a pointer to a valid inode. The assignment of
__entry->name was fixed by commit 68f23b89067f ("memcg: fix a crash in
wb_workfn when a device disappears"), and this commit fixes the remaining
case, for __entry->ino.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()")
Signed-off-by: Rafael Aquini <[email protected]>
Reviewed-by: Yafang Shao <[email protected]>
Cc: Aristeu Rozanski <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|