Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull squashfs update from Seth Forshee:
"This is a simple patch to enable idmapped mounts for squashfs.
All functionality squashfs needs to support idmapped mounts is already
implemented in generic VFS code, so all that is needed is to set
FS_ALLOW_IDMAP in fs_flags"
* tag 'fs.idmapped.squashfs.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
squashfs: enable idmapped mounts
|
|
When squashfs_read_table() returns an error or `sb->s_magic !=
SQUASHFS_MAGIC`, enters the error branch and calls
msblk->thread_ops->destroy(msblk) to destroy msblk. However,
msblk->thread_ops has not been initialized. Therefore, the following
problem is triggered:
==================================================================
BUG: KASAN: null-ptr-deref in squashfs_fill_super+0xe7a/0x13b0
Read of size 8 at addr 0000000000000008 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3-next-20221031 #367
Call Trace:
<TASK>
dump_stack_lvl+0x73/0x9f
print_report+0x743/0x759
kasan_report+0xc0/0x120
__asan_load8+0xd3/0x140
squashfs_fill_super+0xe7a/0x13b0
get_tree_bdev+0x27b/0x450
squashfs_get_tree+0x19/0x30
vfs_get_tree+0x49/0x150
path_mount+0xaae/0x1350
init_mount+0xad/0x100
do_mount_root+0xbc/0x1d0
mount_block_root+0x173/0x316
mount_root+0x223/0x236
prepare_namespace+0x1eb/0x237
kernel_init_freeable+0x528/0x576
kernel_init+0x29/0x250
ret_from_fork+0x1f/0x30
</TASK>
==================================================================
To solve this issue, msblk->thread_ops is initialized immediately after
msblk is assigned a value.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b0645770d3c7 ("squashfs: add the mount parameter theads=<single|multi|percpu>")
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Xiaoming Ni <[email protected]>
Reviewed-by: Phillip Lougher <[email protected]>
Cc: Yu Kuai <[email protected]>
Cc: Zhang Yi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The maximum number of threads in the decompressor_multi.c file is fixed
and cannot be adjusted according to user needs. Therefore, the mount
parameter needs to be added to allow users to configure the number of
threads as required. The upper limit is num_online_cpus() * 2.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoming Ni <[email protected]>
Reviewed-by: Phillip Lougher <[email protected]>
Cc: Jianguo Chen <[email protected]>
Cc: Jubin Zhong <[email protected]>
Cc: Zhang Yi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series 'squashfs: Add the mount parameter "threads="'.
Currently, Squashfs supports multiple decompressor parallel modes.
However, this mode can be configured only during kernel building and does
not support flexible selection during runtime.
In the current patch set, the mount parameter "threads=" is added to allow
users to select the parallel decompressor mode and configure the number of
decompressors when mounting a file system.
"threads=<single|multi|percpu|1|2|3|...>"
The upper limit is num_online_cpus() * 2.
This patch (of 2):
Squashfs supports three decompression concurrency modes:
Single-thread mode: concurrent reads are blocked and the memory
overhead is small.
Multi-thread mode/percpu mode: reduces concurrent read blocking but
increases memory overhead.
The corresponding schema must be fixed at compile time. During mounting,
the concurrent decompression mode cannot be adjusted based on file read
blocking.
The mount parameter theads=<single|multi|percpu> is added to select
the concurrent decompression mode of a single SquashFS file system
image.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoming Ni <[email protected]>
Reviewed-by: Phillip Lougher <[email protected]>
Cc: Jianguo Chen <[email protected]>
Cc: Jubin Zhong <[email protected]>
Cc: Zhang Yi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
For squashfs all needed functionality for idmapped mounts is already
implemented by the generic handlers in the VFS. Thus, it is sufficient
to just enable the corresponding FS_ALLOW_IDMAP flag to support
idmapped mounts.
We use this for unprivileged (user namespaced) containers based on
squashfs images as rootfs in GyroidOS.
A simple test using the mount-idmapped tool executed as user with
uid=1000 looks as follows:
$ mkdir test
$ echo "test" > test/test_file
$ mksquashfs test/ fs.img
$ sudo mkdir /mnt/test
$ sudo mkdir /mnt/mapped
$ sudo mount fs.img -o loop /mnt/test/
$ sudo ./mount-idmapped --map-mount b:1000:2000:1 /mnt/test/ /mnt/mapped/
$ mount | tail -n2
fs.img on /mnt/test type squashfs (ro,relatime,errors=continue)
fs.img on /mnt/mapped type squashfs (ro,relatime,idmapped,errors=continue)
$ ls -lan /mnt/test/
total 5
drwxr-xr-x 2 1000 1000 32 Okt 24 13:36 .
drwxr-xr-x 6 0 0 4096 Okt 24 13:38 ..
-rw-r--r-- 1 1000 1000 5 Okt 24 13:36 test_file
$ ls -lan /mnt/mapped/
total 5
drwxr-xr-x 2 2000 2000 32 Okt 24 13:36 .
drwxr-xr-x 6 0 0 4096 Okt 24 13:38 ..
-rw-r--r-- 1 2000 2000 5 Okt 24 13:36 test_file
Signed-off-by: Michael Weiß <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Reviewed-by: Phillip Lougher <[email protected]>
Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
|
|
Fix a buffer release race condition, where the error value was used after
release.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b09a7a036d20 ("squashfs: support reading fragments in readahead call")
Signed-off-by: Phillip Lougher <[email protected]>
Tested-by: Bagas Sanjaya <[email protected]>
Reported-by: Marc Miltenberger <[email protected]>
Cc: Dimitri John Ledkov <[email protected]>
Cc: Hsin-Yi Wang <[email protected]>
Cc: Mirsad Goran Todorovac <[email protected]>
Cc: Slade Watkins <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The readahead code will try to extend readahead to the entire size of the
Squashfs data block.
But, it didn't take into account that the last block at the end of the
file may not be a whole block. In this case, the code would extend
readahead to beyond the end of the file, leaving trailing pages.
Fix this by only requesting the expected number of pages.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8fc78b6fe24c ("squashfs: implement readahead")
Signed-off-by: Phillip Lougher <[email protected]>
Tested-by: Bagas Sanjaya <[email protected]>
Reported-by: Marc Miltenberger <[email protected]>
Cc: Dimitri John Ledkov <[email protected]>
Cc: Hsin-Yi Wang <[email protected]>
Cc: Mirsad Goran Todorovac <[email protected]>
Cc: Slade Watkins <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "squashfs: fix some regressions introduced in the readahead
code".
This patchset fixes 3 regressions introduced by the recent readahead code
changes. The first regression is causing "snaps" to randomly fail after a
couple of hours or days, which how the regression came to light.
This patch (of 3):
If a file isn't a whole multiple of the page size, the last page will have
trailing bytes unfilled.
There was a mistake in the readahead code which did this. In particular
it incorrectly assumed that the last page in the readahead page array
(page[nr_pages - 1]) will always contain the last page in the block, which
if we're at file end, will be the page that needs to be zero filled.
But the readahead code may not return the last page in the block, which
means it is unmapped and will be skipped by the decompressors (a temporary
buffer used).
In this case the zero filling code will zero out the wrong page, leading
to data corruption.
Fix this by by extending the "page actor" to return the last page if
present, or NULL if a temporary buffer was used.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8fc78b6fe24c ("squashfs: implement readahead")
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Phillip Lougher <[email protected]>
Reported-by: Mirsad Goran Todorovac <[email protected]>
Tested-by: Mirsad Goran Todorovac <[email protected]>
Tested-by: Slade Watkins <[email protected]>
Tested-by: Bagas Sanjaya <[email protected]>
Reported-by: Marc Miltenberger <[email protected]>
Cc: Dimitri John Ledkov <[email protected]>
Cc: Hsin-Yi Wang <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The decompressors may be called while in an atomic section. So move the
kmalloc() out of this path, and into the "page actor" init function.
This fixes a regression introduced by commit
f268eedddf35 ("squashfs: extend "page actor" to handle missing pages")
Link: https://lkml.kernel.org/r/[email protected]
Fixes: f268eedddf35 ("squashfs: extend "page actor" to handle missing pages")
Reported-by: Chris Murphy <[email protected]>
Signed-off-by: Phillip Lougher <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"Updates to various subsystems which I help look after. lib, ocfs2,
fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
material this time"
* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
scripts/gdb: ensure the absolute path is generated on initial source
MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
mailmap: add linux.dev alias for Brendan Higgins
mailmap: update Kirill's email
profile: setup_profiling_timer() is moslty not implemented
ocfs2: fix a typo in a comment
ocfs2: use the bitmap API to simplify code
ocfs2: remove some useless functions
lib/mpi: fix typo 'the the' in comment
proc: add some (hopefully) insightful comments
bdi: remove enum wb_congested_state
kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
squashfs: support reading fragments in readahead call
squashfs: implement readahead
squashfs: always build "file direct" version of page actor
Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
fs/ocfs2: Fix spelling typo in comment
ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
proc: fix test for "vsyscall=xonly" boot option
...
|
|
Since we actually know what error happened, we can report it instead
of having the generic code return -EIO for pages that were unlocked
without being marked uptodate. Also remove a test of PageError since
we have the return value at this point.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
|
|
Add a function which can be used to read fragments in the readahead call.
This function is necessary because filesystems built with the -tailends
(or -always-use-fragments) option may have fragments present which cannot
be currently handled.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Signed-off-by: Hsin-Yi Wang <[email protected]>
Cc: Hou Tao <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: Xiongwei Song <[email protected]>
Cc: Zhang Yi <[email protected]>
Cc: Zheng Liang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Implement readahead callback for squashfs. It will read datablocks which
cover pages in readahead request. For a few cases it will not mark page
as uptodate, including:
- file end is 0.
- zero filled blocks.
- current batch of pages isn't in the same datablock.
- decompressor error.
Otherwise pages will be marked as uptodate. The unhandled pages will be
updated by readpage later.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hsin-Yi Wang <[email protected]>
Suggested-by: Matthew Wilcox <[email protected]>
Reported-by: Matthew Wilcox <[email protected]>
Reported-by: Phillip Lougher <[email protected]>
Reported-by: Xiongwei Song <[email protected]>
Reported-by: Andrew Morton <[email protected]>
Cc: Hou Tao <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: Zhang Yi <[email protected]>
Cc: Zheng Liang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Squashfs_readahead uses the "file direct" version of the page actor, and
so build it unconditionally.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Signed-off-by: Hsin-Yi Wang <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: Hou Tao <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: Xiongwei Song <[email protected]>
Cc: Zhang Yi <[email protected]>
Cc: Zheng Liang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Implement readahead for squashfs", v7.
Commit 9eec1d897139("squashfs: provide backing_dev_info in order to
disable read-ahead") mitigates the performance drop issue for squashfs by
closing readahead for it.
This series implements readahead callback for squashfs.
This patch (of 4):
This reverts 9eec1d897139e5 ("squashfs: provide backing_dev_info in order
to disable read-ahead").
Revert closing the readahead to squashfs since the readahead callback for
squashfs is implemented.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hsin-Yi Wang <[email protected]>
Suggested-by: Xiongwei Song <[email protected]>
Cc: Phillip Lougher <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Zheng Liang <[email protected]>
Cc: Zhang Yi <[email protected]>
Cc: Hou Tao <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: kernel test robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Now that the "page actor" can handle missing pages, we don't have to fall
back to using an intermediate buffer in Squashfs_readpage_block() if all
the pages necessary can't be obtained.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Cc: Hsin-Yi Wang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Xiongwei Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Squashfs: handle missing pages decompressing into page
cache".
This patchset enables Squashfs to handle missing pages when directly
decompressing datablocks into the page cache.
Previously if the full set of pages needed was not available, Squashfs
would have to fall back to using an intermediate buffer (the older
method), which is slower, involving a memcopy, and it introduces
contention on a shared buffer.
The first patch extends the "page actor" code to handle missing pages.
The second patch updates Squashfs_readpage_block() to use the new
functionality, and removes the code that falls back to using an
intermediate buffer.
This patchset is independent of the readahead work, and it is standalone.
It can be merged on its own.
But the readahead patch for efficiency also needs this patch-set.
This patch (of 2):
This patch extends the "page actor" code to handle missing pages.
Previously if the full set of pages needed to decompress a Squashfs
datablock was unavailable, this would cause decompression to fail on the
missing pages.
In this case direct decompression into the page cache could not be
achieved and the code would fall back to using the older intermediate
buffer method.
With this patch, direct decompression into the page cache can be achieved
with missing pages.
For "multi-shot" decompressors (zlib, xz, zstd), the page actor will
allocate a temporary buffer which is passed to the decompressor, and then
freed by the page actor.
For "single shot" decompressors (lz4, lzo) which decompress into a
contiguous "bounce buffer", and which is then copied into the page cache,
it would be pointless to allocate a temporary buffer, memcpy into it, and
then free it. For these decompressors -ENOMEM is returned, which
signifies that the memcpy for that page should be skipped.
This also happens if the data block is uncompressed.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Hsin-Yi Wang <[email protected]>
Cc: Xiongwei Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Pull page cache updates from Matthew Wilcox:
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first
argument like ->read_folio
* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
nilfs2: Fix some kernel-doc comments
Appoint myself page cache maintainer
fs: Remove aops->freepage
secretmem: Convert to free_folio
nfs: Convert to free_folio
orangefs: Convert to free_folio
fs: Add free_folio address space operation
fs: Convert drop_buffers() to use a folio
fs: Change try_to_free_buffers() to take a folio
jbd2: Convert release_buffer_page() to use a folio
jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
reiserfs: Convert release_buffer_page() to use a folio
fs: Remove last vestiges of releasepage
ubifs: Convert to release_folio
reiserfs: Convert to release_folio
orangefs: Convert to release_folio
ocfs2: Convert to release_folio
nilfs2: Remove comment about releasepage
nfs: Convert to release_folio
jfs: Convert to release_folio
...
|
|
This is a "weak" conversion which converts straight back to using pages.
A full conversion should be performed at some point, hopefully by
someone familiar with the filesystem.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
|
|
Remove the magic autofree semantics and require the callers to explicitly
call bio_init to initialize the bio.
This allows bio_free to catch accidental bio_put calls on bio_init()ed
bios as well.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Coly Li <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
If a plain kmalloc that is not backed by a mempool is safe here for a
large read (and the actual page allocations), it must also be for a
small one, so simplify the code a bit.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Phillip Lougher <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Merge updates from Andrew Morton:
- A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs
- Most the MM patches which precede the patches in Willy's tree: kasan,
pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.
* emailed patches from Andrew Morton <[email protected]>: (227 commits)
mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
Docs/ABI/testing: add DAMON sysfs interface ABI document
Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
selftests/damon: add a test for DAMON sysfs interface
mm/damon/sysfs: support DAMOS stats
mm/damon/sysfs: support DAMOS watermarks
mm/damon/sysfs: support schemes prioritization
mm/damon/sysfs: support DAMOS quotas
mm/damon/sysfs: support DAMON-based Operation Schemes
mm/damon/sysfs: support the physical address space monitoring
mm/damon/sysfs: link DAMON for virtual address spaces monitoring
mm/damon: implement a minimal stub for sysfs-based DAMON interface
mm/damon/core: add number of each enum type values
mm/damon/core: allow non-exclusive DAMON start/stop
Docs/damon: update outdated term 'regions update interval'
Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
Docs/vm/damon: call low level monitoring primitives the operations
mm/damon: remove unnecessary CONFIG_DAMON option
mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
mm/damon/dbgfs-test: fix is_target_id() change
...
|
|
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Theodore Ts'o <[email protected]> [ext4]
Acked-by: Roman Gushchin <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Anna Schumaker <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Fam Zheng <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kari Argillander <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pass the block_device and operation that we plan to use this bio for to
bio_alloc to optimize the assignment. NULL/0 can be passed, both for the
passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.
Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit c1f6925e1091 ("mm: put readahead pages in cache earlier") causes
the read performance of squashfs to deteriorate.Through testing, we find
that the performance will be back by closing the readahead of squashfs.
So we want to learn the way of ubifs, provides backing_dev_info and
disable read-ahead
We tested the following data by fio.
squashfs image blocksize=128K
test command:
fio --name basic --bs=? --filename="/mnt/test_file" --rw=? --iodepth=1 --ioengine=psync --runtime=200 --time_based
turn on squashfs readahead in 5.10 kernel
bs(k) read/randread MB/s
4 randread 271
128 randread 231
1024 randread 246
4 read 310
128 read 245
1024 read 247
turn off squashfs readahead in 5.10 kernel
bs(k) read/randread MB/s
4 randread 293
128 randread 330
1024 randread 363
4 read 338
128 read 360
1024 read 365
turn on squashfs readahead and revert the
commit c1f6925e1091("mm: put readahead
pages in cache earlier") in 5.10 kernel
bs(k) read/randread MB/s
4 randread 289
128 randread 306
1024 randread 335
4 read 337
128 read 336
1024 read 338
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zheng Liang <[email protected]>
Reviewed-by: Phillip Lougher <[email protected]>
Cc: Zhang Yi <[email protected]>
Cc: Hou Tao <[email protected]>
Cc: Miao Xie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch:
- Moves `include/linux/zstd.h` -> `include/linux/zstd_lib.h`
- Updates modified zstd headers to yearless copyright
- Adds a new API in `include/linux/zstd.h` that is functionally
equivalent to the in-use subset of the current API. Functions are
renamed to avoid symbol collisions with zstd, to make it clear it is
not the upstream zstd API, and to follow the kernel style guide.
- Updates all callers to use the new API.
There are no functional changes in this patch. Since there are no
functional change, I felt it was okay to update all the callers in a
single patch. Once the API is approved, the callers are mechanically
changed.
This patch is preparing for the 3rd patch in this series, which updates
zstd to version 1.4.10. Since the upstream zstd API is no longer exposed
to callers, the update can happen transparently.
Signed-off-by: Nick Terrell <[email protected]>
Tested By: Paul Jones <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Sedat Dilek <[email protected]> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <[email protected]>
|
|
Use the proper helper to read the block device size.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Phillip Lougher <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use bvec_virt instead of open coding it.
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add an errors=panic mount option to make squashfs trigger a panic when
errors are encountered, similar to several other filesystems. This allows
a kernel dump to be saved using which the corruption can be analysed and
debugged.
Inspired by a pre-fs_context patch by Anton Eliasson.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vincent Whitchurch <[email protected]>
Signed-off-by: Phillip Lougher <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sysbot has reported a "divide error" which has been identified as being
caused by a corrupted file_size value within the file inode. This value
has been corrupted to a much larger value than expected.
Calculate_skip() is passed i_size_read(inode) >> msblk->block_log. Due to
the file_size value corruption this overflows the int argument/variable in
that function, leading to the divide error.
This patch changes the function to use u64. This will accommodate any
unexpectedly large values due to corruption.
The value returned from calculate_skip() is clamped to be never more than
SQUASHFS_CACHED_BLKS - 1, or 7. So file_size corruption does not lead to
an unexpectedly large return result here.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Reported-by: <[email protected]>
Reported-by: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The checks for maximum metadata block size is missing
SQUASHFS_BLOCK_OFFSET (the two byte length count).
Link: https://lkml.kernel.org/r/[email protected]
Fixes: f37aa4c7366e23f ("squashfs: add more sanity checks in id lookup")
Signed-off-by: Phillip Lougher <[email protected]>
Cc: Sean Nyekjaer <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When mouting a squashfs image created without inode compression it fails
with: "unable to read inode lookup table"
It turns out that the BLOCK_OFFSET is missing when checking the
SQUASHFS_METADATA_SIZE agaist the actual size.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: eabac19e40c0 ("squashfs: add more sanity checks in inode lookup")
Signed-off-by: Sean Nyekjaer <[email protected]>
Acked-by: Phillip Lougher <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Sysbot has reported a warning where a kmalloc() attempt exceeds the
maximum limit. This has been identified as corruption of the xattr_ids
count when reading the xattr id lookup table.
This patch adds a number of additional sanity checks to detect this
corruption and others.
1. It checks for a corrupted xattr index read from the inode. This could
be because the metadata block is uncompressed, or because the
"compression" bit has been corrupted (turning a compressed block
into an uncompressed block). This would cause an out of bounds read.
2. It checks against corruption of the xattr_ids count. This can either
lead to the above kmalloc failure, or a smaller than expected
table to be read.
3. It checks the contents of the index table for corruption.
[[email protected]: fix checkpatch issue]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Reported-by: [email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sysbot has reported an "slab-out-of-bounds read" error which has been
identified as being caused by a corrupted "ino_num" value read from the
inode. This could be because the metadata block is uncompressed, or
because the "compression" bit has been corrupted (turning a compressed
block into an uncompressed block).
This patch adds additional sanity checks to detect this, and the
following corruption.
1. It checks against corruption of the inodes count. This can either
lead to a larger table to be read, or a smaller than expected
table to be read.
In the case of a too large inodes count, this would often have been
trapped by the existing sanity checks, but this patch introduces
a more exact check, which can identify too small values.
2. It checks the contents of the index table for corruption.
[[email protected]: fix checkpatch issue]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Reported-by: [email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sysbot has reported a number of "slab-out-of-bounds reads" and
"use-after-free read" errors which has been identified as being caused
by a corrupted index value read from the inode. This could be because
the metadata block is uncompressed, or because the "compression" bit has
been corrupted (turning a compressed block into an uncompressed block).
This patch adds additional sanity checks to detect this, and the
following corruption.
1. It checks against corruption of the ids count. This can either
lead to a larger table to be read, or a smaller than expected
table to be read.
In the case of a too large ids count, this would often have been
trapped by the existing sanity checks, but this patch introduces
a more exact check, which can identify too small values.
2. It checks the contents of the index table for corruption.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Squashfs: fix BIO migration regression and add sanity checks".
Patch [1/4] fixes a regression introduced by the "migrate from
ll_rw_block usage to BIO" patch, which has produced a number of
Sysbot/Syzkaller reports.
Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption
issues which have produced Sysbot reports in the id, inode and xattr
lookup code.
Each patch has been tested against the Sysbot reproducers using the
given kernel configuration. They have the appropriate "Reported-by:"
lines added.
Additionally, all of the reproducer filesystems are indirectly fixed by
patch [4/4] due to the fact they all have xattr corruption which is now
detected there.
Additional testing with other configurations and architectures (32bit,
big endian), and normal filesystems has also been done to trap any
inadvertent regressions caused by the additional sanity checks.
This patch (of 4):
This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".
Sysbot/Syskaller has reported a number of "out of bounds writes" and
"unable to handle kernel paging request in squashfs_decompress" errors
which have been identified as a regression introduced by the above
patch.
Specifically, the patch removed the following sanity check
if (length < 0 || length > output->length ||
(index + length) > msblk->bytes_used)
This check did two things:
1. It ensured any reads were not beyond the end of the filesystem
2. It ensured that the "length" field read from the filesystem
was within the expected maximum length. Without this any
corrupted values can over-run allocated buffers.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 93e72b3c612adc ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: [email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Cc: Philippe Liard <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Assorted stuff all over the place (the largest group here is
Christoph's stat cleanups)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove KSTAT_QUERY_FLAGS
fs: remove vfs_stat_set_lookup_flags
fs: move vfs_fstatat out of line
fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
fs: remove vfs_statx_fd
fs: omfs: use kmemdup() rather than kmalloc+memcpy
[PATCH] reduce boilerplate in fsid handling
fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
selftests: mount: add nosymfollow tests
Add a "nosymfollow" mount option.
|
|
Get rid of boilerplate in most of ->statfs()
instances...
Signed-off-by: Al Viro <[email protected]>
|
|
This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".
Bio_alloc() is limited to 256 pages (1 Mbyte). This can cause a failure
when reading 1 Mbyte block filesystems. The problem is a datablock can be
fully (or almost uncompressed), requiring 256 pages, but, because blocks
are not aligned to page boundaries, it may require 257 pages to read.
Bio_kmalloc() can handle 1024 pages, and so use this for the edge
condition.
Fixes: 93e72b3c612a ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: Nicolas Prochazka <[email protected]>
Reported-by: Tomoatsu Shimada <[email protected]>
Signed-off-by: Phillip Lougher <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Cc: Philippe Liard <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Adrien Schildknecht <[email protected]>
Cc: Daniel Rosenberg <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is a regression introduced by the "migrate from ll_rw_block usage
to BIO" patch.
Squashfs packs structures on byte boundaries, and due to that the length
field (of the metadata block) may not be fully in the current block.
The new code rewrote and introduced a faulty check for that edge case.
Fixes: 93e72b3c612adcaca1 ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: Bernd Amend <[email protected]>
Signed-off-by: Phillip Lougher <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Adrien Schildknecht <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Daniel Rosenberg <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21
Signed-off-by: Gustavo A. R. Silva <[email protected]>
|
|
Merge updates from Andrew Morton:
"A few little subsystems and a start of a lot of MM patches.
Subsystems affected by this patch series: squashfs, ocfs2, parisc,
vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
swap, memcg, pagemap, memory-failure, vmalloc, kasan"
* emailed patches from Andrew Morton <[email protected]>: (128 commits)
kasan: move kasan_report() into report.c
mm/mm_init.c: report kasan-tag information stored in page->flags
ubsan: entirely disable alignment checks under UBSAN_TRAP
kasan: fix clang compilation warning due to stack protector
x86/mm: remove vmalloc faulting
mm: remove vmalloc_sync_(un)mappings()
x86/mm/32: implement arch_sync_kernel_mappings()
x86/mm/64: implement arch_sync_kernel_mappings()
mm/ioremap: track which page-table levels were modified
mm/vmalloc: track which page-table levels were modified
mm: add functions to track page directory modifications
s390: use __vmalloc_node in stack_alloc
powerpc: use __vmalloc_node in alloc_vm_stack
arm64: use __vmalloc_node in arch_alloc_vmap_stack
mm: remove vmalloc_user_node_flags
mm: switch the test_vmalloc module to use __vmalloc_node
mm: remove __vmalloc_node_flags_caller
mm: remove both instances of __vmalloc_node_flags
mm: remove the prot argument to __vmalloc_node
mm: remove the pgprot argument to __vmalloc
...
|
|
ll_rw_block() function has been deprecated in favor of BIO which appears
to come with large performance improvements.
This patch decreases boot time by close to 40% when using squashfs for
the root file-system. This is observed at least in the context of
starting an Android VM on Chrome OS using crosvm. The patch was tested
on 4.19 as well as master.
This patch is largely based on Adrien Schildknecht's patch that was
originally sent as https://lkml.org/lkml/2017/9/22/814 though with some
significant changes and simplifications while also taking Phillip
Lougher's feedback into account, around preserving support for
FILE_CACHE in particular.
[[email protected]: fix build error reported by Randy]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Philippe Liard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Adrien Schildknecht <[email protected]>
Cc: Phillip Lougher <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Daniel Rosenberg <[email protected]>
Link: https://chromium.googlesource.com/chromiumos/platform/crosvm
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The squashfs multi CPU decompressor makes use of get_cpu_ptr() to
acquire a pointer to per-CPU data. get_cpu_ptr() implicitly disables
preemption which serializes the access to the per-CPU data.
But decompression can take quite some time depending on the size. The
observed preempt disabled times in real world scenarios went up to 8ms,
causing massive wakeup latencies. This happens on all CPUs as the
decompression is fully parallelized.
Replace the implicit preemption control with an explicit local lock.
This allows RT kernels to substitute it with a real per CPU lock, which
serializes the access but keeps the code section preemptible. On non RT
kernels this maps to preempt_disable() as before, i.e. no functional
change.
[ bigeasy: Use local_lock(), patch description]
Reported-by: Alexander Stein <[email protected]>
Signed-off-by: Julia Cartwright <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Alexander Stein <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc mount API conversions from Al Viro:
"Conversions to new API for shmem and friends and for mount_mtd()-using
filesystems.
As for the rest of the mount API conversions in -next, some of them
belong in the individual trees (e.g. binderfs one should definitely go
through android folks, after getting redone on top of their changes).
I'm going to drop those and send the rest (trivial ones + stuff ACKed
by maintainers) in a separate series - by that point they are
independent from each other.
Some stuff has already migrated into individual trees (NFS conversion,
for example, or FUSE stuff, etc.); those presumably will go through
the regular merges from corresponding trees."
* 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Make fs_parse() handle fs_param_is_fd-type params better
vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API
shmem_parse_one(): switch to use of fs_parse()
shmem_parse_options(): take handling a single option into a helper
shmem_parse_options(): don't bother with mpol in separate variable
shmem_parse_options(): use a separate structure to keep the results
make shmem_fill_super() static
make ramfs_fill_super() static
devtmpfs: don't mix {ramfs,shmem}_fill_super() with mount_single()
vfs: Convert squashfs to use the new mount API
mtd: Kill mount_mtd()
vfs: Convert jffs2 to use the new mount API
vfs: Convert cramfs to use the new mount API
vfs: Convert romfs to use the new mount API
vfs: Add a single-or-reconfig keying to vfs_get_super()
|
|
Convert the squashfs filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.
See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: David Howells <[email protected]>
cc: Phillip Lougher <[email protected]>
cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
Even though some filesystems are read-only, fill in the
timestamps to reflect the on-disk representation.
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Acked-By: Tigran Aivazian <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
|
|
Based on 1 normalized pattern(s):
this work is licensed under the terms of the gnu gpl version 2 see
the copying file in the top level directory
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 35 file(s).
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kate Stewart <[email protected]>
Reviewed-by: Enrico Weigelt <[email protected]>
Reviewed-by: Allison Randal <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 or at your option any
later version this program is distributed in the hope that it will
be useful but without any warranty without even the implied warranty
of merchantability or fitness for a particular purpose see the gnu
general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 44 file(s).
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Richard Fontana <[email protected]>
Reviewed-by: Allison Randal <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|