Age | Commit message (Collapse) | Author | Files | Lines |
|
Make bb_check_counter per group, so each group has the same chance
to be checked, which can expose errors more easily.
Signed-off-by: Chunguang Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
The comment near mb_buddy_adjust_border seems meaningless, just
clear it.
Signed-off-by: Chunguang Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Consider a situation when a filesystem was uncleanly shutdown and the
orphan list is not empty and a read-only mount is attempted. The orphan
list cleanup during mount will fail with:
ext4_check_bdev_write_error:193: comm mount: Error while async write back metadata
This happens because sbi->s_bdev_wb_err is not initialized when mounting
the filesystem in read only mode and so ext4_check_bdev_write_error()
falsely triggers.
Initialize sbi->s_bdev_wb_err unconditionally to avoid this problem.
Fixes: bc71726c7257 ("ext4: abort the filesystem if failed to async write metadata buffer")
Cc: [email protected]
Signed-off-by: Zhang Xiaoxu <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Rename system_blks to s_system_blks inside ext4_sb_info, keep
the naming rules consistent with other variables, which is
convenient for code reading and writing.
Signed-off-by: Chunguang Xu <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Ritesh Harjani <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Rename journal_dev to s_journal_dev inside ext4_sb_info, keep
the naming rules consistent with other variables, which is
convenient for code reading and writing.
Signed-off-by: Chunguang Xu <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Ritesh Harjani <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
the struct name was modified long ago, but the comment still
use struct handle_s.
Signed-off-by: Hui Su <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/20200922171231.GA53120@rlk
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Missing trace exit in exception path of ext4_sync_file and
ext4_ind_map_blocks.
Signed-off-by: Zhang Qilong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
In case if the file already has underlying blocks/extents allocated
then we don't need to start a journal txn and can directly return
the underlying mapping. Currently ext4_iomap_begin() is used by
both DAX & DIO path. We can check if the write request is an
overwrite & then directly return the mapping information.
This could give a significant perf boost for multi-threaded writes
specially random overwrites.
On PPC64 VM with simulated pmem(DAX) device, ~10x perf improvement
could be seen in random writes (overwrite). Also bcoz this optimizes
away the spinlock contention during jbd2 slab cache allocation
(jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed.
Reported-by: Dan Williams <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Ritesh Harjani <[email protected]>
Link: https://lore.kernel.org/r/88e795d8a4d5cd22165c7ebe857ba91d68d8813e.1600401668.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Remove including <linux/version.h> that don't need it.
Signed-off-by: Tian Tao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
The race condition could cause the persisted superblock checksum
to not match the contents of the superblock, causing the
superblock to be considered corrupt.
An example of the race follows. A first thread is interrupted in the
middle of a checksum calculation. Then, another thread changes the
superblock, calculates a new checksum, and sets it. Then, the first
thread resumes and sets the checksum based on the older superblock.
To fix, serialize the superblock checksum calculation using the buffer
header lock. While a spinlock is sufficient, the buffer header is
already there and there is precedent for locking it (e.g. in
ext4_commit_super).
Tested the patch by booting up a kernel with the patch, creating
a filesystem and some files (including some orphans), and then
unmounting and remounting the file system.
Cc: [email protected]
Signed-off-by: Constantine Sapuntzakis <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
When ext4_journal_get_write_access() fails, we should
terminate the execution flow and release n_group_desc,
iloc.bh, dind and gdb_bh.
Cc: [email protected]
Signed-off-by: Dinghao Liu <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
inline_data is mutually exclusive to DAX so enabling both of them triggers
the following issue:
------------------------------------------
# mkfs.ext4 -F -O inline_data /dev/pmem1
...
# mount /dev/pmem1 /mnt
# echo 'test' >/mnt/file
# lsattr -l /mnt/file
/mnt/file Inline_Data
# xfs_io -c "chattr +x" /mnt/file
# xfs_io -c "lsattr -v" /mnt/file
[dax] /mnt/file
# umount /mnt
# mount /dev/pmem1 /mnt
# cat /mnt/file
cat: /mnt/file: Numerical result out of range
------------------------------------------
Fixes: b383a73f2b83 ("fs/ext4: Introduce DAX inode flag")
Signed-off-by: Xiao Yang <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
The 'handle' argument is not used for anything so simply remove it.
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: Ritesh Harjani <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Fields s_free_blocks_count_hi, s_r_blocks_count_hi and s_blocks_count_hi
are not valid if EXT4_FEATURE_INCOMPAT_64BIT is not enabled and should be
treated as zeroes.
Signed-off-by: Petr Malat <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Delete repeated words in fs/ext4/.
{the, this, of, we, after}
Also change spelling of "xttr" in inline.c to "xattr" in 2 places.
Signed-off-by: Randy Dunlap <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
ext4 uses generic_file_read_iter(), which already supports this.
Cc: Theodore Ts'o <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
ext4_unregister_sysfs() only deletes the kobject. The reference to it
needs to be put separately, like ext4_put_super() does.
This addresses the syzbot report
"memory leak in kobject_set_name_vargs (3)"
(https://syzkaller.appspot.com/bug?extid=9f864abad79fae7c17e1).
Reported-by: [email protected]
Fixes: 72ba74508b28 ("ext4: release sysfs kobject when failing to enable quotas on mount")
Cc: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
ext4_mb_discard_group_preallocations() can be releasing group lock with
preallocations accumulated on its local list. Thus although
discard_pa_seq was incremented and concurrent allocating processes will
be retrying allocations, it can happen that premature ENOSPC error is
returned because blocks used for preallocations are not available for
reuse yet. Make sure we always free locally accumulated preallocations
before releasing group lock.
Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
Signed-off-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
As we test disk offline/online with running fsstress, we find fsstress
process is keeping running state.
kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
....
kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
ext4_mb_new_blocks
repeat:
ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
freed = ext4_mb_discard_preallocations
ext4_mb_discard_group_preallocations
this_cpu_inc(discard_pa_seq);
---> freed == 0
seq_retry = ext4_get_discard_pa_seq_sum
for_each_possible_cpu(__cpu)
__seq += per_cpu(discard_pa_seq, __cpu);
if (seq_retry != *seq) {
*seq = seq_retry;
ret = true;
}
As we see seq_retry is sum of discard_pa_seq every cpu, if
ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
cpu maybe increase one, so condition "seq_retry != *seq" have always
been met.
Ritesh Harjani suggest to in ext4_mb_discard_group_preallocations function we
only increase discard_pa_seq when there is some PA to free.
Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
Signed-off-by: Ye Bin <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Ritesh Harjani <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
After moving ext4's bmap to iomap interface, swapon functionality
on files created using fallocate (which creates unwritten extents) are
failing. This is since iomap_bmap interface returns 0 for unwritten
extents and thus generic_swapfile_activate considers this as holes
and hence bail out with below kernel msg :-
[340.915835] swapon: swapfile has holes
To fix this we need to implement ->swap_activate aops in ext4
which will use ext4_iomap_report_ops. Since we only need to return
the list of extents so ext4_iomap_report_ops should be enough.
Cc: [email protected]
Reported-by: Yuxuan Shui <[email protected]>
Fixes: ac58e4fb03f ("ext4: move ext4 bmap to use iomap infrastructure")
Signed-off-by: Ritesh Harjani <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Check that alloc and free types of functions match each other.
Signed-off-by: Denis Efremov <[email protected]>
Signed-off-by: Julia Lawall <[email protected]>
|
|
A previous commit changed the notification mode from true/false to an
int, allowing notify-no, notify-yes, or signal-notify. This was
backwards compatible in the sense that any existing true/false user
would translate to either 0 (on notification sent) or 1, the latter
which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.
Clean this up properly, and define a proper enum for the notification
mode. Now we have:
- TWA_NONE. This is 0, same as before the original change, meaning no
notification requested.
- TWA_RESUME. This is 1, same as before the original change, meaning
that we use TIF_NOTIFY_RESUME.
- TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
notification.
Clean up all the callers, switching their 0/1/false/true to using the
appropriate TWA_* mode for notifications.
Fixes: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
For the case where read-ahead is disabled on the file, or if the cgroup
is congested, ensure that we can at least do 1 page of read-ahead to
make progress on the read in an async fashion. This could potentially be
larger, but it's not needed in terms of functionality, so let's error on
the side of caution as larger counts of pages may run into reclaim
issues (particularly if we're congested).
This makes sure we're not hitting the potentially sync ->readpage() path
for IO that is marked IOCB_WAITQ, which could cause us to block. It also
means we'll use the same path for IO, regardless of whether or not
read-ahead happens to be disabled on the lower level device.
Acked-by: Johannes Weiner <[email protected]>
Reported-by: Matthew Wilcox (Oracle) <[email protected]>
Reported-by: Hao_Xu <[email protected]>
[axboe: updated for new ractl API]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Once we've copied some data for an iocb that is marked with IOCB_WAITQ,
we should no longer attempt to async lock a new page. Instead make sure
we return the copied amount, and let the caller retry, instead of
returning -EIOCBQUEUED for a new page.
This should only be possible with read-ahead disabled on the below
device, and multiple threads racing on the same file. Haven't been able
to reproduce on anything else.
Cc: [email protected] # v5.9
Fixes: 1a0a7853b901 ("mm: support async buffered reads in generic_file_buffered_read()")
Reported-by: Kent Overstreet <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
- cgroup improvements for 'perf stat', allowing for compact
specification of events and cgroups in the command line.
- Support per thread topdown metrics in 'perf stat'.
- Support sample-read topdown metric group in 'perf record'
- Show start of latency in addition to its start in 'perf sched
latency'.
- Add min, max to 'perf script' futex-contention output, in addition to
avg.
- Allow usage of 'perf_event_attr->exclusive' attribute via the new
':e' event modifier.
- Add 'snapshot' command to 'perf record --control', using it with
Intel PT.
- Support FIFO file names as alternative options to 'perf record
--control'.
- Introduce branch history "streams", to compare 'perf record' runs
with 'perf diff' based on branch records and report hot streams.
- Support PE executable symbol tables using libbfd, to profile, for
instance, wine binaries.
- Add filter support for option 'perf ftrace -F/--funcs'.
- Allow configuring the 'disassembler_style' 'perf annotate' knob via
'perf config'
- Update CascadelakeX and SkylakeX JSON vendor events files.
- Add support for parsing perchip/percore JSON vendor events.
- Add power9 hv_24x7 core level metric events.
- Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD
zen1.
- Enable Family 19h users by matching Zen2 AMD vendor events.
- Use debuginfod in 'perf probe' when required debug files not found
locally.
- Display negative tid in non-sample events in 'perf script'.
- Make GTK2 support opt-in
- Add build test with GTK+
- Add missing -lzstd to the fast path feature detection
- Add scripts to auto generate 'mmap', 'mremap' string<->id tables for
use in 'perf trace'.
- Show python test script in verbose mode.
- Fix uncore metric expressions
- Msan uninitialized use fixes.
- Use condition variables in 'perf bench numa'
- Autodetect python3 binary in systems without python2.
- Support md5 build ids in addition to sha1.
- Add build id 'perf test' regression test.
- Fix printable strings in python3 scripts.
- Fix off by ones in 'perf trace' in arches using libaudit.
- Fix JSON event code for events referencing std arch events.
- Introduce 'perf test' shell script for Arm CoreSight testing.
- Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata
event and in 'perf test tsc'.
- 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores",
fixes and documentation update.
- Fix usage of reloc_sym in 'perf probe' when using both kallsyms and
debuginfo files.
- Do not print 'Metric Groups:' unnecessarily in 'perf list'
- Refcounting fixes in the event parsing code.
- Add expand cgroup event 'perf test' entry.
- Fix out of bounds CPU map access when handling armv8_pmu events in
'perf stat'.
- Add build-id injection 'perf bench' benchmark.
- Enter namespace when reading build-id in 'perf inject'.
- Do not load map/dso when injecting build-id speeding up the 'perf
inject' process.
- Add --buildid-all option to avoid processing all samples, just the
mmap metadata events.
- Add feature test to check if libbfd has buildid support
- Add 'perf test' entry for PE binary format support.
- Fix typos in power8 PMU vendor events JSON files.
- Hide libtraceevent non API functions.
* tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (113 commits)
perf c2c: Update documentation for metrics reorganization
perf c2c: Add metrics "RMT Load Hit"
perf c2c: Correct LLC load hit metrics
perf c2c: Change header for LLC local hit
perf c2c: Use more explicit headers for HITM
perf c2c: Change header from "LLC Load Hitm" to "Load Hitm"
perf c2c: Organize metrics based on memory hierarchy
perf c2c: Display "Total Stores" as a standalone metrics
perf c2c: Display the total numbers continuously
perf bench: Use condition variables in numa.
perf jevents: Fix event code for events referencing std arch events
perf diff: Support hot streams comparison
perf streams: Report hot streams
perf streams: Calculate the sum of total streams hits
perf streams: Link stream pair
perf streams: Compare two streams
perf streams: Get the evsel_streams by evsel_idx
perf streams: Introduce branch history "streams"
perf intel-pt: Improve PT documentation slightly
perf tools: Add support for exclusive groups/events
...
|
|
Pull rdma updates from Jason Gunthorpe:
"A usual cycle for RDMA with a typical mix of driver and core subsystem
updates:
- Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
hns, usnic, qib, qedr, cxgb4, hns, bnxt_re
- Various rtrs fixes and updates
- Bug fix for mlx4 CM emulation for virtualization scenarios where
MRA wasn't working right
- Use tracepoints instead of pr_debug in the CM code
- Scrub the locking in ucma and cma to close more syzkaller bugs
- Use tasklet_setup in the subsystem
- Revert the idea that 'destroy' operations are not allowed to fail
at the driver level. This proved unworkable from a HW perspective.
- Revise how the umem API works so drivers make fewer mistakes using
it
- XRC support for qedr
- Convert uverbs objects RWQ and MW to new the allocation scheme
- Large queue entry sizes for hns
- Use hmm_range_fault() for mlx5 On Demand Paging
- uverbs APIs to inspect the GID table instead of sysfs
- Move some of the RDMA code for building large page SGLs into
lib/scatterlist"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
RDMA/ucma: Fix use after free in destroy id flow
RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
RDMA: Explicitly pass in the dma_device to ib_register_device
lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
IB/mlx4: Convert rej_tmout radix-tree to XArray
RDMA/rxe: Fix bug rejecting all multicast packets
RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
RDMA/rxe: Remove duplicate entries in struct rxe_mr
IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
IB/rdmavt: Fix sizeof mismatch
MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
RDMA/umem: Move to allocate SG table from pages
lib/scatterlist: Add support in dynamic allocation of SG table from pages
tools/testing/scatterlist: Show errors in human readable form
tools/testing/scatterlist: Rejuvenate bit-rotten test
RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
RDMA/uverbs: Expose the new GID query API to user space
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Boris Brezillon:
- Fix DAA for the pre-reserved address case
- Fix an error path in the cadence driver
* tag 'i3c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: Fix error return in cdns_i3c_master_probe()
i3c: master: fix for SETDASA and DAA process
i3c: master add i3c_master_attach_boardinfo to preserve boardinfo
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD updates from Richard Weinberger:
"NAND core changes:
- Drop useless 'depends on' in Kconfig
- Add an extra level in the Kconfig hierarchy
- Trivial spellings
- Dynamic allocation of the interface configurations
- Dropping the default ONFI timing mode
- Various cleanup (types, structures, naming, comments)
- Hide the chip->data_interface indirection
- Add the generic rb-gpios property
- Add the ->choose_interface_config() hook
- Introduce nand_choose_best_sdr_timings()
- Use default values for tPROG_max and tBERS_max
- Avoid redefining tR_max and tCCS_min
- Add a helper to find the closest ONFI mode
- bcm63xx MTD parsers: simplify CFE detection
Raw NAND controller drivers changes:
- fsl-upm: Deprecation of specific DT properties
- fsl_upm: Driver rework and cleanup in favor of ->exec_op()
- Ingenic: Cleanup ARRAY_SIZE() vs sizeof() use
- brcmnand: ECC error handling on EDU transfers
- brcmnand: Don't default to EDU transfers
- qcom: Set BAM mode only if not set already
- qcom: Avoid write to unavailable register
- gpio: Driver rework in favor of ->exec_op()
- tango: ->exec_op() conversion
- mtk: ->exec_op() conversion
Raw NAND chip drivers changes:
- toshiba: Implement ->choose_interface_config() for TH58NVG2S3HBAI4
- toshiba: Implement ->choose_interface_config() for TC58NVG0S3E
- toshiba: Implement ->choose_interface_config() for TC58TEG5DCLTA00
- hynix: Implement ->choose_interface_config() for H27UCG8T2ATR-BC
HyperBus changes:
- DMA support for TI's AM654 HyperBus controller driver.
- HyperBus frontend driver for Renesas RPC-IF driver.
SPI NOR core changes:
- Support for Winbond w25q64jwm flash
- Enable 4K sector support for mx25l12805d
SPI NOR controller drivers changes:
- intel-spi Add Alder Lake-S PCI ID
MTD Core changes:
- mtdoops: Don't run panic write twice
- mtdconcat: Correctly handle panic write
- Use DEFINE_SHOW_ATTRIBUTE"
* tag 'mtd/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (76 commits)
mtd: hyperbus: Fix build failure when only RPCIF_HYPERBUS is enabled
mtd: hyperbus: add Renesas RPC-IF driver
Revert "mtd: spi-nor: Prefer asynchronous probe"
mtd: parsers: bcm63xx: Do not make it modular
mtd: spear_smi: Enable compile testing
mtd: maps: vmu-flash: fix typos for struct memcard
mtd: physmap: Add Baikal-T1 physically mapped ROM support
mtd: maps: vmu-flash: simplify the return expression of probe_maple_vmu
mtd: onenand: simplify the return expression of onenand_transfer_auto_oob
mtd: rawnand: cadence: remove a redundant dev_err call
mtd: rawnand: ams-delta: Fix non-OF build warning
mtd: rawnand: Don't overwrite the error code from nand_set_ecc_soft_ops()
mtd: rawnand: Introduce nand_set_ecc_on_host_ops()
mtd: rawnand: atmel: Check return values for nand_read_data_op
mtd: rawnand: vf610: Remove unused function vf610_nfc_transfer_size()
mtd: rawnand: qcom: Simplify with dev_err_probe()
mtd: rawnand: marvell: Fix and update kerneldoc
mtd: rawnand: marvell: Simplify with dev_err_probe()
mtd: rawnand: gpmi: Simplify with dev_err_probe()
mtd: rawnand: atmel: Simplify with dev_err_probe()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Fix Kconfig typo "acces" -> "access" (Colin Ian King)
- Use dev_error_probe() to simplify the error handling on imx and imx8
platforms (Anson Huang)
- Use dedicated kobj_to_dev() instead of container_of() in the sysfs
core code (Tian Tao)
- Fix coding style by adding braces to a one line conditional statement
on rcar (Geert Uytterhoeven)
- Add DT binding documentation for the r8a774e1 platform and update the
Kconfig description supporting RZ/G2 SoCs (Lad Prabhakar)
- Simplify the return expression of stm_thermal_prepare on the stm32
platform (Qinglang Miao)
- Fix the unit in the function documentation for the idle injection
cooling device (Zhuguang Qing)
- Remove an unecessary mutex_init() in the core code (Qinglang Miao)
- Add support for keep alive events in the core code and the specific
int340x (Srinivas Pandruvada)
- Remove unused thermal zone variable in devfreq and cpufreq cooling
devices (Zhuguang Qing)
- Add the A100's THS controller support (Yangtao Li)
- Add power management on the omap3's bandgap sensor (Adam Ford)
- Fix a missing nlmsg_free in the netlink core error path (Jing
Xiangfeng)
* tag 'thermal-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal: core: Adding missing nlmsg_free() in thermal_genl_sampling_temp()
thermal: ti-soc-thermal: Enable addition power management
thermal: sun8i: Add A100's THS controller support
thermal: sun8i: add TEMP_CALIB_MASK for calibration data in sun50i_h6_ths_calibrate
dt-bindings: thermal: sun8i: Add binding for A100's THS controller
thermal: cooling: Remove unused variable *tz
thermal: int340x: Add keep alive response method
thermal: core: Add new event for sending keep alive notifications
thermal: int340x: Provide notification for OEM variable change
thermal: core: remove unnecessary mutex_init()
thermal/idle_inject: Fix comment of idle_duration_us and name of latency_ns
thermal: Kconfig: Update description for RCAR_GEN3_THERMAL config
thermal: stm32: simplify the return expression of stm_thermal_prepare()
dt-bindings: thermal: rcar-gen3-thermal: Add r8a774e1 support
thermal: rcar_thermal: Add missing braces to conditional statement
thermal: Use kobj_to_dev() instead of container_of()
thermal: imx8mm: Use dev_err_probe() to simplify error handling
thermal: imx: Use dev_err_probe() to simplify error handling
drivers: thermal: Kconfig: fix spelling mistake "acces" -> "access"
|
|
__io_queue_proc() is used by both, poll reqs and apoll. Don't use
req->poll.events to copy poll mask because for apoll it aliases with
private data of the request.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Make sure the async io-wq workers inherit the loginuid and sessionid from
the original task, and restore them to unset once we're done with the
async work item.
While at it, disable the ability for kernel threads to write to their own
loginuid.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Even though we place the req_issued and req_complete in separate
cachelines, there's considerable overhead in doing the atomics
particularly on the completion side.
Get rid of having the two counters, and just use a percpu_counter for
this. That's what it was made for, after all. This considerably
reduces the overhead in __io_free_req().
Signed-off-by: Jens Axboe <[email protected]>
|
|
This avoids doing a copy for each new async IO, if some parts of the
io_identity has changed. We avoid reference counting for the normal
fast path of nothing ever changing.
Signed-off-by: Jens Axboe <[email protected]>
|
|
This is, by definition, a per-task structure. So store it in the
task context, instead of doing carrying it in each io_kiocb. We're being
a bit inefficient if members have changed, as that requires an alloc and
copy of a new io_identity struct. The next patch will fix that up.
Signed-off-by: Jens Axboe <[email protected]>
|
|
If the io_identity doesn't completely match the task, then create a
copy of it and use that. The existing copy remains valid until the last
user of it has gone away.
This also changes the personality lookup to be indexed by io_identity,
instead of creds directly.
Signed-off-by: Jens Axboe <[email protected]>
|
|
io-wq contains a pointer to the identity, which we just hold in io_kiocb
for now. This is in preparation for putting this outside io_kiocb. The
only exception is struct files_struct, which we'll need different rules
for to avoid a circular dependency.
No functional changes in this patch.
Signed-off-by: Jens Axboe <[email protected]>
|
|
We solely rely on work->work_flags now, so use that for proper checking
and clearing/dropping of various identity items.
Signed-off-by: Jens Axboe <[email protected]>
|
|
We have a number of bits that decide what context to inherit. Set up
io-wq flags for these instead. This is in preparation for always having
the various members set, but not always needing them for all requests.
No intended functional changes in this patch.
Signed-off-by: Jens Axboe <[email protected]>
|
|
There was an assumption that kthread_create_on_node() would properly set
NUMA affinities in terms of CPUs allowed, but it doesn't. Make sure we
do this when creating an io-wq context on NUMA.
Cc: [email protected]
Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
syzbot reports the following crash:
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 8927 Comm: syz-executor.3 Not tainted 5.9.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline]
RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline]
RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline]
RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553
Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c
RSP: 0018:ffffc90009137d68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000
RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005
RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37
R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000
FS: 00007f4266f3c700(0000) GS:ffff8880ae500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000118c000 CR3: 000000008e57d000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45de59
Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f4266f3bc78 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 00000000000083c0 RCX: 000000000045de59
RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000005
RBP: 000000000118bf68 R08: 0000000000000000 R09: 0000000000000000
R10: 40000000000000a1 R11: 0000000000000246 R12: 000000000118bf2c
R13: 00007fff2fa4f12f R14: 00007f4266f3c9c0 R15: 000000000118bf2c
Modules linked in:
---[ end trace 2a40a195e2d5e6e6 ]---
RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline]
RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline]
RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline]
RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553
Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c
RSP: 0018:ffffc90009137d68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000
RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005
RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37
R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000
FS: 00007f4266f3c700(0000) GS:ffff8880ae400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000074a918 CR3: 000000008e57d000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
which is a copy of fget failure condition jumping to cleanup, but the
cleanup requires ctx->file_data to be assigned. Assign it when setup,
and ensure that we clear it again for the error path exit.
Fixes: 5398ae698525 ("io_uring: clean file_data access in files_register")
Reported-by: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
This reverts commit 738277adc81929b3e7c9b63fec6693868cc5f931.
This change didn't make a lot of sense, and as Linus reports, it actually
fails on clang:
/tmp/io_uring-dd40c4.s:26476: Warning: ignoring changed section
attributes for .data..read_mostly
The arrays are already marked const so, by definition, they are not
just read-mostly, they are read-only.
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
REQ_F_COMP_LOCKED is used and implemented in a buggy way. The problem is
that the flag is set before io_put_req() but not cleared after, and if
that wasn't the final reference, the request will be freed with the flag
set from some other context, which may not hold a spinlock. That means
possible races with removing linked timeouts and unsynchronised
completion (e.g. access to CQ).
Instead of fixing REQ_F_COMP_LOCKED, kill the flag and use
task_work_add() to move such requests to a fresh context to free from
it, as was done with __io_free_req_finish().
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
io_req_clean_work() checks REQ_F_COMP_LOCK to pass this two layers up.
Move the check up into __io_free_req(), so at least it doesn't looks so
ugly and would facilitate further changes.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Move io_put_req() in io_poll_task_handler() from under spinlock. This
eliminates the need to use REQ_F_COMP_LOCKED, at the expense of
potentially having to grab the lock again. That's still a better trade
off than relying on the locked flag.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If a request had REQ_F_LINK_TIMEOUT it would've been cleared in
__io_kill_linked_timeout() by the time of __io_fail_links(), so no need
to care about it.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
__io_kill_linked_timeout() sets REQ_F_COMP_LOCKED for a linked timeout
even if it can't cancel it, e.g. it's already running. It not only races
with io_link_timeout_fn() for ->flags field, but also leaves the flag
set and so io_link_timeout_fn() may find it and decide that it holds the
lock. Hopefully, the second problem is potential.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
An incorrect sizeof() is being used, sizeof(file_data->table) is not
correct, it should be sizeof(*file_data->table).
Fixes: 5398ae698525 ("io_uring: clean file_data access in files_register")
Signed-off-by: Colin Ian King <[email protected]>
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Signed-off-by: Jens Axboe <[email protected]>
|
|
Remove including <linux/version.h> that don't need it.
Signed-off-by: Tian Tao <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Don't populate const arrays on the stack but instead make them
static. Makes the object code smaller by 57 bytes.
Before:
text data bss dec hex filename
173256 38016 192 211464 33a08 sound/pci/hda/patch_ca0132.o
After:
text data bss dec hex filename
172879 38336 192 211407 339cf sound/pci/hda/patch_ca0132.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|