Age | Commit message (Collapse) | Author | Files | Lines |
|
On non-OF systems spi->controlled_data may be NULL. This causes a NULL
pointer derefence on dm365-evm.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Cc: [email protected]
|
|
The kernel unnecessarily prevents late microcode loading when SMT is
disabled. It should be safe to allow it if all the primary threads are
online.
Signed-off-by: Josh Poimboeuf <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-08-10
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix cpumap and devmap on teardown as they're under RCU context
and won't have same assumption as running under NAPI protection,
from Jesper.
2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had
a bug where socket error was not propagated correctly, from Daniel.
3) Fix incompatible libbpf header license for BTF code and match it
before it gets officially released with the rest of libbpf which
is LGPL-2.1, from Martin.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
When enumerating snapshots, the last few bytes of the final
snapshot could be left off since we were miscalculating the
length returned (leaving off the sizeof struct SRV_SNAPSHOT_ARRAY)
See MS-SMB2 section 2.2.32.2. In addition fixup the length used
to allow smaller buffer to be passed in, in order to allow
returning the size of the whole snapshot array more easily.
Sample userspace output with a kernel patched with this
(mounted to a Windows volume with two snapshots).
Before this patch, the second snapshot would be missing a
few bytes at the end.
~/cifs-2.6# ~/enum-snapshots /mnt/file
press enter to issue the ioctl to retrieve snapshot information ...
size of snapshot array = 102
Num snapshots: 2 Num returned: 2 Array Size: 102
Snapshot 0:@GMT-2018.06.30-19.34.17
Snapshot 1:@GMT-2018.06.30-19.33.37
CC: Stable <[email protected]>
Signed-off-by: Steve French <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
|
|
Change smb2_queryfs() to use a Create/QueryInfo/Close compound request.
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
Reviewed-by: Paulo Alcantara <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
|
|
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
Reviewed-by: Paulo Alcantara <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
|
|
RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq. That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all. Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.
We could try and be careful in the (few) places where that could
happen. Or we could just make the final dput() invalidate the damn
thing, unhashed or not. The latter is much simpler and easier to
backport, so let's do it that way.
Reported-by: "Dae R. Jeong" <[email protected]>
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
|
|
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.
Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.
It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...
Reported-by: Oleg Nesterov <[email protected]>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
|
|
Commit 33679a50370d ("MIPS: uasm: Remove needless ISA abstraction")
removed use of the MIPS_ISA preprocessor macro, but left a couple of
unused definitions of it behind.
Remove the dead code.
Signed-off-by: Paul Burton <[email protected]>
|
|
mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test. Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files. One
of those files gets closed. Total refcount of mnt is 2. On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69. There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting. It observes the
decrement of component #69, but not the increment of component #0. As the
result, the total it gets is not 1 as it should've been - it's 0. At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down. However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.
It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput(). IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.
Reported-by: Jann Horn <[email protected]>
Tested-by: Jann Horn <[email protected]>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
|
|
sparse complains:
drivers/block/null_blk_main.c:816:24: sparse: context imbalance in 'null_insert_page' - unexpected unlock
Fix it by adding the necessary annotations to the function.
Signed-off-by: Jens Axboe <[email protected]>
|
|
All announced Threadripper 29xx models have a temperature offset of
27 degrees C. Simplify temperature offset table to match all 29xx
Threadripper models with a single entry. Also simplify the table to match
all 19xx Threadripper models with a single entry. This effectively drops
entries for Threadripper 1910/1920/1950 which never saw the light of day.
Cc: Michael Larabel <[email protected]>
Cc: Clemens Ladisch <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
Jesper Dangaard Brouer says:
====================
Removing entries from cpumap and devmap, goes through a number of
syncronization steps to make sure no new xdp_frames can be enqueued.
But there is a small chance, that xdp_frames remains which have not
been flushed/processed yet. Flushing these during teardown, happens
from RCU context and not as usual under RX NAPI context.
The optimization introduced in commt 389ab7f01af9 ("xdp: introduce
xdp_return_frame_rx_napi"), missed that the flush operation can also
be called from RCU context. Thus, we cannot always use the
xdp_return_frame_rx_napi call, which take advantage of the protection
provided by XDP RX running under NAPI protection.
The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
adjusted to easier reproduce (verified by Red Hat QA).
====================
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
Like cpumap teardown, the devmap teardown code also flush remaining
xdp_frames, via bq_xmit_all() in case map entry is removed. The code
can call xdp_return_frame_rx_napi, from the the wrong context, in-case
ndo_xdp_xmit() fails.
Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi")
Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
The teardown race in cpumap is really hard to reproduce. These changes
makes it easier to reproduce, for QA.
The --stress-mode now have a case of a very small queue size of 8, that helps
to trigger teardown flush to encounter a full queue, which results in calling
xdp_return_frame API, in a non-NAPI protect context.
Also increase MAX_CPUS, as my QA department have larger machines than me.
Tested-by: Jean-Tsung Hsiao <[email protected]>
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
When removing a cpumap entry, a number of syncronization steps happen.
Eventually the teardown code __cpu_map_entry_free is invoked from/via
call_rcu.
The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
The issues is that the teardown code is not running in the RX NAPI
code path. Thus, it is not allowed to invoke the NAPI variant of
xdp_return_frame.
This bug was found and triggered by using the --stress-mode option to
the samples/bpf program xdp_redirect_cpu. It is hard to trigger,
because the ptr_ring have to be full and cpumap bulk queue max
contains 8 packets, and a remote CPU is racing to empty the ptr_ring
queue.
Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi")
Tested-by: Jean-Tsung Hsiao <[email protected]>
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
When an application's iops has exceeded its cgroup's iops limit, surely it
is throttled and kernel will set a timer for dispatching, thus IO latency
includes the delay.
However, the dispatch delay which is calculated by the limit and the
elapsed jiffies is suboptimal. As the dispatch delay is only calculated
once the application's iops is (iops limit + 1), it doesn't need to wait
any longer than the remaining time of the current slice.
The difference can be proved by the following fio job and cgroup iops
setting,
-----
$ echo 4 > /mnt/config/nullb/disk1/mbps # limit nullb's bandwidth to 4MB/s for testing.
$ echo "253:1 riops=100 rbps=max" > /sys/fs/cgroup/unified/cg1/io.max
$ cat r2.job
[global]
name=fio-rand-read
filename=/dev/nullb1
rw=randread
bs=4k
direct=1
numjobs=1
time_based=1
runtime=60
group_reporting=1
[file1]
size=4G
ioengine=libaio
iodepth=1
rate_iops=50000
norandommap=1
thinktime=4ms
-----
wo patch:
file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7-66-gedfc
Starting 1 process
read: IOPS=99, BW=400KiB/s (410kB/s)(23.4MiB/60001msec)
slat (usec): min=10, max=336, avg=27.71, stdev=17.82
clat (usec): min=2, max=28887, avg=5929.81, stdev=7374.29
lat (usec): min=24, max=28901, avg=5958.73, stdev=7366.22
clat percentiles (usec):
| 1.00th=[ 4], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 4],
| 30.00th=[ 4], 40.00th=[ 4], 50.00th=[ 6], 60.00th=[11731],
| 70.00th=[11863], 80.00th=[11994], 90.00th=[12911], 95.00th=[22676],
| 99.00th=[23725], 99.50th=[23987], 99.90th=[23987], 99.95th=[25035],
| 99.99th=[28967]
w/ patch:
file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7-66-gedfc
Starting 1 process
read: IOPS=100, BW=400KiB/s (410kB/s)(23.4MiB/60005msec)
slat (usec): min=10, max=155, avg=23.24, stdev=16.79
clat (usec): min=2, max=12393, avg=5961.58, stdev=5959.25
lat (usec): min=23, max=12412, avg=5985.91, stdev=5951.92
clat percentiles (usec):
| 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 4], 20.00th=[ 4],
| 30.00th=[ 4], 40.00th=[ 5], 50.00th=[ 47], 60.00th=[11863],
| 70.00th=[11994], 80.00th=[11994], 90.00th=[11994], 95.00th=[11994],
| 99.00th=[11994], 99.50th=[11994], 99.90th=[12125], 99.95th=[12125],
| 99.99th=[12387]
Signed-off-by: Liu Bo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This new symbol needs to be in the workaround-list for buggy
binutils, otherwise the build with gcc-4.6 fails.
Fixes: 39d668e04eda ('x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit')
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linux-Next Mailing List <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a performance regression in arm64 NEON crypto as well as a
crash in x86 aegis/morus on unsupported CPUs"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/aegis,morus - Fix and simplify CPUID checks
crypto: arm64 - revert NEON yield for fast AEAD implementations
|
|
Pull networking fixes from David Miller:
1) The real fix for the ipv6 route metric leak Sabrina was seeing, from
Cong Wang.
2) Fix syzbot triggers AF_PACKET v3 ring buffer insufficient room
conditions, from Willem de Bruijn.
3) vsock can reinitialize active work struct, fix from Cong Wang.
4) RXRPC keepalive generator can wedge a cpu, fix from David Howells.
5) Fix locking in AF_SMC ioctl, from Ursula Braun.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
dsa: slave: eee: Allow ports to use phylink
net/smc: move sock lock in smc_ioctl()
net/smc: allow sysctl rmem and wmem defaults for servers
net/smc: no shutdown in state SMC_LISTEN
net: aquantia: Fix IFF_ALLMULTI flag functionality
rxrpc: Fix the keepalive generator [ver #2]
net/mlx5e: Cleanup of dcbnl related fields
net/mlx5e: Properly check if hairpin is possible between two functions
vhost: reset metadata cache when initializing new IOTLB
llc: use refcount_inc_not_zero() for llc_sap_find()
dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()
tipc: fix an interrupt unsafe locking scenario
vsock: split dwork to avoid reinitializations
net: thunderx: check for failed allocation lmac->dmacs
cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts
packet: refine ring v3 block size test to hold one frame
ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit
ipv6: fix double refcount of fib6_metrics
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1056543 ("Missing break in switch")
Addresses-Coverity-ID: 1056544 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
During ipmi stress tests we see occasional failure of transactions
at the boot time. This happens in the case of a I2C_M_RECV_LEN
transactions, when the read transfer completes (with the initial
read length of 34) before the driver gets a chance to handle interrupts.
The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN
transactions. The length is updated during the first interrupt, and the
buffer contents are only copied during subsequent interrupts. In case of
just one interrupt, we will complete the transaction without copying
out the bytes from RX fifo.
Update the code to drain the RX fifo after the length update,
so that the transaction completes correctly in all cases.
Signed-off-by: George Cherian <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Cc: [email protected]
|
|
Several block drivers call alloc_disk() followed by put_disk() if
something fails before device_add_disk() is called without calling
blk_cleanup_queue(). Make sure that also for this scenario a request
queue is dissociated from the cgroup controller. This patch avoids
that loading the parport_pc, paride and pf drivers triggers the
following kernel crash:
BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride]
Read of size 4 at addr 0000000000000008 by task modprobe/744
Call Trace:
dump_stack+0x9a/0xeb
kasan_report+0x139/0x350
pi_init+0x42e/0x580 [paride]
pf_init+0x2bb/0x1000 [pf]
do_one_initcall+0x8e/0x405
do_init_module+0xd9/0x2f2
load_module+0x3ab4/0x4700
SYSC_finit_module+0x176/0x1a0
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Reported-by: Alexandru Moise <[email protected]>
Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17
Signed-off-by: Bart Van Assche <[email protected]>
Tested-by: Alexandru Moise <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Alexandru Moise <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Omar Sandoval <[email protected]>
Cc: Alexandru Moise <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This new function will be used in a later patch to verify whether a
queue has been dissociated from the cgroup controller before being
released.
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Omar Sandoval <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: Alexandru Moise <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 12f5b9314545 ("blk-mq: Remove generation seqeunce") removed the
only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but
did not remove the corresponding #include directives. Since these
include directives are no longer needed, remove them.
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Jianchao Wang <[email protected]>
Cc: Hannes Reinecke <[email protected]>,
Cc: Johannes Thumshirn <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently, we count the hctx as active after allocate driver tag
successfully. If a previously inactive hctx try to get tag first
time, it may fails and need to wait. However, due to the stale tag
->active_queues, the other shared-tags users are still able to
occupy all driver tags while there is someone waiting for tag.
Consequently, even if the previously inactive hctx is waked up, it
still may not be able to get a tag and could be starved.
To fix it, we count the hctx as active before try to allocate driver
tag, then when it is waiting the tag, the other shared-tag users
will reserve budget for it.
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jianchao Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In commit ed996a52c868 ("block: simplify and cleanup bvec pool
handling"), the value of the slab index is incremented by one in
bvec_alloc() after the allocation is done to indicate an index value of
0 does not need to be later freed.
bvec_nr_vecs() was not updated accordingly, and thus returns the wrong
value. Decrement idx before performing the lookup.
Fixes: ed996a52c868 ("block: simplify and cleanup bvec pool handling")
Signed-off-by: Greg Edwards <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull NVMe updates from Christoph:
"This should be the last round of NVMe updates before the 4.19 merge
window opens. It conatins support for write protected (aka read-only)
namespaces from Chaitanya, two ANA fixes from Hannes and a fabrics
fix from Tal Shorer."
* 'nvme-4.19' of git://git.infradead.org/nvme:
nvme-fabrics: fix ctrl_loss_tmo < 0 to reconnect forever
nvmet: add ns write protect support
nvme: set gendisk read only based on nsattr
nvme.h: add support for ns write protect definitions
nvme.h: fixup ANA group descriptor format
nvme: fixup crash on failed discovery
|
|
Remove the tailing backslash in macro BTREE_FLAG in btree.h
Signed-off-by: Shenghui Wang <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The pr_err statement in the code for sysfs_attatch section would run
for various error codes, which maybe confusing.
E.g,
Run the command twice:
echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
/sys/block/bcache0/bcache/attach
[the backing dev got attached on the first run]
echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
/sys/block/bcache0/bcache/attach
In dmesg, after the command run twice, we can get:
bcache: bch_cached_dev_attach() Can't attach sda6: already attached
bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\
a8df5e8be891
: cache set not found
The first statement in the message was right, but the second was
confusing.
bch_cached_dev_attach has various pr_ statements for various error
codes, except ENOENT.
After the change, rerun above command twice:
echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
/sys/block/bcache0/bcache/attach
echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
/sys/block/bcache0/bcache/attach
In dmesg we only got:
bcache: bch_cached_dev_attach() Can't attach sda6: already attached
No confusing "cache set not found" message anymore.
And for some not exist SET-UUID:
echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be898 > \
/sys/block/bcache0/bcache/attach
In dmesg we can get:
bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\
a8df5e8be898
: cache set not found
Signed-off-by: Shenghui Wang <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
allows the writeback rate to be faster if there is no I/O request on a
bcache device. It works well if there is only one bcache device attached
to the cache set. If there are many bcache devices attached to a cache
set, it may introduce performance regression because multiple faster
writeback threads of the idle bcache devices will compete the btree level
locks with the bcache device who have I/O requests coming.
This patch fixes the above issue by only permitting fast writebac when
all bcache devices attached on the cache set are idle. And if one of the
bcache devices has new I/O request coming, minimized all writeback
throughput immediately and let PI controller __update_writeback_rate()
to decide the upcoming writeback rate for each bcache device.
Also when all bcache devices are idle, limited wrieback rate to a small
number is wast of thoughput, especially when backing devices are slower
non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
rate for each backing device if the whole cache set is idle. A faster
writeback rate in idle time means new I/Os may have more available space
for dirty data, and people may observe a better write performance then.
Please note bcache may change its cache mode in run time, and this patch
still works if the cache mode is switched from writeback mode and there
is still dirty data on cache.
Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
Cc: [email protected] #4.16+
Signed-off-by: Coly Li <[email protected]>
Tested-by: Kai Krakow <[email protected]>
Tested-by: Stefan Priebe <[email protected]>
Cc: Michael Lyle <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch tries to add code comments in bset.c, to make some
tricky code and designment to be more comprehensible. Most information
of this patch comes from the discussion between Kent and I, he
offers very informative details. If there is any mistake
of the idea behind the code, no doubt that's from me misrepresentation.
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch updates code comment in bch_keylist_realloc() by fixing
incorrected function names, to make the code to be more comprehennsible.
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch updates the code comment in struct cache with correct array
names, to make the code to be more comprehensible.
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch adds a line of code comment in super.c:register_bdev(), to
make code to be more comprehensible.
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In bch_btree_node_get() the read-in btree node will be partially
prefetched into L1 cache for following bset iteration (if there is).
But if the btree node read is failed, the perfetch operations will
waste L1 cache space. This patch checkes whether read operation and
only does cache prefetch when read I/O succeeded.
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When writeback is not running, writeback rate should be 0, other value is
misleading. And the following dyanmic writeback rate debug parameters
should be 0 too,
rate, proportional, integral, change
otherwise they are misleading when writeback is not running.
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Greg KH suggests that normal code should not care about debugfs. Therefore
no matter successful or failed of debugfs_create_dir() execution, it is
unncessary to check its return value.
There are two functions called debugfs_create_dir() and check the return
value, which are bch_debug_init() and closure_debug_init(). This patch
changes these two functions from int to void type, and ignore return values
of debugfs_create_dir().
This patch does not fix exact bug, just makes things work as they should.
Signed-off-by: Coly Li <[email protected]>
Suggested-by: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: Kai Krakow <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
|
|
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1056531 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Acked-by: Lars-Peter Clausen <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
|
|
The host-progs has been kept as an alias of hostprogs-y for a long time
(at least since the beginning of Git era), with the clear prompt:
Usage of host-progs is deprecated. Please replace with hostprogs-y!
Enough time for the migration has passed.
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Max Filippov <[email protected]>
|
|
Avoids warning messages with the latest release of toybox, which never
bothered to implement the --longopts nothing was using.
Signed-off-by: Rob Landley <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
Kernel headers must be installed into $(objtree)/usr/include to avoid
the build failure of samples.
Commit ddea05fa148b ("kbuild: make samples depend on headers_install")
addressed this, but "samples/" is only used for the single target build.
"make samples/" properly installs kernel headers, but it does not work
for general building because a phony target "sample" (no trailing slash)
is used.
Reported-by: David Howells <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Tested-by: David Howells <[email protected]>
|
|
On systems with ACPI instantiated i2c-clients, normally there is 1 fw_node
per i2c-device and that fw-node contains 1 I2cSerialBus resource for that 1
i2c-device.
But in some rare cases the manufacturer has decided to describe multiple
i2c-devices in a single ACPI fwnode with multiple I2cSerialBus resources.
An earlier attempt to fix this in the i2c-core resulted in a lot of extra
code to support this corner-case.
This commit introduces a new i2c-multi-instantiate driver which fixes this
in a different way. This new driver can be built as a module which will
only loaded on affected systems.
This driver will instantiate a new i2c-client per I2cSerialBus resource,
using the driver_data from the acpi_device_id it is binding to to tell it
which chip-type (and optional irq-resource) to use when instantiating.
Note this driver depends on a platform device being instantiated for the
ACPI fwnode, see the i2c_multi_instantiate_ids list of ACPI device-ids in
drivers/acpi/scan.c: acpi_device_enumeration_by_parent().
Acked-by: Andy Shevchenko <[email protected]>
Acked-by: Wolfram Sang <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
During offline processing two worker threads are canceled without
freeing the device reference which leads to a hanging offline process.
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Fix a panic that occurs for a device that got an error in
dasd_eckd_check_characteristics() during online processing.
For example the read configuration data command may have failed.
If this error occurs the device is not being set online and the earlier
invoked steps during online processing are rolled back. Therefore
dasd_eckd_uncheck_device() is called which needs a valid private
structure. But this pointer is not valid if
dasd_eckd_check_characteristics() has failed.
Check for a valid device->private pointer to prevent a panic.
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
|
|
regmap: Support non-incrementing registers
Some devices have individual registers that don't autoincrement the
register address during bulk reads but instead repeatedly read the same
value, for example for monitoring GPIOs or ADCs. Add support for these.
|