aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-05-30btrfs: Factor out read portion of btrfs_get_blocks_directNikolay Borisov1-13/+43
Currently this function handles both the READ and WRITE dio cases. This is facilitated by a bunch of 'if' statements, a goto short-circuit statement and a very perverse aliasing of "!created"(READ) case by setting lockstart = lockend and checking for lockstart < lockend for detecting the write. Let's simplify this mess by extracting the READ-only code into a separate __btrfs_get_block_direct_read function. This is only the first step, the next one will be to factor out the write side as well. The end goal will be to have the common locking/ unlocking code in btrfs_get_blocks_direct and then it will call either the read|write subvariants. No functional changes. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30blk-throttle: fix potential NULL pointer dereference in throtl_select_dispatchLiu Bo1-1/+2
tg in throtl_select_dispatch is used first and then do check. Since tg may be NULL, it has potential NULL pointer dereference risk. So fix it. Signed-off-by: Joseph Qi <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30block: kyber: make kyber more friendly with mergingJianchao Wang1-32/+158
Currently, kyber is very unfriendly with merging. kyber depends on ctx rq_list to do merging, however, most of time, it will not leave any requests in ctx rq_list. This is because even if tokens of one domain is used up, kyber will try to dispatch requests from other domain and flush the rq_list there. To improve this, we setup kyber_ctx_queue (kcq) which is similar with ctx, but it has rq_lists for different domain and build same mapping between kcq and khd as the ctx & hctx. Then we could merge, insert and dispatch for different domains separately. At the same time, only flush the rq_list of kcq when get domain token successfully. Then if one domain token is used up, the requests could be left in the rq_list of that domain and maybe merged with following io. Following is my test result on machine with 8 cores and NVMe card INTEL SSDPEKKR128G7 fio size=256m ioengine=libaio iodepth=64 direct=1 numjobs=8 seq/random +------+---------------------------------------------------------------+ |patch?| bw(MB/s) | iops | slat(usec) | clat(usec) | merge | +----------------------------------------------------------------------+ | w/o | 606/612 | 151k/153k | 6.89/7.03 | 3349.21/3305.40 | 0/0 | +----------------------------------------------------------------------+ | w/ | 1083/616 | 277k/154k | 4.93/6.95 | 1830.62/3279.95 | 223k/3k | +----------------------------------------------------------------------+ When set numjobs to 16, the bw and iops could reach 1662MB/s and 425k on my platform. Signed-off-by: Jianchao Wang <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30blk-mq: abstract out blk-mq-sched rq list iteration bio merge helperJens Axboe2-11/+26
No functional changes in this patch, just a prep patch for utilizing this in an IO scheduler. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]>
2018-05-30btrfs: return ENOMEM if path allocation fails in btrfs_cross_ref_existSu Yue1-1/+1
The error code does not match the reason of failure and may confuse the callers. Signed-off-by: Su Yue <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30Merge branch 'for-linus' of ↵Linus Torvalds2-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: - a missing -msoft-float for the compile of the kexec purgatory - a fix for the dasd driver to avoid the double use of a field in the 'struct request' [ That latter one is being discussed, and Christoph asked for something cleaner, but for now it's a fix ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/dasd: use blk_mq_rq_from_pdu for per request data s390/purgatory: Fix endless interrupt loop
2018-05-30btrfs: raid56: Remove VLA usageKees Cook1-10/+28
In the quest to remove all stack VLA usage from the kernel[1], this allocates the working buffers during regular init, instead of using stack space. This refactors the allocation code a bit to make it easier to review. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: return error value if create_io_em failed in cow_file_rangeSu Yue1-1/+3
In cow_file_range(), create_io_em() may fail, but its return value is not recorded. Then return value may be 0 even it failed which is a wrong behavior. Let cow_file_range() return PTR_ERR(em) if create_io_em() failed. Fixes: 6f9994dbabe5 ("Btrfs: create a helper to create em for IO") CC: [email protected] # 4.11+ Signed-off-by: Su Yue <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: drop useless member qgroup_reserved of btrfs_pending_snapshotGu JinXiang1-1/+0
Since there is no more use of qgroup_reserved member in struct btrfs_pending_snapshot, remove it. Signed-off-by: Gu JinXiang <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: drop unused parameter qgroup_reservedGu JinXiang4-14/+5
Since commit 7775c8184ec0 ("btrfs: remove unused parameter from btrfs_subvolume_release_metadata") parameter qgroup_reserved is not used by caller of function btrfs_subvolume_reserve_metadata. So remove it. Signed-off-by: Gu JinXiang <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: balance dirty metadata pages in btrfs_finish_ordered_ioEthan Lien1-0/+3
[Problem description and how we fix it] We should balance dirty metadata pages at the end of btrfs_finish_ordered_io, since a small, unmergeable random write can potentially produce dirty metadata which is multiple times larger than the data itself. For example, a small, unmergeable 4KiB write may produce: 16KiB dirty leaf (and possibly 16KiB dirty node) in subvolume tree 16KiB dirty leaf (and possibly 16KiB dirty node) in checksum tree 16KiB dirty leaf (and possibly 16KiB dirty node) in extent tree Although we do call balance dirty pages in write side, but in the buffered write path, most metadata are dirtied only after we reach the dirty background limit (which by far only counts dirty data pages) and wakeup the flusher thread. If there are many small, unmergeable random writes spread in a large btree, we'll find a burst of dirty pages exceeds the dirty_bytes limit after we wakeup the flusher thread - which is not what we expect. In our machine, it caused out-of-memory problem since a page cannot be dropped if it is marked dirty. Someone may worry about we may sleep in btrfs_btree_balance_dirty_nodelay, but since we do btrfs_finish_ordered_io in a separate worker, it will not stop the flusher consuming dirty pages. Also, we use different worker for metadata writeback endio, sleep in btrfs_finish_ordered_io help us throttle the size of dirty metadata pages. [Reproduce steps] To reproduce the problem, we need to do 4KiB write randomly spread in a large btree. In our 2GiB RAM machine: 1) Create 4 subvolumes. 2) Run fio on each subvolume: [global] direct=0 rw=randwrite ioengine=libaio bs=4k iodepth=16 numjobs=1 group_reporting size=128G runtime=1800 norandommap time_based randrepeat=0 3) Take snapshot on each subvolume and repeat fio on existing files. 4) Repeat step (3) until we get large btrees. In our case, by observing btrfs_root_item->bytes_used, we have 2GiB of metadata in each subvolume tree and 12GiB of metadata in extent tree. 5) Stop all fio, take snapshot again, and wait until all delayed work is completed. 6) Start all fio. Few seconds later we hit OOM when the flusher starts to work. It can be reproduced even when using nocow write. Signed-off-by: Ethan Lien <[email protected]> Reviewed-by: David Sterba <[email protected]> [ add comment ] Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: lift some btrfs_cross_ref_exist checks in nocow pathEthan Lien1-0/+15
In nocow path, we check if the extent is snapshotted in btrfs_cross_ref_exist(). We can do the similar check earlier and avoid unnecessary search into extent tree. A fio test on a Intel D-1531, 16GB RAM, SSD RAID-5 machine as follows: [global] group_reporting time_based thread=1 ioengine=libaio bs=4k iodepth=32 size=64G runtime=180 numjobs=8 rw=randwrite [file1] filename=/mnt/nocow/testfile IOPS result: unpatched patched 1 fio round: 46670 46958 snapshot 2 fio round: 51826 54498 3 fio round: 59767 61289 After snapshot, the first fio get about 5% performance gain. As we continually write to the same file, all writes will resume to nocow mode and eventually we have no performance gain. Signed-off-by: Ethan Lien <[email protected]> Reviewed-by: David Sterba <[email protected]> [ update comments ] Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: Remove fs_info argument from btrfs_uuid_tree_remLu Fengqi4-9/+7
This function always takes a transaction handle which contains a reference to the fs_info. Use that and remove the extra argument. Signed-off-by: Lu Fengqi <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> [ rename the function ] Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: Remove fs_info argument from btrfs_uuid_tree_addLu Fengqi5-13/+10
This function always takes a transaction handle which contains a reference to the fs_info. Use that and remove the extra argument. Signed-off-by: Lu Fengqi <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30Btrfs: remove unused check of skip_lockingLiu Bo1-2/+5
The check is superfluous since all callers who set search_for_commit also have skip_locking set. ASSERT() is put in place to ensure skip_locking is set by new callers. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30Btrfs: remove always true check in unlock_upLiu Bo1-1/+1
As unlock_up() is written as for () { if (!path->locks[i]) break; ... if (... && path->locks[i]) { } } Apparently, @path->locks[i] is always true at this 'if'. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30Btrfs: grab write lock directly if write_lock_level is the max levelLiu Bo1-11/+16
Typically, when acquiring root node's lock, btrfs tries its best to get read lock and trade for write lock if @write_lock_level implies to do so. In case of (cow && (p->keep_locks || p->lowest_level)), write_lock_level is set to BTRFS_MAX_LEVEL, which means we need to acquire root node's write lock directly. In this particular case, the dance of acquiring read lock and then trading for write lock can be saved. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30Btrfs: move get root out of btrfs_search_slot to a helperLiu Bo1-45/+65
It's good to have a helper instead of having all get-root details open-coded. The new helper locks (if necessary) and sets root node of the path. Also invert the checks to make the code flow easier to read. There is no functional change in this commit. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30Btrfs: use more straightforward extent_buffer_uptodate checkLiu Bo1-1/+1
If parent_transid "0" is passed to btrfs_buffer_uptodate(), btrfs_buffer_uptodate() is equivalent to extent_buffer_uptodate(), but extent_buffer_uptodate() is preferred since we don't have to look into verify_parent_transid(). Signed-off-by: Liu Bo <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30Btrfs: remove superfluous free_extent_buffer in read_block_for_searchLiu Bo1-1/+0
read_block_for_search() can be simplified as: tmp = find_extent_buffer(); if (tmp) return; ... free_extent_buffer(); read_tree_block(); Apparently, @tmp must be NULL at this point, free_extent_buffer() is not needed. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: drop unused space_info parameter from create_space_infoLu Fengqi1-8/+5
Since commit dc2d3005d27d ("btrfs: remove dead create_space_info calls"), there is only one caller btrfs_init_space_info. However, it doesn't need create_space_info to return space_info at all. Signed-off-by: Lu Fengqi <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30Btrfs: add parent_transid parameter to veirfy_level_keyLiu Bo1-6/+7
As verify_level_key() is checked after verify_parent_transid(), i.e. if (verify_parent_transid()) ret = -EIO; else if (verify_level_key()) ret = -EUCLEAN; if parent_transid is 0, verify_parent_transid() skips verifying parent_transid and considers eb as valid, and if verify_level_key() reports something wrong, we're not going to know if it's caused by corrupted metadata or non-checkecd eb (e.g. stale eb). The stale eb can be from an outdated raid1 mirror after a degraded mount, see eg "btrfs: fix reading stale metadata blocks after degraded raid1 mounts" (02a3307aa9c20b4f66262) for more details. @parent_transid is able to tell whether the eb's generation has been verified by the caller. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: qgroup: show more meaningful qgroup_rescan_init error messageQu Wenruo1-15/+18
Error message from qgroup_rescan_init() mostly looks like: BTRFS info (device nvme0n1p1): qgroup_rescan_init failed with -115 Which is far from meaningful, and sometimes confusing as for above -EINPROGRESS it's mostly (despite the init race) harmless, but sometimes it can also indicate problem if the return value is -EINVAL. Change it to some more meaningful messages like: BTRFS info (device nvme0n1p1): qgroup rescan is already in progress And BTRFS err(device nvme0n1p1): qgroup rescan init failed, qgroup is not enabled Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> [ update the messages and level ] Signed-off-by: David Sterba <[email protected]>
2018-05-30Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2()Omar Sandoval1-2/+4
If we have invalid flags set, when we error out we must drop our writer counter and free the buffer we allocated for the arguments. This bug is trivially reproduced with the following program on 4.7+: #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <linux/btrfs.h> #include <linux/btrfs_tree.h> int main(int argc, char **argv) { struct btrfs_ioctl_vol_args_v2 vol_args = { .flags = UINT64_MAX, }; int ret; int fd; if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } fd = open(argv[1], O_WRONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args); if (ret == -1) perror("ioctl"); close(fd); return EXIT_SUCCESS; } When unmounting the filesystem, we'll hit the WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the filesystem to be remounted read-only as the writer count will stay lifted. Fixes: 6b526ed70cf1 ("btrfs: introduce device delete by devid") CC: [email protected] # 4.9+ Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: Su Yue <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: lzo: Harden inline lzo compressed extent decompressionQu Wenruo1-1/+10
For inlined extent, we only have one segment, thus less things to check. And further more, inlined extent always has the csum in its leaf header, it's less probable to have corrupted data. Anyway, still check header and segment header. Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-30btrfs: lzo: Add header length check to avoid potential out-of-bounds accessQu Wenruo1-2/+26
James Harvey reported that some corrupted compressed extent data can lead to various kernel memory corruption. Such corrupted extent data belongs to inode with NODATASUM flags, thus data csum won't help us detecting such bug. If lucky enough, KASAN could catch it like: BUG: KASAN: slab-out-of-bounds in lzo_decompress_bio+0x384/0x7a0 [btrfs] Write of size 4096 at addr ffff8800606cb0f8 by task kworker/u16:0/2338 CPU: 3 PID: 2338 Comm: kworker/u16:0 Tainted: G O 4.17.0-rc5-custom+ #50 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: btrfs-endio btrfs_endio_helper [btrfs] Call Trace: dump_stack+0xc2/0x16b print_address_description+0x6a/0x270 kasan_report+0x260/0x380 memcpy+0x34/0x50 lzo_decompress_bio+0x384/0x7a0 [btrfs] end_compressed_bio_read+0x99f/0x10b0 [btrfs] bio_endio+0x32e/0x640 normal_work_helper+0x15a/0xea0 [btrfs] process_one_work+0x7e3/0x1470 worker_thread+0x1b0/0x1170 kthread+0x2db/0x390 ret_from_fork+0x22/0x40 ... The offending compressed data has the following info: Header: length 32768 (looks completely valid) Segment 0 Header: length 3472882419 (obviously out of bounds) Then when handling segment 0, since it's over the current page, we need the copy the compressed data to temporary buffer in workspace, then such large size would trigger out-of-bounds memory access, screwing up the whole kernel. Fix it by adding extra checks on header and segment headers to ensure we won't access out-of-bounds, and even checks the decompressed data won't be out-of-bounds. Reported-by: James Harvey <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: Misono Tomohiro <[email protected]> Reviewed-by: David Sterba <[email protected]> [ updated comments ] Signed-off-by: David Sterba <[email protected]>
2018-05-30perf test: "Session topology" dumps core on s390Thomas Richter1-6/+24
The "perf test Session topology" entry fails with core dump on s390. The root cause is a NULL pointer dereference in function check_cpu_topology() line 76 (or line 82 without -v). The session->header.env.cpu variable is NULL because on s390 function process_cpu_topology() returns with error: socket_id number is too big. You may need to upgrade the perf tool. and releases the env.cpu variable via zfree() and sets it to NULL. Here is the gdb output: (gdb) n 76 pr_debug("CPU %d, core %d, socket %d\n", i, (gdb) n Program received signal SIGSEGV, Segmentation fault. 0x00000000010f4d9e in check_cpu_topology (path=0x3ffffffd6c8 "/tmp/perf-test-J6CHMa", map=0x14a1740) at tests/topology.c:76 76 pr_debug("CPU %d, core %d, socket %d\n", i, (gdb) Make sure the env.cpu variable is not used when its NULL. Test for NULL pointer and return TEST_SKIP if so. Output before: [root@p23lp27 perf]# ./perf test -F 39 39: Session topology :Segmentation fault (core dumped) [root@p23lp27 perf]# Output after: [root@p23lp27 perf]# ./perf test -vF 39 39: Session topology : --- start --- templ file: /tmp/perf-test-Ajx59D socket_id number is too big.You may need to upgrade the perf tool. ---- end ---- Session topology: Skip [root@p23lp27 perf]# Signed-off-by: Thomas Richter <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hendrik Brueckner <[email protected]> Cc: Martin Schwidefsky <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-05-30perf parse-events: Handle uncore event aliases in small groups properlyKan Liang4-9/+137
Perf stat doesn't count the uncore event aliases from the same uncore block in a group, for example: perf stat -e '{unc_m_cas_count.all,unc_m_clockticks}' -a -I 1000 # time counts unit events 1.000447342 <not counted> unc_m_cas_count.all 1.000447342 <not counted> unc_m_clockticks 2.000740654 <not counted> unc_m_cas_count.all 2.000740654 <not counted> unc_m_clockticks The output is very misleading. It gives a wrong impression that the uncore event doesn't work. An uncore block could be composed by several PMUs. An uncore event alias is a joint name which means the same event runs on all PMUs of a block. Perf doesn't support mixed events from different PMUs in the same group. It is wrong to put uncore event aliases in a big group. The right way is to split the big group into multiple small groups which only include the events from the same PMU. Only uncore event aliases from the same uncore block should be specially handled here. It doesn't make sense to mix the uncore events with other uncore events from different blocks or even core events in a group. With the patch: # time counts unit events 1.001557653 140,833 unc_m_cas_count.all 1.001557653 1,330,231,332 unc_m_clockticks 2.002709483 85,007 unc_m_cas_count.all 2.002709483 1,429,494,563 unc_m_clockticks Reported-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Agustin Vega-Frias <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Jin Yao <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Shaokun Zhang <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-05-30nvme-loop: add support for multiple portsChristoph Hellwig2-19/+31
This is useful at least for multipath testing. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]>
2018-05-30nvme-pci: simplify __nvme_submit_cmdChristoph Hellwig1-23/+14
With recent CQ handling improvements we can now move the locking into __nvme_submit_cmd. Also remove the local tail variable to make the code more obvious, remove the __ prefix in the name, and fix the comments describing the function. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]>
2018-05-30nvme-pci: Rate limit the nvme timeout warningsKeith Busch1-1/+1
The block layer's timeout handling currently prevents drivers from completing commands outside the timeout callback once blk-mq decides they've expired. If a device breaks, this could potentially create many thousands of timed out commands. There's nothing of value to be gleaned from observing each of those messages, so this patch adds a rate limit on them. Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-05-30nvme: allow duplicate controller if prior controller being deletedJames Smart1-1/+3
The current checks for whether a new controller request "matches" an existing controller ignores controller state and checks identity strings. There are cases where an existing controller may be in its last steps of deletion when they are "matched" by a new connection. Change the behavior so that the new connection ignores controllers that are deleted. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-05-30Merge branch 'for-mingo' of ↵Ingo Molnar5-1/+15
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU fix from Paul E. McKenney: "This additional v4.18 pull request contains a single commit that fell through the cracks: Provide early rcu_cpu_starting() callback for the benefit of the x86/mtrr code, which needs RCU to be available on incoming CPUs earlier than has been the case in the past." Signed-off-by: Ingo Molnar <[email protected]>
2018-05-29Merge branch 'for-linus' of ↵Linus Torvalds2-11/+17
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "We are switching a bunch of Lenovo devices with Synaptics touchpads from PS/2 emulation over to native RMI/SMbus. Given that all commits are marked for stable there is no point delaying them till next release" [ Also fix a too-small stack array for i2c communication in elan driver ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elan_i2c_smbus - fix corrupted stack Input: synaptics - add Lenovo 80 series ids to SMBus Input: synaptics - add Intertouch support on X1 Carbon 6th and X280 Input: synaptics - Lenovo Thinkpad X1 Carbon G5 (2017) with Elantech trackpoints should use RMI Input: synaptics - Lenovo Carbon X1 Gen5 (2017) devices should use RMI
2018-05-29aio: sanitize the limit checking in io_submit(2)Al Viro1-8/+6
as it is, the logics in native io_submit(2) is "if asked for more than LONG_MAX/sizeof(pointer) iocbs to submit, don't bother with more than LONG_MAX/sizeof(pointer)" (i.e. 512M requests on 32bit and 1E requests on 64bit) while compat io_submit(2) goes with "stop after the first PAGE_SIZE/sizeof(pointer) iocbs", i.e. 1K or so. Which is * inconsistent * *way* too much in native case * possibly too little in compat one and * wrong anyway, since the natural point where we ought to stop bothering is ctx->nr_events Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2018-05-29aio: fold do_io_submit() into callersAl Viro1-54/+45
get rid of insane "copy array of 32bit pointers into an array of native ones" glue. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2018-05-29aio: shift copyin of iocb into io_submit_one()Al Viro1-24/+22
Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2018-05-29aio_read_events_ring(): make a bit more readableAl Viro1-4/+3
The logics for 'avail' is * not past the tail of cyclic buffer * no more than asked * not past the end of buffer * not past the end of a page Unobfuscate the last part. Signed-off-by: Al Viro <[email protected]>
2018-05-29aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the ↵Al Viro1-14/+12
same way ... so just make them return 0 when caller does not need to destroy iocb Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2018-05-29aio: take list removal to (some) callers of aio_complete()Al Viro1-17/+21
We really want iocb out of io_cancel(2) reach before we start tearing it down. Signed-off-by: Al Viro <[email protected]>
2018-05-30Merge branch 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie1-6/+7
into drm-fixes One last fix for 4.17. Fix a suspend regression in DC. * 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux: drm/amd/display: Fix BUG_ON during CRTC atomic check update
2018-05-30Merge tag 'drm-misc-fixes-2018-05-29' of ↵Dave Airlie2-1/+5
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes core: Add 220us psr setup time (Dhinakaran) omap: Fix NULL deref (Tomi) Cc: Dhinakaran Pandiyan <[email protected]> Cc: Tomi Valkeinen <[email protected]> * tag 'drm-misc-fixes-2018-05-29' of git://anongit.freedesktop.org/drm/drm-misc: drm/omap: fix NULL deref crash with SDI displays drm/psr: Fix missed entry in PSR setup time table.
2018-05-29selinux: KASAN: slab-out-of-bounds in xattr_getsecuritySachin Grover1-1/+1
Call trace: [<ffffff9203a8d7a8>] dump_backtrace+0x0/0x428 [<ffffff9203a8dbf8>] show_stack+0x28/0x38 [<ffffff920409bfb8>] dump_stack+0xd4/0x124 [<ffffff9203d187e8>] print_address_description+0x68/0x258 [<ffffff9203d18c00>] kasan_report.part.2+0x228/0x2f0 [<ffffff9203d1927c>] kasan_report+0x5c/0x70 [<ffffff9203d1776c>] check_memory_region+0x12c/0x1c0 [<ffffff9203d17cdc>] memcpy+0x34/0x68 [<ffffff9203d75348>] xattr_getsecurity+0xe0/0x160 [<ffffff9203d75490>] vfs_getxattr+0xc8/0x120 [<ffffff9203d75d68>] getxattr+0x100/0x2c8 [<ffffff9203d76fb4>] SyS_fgetxattr+0x64/0xa0 [<ffffff9203a83f70>] el0_svc_naked+0x24/0x28 If user get root access and calls security.selinux setxattr() with an embedded NUL on a file and then if some process performs a getxattr() on that file with a length greater than the actual length of the string, it would result in a panic. To fix this, add the actual length of the string to the security context instead of the length passed by the userspace process. Signed-off-by: Sachin Grover <[email protected]> Cc: [email protected] Signed-off-by: Paul Moore <[email protected]>
2018-05-30Merge tag 'drm-intel-fixes-2018-05-29' of ↵Dave Airlie2-15/+51
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix for potential Spectre vector in the new query uAPI - Fix NULL pointer deref (FDO #106559) - DMI fix to hide LVDS for Radiant P845 (FDO #105468) * tag 'drm-intel-fixes-2018-05-29' of git://anongit.freedesktop.org/drm/drm-intel: drm/i915/query: nospec expects no more than an unsigned long drm/i915/query: Protect tainted function pointer lookup drm/i915/lvds: Move acpi lid notification registration to registration phase drm/i915: Disable LVDS on Radiant P845
2018-05-29platform/chrome: chromeos_laptop: fix touchpad button mapping on CelesDmitry Torokhov1-1/+9
Celes has newer touch controller (compared to the controllers used in older BayTrail-based devices) and so uses the same button mapping as Samus. This fixes the issue with mouse button being stuck in pressed state after the first click. Reported-by: Sultan Alsawaf <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Benson Leung <[email protected]>
2018-05-29hwmon: (gpio-fan) Fix "#cooling-cells" property name in bindingsViresh Kumar1-1/+1
It should be "#cooling-cells" instead of "cooling-cells". Fix it. Signed-off-by: Viresh Kumar <[email protected]> [groeck: Updated subject] Signed-off-by: Guenter Roeck <[email protected]>
2018-05-29Merge tag 'afs-fixes-20180529' of ↵Linus Torvalds2-16/+13
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: - fix a BUG triggerable from faccessat() - fix the mounting of backup volumes * tag 'afs-fixes-20180529' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix mounting of backup volumes afs: Fix directory permissions check
2018-05-29drm/amd/display: Fix BUG_ON during CRTC atomic check updateLeo (Sunpeng) Li1-6/+7
For cases where the CRTC is inactive (DPMS off), where a modeset is not required, yet the CRTC is still in the atomic state, we should not attempt to update anything on it. Previously, we were relying on the modereset_required() helper to check the above condition. However, the function returns false immediately if a modeset is not required, ignoring the CRTC's enable/active state flags. The correct way to filter is by looking at these flags instead. Fixes: e277adc5a06c "drm/amd/display: Hookup color management functions" Bugzilla: https://bugs.freedesktop.org/106194 Signed-off-by: Leo (Sunpeng) Li <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Tested-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-05-29block: remove parent device reference from struct bsg_class_deviceChristoph Hellwig7-63/+19
Bsg holding a reference to the parent device may result in a crash if a bsg file handle is closed after the parent device driver has unloaded. Holding a reference is not really needed: the parent device must exist between bsg_register_queue and bsg_unregister_queue. Before the device goes away the caller does blk_cleanup_queue so that all in-flight requests to the device are gone and all new requests cannot pass beyond the queue. The queue itself is a refcounted object and it will stay alive with a bsg file. Based on analysis, previous patch and changelog from Anatoliy Glagolev. Reported-by: Anatoliy Glagolev <[email protected]> Reviewed-by: James E.J. Bottomley <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-29Merge branch 'nvme-4.18-2' of git://git.infradead.org/nvme into for-4.18/blockJens Axboe16-119/+457
Pull NVMe changes from Christoph: "Here is the current batch of nvme updates for 4.18, we have a few more patches in the queue, but I'd like to get this pile into your tree and linux-next ASAP. The biggest item is support for file-backed namespaces in the NVMe target from Chaitanya, in addition to that we mostly small fixes from all the usual suspects." * 'nvme-4.18-2' of git://git.infradead.org/nvme: nvme: fixup memory leak in nvme_init_identify() nvme: fix KASAN warning when parsing host nqn nvmet-loop: use nr_phys_segments when map rq to sgl nvmet-fc: increase LS buffer count per fc port nvmet: add simple file backed ns support nvmet: remove duplicate NULL initialization for req->ns nvmet: make a few error messages more generic nvme-fabrics: allow duplicate connections to the discovery controller nvme-fabrics: centralize discovery controller defaults nvme-fabrics: remove unnecessary controller subnqn validation nvme-fc: remove setting DNR on exception conditions nvme-rdma: stop admin queue before freeing it nvme-pci: Fix AER reset handling nvme-pci: set nvmeq->cq_vector after alloc cq/sq nvme: host: core: fix precedence of ternary operator nvme: fix lockdep warning in nvme_mpath_clear_current_path