aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-02-07bcache: fix for data collapse after re-attaching an attached deviceTang Junhui3-7/+11
back-end device sdm has already attached a cache_set with ID f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with another cache set, and it returns with an error: [root]# cd /sys/block/sdm/bcache [root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f > attach -bash: echo: write error: Invalid argument After that, execute a command to modify the label of bcache device: [root]# echo data_disk1 > label Then we reboot the system, when the system power on, the back-end device can not attach to cache_set, a messages show in the log: Feb 5 12:05:52 ceph152 kernel: [922385.508498] bcache: bch_cached_dev_attach() couldn't find uuid for sdm in set In sysfs_attach(), dc->sb.set_uuid was assigned to the value which input through sysfs, no matter whether it is success or not in bch_cached_dev_attach(). For example, If the back-end device has already attached to an cache set, bch_cached_dev_attach() would fail, but dc->sb.set_uuid was changed. Then modify the label of bcache device, it will call bch_write_bdev_super(), which would write the dc->sb.set_uuid to the super block, so we record a wrong cache set ID in the super block, after the system reboot, the cache set couldn't find the uuid of the back-end device, so the bcache device couldn't exist and use any more. In this patch, we don't assigned cache set ID to dc->sb.set_uuid in sysfs_attach() directly, but input it into bch_cached_dev_attach(), and assigned dc->sb.set_uuid to the cache set ID after the back-end device attached to the cache set successful. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07bcache: return attach error when no cache set existTang Junhui1-2/+3
I attach a back-end device to a cache set, and the cache set is not registered yet, this back-end device did not attach successfully, and no error returned: [root]# echo 87859280-fec6-4bcc-20df7ca8f86b > /sys/block/sde/bcache/attach [root]# In sysfs_attach(), the return value "v" is initialized to "size" in the beginning, and if no cache set exist in bch_cache_sets, the "v" value would not change any more, and return to sysfs, sysfs regard it as success since the "size" is a positive number. This patch fixes this issue by assigning "v" with "-ENOENT" in the initialization. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07bcache: set writeback_rate_update_seconds in range [1, 60] secondsColy Li3-2/+7
dc->writeback_rate_update_seconds can be set via sysfs and its value can be set to [1, ULONG_MAX]. It does not make sense to set such a large value, 60 seconds is long enough value considering the default 5 seconds works well for long time. Because dc->writeback_rate_update is a special delayed work, it re-arms itself inside the delayed work routine update_writeback_rate(). When stopping it by cancel_delayed_work_sync(), there should be a timeout to wait and make sure the re-armed delayed work is stopped too. A small max value of dc->writeback_rate_update_seconds is also helpful to decide a reasonable small timeout. This patch limits sysfs interface to set dc->writeback_rate_update_seconds in range of [1, 60] seconds, and replaces the hand-coded number by macros. Changelog: v2: fix a rebase typo in v4, which is pointed out by Michael Lyle. v1: initial version. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07bcache: fix for allocator and register thread raceTang Junhui2-4/+18
After long time running of random small IO writing, I reboot the machine, and after the machine power on, I found bcache got stuck, the stack is: [root@ceph153 ~]# cat /proc/2510/task/*/stack [<ffffffffa06b2455>] closure_sync+0x25/0x90 [bcache] [<ffffffffa06b6be8>] bch_journal+0x118/0x2b0 [bcache] [<ffffffffa06b6dc7>] bch_journal_meta+0x47/0x70 [bcache] [<ffffffffa06be8f7>] bch_prio_write+0x237/0x340 [bcache] [<ffffffffa06a8018>] bch_allocator_thread+0x3c8/0x3d0 [bcache] [<ffffffff810a631f>] kthread+0xcf/0xe0 [<ffffffff8164c318>] ret_from_fork+0x58/0x90 [<ffffffffffffffff>] 0xffffffffffffffff [root@ceph153 ~]# cat /proc/2038/task/*/stack [<ffffffffa06b1abd>] __bch_btree_map_nodes+0x12d/0x150 [bcache] [<ffffffffa06b1bd1>] bch_btree_insert+0xf1/0x170 [bcache] [<ffffffffa06b637f>] bch_journal_replay+0x13f/0x230 [bcache] [<ffffffffa06c75fe>] run_cache_set+0x79a/0x7c2 [bcache] [<ffffffffa06c0cf8>] register_bcache+0xd48/0x1310 [bcache] [<ffffffff812f702f>] kobj_attr_store+0xf/0x20 [<ffffffff8125b216>] sysfs_write_file+0xc6/0x140 [<ffffffff811dfbfd>] vfs_write+0xbd/0x1e0 [<ffffffff811e069f>] SyS_write+0x7f/0xe0 [<ffffffff8164c3c9>] system_call_fastpath+0x16/0x1 The stack shows the register thread and allocator thread were getting stuck when registering cache device. I reboot the machine several times, the issue always exsit in this machine. I debug the code, and found the call trace as bellow: register_bcache() ==>run_cache_set() ==>bch_journal_replay() ==>bch_btree_insert() ==>__bch_btree_map_nodes() ==>btree_insert_fn() ==>btree_split() //node need split ==>btree_check_reserve() In btree_check_reserve(), It will check if there is enough buckets of RESERVE_BTREE type, since allocator thread did not work yet, so no buckets of RESERVE_BTREE type allocated, so the register thread waits on c->btree_cache_wait, and goes to sleep. Then the allocator thread initialized, the call trace is bellow: bch_allocator_thread() ==>bch_prio_write() ==>bch_journal_meta() ==>bch_journal() ==>journal_wait_for_write() In journal_wait_for_write(), It will check if journal is full by journal_full(), but the long time random small IO writing causes the exhaustion of journal buckets(journal.blocks_free=0), In order to release the journal buckets, the allocator calls btree_flush_write() to flush keys to btree nodes, and waits on c->journal.wait until btree nodes writing over or there has already some journal buckets space, then the allocator thread goes to sleep. but in btree_flush_write(), since bch_journal_replay() is not finished, so no btree nodes have journal (condition "if (btree_current_write(b)->journal)" never satisfied), so we got no btree node to flush, no journal bucket released, and allocator sleep all the times. Through the above analysis, we can see that: 1) Register thread wait for allocator thread to allocate buckets of RESERVE_BTREE type; 2) Alloctor thread wait for register thread to replay journal, so it can flush btree nodes and get journal bucket. then they are all got stuck by waiting for each other. Hua Rui provided a patch for me, by allocating some buckets of RESERVE_BTREE type in advance, so the register thread can get bucket when btree node splitting and no need to waiting for the allocator thread. I tested it, it has effect, and register thread run a step forward, but finally are still got stuck, the reason is only 8 bucket of RESERVE_BTREE type were allocated, and in bch_journal_replay(), after 2 btree nodes splitting, only 4 bucket of RESERVE_BTREE type left, then btree_check_reserve() is not satisfied anymore, so it goes to sleep again, and in the same time, alloctor thread did not flush enough btree nodes to release a journal bucket, so they all got stuck again. So we need to allocate more buckets of RESERVE_BTREE type in advance, but how much is enough? By experience and test, I think it should be as much as journal buckets. Then I modify the code as this patch, and test in the machine, and it works. This patch modified base on Hua Rui’s patch, and allocate more buckets of RESERVE_BTREE type in advance to avoid register thread and allocate thread going to wait for each other. [patch v2] ca->sb.njournal_buckets would be 0 in the first time after cache creation, and no journal exists, so just 8 btree buckets is OK. Signed-off-by: Hua Rui <huarui.dev@gmail.com> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07bcache: set error_limit correctlyColy Li3-3/+4
Struct cache uses io_errors for two purposes, - Error decay: when cache set error_decay is set, io_errors is used to generate a small piece of delay when I/O error happens. - I/O errors counter: in order to generate big enough value for error decay, I/O errors counter value is stored by left shifting 20 bits (a.k.a IO_ERROR_SHIFT). In function bch_count_io_errors(), if I/O errors counter reaches cache set error limit, bch_cache_set_error() will be called to retire the whold cache set. But current code is problematic when checking the error limit, see the following code piece from bch_count_io_errors(), 90 if (error) { 91 char buf[BDEVNAME_SIZE]; 92 unsigned errors = atomic_add_return(1 << IO_ERROR_SHIFT, 93 &ca->io_errors); 94 errors >>= IO_ERROR_SHIFT; 95 96 if (errors < ca->set->error_limit) 97 pr_err("%s: IO error on %s, recovering", 98 bdevname(ca->bdev, buf), m); 99 else 100 bch_cache_set_error(ca->set, 101 "%s: too many IO errors %s", 102 bdevname(ca->bdev, buf), m); 103 } At line 94, errors is right shifting IO_ERROR_SHIFT bits, now it is real errors counter to compare at line 96. But ca->set->error_limit is initia- lized with an amplified value in bch_cache_set_alloc(), 1545 c->error_limit = 8 << IO_ERROR_SHIFT; It means by default, in bch_count_io_errors(), before 8<<20 errors happened bch_cache_set_error() won't be called to retire the problematic cache device. If the average request size is 64KB, it means bcache won't handle failed device until 512GB data is requested. This is too large to be an I/O threashold. So I believe the correct error limit should be much less. This patch sets default cache set error limit to 8, then in bch_count_io_errors() when errors counter reaches 8 (if it is default value), function bch_cache_set_error() will be called to retire the whole cache set. This patch also removes bits shifting when store or show io_error_limit value via sysfs interface. Nowadays most of SSDs handle internal flash failure automatically by LBA address re-indirect mapping. If an I/O error can be observed by upper layer code, it will be a notable error because that SSD can not re-indirect map the problematic LBA address to an available flash block. This situation indicates the whole SSD will be failed very soon. Therefore setting 8 as the default io error limit value makes sense, it is enough for most of cache devices. Changelog: v2: add reviewed-by from Hannes. v1: initial version for review. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Cc: Junhui Tang <tang.junhui@zte.com.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07bcache: properly set task state in bch_writeback_thread()Coly Li2-3/+8
Kernel thread routine bch_writeback_thread() has the following code block, 447 down_write(&dc->writeback_lock); 448~450 if (check conditions) { 451 up_write(&dc->writeback_lock); 452 set_current_state(TASK_INTERRUPTIBLE); 453 454 if (kthread_should_stop()) 455 return 0; 456 457 schedule(); 458 continue; 459 } If condition check is true, its task state is set to TASK_INTERRUPTIBLE and call schedule() to wait for others to wake up it. There are 2 issues in current code, 1, Task state is set to TASK_INTERRUPTIBLE after the condition checks, if another process changes the condition and call wake_up_process(dc-> writeback_thread), then at line 452 task state is set back to TASK_INTERRUPTIBLE, the writeback kernel thread will lose a chance to be waken up. 2, At line 454 if kthread_should_stop() is true, writeback kernel thread will return to kernel/kthread.c:kthread() with TASK_INTERRUPTIBLE and call do_exit(). It is not good to enter do_exit() with task state TASK_INTERRUPTIBLE, in following code path might_sleep() is called and a warning message is reported by __might_sleep(): "WARNING: do not call blocking ops when !TASK_RUNNING; state=1 set at [xxxx]". For the first issue, task state should be set before condition checks. Ineed because dc->writeback_lock is required when modifying all the conditions, calling set_current_state() inside code block where dc-> writeback_lock is hold is safe. But this is quite implicit, so I still move set_current_state() before all the condition checks. For the second issue, frankley speaking it does not hurt when kernel thread exits with TASK_INTERRUPTIBLE state, but this warning message scares users, makes them feel there might be something risky with bcache and hurt their data. Setting task state to TASK_RUNNING before returning fixes this problem. In alloc.c:allocator_wait(), there is also a similar issue, and is also fixed in this patch. Changelog: v3: merge two similar fixes into one patch v2: fix the race issue in v1 patch. v1: initial buggy fix. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Michael Lyle <mlyle@lyle.org> Cc: Michael Lyle <mlyle@lyle.org> Cc: Junhui Tang <tang.junhui@zte.com.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07bcache: fix high CPU occupancy during journalTang Junhui3-15/+36
After long time small writing I/O running, we found the occupancy of CPU is very high and I/O performance has been reduced by about half: [root@ceph151 internal]# top top - 15:51:05 up 1 day,2:43, 4 users, load average: 16.89, 15.15, 16.53 Tasks: 2063 total, 4 running, 2059 sleeping, 0 stopped, 0 zombie %Cpu(s):4.3 us, 17.1 sy 0.0 ni, 66.1 id, 12.0 wa, 0.0 hi, 0.5 si, 0.0 st KiB Mem : 65450044 total, 24586420 free, 38909008 used, 1954616 buff/cache KiB Swap: 65667068 total, 65667068 free, 0 used. 25136812 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2023 root 20 0 0 0 0 S 55.1 0.0 0:04.42 kworker/11:191 14126 root 20 0 0 0 0 S 42.9 0.0 0:08.72 kworker/10:3 9292 root 20 0 0 0 0 S 30.4 0.0 1:10.99 kworker/6:1 8553 ceph 20 0 4242492 1.805g 18804 S 30.0 2.9 410:07.04 ceph-osd 12287 root 20 0 0 0 0 S 26.7 0.0 0:28.13 kworker/7:85 31019 root 20 0 0 0 0 S 26.1 0.0 1:30.79 kworker/22:1 1787 root 20 0 0 0 0 R 25.7 0.0 5:18.45 kworker/8:7 32169 root 20 0 0 0 0 S 14.5 0.0 1:01.92 kworker/23:1 21476 root 20 0 0 0 0 S 13.9 0.0 0:05.09 kworker/1:54 2204 root 20 0 0 0 0 S 12.5 0.0 1:25.17 kworker/9:10 16994 root 20 0 0 0 0 S 12.2 0.0 0:06.27 kworker/5:106 15714 root 20 0 0 0 0 R 10.9 0.0 0:01.85 kworker/19:2 9661 ceph 20 0 4246876 1.731g 18800 S 10.6 2.8 403:00.80 ceph-osd 11460 ceph 20 0 4164692 2.206g 18876 S 10.6 3.5 360:27.19 ceph-osd 9960 root 20 0 0 0 0 S 10.2 0.0 0:02.75 kworker/2:139 11699 ceph 20 0 4169244 1.920g 18920 S 10.2 3.1 355:23.67 ceph-osd 6843 ceph 20 0 4197632 1.810g 18900 S 9.6 2.9 380:08.30 ceph-osd The kernel work consumed a lot of CPU, and I found they are running journal work, The journal is reclaiming source and flush btree node with surprising frequency. Through further analysis, we found that in btree_flush_write(), we try to get a btree node with the smallest fifo idex to flush by traverse all the btree nodein c->bucket_hash, after we getting it, since no locker protects it, this btree node may have been written to cache device by other works, and if this occurred, we retry to traverse in c->bucket_hash and get another btree node. When the problem occurrd, the retry times is very high, and we consume a lot of CPU in looking for a appropriate btree node. In this patch, we try to record 128 btree nodes with the smallest fifo idex in heap, and pop one by one when we need to flush btree node. It greatly reduces the time for the loop to find the appropriate BTREE node, and also reduce the occupancy of CPU. [note by mpl: this triggers a checkpatch error because of adjacent, pre-existing style violations] Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07bcache: add journal statisticTang Junhui3-0/+24
Sometimes, Journal takes up a lot of CPU, we need statistics to know what's the journal is doing. So this patch provide some journal statistics: 1) reclaim: how many times the journal try to reclaim resource, usually the journal bucket or/and the pin are exhausted. 2) flush_write: how many times the journal try to flush btree node to cache device, usually the journal bucket are exhausted. 3) retry_flush_write: how many times the journal retry to flush the next btree node, usually the previous tree node have been flushed by other thread. we show these statistic by sysfs interface. Through these statistics We can totally see the status of journal module when the CPU is too high. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07Merge tag 'riscv-for-linus-4.16-merge_window' of ↵Linus Torvalds17-76/+250
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V updates from Palmer Dabbelt: "This contains the fixes we'd like to target for the 4.16 merge window. It's not as much as I was originally hoping to do but between glibc, the chip, and FOSDEM there just wasn't enough time to get everything put together. As such, this merge window is essentially just going to be small changes. This includes mostly cleanups: - A build fix failure to the audit test cases. RISC-V doesn't have renameat because the generic syscall ABI moved to renameat2 by the time of our port. The syscall audit test cases don't understand this, so I added a trivial fix. This went through mailing list review during the 4.15 merge window, but nobody has picked it up so I think it's best to just do this here. - The removal of our command-line argument processing code. The "mem_end" stuff was broken and the rest duplicated generic device tree code. The generic code was already being called. - Some unused/redundant code has been removed, including __ARCH_HAVE_MMU, current_pgdir, and the initialization of init_mm.pgd. - SUM is disabled upon taking a trap, which means that user memory is protected during traps taking inside copy_{to,from}_user(). - The sptbr CSR has been renamed to satp in C code. We haven't changed the assembly code in order to maintain compatibility with binutils 2.29, which doesn't understand the new name. Additionally, we're adding some new features: - Basic ftrace support, thanks to Alan Kao! - Support for ZONE_DMA32. This is necessary for all the normal reasons, but also to deal with a deficiency in the Xilinx PCIe controller we're using on our FPGA-based systems. While the ZONE_DMA32 addition should be sufficient for most uses, it doesn't complete the fix for the Xilinx controller. - TLB shootdowns now only target the harts where they're necessary, instead of applying to all harts in the system. These patches have all been sitting on our linux-next branch for a while now. Due to time constraints this is all I feel comfortable submitting during the 4.16 merge window, hopefully we'll do better next time!" [ Note to self: "harts" is RISC-V speak for "hardware threads". I had to look that up. - Linus ] * tag 'riscv-for-linus-4.16-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: riscv: inline set_pgdir into its only caller riscv: rename sptbr to satp riscv: don't read back satp in paging_init riscv: remove the unused current_pgdir function riscv: add ZONE_DMA32 RISC-V: Limit the scope of TLB shootdowns riscv: disable SUM in the exception handler riscv: remove redundant unlikely() riscv: remove unused __ARCH_HAVE_MMU define riscv/ftrace: Add basic support RISC-V: Remove mem_end command line processing RISC-V: Remove duplicate command-line parsing logic audit: Avoid build failures on systems without renameat
2018-02-07Merge tag 'mips_fixes_4.16_1' of ↵Linus Torvalds2-7/+14
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips Pull MIPS fixes from James Hogan: "A couple of MIPS fixes for 4.16-rc1, including an important regression in 4.15 and a rather more longstanding corner case build fix. These are separate from the main pull request as one of the bugs fixed was only recently introduced in v4.15-rc8. - Fix CPS regression on older binutils due to MIPS_ISA_LEVEL_RAW fix (4.15) - Fix allmodconfig + CONFIG_MACH_TX49XX=y builds due to incorrect use of IS_ENABLED() (2.6.28)" * tag 'mips_fixes_4.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS MIPS: CPS: Fix MIPS_ISA_LEVEL_RAW fallout
2018-02-07Merge tag 'mips_4.16' of ↵Linus Torvalds77-553/+1649
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips Pull MIPS updates from James Hogan: "These are the main MIPS changes for 4.16. Rough overview: (1) Basic support for the Ingenic JZ4770 based GCW Zero open-source handheld video game console (2) Support for the Ranchu board (used by Android emulator) (3) Various cleanups and misc improvements More detailed summary: Fixes: - Fix generic platform's USB_*HCI_BIG_ENDIAN selects (4.9) - Fix vmlinuz default build when ZBOOT selected - Fix clean up of vmlinuz targets - Fix command line duplication (in preparation for Ingenic JZ4770) Miscellaneous: - Allow Processor ID reads to be to be optimised away by the compiler (improves performance when running in guest) - Push ARCH_MIGHT_HAVE_PC_SERIO/PARPORT down to platform level to disable on generic platform with Ranchu board support - Add helpers for assembler macro instructions for older assemblers - Use assembler macro instructions to support VZ, XPA & MSA operations on older assemblers, removing C wrapper duplication - Various improvements to VZ & XPA assembly wrappers - Add drivers/platform/mips/ to MIPS MAINTAINERS entry Minor cleanups: - Misc FPU emulation cleanups (removal of unnecessary include, moving macros to common header, checkpatch and sparse fixes) - Remove duplicate assignment of core in play_dead() - Remove duplication in watchpoint handling - Remove mips_dma_mapping_error() stub - Use NULL instead of 0 in prepare_ftrace_return() - Use proper kernel-doc Return keyword for __compute_return_epc_for_insn() - Remove duplicate semicolon in csum_fold() Platform support: Broadcom: - Enable ZBOOT on BCM47xx Generic platform: - Add Ranchu board support, used by Android emulator - Fix machine compatible string matching for Ranchu - Support GIC in EIC mode Ingenic platforms: - Add DT, defconfig and other support for JZ4770 SoC and GCW Zero - Support dynamnic machine types (i.e. JZ4740 / JZ4770 / JZ4780) - Add Ingenic JZ4770 CGU clocks - General Ingenic clk changes to prepare for JZ4770 SoC support - Use common command line handling code - Add DT vendor prefix to GCW (Game Consoles Worldwide) Loongson: - Add MAINTAINERS entry for Loongson2 and Loongson3 platforms - Drop 32-bit support for Loongson 2E/2F devices - Fix build failures due to multiple use of 'MEM_RESERVED'" * tag 'mips_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: (53 commits) MIPS: Malta: Sanitize mouse and keyboard configuration. MIPS: Update defconfigs after previous patch. MIPS: Push ARCH_MIGHT_HAVE_PC_SERIO down to platform level MIPS: Push ARCH_MIGHT_HAVE_PC_PARPORT down to platform level MIPS: SMP-CPS: Remove duplicate assignment of core in play_dead MIPS: Generic: Support GIC in EIC mode MIPS: generic: Fix Makefile alignment MIPS: generic: Fix ranchu_of_match[] termination MIPS: generic: Fix machine compatible matching MIPS: Loongson fix name confict - MEM_RESERVED MIPS: bcm47xx: enable ZBOOT support MIPS: Fix trailing semicolon MIPS: Watch: Avoid duplication of bits in mips_read_watch_registers MIPS: Watch: Avoid duplication of bits in mips_install_watch_registers. MIPS: MSA: Update helpers to use new asm macros MIPS: XPA: Standardise readx/writex accessors MIPS: XPA: Allow use of $0 (zero) to MTHC0 MIPS: XPA: Use XPA instructions in assembly MIPS: VZ: Pass GC0 register names in $n format MIPS: VZ: Update helpers to use new asm macros ...
2018-02-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller15-79/+97
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for you net tree, they are: 1) Restore __GFP_NORETRY in xt_table allocations to mitigate effects of large memory allocation requests, from Michal Hocko. 2) Release IPv6 fragment queue in case of error in fragmentation header, this is a follow up to amend patch 83f1999caeb1, from Subash Abhinov Kasiviswanathan. 3) Flowtable infrastructure depends on NETFILTER_INGRESS as it registers a hook for each flowtable, reported by John Crispin. 4) Missing initialization of info->priv in xt_cgroup version 1, from Cong Wang. 5) Give a chance to garbage collector to run after scheduling flowtable cleanup. 6) Releasing flowtable content on nft_flow_offload module removal is not required at all, there is not dependencies between this module and flowtables, remove it. 7) Fix missing xt_rateest_mutex grabbing for hash insertions, also from Cong Wang. 8) Move nf_flow_table_cleanup() routine to flowtable core, this patch is a dependency for the next patch in this list. 9) Flowtable resources are not properly released on removal from the control plane. Fix this resource leak by scheduling removal of all entries and explicit call to the garbage collector. 10) nf_ct_nat_offset() declaration is dead code, this function prototype is not used anywhere, remove it. From Taehee Yoo. 11) Fix another flowtable resource leak on entry insertion failures, this patch also fixes a possible use-after-free. Patch from Felix Fietkau. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-07Merge tag 'docs-4.16-2' of git://git.lwn.net/linuxLinus Torvalds5-41/+964
Pull more documentation updates from Jonathan Corbet: "A few late-arriving fixes, along with Konstantin's PGP document that had no reason to wait another cycle" * tag 'docs-4.16-2' of git://git.lwn.net/linux: Documentation/process: tweak pgp maintainer guide Documentation/admin-guide: fixes for thunderbolt.rst Documentation: mips: Update AU1xxx_IDE Kconfig dependencies Fix broken link in Documentation/process/kernel-docs.rst Documentation/process: kernel maintainer PGP guide
2018-02-07Add missing structs and defines from recent SMB3.1.1 documentationSteve French1-2/+112
The last two updates to MS-SMB2 protocol documentation added various flags and structs (especially relating to SMB3.1.1 tree connect). Add missing defines and structs to smb2pdu.h Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-02-07address lock imbalance warnings in smbdirect.cSteve French1-7/+9
Although at least one of these was an overly strict sparse warning in the new smbdirect code, it is cleaner to fix - so no warnings. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-02-07cifs: silence compiler warnings showing up with gcc-8.0.0Arnd Bergmann1-3/+1
This bug was fixed before, but came up again with the latest compiler in another function: fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA': fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 4] [-Werror=array-bounds] strncpy(parm_data->list[0].name, ea_name, name_len); Let's apply the same fix that was used for the other instances. Fixes: b2a3ad9ca502 ("cifs: silence compiler warnings showing up with gcc-4.7.0") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steve French <smfrench@gmail.com>
2018-02-07Add some missing debug fields in server and tcon structsSteve French1-1/+8
Allow dumping out debug information on dialect, signing, unix extensions and encryption Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-02-08coccinelle: deref_null: avoid useless computationJulia Lawall1-3/+3
The effect of the rules ifm1, pr11, and pr12 is only used in the final rule, which depends on context && !org && !report. Thus these rules should only be performed in those circumstances. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-07s390: introduce execute-trampolines for branchesMartin Schwidefsky12-35/+327
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and -mfunction_return= compiler options to create a kernel fortified against the specte v2 attack. With CONFIG_EXPOLINE=y all indirect branches will be issued with an execute type instruction. For z10 or newer the EXRL instruction will be used, for older machines the EX instruction. The typical indirect call basr %r14,%r1 is replaced with a PC relative call to a new thunk brasl %r14,__s390x_indirect_jump_r1 The thunk contains the EXRL/EX instruction to the indirect branch __s390x_indirect_jump_r1: exrl 0,0f j . 0: br %r1 The detour via the execute type instruction has a performance impact. To get rid of the detour the new kernel parameter "nospectre_v2" and "spectre_v2=[on,off,auto]" can be used. If the parameter is specified the kernel and module code will be patched at runtime. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-07coccinelle: devm_free: reduce false positivesJulia Lawall1-1/+54
Some files use both a non-devm allocation and a devm_allocation. Don't complain about a free when the same function contains a non-devm allocation. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-07SUNRPC: Queue latency-sensitive socket tasks to xprtiodTrond Myklebust3-1/+17
The response to a write_space notification is very latency sensitive, so we should queue it to the lower latency xprtiod_workqueue. This is something we already do for the other cases where an rpc task holds the transport XPRT_LOCKED bitlock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-07ath9k_htc: add Altai WA1011N-GUOleksij Rempel1-0/+1
as reported in: https://github.com/qca/open-ath9k-htc-firmware/pull/71#issuecomment-361100070 Signed-off-by: Oleksij Rempel <linux@rempel-privat.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07ath10k: fix kernel panic issue during pci probeYu Wang1-2/+10
If device gone during chip reset, ar->normal_mode_fw.board is not initialized, but ath10k_debug_print_hwfw_info() will try to access its member, which will cause 'kernel NULL pointer' issue. This was found using a faulty device (pci link went down sometimes) in a random insmod/rmmod/other-op test. To fix it, check ar->normal_mode_fw.board before accessing the member. pci 0000:02:00.0: BAR 0: assigned [mem 0xf7400000-0xf75fffff 64bit] ath10k_pci 0000:02:00.0: enabling device (0000 -> 0002) ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0 ath10k_pci 0000:02:00.0: failed to read device register, device is gone ath10k_pci 0000:02:00.0: failed to wait for target init: -5 ath10k_pci 0000:02:00.0: failed to warm reset: -5 ath10k_pci 0000:02:00.0: firmware crashed during chip reset ath10k_pci 0000:02:00.0: firmware crashed! (uuid 5d018951-b8e1-404a-8fde-923078b4423a) ath10k_pci 0000:02:00.0: (null) target 0x00000000 chip_id 0x00340aff sub 0000:0000 ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 1 testmode 1 ath10k_pci 0000:02:00.0: firmware ver api 0 features crc32 00000000 ... BUG: unable to handle kernel NULL pointer dereference at 00000004 ... Call Trace: [<fb4e7882>] ath10k_print_driver_info+0x12/0x20 [ath10k_core] [<fb62b7dd>] ath10k_pci_fw_crashed_dump+0x6d/0x4d0 [ath10k_pci] [<fb629f07>] ? ath10k_pci_sleep.part.19+0x57/0xc0 [ath10k_pci] [<fb62c8ee>] ath10k_pci_hif_power_up+0x14e/0x1b0 [ath10k_pci] [<c10477fb>] ? do_page_fault+0xb/0x10 [<fb4eb934>] ath10k_core_register_work+0x24/0x840 [ath10k_core] [<c18a00d8>] ? netlbl_unlhsh_remove+0x178/0x410 [<c10477f0>] ? __do_page_fault+0x480/0x480 [<c1068e44>] process_one_work+0x114/0x3e0 [<c1069d07>] worker_thread+0x37/0x4a0 [<c106e294>] kthread+0xa4/0xc0 [<c1069cd0>] ? create_worker+0x180/0x180 [<c106e1f0>] ? kthread_park+0x50/0x50 [<c18ab4f7>] ret_from_fork+0x1b/0x28 Code: 78 80 b8 50 09 00 00 00 75 5d 8d 75 94 c7 44 24 08 aa d7 52 fb c7 44 24 04 64 00 00 00 89 34 24 e8 82 52 e2 c5 8b 83 dc 08 00 00 <8b> 50 04 8b 08 31 c0 e8 20 57 e3 c5 89 44 24 10 8b 83 58 09 00 EIP: [<fb4e7754>]- ath10k_debug_print_board_info+0x34/0xb0 [ath10k_core] SS:ESP 0068:f4921d90 CR2: 0000000000000004 Signed-off-by: Yu Wang <yyuwang@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07ath9k: Fix get channel default noise floorWojciech Dubowik1-1/+1
Commit 8da58553cc63 ("ath9k: Use calibrated noise floor value when available") introduced regression in ath9k_hw_getchan_noise where per chain nominal noise floor has been taken instead default for channel. Revert to original default channel noise floor. Fixes: 8da58553cc63 ("ath9k: Use calibrated noise floor value when available") Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07ath10k: add support for Ubiquiti rebranded QCA988X v2Tobias Schramm3-0/+36
Some modern Ubiquiti devices contain a rebranded QCA988X rev2 with a custom Ubiquiti vendor and device id. This patch adds support for those devices, treating them as a QCA988X v2. Signed-off-by: Tobias Schramm <tobleminer@gmail.com> [kvalo@codeaurora.org: rebase, add missing fields in hw_params, fix a long line in pci.c:61] Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07PCI: Add Ubiquiti Networks vendor IDTobias Schramm1-0/+2
Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Tobias Schramm <tobleminer@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07ath10k: correct the length of DRAM dump for QCA6174 hw3.x/QCA9377 hw1.1Yu Wang1-1/+2
The length of DRAM dump for QCA6174 hw3.0/hw3.2 and QCA9377 hw1.1 are less than the actual value, some coredump contents are missed. To fix it, change the length from 0x90000 to 0xa8000. Fixes: 703f261dd77f ("ath10k: add memory dump support for QCA6174/QCA9377") Signed-off-by: Yu Wang <yyuwang@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07rtlwifi: rtl8821ae: Fix connection lost problem correctlyLarry Finger2-2/+4
There has been a coding error in rtl8821ae since it was first introduced, namely that an 8-bit register was read using a 16-bit read in _rtl8821ae_dbi_read(). This error was fixed with commit 40b368af4b75 ("rtlwifi: Fix alignment issues"); however, this change led to instability in the connection. To restore stability, this change was reverted in commit b8b8b16352cd ("rtlwifi: rtl8821ae: Fix connection lost problem"). Unfortunately, the unaligned access causes machine checks in ARM architecture, and we were finally forced to find the actual cause of the problem on x86 platforms. Following a suggestion from Pkshih <pkshih@realtek.com>, it was found that increasing the ASPM L1 latency from 0 to 7 fixed the instability. This parameter was varied to see if a smaller value would work; however, it appears that 7 is the safest value. A new symbol is defined for this quantity, thus it can be easily changed if necessary. Fixes: b8b8b16352cd ("rtlwifi: rtl8821ae: Fix connection lost problem") Cc: Stable <stable@vger.kernel.org> # 4.14+ Fix-suggested-by: Pkshih <pkshih@realtek.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: James Cameron <quozl@laptop.org> # x86_64 OLPC NL3 Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07Merge remote-tracking branches 'asoc/topic/sam9x5_wm8731', ↵Mark Brown3-11/+90
'asoc/topic/sgtl5000' and 'asoc/topic/sun8i-codec' into asoc-next
2018-02-07Merge remote-tracking branches 'asoc/topic/max98373', 'asoc/topic/mtk', ↵Mark Brown7-32/+49
'asoc/topic/pcm', 'asoc/topic/rockchip' and 'asoc/topic/sam9g20_wm8731' into asoc-next
2018-02-07Merge remote-tracking branches 'asoc/topic/ak4613', 'asoc/topic/core', ↵Mark Brown28-78/+829
'asoc/topic/dmic' and 'asoc/topic/intel' into asoc-next
2018-02-07Merge remote-tracking branch 'asoc/topic/compress' into asoc-nextMark Brown1-30/+41
2018-02-07Merge remote-tracking branch 'asoc/fix/twl-breakage' into asoc-nextMark Brown2-0/+4
2018-02-07Merge remote-tracking branches 'asoc/fix/compress', 'asoc/fix/core', ↵Mark Brown2472-19482/+39315
'asoc/fix/dapm', 'asoc/fix/mtk' and 'asoc/fix/stm' into asoc-next
2018-02-07ASoC: stm32: add of dependency for stm32 driversOlivier Moysan1-3/+3
Add of dependency for STM32 ASoC drivers. DFSDM of dependency is already inherited from STM32_DFSDM_ADC dependency. Signed-off-by: olivier moysan <olivier.moysan@st.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-02-07x86: hibernate: fix swsusp_arch_resume() prototypeArnd Bergmann4-5/+4
The declaration for swsusp_arch_resume() marks it as 'asmlinkage', but the definition in x86-32 does not, and it fails to include the header with the declaration. This leads to a warning when building with link-time-optimizations: kernel/power/power.h:108:23: error: type of 'swsusp_arch_resume' does not match original declaration [-Werror=lto-type-mismatch] extern asmlinkage int swsusp_arch_resume(void); ^ arch/x86/power/hibernate_32.c:148:0: note: 'swsusp_arch_resume' was previously declared here int swsusp_arch_resume(void) This moves the declaration into a globally visible header file and fixes up both x86 definitions to match it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07ASoC: mt8173-rt5650: fix child-node lookupJohan Hovold1-8/+3
This driver used the wrong OF-helper when looking up the optional capture-codec child node during probe. Instead of searching just children of the sound node, a tree-wide depth-first search starting at the unrelated platform node was done. Not only could this end up matching an unrelated node or no node at all; the platform node could also be prematurely freed since of_find_node_by_name() drops a reference to its first argument. This particular pattern has been observed leading to crashes after probe deferrals in other drivers. Fix this by dropping the broken call to of_find_node_by_name() and keeping only the second, correct lookup using of_get_child_by_name() while taking care not to bail out if the optional node is missing. Note that this also addresses two capture-codec node-reference leaks (one for each of the original helper calls). Compile tested only. Fixes: d349caeb0510 ("ASoC: mediatek: Add second I2S on mt8173-rt5650 machine driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-02-07ASoC: dapm: fix debugfs read using path->connectedKaiChieh Chuang1-1/+1
This fix a bug in dapm_widget_power_read_file(), where it may sent opposite order of source/sink widget into the p->connected(). for example, static int connected_check(source, sink); {"w_sink", NULL, "w_source", connected_check} the dapm_widget_power_read_file() will query p->connected() in following case p->conneted("w_source", "w_sink") p->conneted("w_sink", "w_source") we should avoid the last case, since it's the wrong order (source/sink) as declared in snd_soc_dapm_route. Signed-off-by: KaiChieh Chuang <kaichieh.chuang@mediatek.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-02-07PM / domains: Fix up domain-idle-states OF parsingUlf Hansson1-31/+45
Commit b539cc82d493 (PM / Domains: Ignore domain-idle-states that are not compatible), made it possible to ignore non-compatible domain-idle-states OF nodes. However, in case that happens while doing the OF parsing, the number of elements in the allocated array would exceed the numbers actually needed, thus wasting memory. Fix this by pre-iterating the genpd OF node and counting the number of compatible domain-idle-states nodes, before doing the allocation. While doing this, it makes sense to rework the code a bit to avoid open coding, of parts responsible for the OF node iteration. Let's also take the opportunity to clarify the function header for of_genpd_parse_idle_states(), about what is being returned in case of errors. Fixes: b539cc82d493 (PM / Domains: Ignore domain-idle-states that are not compatible) Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Lina Iyer <ilina@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07netfilter: nf_flow_offload: fix use-after-free and a resource leakFelix Fietkau2-21/+7
flow_offload_del frees the flow, so all associated resource must be freed before. Since the ct entry in struct flow_offload_entry was allocated by flow_offload_alloc, it should be freed by flow_offload_free to take care of the error handling path when flow_offload_add fails. While at it, make flow_offload_del static, since it should never be called directly, only from the gc step Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-07netfilter: remove useless prototypeTaehee Yoo1-5/+0
prototype nf_ct_nat_offset is not used anymore. Signed-off-by: Taehee Yoo <ap420073@gmail.com>
2018-02-07cpufreq: scpi: fix static checker warning cdev isn't an ERR_PTRSudeep Holla1-4/+1
Commit 343a8d17fa8d (cpufreq: scpi: remove arm_big_little dependency) leads to the following static checker warning: drivers/cpufreq/scpi-cpufreq.c:203 scpi_cpufreq_ready() warn: 'cdev' isn't an ERR_PTR of_cpufreq_cooling_register() returns NULL on error. This patch removes the incorrect IS_ERR check on the returned pointer. Fixes: 343a8d17fa8d (cpufreq: scpi: remove arm_big_little dependency) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07platform/x86: samsung-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macroAndy Shevchenko1-15/+3
...instead of open coding file operations followed by custom ->open() callbacks per each attribute. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-07platform/x86: ideapad-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macroAndy Shevchenko1-26/+2
...instead of open coding file operations followed by custom ->open() callbacks per each attribute. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-07platform/x86: dell-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macroAndy Shevchenko1-13/+1
...instead of open coding file operations followed by custom ->open() callbacks per each attribute. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-07seq_file: Introduce DEFINE_SHOW_ATTRIBUTE() helper macroAndy Shevchenko4-41/+14
The DEFINE_SHOW_ATTRIBUTE() helper macro would be useful for current users, which are many of them, and for new comers to decrease code duplication. Acked-by: Lee Jones <lee.jones@linaro.org> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-07cpufreq: remove at32ap-cpufreqCorentin LABBE3-138/+0
Since AVR32 arch was removed, at32ap-cpufreq is useless. Remove this driver. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07ACPI: SPCR: Make SPCR available to x86Prarit Bhargava8-34/+44
SPCR is currently only enabled or ARM64 and x86 can use SPCR to setup an early console. General fixes include updating Documentation & Kconfig (for x86), updating comments, and changing parse_spcr() to acpi_parse_spcr(), and earlycon_init_is_deferred to earlycon_acpi_spcr_enable to be more descriptive. On x86, many systems have a valid SPCR table but the table version is not 2 so the table version check must be a warning. On ARM64 when the kernel parameter earlycon is used both the early console and console are enabled. On x86, only the earlycon should be enabled by by default. Modify acpi_parse_spcr() to allow options for initializing the early console and console separately. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mark Salter <msalter@redhat.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07cpufreq: AMD: Ignore the check for ProcFeedback in ST/CZAkshu Agrawal1-2/+9
In ST/CZ CPUID 8000_0007_EDX[11, ProcFeedbackInterface] is 0, but the mechanism is still available and can be used. Signed-off-by: Akshu Agrawal <akshu.agrawal@amd.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07ACPI / CPPC: Use 64-bit arithmetic instead of 32-bitGustavo A. R. Silva1-1/+1
Add suffix ULL to constant 500 in order to give the compiler complete information about the proper arithmetic to use. Notice that this constant is used in a context that expects an expression of type u64 (64 bits, unsigned). The expression NUM_RETRIES * cppc_ss->latency at line 578, which at preprocessing time translates to 500 * cppc_ss->latency is currently being evaluated using 32-bit arithmetic. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>