aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-09-12pinctrl: armada-37xx: Fix gpio interrupt setupGregory CLEMENT1-19/+22
Since commit dc749a09ea5e ("gpiolib: allow gpio irqchip to map irqs dynamically"), the irqs for gpio are not statically allocated during in gpiochip_irqchip_add. This driver was based on this assumption for initializing the mask associated to each interrupt this led to a NULL pointer crash in the kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Mem abort info: Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000068 CM = 0, WnR = 1 [0000000000000000] user address but active_mm is swapper Internal error: Oops: 96000044 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-06657-g3b9f8ed25dbe #576 Hardware name: Marvell Armada 3720 Development Board DB-88F3720-DDR3 (DT) task: ffff80001d908000 task.stack: ffff000008068000 PC is at armada_37xx_pinctrl_probe+0x5f8/0x670 LR is at armada_37xx_pinctrl_probe+0x5e8/0x670 pc : [<ffff000008e25cdc>] lr : [<ffff000008e25ccc>] pstate: 60000045 sp : ffff00000806bb80 x29: ffff00000806bb80 x28: 0000000000000024 x27: 000000000000000c x26: 0000000000000001 x25: ffff80001efee760 x24: 0000000000000000 x23: ffff80001db6f570 x22: ffff80001db6f438 x21: 0000000000000000 x20: ffff80001d9f4810 x19: ffff80001db6f418 x18: 0000000000000000 x17: 0000000000000001 x16: 0000000000000019 x15: ffffffffffffffff x14: 0140000000000000 x13: 0000000000000000 x12: 0000000000000030 x11: 0101010101010101 x10: 0000000000000040 x9 : ffff000009923580 x8 : ffff80001d400248 x7 : ffff80001d400270 x6 : 0000000000000000 x5 : ffff80001d400248 x4 : ffff80001d400270 x3 : 0000000000000000 x2 : 0000000000000001 x1 : 0000000000000001 x0 : 0000000000000000 Process swapper/0 (pid: 1, stack limit = 0xffff000008068000) Call trace: Exception stack(0xffff00000806ba40 to 0xffff00000806bb80) ba40: 0000000000000000 0000000000000001 0000000000000001 0000000000000000 ba60: ffff80001d400270 ffff80001d400248 0000000000000000 ffff80001d400270 ba80: ffff80001d400248 ffff000009923580 0000000000000040 0101010101010101 baa0: 0000000000000030 0000000000000000 0140000000000000 ffffffffffffffff bac0: 0000000000000019 0000000000000001 0000000000000000 ffff80001db6f418 bae0: ffff80001d9f4810 0000000000000000 ffff80001db6f438 ffff80001db6f570 bb00: 0000000000000000 ffff80001efee760 0000000000000001 000000000000000c bb20: 0000000000000024 ffff00000806bb80 ffff000008e25ccc ffff00000806bb80 bb40: ffff000008e25cdc 0000000060000045 ffff00000806bb60 ffff0000081189b8 bb60: ffffffffffffffff ffff00000811cf1c ffff00000806bb80 ffff000008e25cdc [<ffff000008e25cdc>] armada_37xx_pinctrl_probe+0x5f8/0x670 [<ffff00000859d8c8>] platform_drv_probe+0x58/0xb8 [<ffff00000859bb44>] driver_probe_device+0x22c/0x2d8 [<ffff00000859bcac>] __driver_attach+0xbc/0xc0 [<ffff000008599c84>] bus_for_each_dev+0x4c/0x98 [<ffff00000859b440>] driver_attach+0x20/0x28 [<ffff00000859af90>] bus_add_driver+0x1b8/0x228 [<ffff00000859c648>] driver_register+0x60/0xf8 [<ffff00000859df64>] __platform_driver_probe+0x74/0x130 [<ffff000008e256dc>] armada_37xx_pinctrl_driver_init+0x20/0x28 [<ffff000008083980>] do_one_initcall+0x38/0x128 [<ffff000008e00cf4>] kernel_init_freeable+0x188/0x22c [<ffff0000089b56e8>] kernel_init+0x10/0x100 [<ffff000008084bb0>] ret_from_fork+0x10/0x18 Code: f9403fa2 12001341 1100075a 9ac12041 (b9000001) ---[ end trace 8b0f4e05e1603208 ]--- This patch moves the initialization of the mask field in the irq_startup function. However some callbacks such as irq_set_type and irq_set_wake could be called before irq_startup. For those functions the mask is computed at each call which is not a issue as these functions are not located in a hot path but are used sporadically for configuration. Fixes: dc749a09ea5e ("gpiolib: allow gpio irqchip to map irqs dynamically") Cc: <stable@vger.kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-09-12pinctrl: sprd: fix off by one bugsDan Carpenter1-4/+4
info->groups[] has info->ngroups elements so these comparisons should be >= instead of >. Fixes: 41d32cfce1ae ("pinctrl: sprd: Add Spreadtrum pin control driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Baolin Wang <baolin.wang@spreadtrum.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-09-12pinctrl: sprd: check for allocation failureDan Carpenter1-3/+7
devm_pinctrl_get() could fail with ERR_PTR(-ENOMEM) so I have added a check for that. I also reversed the other IS_ERR() test because it was a little confusing to test one way and then the opposite a couple lines later. Fixes: 41d32cfce1ae ("pinctrl: sprd: Add Spreadtrum pin control driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-09-12pinctrl: sprd: Restrict PINCTRL_SPRD to ARCH_SPRD or COMPILE_TESTGeert Uytterhoeven1-1/+2
The Spreadtrum pinctrl drivers are only useful when building for a Spreadtrum platform. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-09-12pinctrl: sprd: fix build errors and dependenciesRandy Dunlap1-0/+2
Fix build errors when CONFIG_OF is not enabled. Also, the pinctrl-sprd-sc9860 driver uses functions from the pinctrl-sprd driver, so the former should depend on the latter driver. ../drivers/pinctrl/sprd/pinctrl-sprd.c: In function 'sprd_dt_node_to_map': ../drivers/pinctrl/sprd/pinctrl-sprd.c:290:2: error: implicit declaration of function 'pinconf_generic_parse_dt_config' [-Werror=implicit-function-declaration] ret = pinconf_generic_parse_dt_config(np, pctldev, &configs, ^ ../drivers/pinctrl/sprd/pinctrl-sprd.c: At top level: ../drivers/pinctrl/sprd/pinctrl-sprd.c:844:44: error: array type has incomplete element type static const struct pinconf_generic_params sprd_dt_params[] = { ^ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Baolin Wang <baolin.wang@spreadtrum.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: linux-gpio@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-09-12pinctrl: sprd: make three local functions staticColin Ian King1-7/+7
The functions sprd_pmx_get_function_count, sprd_pmx_get_function_name and sprd_pmx_get_function_groups are local to the source and do not need to be in global scope, so make them static. Cleans up sparse warnings: "symbol 'sprd_pmx_get_function_count' was not declared. Should it be static?" "symbol 'sprd_pmx_get_function_name' was not declared. Should it be static?" "symbol 'sprd_pmx_get_function_groups' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-09-12pinctrl: uniphier: include <linux/build_bug.h> instead of <linux/bug.h>Masahiro Yamada1-1/+1
The #includes <linux/bug.h> is here to use BUILD_BUG_ON_ZERO(). Thanks to commit bc6245e5efd7 ("bug: split BUILD_BUG stuff out into <linux/build_bug.h>"), it is now possible to reduce the number of headers pulled in. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-09-12ALSA: firewire: Use common error handling code in snd_motu_stream_start_duplex()Markus Elfring1-8/+8
Add a jump target so that a bit of exception handling can be better reused at the end of this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-09-12KVM: PPC: Book3S HV: Fix bug causing host SLB to be restored incorrectlyPaul Mackerras1-1/+16
Aneesh Kumar reported seeing host crashes when running recent kernels on POWER8. The symptom was an oops like this: Unable to handle kernel paging request for data at address 0xf00000000786c620 Faulting instruction address: 0xc00000000030e1e4 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: powernv_op_panel CPU: 24 PID: 6663 Comm: qemu-system-ppc Tainted: G W 4.13.0-rc7-43932-gfc36c59 #2 task: c000000fdeadfe80 task.stack: c000000fdeb68000 NIP: c00000000030e1e4 LR: c00000000030de6c CTR: c000000000103620 REGS: c000000fdeb6b450 TRAP: 0300 Tainted: G W (4.13.0-rc7-43932-gfc36c59) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24044428 XER: 20000000 CFAR: c00000000030e134 DAR: f00000000786c620 DSISR: 40000000 SOFTE: 0 GPR00: 0000000000000000 c000000fdeb6b6d0 c0000000010bd000 000000000000e1b0 GPR04: c00000000115e168 c000001fffa6e4b0 c00000000115d000 c000001e1b180386 GPR08: f000000000000000 c000000f9a8913e0 f00000000786c600 00007fff587d0000 GPR12: c000000fdeb68000 c00000000fb0f000 0000000000000001 00007fff587cffff GPR16: 0000000000000000 c000000000000000 00000000003fffff c000000fdebfe1f8 GPR20: 0000000000000004 c000000fdeb6b8a8 0000000000000001 0008000000000040 GPR24: 07000000000000c0 00007fff587cffff c000000fdec20bf8 00007fff587d0000 GPR28: c000000fdeca9ac0 00007fff587d0000 00007fff587c0000 00007fff587d0000 NIP [c00000000030e1e4] __get_user_pages_fast+0x434/0x1070 LR [c00000000030de6c] __get_user_pages_fast+0xbc/0x1070 Call Trace: [c000000fdeb6b6d0] [c00000000139dab8] lock_classes+0x0/0x35fe50 (unreliable) [c000000fdeb6b7e0] [c00000000030ef38] get_user_pages_fast+0xf8/0x120 [c000000fdeb6b830] [c000000000112318] kvmppc_book3s_hv_page_fault+0x308/0xf30 [c000000fdeb6b960] [c00000000010e10c] kvmppc_vcpu_run_hv+0xfdc/0x1f00 [c000000fdeb6bb20] [c0000000000e915c] kvmppc_vcpu_run+0x2c/0x40 [c000000fdeb6bb40] [c0000000000e5650] kvm_arch_vcpu_ioctl_run+0x110/0x300 [c000000fdeb6bbe0] [c0000000000d6468] kvm_vcpu_ioctl+0x528/0x900 [c000000fdeb6bd40] [c0000000003bc04c] do_vfs_ioctl+0xcc/0x950 [c000000fdeb6bde0] [c0000000003bc930] SyS_ioctl+0x60/0x100 [c000000fdeb6be30] [c00000000000b96c] system_call+0x58/0x6c Instruction dump: 7ca81a14 2fa50000 41de0010 7cc8182a 68c60002 78c6ffe2 0b060000 3cc2000a 794a3664 390610d8 e9080000 7d485214 <e90a0020> 7d435378 790507e1 408202f0 ---[ end trace fad4a342d0414aa2 ]--- It turns out that what has happened is that the SLB entry for the vmmemap region hasn't been reloaded on exit from a guest, and it has the wrong page size. Then, when the host next accesses the vmemmap region, it gets a page fault. Commit a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with KVM", 2017-07-24) modified the guest exit code so that it now only clears out the SLB for hash guest. The code tests the radix flag and puts the result in a non-volatile CR field, CR2, and later branches based on CR2. Unfortunately, the kvmppc_save_tm function, which gets called between those two points, modifies all the user-visible registers in the case where the guest was in transactional or suspended state, except for a few which it restores (namely r1, r2, r9 and r13). Thus the hash/radix indication in CR2 gets corrupted. This fixes the problem by re-doing the comparison just before the result is needed. For good measure, this also adds comments next to the call sites of kvmppc_save_tm and kvmppc_restore_tm pointing out that non-volatile register state will be lost. Cc: stable@vger.kernel.org # v4.13 Fixes: a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with KVM") Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-09-12KVM: PPC: Book3S HV: Hold kvm->lock around call to kvmppc_update_lpcrPaul Mackerras1-0/+2
Commit 468808bd35c4 ("KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9", 2017-01-30) added a call to kvmppc_update_lpcr() which doesn't hold the kvm->lock mutex around the call, as required. This adds the lock/unlock pair, and for good measure, includes the kvmppc_setup_partition_table() call in the locked region, since it is altering global state of the VM. This error appears not to have any fatal consequences for the host; the consequences would be that the VCPUs could end up running with different LPCR values, or an update to the LPCR value by userspace using the one_reg interface could get overwritten, or the update done by kvmhv_configure_mmu() could get overwritten. Cc: stable@vger.kernel.org # v4.10+ Fixes: 468808bd35c4 ("KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-09-12KVM: PPC: Book3S HV: Don't access XIVE PIPR register using byte accessesBenjamin Herrenschmidt3-5/+4
The XIVE interrupt controller on POWER9 machines doesn't support byte accesses to any register in the thread management area other than the CPPR (current processor priority register). In particular, when reading the PIPR (pending interrupt priority register), we need to do a 32-bit or 64-bit load. Cc: stable@vger.kernel.org # v4.13 Fixes: 2c4fb78f78b6 ("KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-09-11Merge branch 'next' of ↵Linus Torvalds24-106/+747
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal updates from Zhang Rui: - fix resources release in error paths when registering thermal zone. (Christophe Jaillet) - introduce a new thermal driver for on-chip PVT (Process, Voltage and Temperature) monitoring unit implemented on UniPhier SoCs. This driver supports temperature monitoring and alert function. (Kunihiko Hayashi) - Add support for mt2712 chip in the mtk_thermal driver. (Louis Yu) - Add support for RK3328 SOC in rockchip_thermal driver. (Rocky Hao) - cleanup a couple of platform thermal drivers to constify thermal_zone_of_device_ops structures. (Julia Lawall) - a couple of fixes in int340x and intel_pch_thermal thermal driver. (Arvind Yadav, Sumeet Pawnikar, Brian Bian, Ed Swierk, Zhang Rui) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (27 commits) Thermal: int3406_thermal: fix thermal sysfs I/F thermal: mediatek: minor mtk_thermal.c cleanups thermal: mediatek: extend calibration data for mt2712 chip thermal: mediatek: add Mediatek thermal driver for mt2712 dt-bindings: thermal: Add binding document for Mediatek thermal controller thermal: intel_pch_thermal: Fix enable check on Broadwell-DE thermal: rockchip: Support the RK3328 SOC in thermal driver dt-bindings: rockchip-thermal: Support the RK3328 SoC compatible thermal: bcm2835: constify thermal_zone_of_device_ops structures thermal: exynos: constify thermal_zone_of_device_ops structures thermal: zx2967: constify thermal_zone_of_device_ops structures thermal: rcar_gen3_thermal: constify thermal_zone_of_device_ops structures thermal: qoriq: constify thermal_zone_of_device_ops structures thermal: hisilicon: constify thermal_zone_of_device_ops structures thermal: core: Fix resources release in error paths in thermal_zone_device_register() thermal: core: Use the new 'thermal_zone_destroy_device_groups()' helper function thermal: core: Add some new helper functions to free resources thermal: int3400_thermal: process "thermal table changed" event thermal: uniphier: add UniPhier thermal driver dt-bindings: thermal: add binding documentation for UniPhier thermal monitor ...
2017-09-11Merge tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds36-1133/+1228
Pull NFS client updates from Trond Myklebust: "Hightlights include: Stable bugfixes: - Fix mirror allocation in the writeback code to avoid a use after free - Fix the O_DSYNC writes to use the correct byte range - Fix 2 use after free issues in the I/O code Features: - Writeback fixes to split up the inode->i_lock in order to reduce contention - RPC client receive fixes to reduce the amount of time the xprt->transport_lock is held when receiving data from a socket into am XDR buffer. - Ditto fixes to reduce contention between call side users of the rdma rb_lock, and its use in rpcrdma_reply_handler. - Re-arrange rdma stats to reduce false cacheline sharing. - Various rdma cleanups and optimisations. - Refactor the NFSv4.1 exchange id code and clean up the code. - Const-ify all instances of struct rpc_xprt_ops Bugfixes: - Fix the NFSv2 'sec=' mount option. - NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys' - Fix the NFSv3 GRANT callback when the port changes on the server. - Fix livelock issues with COMMIT - NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state() when doing and NFSv4.1 open by filehandle" * tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits) NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests() NFS: Don't hold the group lock when calling nfs_release_request() NFS: Remove pnfs_generic_transfer_commit_list() NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlock NFS: Fix 2 use after free issues in the I/O code NFS: Sync the correct byte range during synchronous writes lockd: Delete an error message for a failed memory allocation in reclaimer() NFS: remove jiffies field from access cache NFS: flush data when locking a file to ensure cache coherence for mmap. SUNRPC: remove some dead code. NFS: don't expect errors from mempool_alloc(). xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler xprtrdma: Re-arrange struct rx_stats NFS: Fix NFSv2 security settings NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys' SUNRPC: ECONNREFUSED should cause a rebind. NFS: Remove unused parameter gfp_flags from nfs_pageio_init() NFSv4: Fix up mirror allocation SUNRPC: Add a separate spinlock to protect the RPC request receive list SUNRPC: Cleanup xs_tcp_read_common() ...
2017-09-11f2fs: clear radix tree dirty tag of pages whose dirty flag is clearedDaeho Jeong2-0/+14
On a senario like writing out the first dirty page of the inode as the inline data, we only cleared dirty flags of the pages, but didn't clear the dirty tags of those pages in the radix tree. If we don't clear the dirty tags of the pages in the radix tree, the inodes which contain the pages will be marked with I_DIRTY_PAGES again and again, and writepages() for the inodes will be invoked in every writeback period. As a result, nothing will be done in every writepages() for the inodes and it will just consume CPU time meaninglessly. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-11NFS: various changes relating to reporting IO errors.NeilBrown4-15/+19
1/ remove 'start' and 'end' args from nfs_file_fsync_commit(). They aren't used. 2/ Make nfs_context_set_write_error() a "static inline" in internal.h so we can... 3/ Use nfs_context_set_write_error() instead of mapping_set_error() if nfs_pageio_add_request() fails before sending any request. NFS generally keeps errors in the open_context, not the mapping, so this is more consistent. 4/ If filemap_write_and_write_range() reports any error, still check ctx->error. The value in ctx->error is likely to be more useful. As part of this, NFS_CONTEXT_ERROR_WRITE is cleared slightly earlier, before nfs_file_fsync_commit() is called, rather than at the start of that function. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-11NFS: Add static NFS I/O tracepointsChuck Lever3-0/+259
Tools like tcpdump and rpcdebug can be very useful. But there are plenty of environments where they are difficult or impossible to use. For example, we've had customers report I/O failures during workloads so heavy that collecting network traffic or enabling RPC debugging are themselves onerous. The kernel's static tracepoints are lightweight (less likely to introduce timing changes) and efficient (the trace data is compact). They also work in scenarios where capturing network traffic is not possible due to lack of hardware support (some InfiniBand HCAs) or where data or network privacy is a concern. Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT is initiated, and when it completes. Record the arguments and results of each operation, which are not shown by existing sunrpc module's tracepoints. For instance, the recorded offset and count can be used to match an "initiate" event to a "done" event. If an NFS READ result returns fewer bytes than requested or zero, seeing the EOF flag can be probative. Seeing an NFS4ERR_BAD_STATEID result is also indication of a particular class of problems. The timing information attached to each event record can often be useful as well. Usage example: [root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done /sys/kernel/debug/tracing/events/nfs/*initiate*/filter /sys/kernel/debug/tracing/events/nfs/*done/filter Hit Ctrl^C to stop recording ^CKernel buffer statistics: Note: "entries" are the entries left in the kernel ring buffer and are not recorded in the trace data. They should all be zero. CPU: 0 entries: 0 overrun: 0 commit overrun: 0 bytes: 3680 oldest event ts: 78.367422 now ts: 100.124419 dropped events: 0 read events: 74 ... and so on. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-11pNFS: Use the standard I/O stateid when calling LAYOUTGETTrond Myklebust1-5/+9
Instead of having a private method for copying the open/delegation stateid, use the same call that is used for standard I/O through the MDS. Note that this means we transmit the stateid with a zero seqid, avoiding issues with NFS4ERR_OLD_STATEID. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-11Merge branch 'for-linus' of ↵Linus Torvalds40-297/+622
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "Life has been busy and I have not gotten half as much done this round as I would have liked. I delayed it so that a minor conflict resolution with the mips tree could spend a little time in linux-next before I sent this pull request. This includes two long delayed user namespace changes from Kirill Tkhai. It also includes a very useful change from Serge Hallyn that allows the security capability attribute to be used inside of user namespaces. The practical effect of this is people can now untar tarballs and install rpms in user namespaces. It had been suggested to generalize this and encode some of the namespace information information in the xattr name. Upon close inspection that makes the things that should be hard easy and the things that should be easy more expensive. Then there is my bugfix/cleanup for signal injection that removes the magic encoding of the siginfo union member from the kernel internal si_code. The mips folks reported the case where I had used FPE_FIXME me is impossible so I have remove FPE_FIXME from mips, while at the same time including a return statement in that case to keep gcc from complaining about unitialized variables. I almost finished the work to get make copy_siginfo_to_user a trivial copy to user. The code is available at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3 But I did not have time/energy to get the code posted and reviewed before the merge window opened. I was able to see that the security excuse for just copying fields that we know are initialized doesn't work in practice there are buggy initializations that don't initialize the proper fields in siginfo. So we still sometimes copy unitialized data to userspace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Introduce v3 namespaced file capabilities mips/signal: In force_fcr31_sig return in the impossible case signal: Remove kernel interal si_code magic fcntl: Don't use ambiguous SIG_POLL si_codes prctl: Allow local CAP_SYS_ADMIN changing exe_file security: Use user_namespace::level to avoid redundant iterations in cap_capable() userns,pidns: Verify the userns for new pid namespaces signal/testing: Don't look for __SI_FAULT in userspace signal/mips: Document a conflict with SI_USER with SIGFPE signal/sparc: Document a conflict with SI_USER with SIGFPE signal/ia64: Document a conflict with SI_USER with SIGFPE signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-11f2fs: speed up gc_urgent mode with SSRJaegeuk Kim3-13/+16
This patch activates SSR in gc_urgent mode. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-11f2fs: better to wait for fstrim completionJaegeuk Kim1-1/+6
In android, we'd better wait for fstrim completion instead of issuing the discard commands asynchronous. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-11block: directly insert blk-mq request from blk_insert_cloned_request()Jens Axboe3-1/+23
A NULL pointer crash was reported for the case of having the BFQ IO scheduler attached to the underlying blk-mq paths of a DM multipath device. The crash occured in blk_mq_sched_insert_request()'s call to e->type->ops.mq.insert_requests(). Paolo Valente correctly summarized why the crash occured with: "the call chain (dm_mq_queue_rq -> map_request -> setup_clone -> blk_rq_prep_clone) creates a cloned request without invoking e->type->ops.mq.prepare_request for the target elevator e. The cloned request is therefore not initialized for the scheduler, but it is however inserted into the scheduler by blk_mq_sched_insert_request." All said, a request-based DM multipath device's IO scheduler should be the only one used -- when the original requests are issued to the underlying paths as cloned requests they are inserted directly in the underlying dispatch queue(s) rather than through an additional elevator. But commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") switched blk_insert_cloned_request() from using blk_mq_insert_request() to blk_mq_sched_insert_request(). Which incorrectly added elevator machinery into a call chain that isn't supposed to have any. To fix this introduce a blk-mq private blk_mq_request_bypass_insert() that blk_insert_cloned_request() calls to insert the request without involving any elevator that may be attached to the cloned request's request_queue. Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Cc: stable@vger.kernel.org Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11Merge branch 'nvme-4.14' of git://git.infradead.org/nvme into for-linusJens Axboe6-73/+70
Pull NVMe fixes from Christoph: "Below are a few small fixes for the current merge window: - fix string.h compilation failures with the new memcpy_and_pad helper (Martin Wilck) - fix incorrect dereference of a PCI data structure in the lightnvm support code (me) - HMB fixes (Akinobu Mita and me)"
2017-09-11net/sched: fix pointer check in gen_handleJosh Hunt1-1/+1
Fixes sparse warning about pointer in gen_handle: net/sched/cls_rsvp.h:392:40: warning: Using plain integer as NULL pointer Fixes: 8113c095672f6 ("net_sched: use void pointer for filter handle") Signed-off-by: Josh Hunt <johunt@akamai.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11ipv6: sr: remove duplicate routing header type checkDavid Lebrun1-4/+0
As seg6_validate_srh() already checks that the Routing Header type is correct, it is not necessary to do it again in get_srh(). Fixes: 5829d70b ("ipv6: sr: fix get_srh() to comply with IPv6 standard "RFC 8200") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11xdp: implement xdp_redirect_map for generic XDPJesper Dangaard Brouer2-14/+28
Using bpf_redirect_map is allowed for generic XDP programs, but the appropriate map lookup was never performed in xdp_do_generic_redirect(). Instead the map-index is directly used as the ifindex. For the xdp_redirect_map sample in SKB-mode '-S', this resulted in trying sending on ifindex 0 which isn't valid, resulting in getting SKB packets dropped. Thus, the reported performance numbers are wrong in commit 24251c264798 ("samples/bpf: add option for native and skb mode for redirect apps") for the 'xdp_redirect_map -S' case. Before commit 109980b894e9 ("bpf: don't select potentially stale ri->map from buggy xdp progs") it could crash the kernel. Like this commit also check that the map_owner owner is correct before dereferencing the map pointer. But make sure that this API misusage can be caught by a tracepoint. Thus, allowing userspace via tracepoints to detect misbehaving bpf_progs. Fixes: 6103aa96ec07 ("net: implement XDP_REDIRECT for xdp generic") Fixes: 24251c264798 ("samples/bpf: add option for native and skb mode for redirect apps") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11perf/bpf: fix a clang compilation issueYonghong Song2-1/+3
clang does not support variable length array for structure member. It has the following error during compilation: kernel/trace/trace_syscalls.c:568:17: error: fields must have a constant size: 'variable length array in structure' extension will never be supported unsigned long args[sys_data->nb_args]; ^ The fix is to use a fixed array length instead. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11net: bonding: Fix transmit load balancing in balance-alb mode if specified ↵Kosuke Tatsukawa1-0/+3
by sysfs Commit cbf5ecb30560 ("net: bonding: Fix transmit load balancing in balance-alb mode") tried to fix transmit dynamic load balancing in balance-alb mode, which wasn't working after commit 8b426dc54cf4 ("bonding: remove hardcoded value"). It turned out that my previous patch only fixed the case when balance-alb was specified as bonding module parameter, and not when balance-alb mode was set using /sys/class/net/*/bonding/mode (the most common usage). In the latter case, tlb_dynamic_lb was set up according to the default mode of the bonding interface, which happens to be balance-rr. This additional patch addresses this issue by setting up tlb_dynamic_lb to 1 if "mode" is set to balance-alb through the sysfs interface. I didn't add code to change tlb_balance_lb back to the default value for other modes, because "mode" is usually set up only once during initialization, and it's not worthwhile to change the static variable bonding_defaults in bond_main.c to a global variable just for this purpose. Commit 8b426dc54cf4 also changes the value of tlb_dynamic_lb for balance-tlb mode if it is set up using the sysfs interface. I didn't change that behavior, because the value of tlb_balance_lb can be changed using the sysfs interface for balance-tlb, and I didn't like changing the default value back and forth for balance-tlb. As for balance-alb, /sys/class/net/*/bonding/tlb_balance_lb cannot be written to. However, I think balance-alb with tlb_dynamic_lb set to 0 is not an intended usage, so there is little use making it writable at this moment. Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value") Reported-by: Reinis Rozitis <r@roze.lv> Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Cc: stable@vger.kernel.org # v4.12+ Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11Input: ucb1400_ts - fix suspend and resume handlingDmitry Torokhov1-2/+2
Instead of stopping the touchscreen we were starting it in suspend, and disabling it in resume. Fixes: c899afedf168 ("Input: ucb1400_ts - convert to threaded IRQ") Reported-by: Anton Volkov <avolkov@ispras.ru> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-09-11Input: edt-ft5x06 - fix access to non-existing registerLuca Ceresoli1-1/+2
reg_addr->reg_report_rate is supposed to exist in M06, not M09. The driver is written to skip avoids access to non-existing registers when the register address is NO_REGISTER (0xff). But reg_addr->reg_report_rate is initialized to 0x00 by devm_kzalloc() (in edt_ft5x06_ts_probe()) and not changed thereafter. So the checks do not work and an access to register 0x00 is done. Fix by setting reg_addr->reg_report_rate to NO_REGISTER. Also fix the only place where reg_report_rate is checked against zero instead of NO_REGISTER. Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-09-11Input: elantech - make arrays debounce_packet static, reduces object code sizeColin Ian King1-2/+6
Don't populate the arrays debounce_packet on the stack, instead make them static. Makes the object code smaller by over 870 bytes: Before: text data bss dec hex filename 30553 9152 0 39705 9b19 drivers/input/mouse/elantech.o After: text data bss dec hex filename 29521 9312 0 38833 97b1 drivers/input/mouse/elantech.o Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-09-11Input: surface3_spi - make const array header static, reduces object code sizeColin Ian King1-1/+1
Don't populate the const array header on the stack, instead make it static. Makes the object code smaller by over 180 bytes: Before: text data bss dec hex filename 6003 1536 0 7539 1d73 surface3_spi.o After: text data bss dec hex filename 5726 1632 0 7358 1cbe surface3_spi.o Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-09-11Input: goodix - add support for capacitive home buttonSergei A. Trusov1-0/+9
On some x86 tablets with a Goodix touchscreen, the Windows logo on the front is a capacitive home button. Touching this button results in a touch with bit 4 of the first byte set, while only the lower 4 bits (0-3) are used to indicate the number of touches. Report a KEY_LEFTMETA press when this happens. Note that the hardware might support more than one button, in which case the "id" byte of coor_data would identify the button in question. This is not implemented as we don't have access to hardware with multiple buttons. Signed-off-by: Sergei A. Trusov <sergei.a.trusov@ya.ru> Acked-by: Bastien Nocera <hadess@hadess.net> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-09-11hv_netvsc: avoid unnecessary wakeups on subchannel creationStephen Hemminger1-2/+2
Only need to wakeup the initiator after all sub-channels are opened. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11hv_netvsc: fix deadlock on hotplugStephen Hemminger4-45/+94
When a virtual device is added dynamically (via host console), then the vmbus sends an offer message for the primary channel. The processing of this message for networking causes the network device to then initialize the sub channels. The problem is that setting up the sub channels needs to wait until the subsequent subchannel offers have been processed. These offers come in on the same ring buffer and work queue as where the primary offer is being processed; leading to a deadlock. This did not happen in older kernels, because the sub channel waiting logic was broken (it wasn't really waiting). The solution is to do the sub channel setup in its own work queue context that is scheduled by the primary channel setup; and then happens later. Fixes: 732e49850c5e ("netvsc: fix race on sub channel creation") Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11mm/backing-dev.c: fix an error handling path in 'cgwb_create()'Christophe JAILLET1-2/+4
If the 'kmalloc' fails, we must go through the existing error handling path. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11Merge tag 'libnvdimm-for-4.14' of ↵Linus Torvalds33-170/+397
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm from Dan Williams: "A rework of media error handling in the BTT driver and other updates. It has appeared in a few -next releases and collected some late- breaking build-error and warning fixups as a result. Summary: - Media error handling support in the Block Translation Table (BTT) driver is reworked to address sleeping-while-atomic locking and memory-allocation-context conflicts. - The dax_device lookup overhead for xfs and ext4 is moved out of the iomap hot-path to a mount-time lookup. - A new 'ecc_unit_size' sysfs attribute is added to advertise the read-modify-write boundary property of a persistent memory range. - Preparatory fix-ups for arm and powerpc pmem support are included along with other miscellaneous fixes" * tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits) libnvdimm, btt: fix format string warnings libnvdimm, btt: clean up warning and error messages ext4: fix null pointer dereference on sbi libnvdimm, nfit: move the check on nd_reserved2 to the endpoint dax: fix FS_DAX=n BLOCK=y compilation libnvdimm: fix integer overflow static analysis warning libnvdimm, nd_blk: remove mmio_flush_range() libnvdimm, btt: rework error clearing libnvdimm: fix potential deadlock while clearing errors libnvdimm, btt: cache sector_size in arena_info libnvdimm, btt: ensure that flags were also unchanged during a map_read libnvdimm, btt: refactor map entry operations with macros libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute ext4: perform dax_device lookup at mount ext2: perform dax_device lookup at mount xfs: perform dax_device lookup at mount dax: introduce a fs_dax_get_by_bdev() helper libnvdimm, btt: check memory allocation failure libnvdimm, label: fix index block size calculation ...
2017-09-11Merge tag 'pwm/for-4.14-rc1' of ↵Linus Torvalds21-307/+699
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "The changes for this release include a new driver for the PWM controller found on SoCs of the ZTX ZX family. Support for an old SH-Mobile SoC has been dropped and the Rockchip and MediaTek drivers gain support for more generations. Other than that there are a bunch of coding style fixes, minor bug fixes and cleanup as well as documentation patches" * tag 'pwm/for-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (32 commits) pwm: pwm-samsung: fix suspend/resume support pwm: samsung: Remove redundant checks from pwm_samsung_config() pwm: mediatek: Disable clock on PWM configuration failure dt-bindings: pwm: Add MT2712/MT7622 information pwm: mediatek: Fix clock control issue pwm: mediatek: Fix PWM source clock selection pwm: mediatek: Fix Kconfig description pwm: tegra: Explicitly request exclusive reset control pwm: hibvt: Explicitly request exclusive reset control pwm: tiehrpwm: Set driver data before runtime PM enable pwm: tiehrpwm: Miscellaneous coding style fixups pwm: tiecap: Set driver data before runtime PM enable pwm: tiecap: Miscellaneous coding style fixups dt-bindings: pwm: tiecap: Add TI 66AK2G SoC specific compatible pwm: tiehrpwm: fix clock imbalance in probe error path pwm: tiehrpwm: Fix runtime PM imbalance at unbind pwm: Kconfig: Enable pwm-tiecap to be built for Keystone pwm: Add ZTE ZX PWM device driver dt-bindings: pwm: Add bindings doc for ZTE ZX PWM controller pwm: bcm2835: Support for polarity setting via DT ...
2017-09-11Merge branch 'bt-fix' (bluetooth fixes from Marcel)Linus Torvalds1-37/+43
Pull bluetooth fix from Marcel Holtmann: "All of our mgmt-tester, l2cap-test and rfcomm-tester unit tests are passing with this patch" * emailed patch from Marcel Holtmann <marcel@holtmann.org>: Bluetooth: Properly check L2CAP config option output buffer length
2017-09-11mlxsw: spectrum: Fix EEPROM access in case of SFP/SFP+Arkadi Sharshevsky1-2/+17
The current code does not handle correctly the access to the upper page in case of SFP/SFP+ EEPROM. In that case the offset should be local and the I2C address should be changed. Fixes: 2ea109039cd3 ("mlxsw: spectrum: Add support for access cable info via ethtool") Reported-by: Florian Klink <flokli@flokli.de> Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11string.h: un-fortify memcpy_and_padMartin Wilck1-13/+2
The way I'd implemented the new helper memcpy_and_pad with __FORTIFY_INLINE caused compiler warnings for certain kernel configurations. This helper is only used in a single place at this time, and thus doesn't benefit much from fortification. So simplify the code by dropping fortification support for now. Fixes: 01f33c336e2d "string.h: add memcpy_and_pad()" Signed-off-by: Martin Wilck <mwilck@suse.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-09-11nvme-pci: implement the HMB entry number and size limitationsChristoph Hellwig4-2/+13
Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size and Host Memory Maximum Descriptors Entries field that were added in TP 4002 HMB Enhancements. These allow the controller to advertise limits for the usual number of segments in the host memory buffer, as well as a minimum usable per-segment size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>
2017-09-11nvme-pci: propagate (some) errors from host memory buffer setupChristoph Hellwig1-6/+12
We want to catch command execution errors when resetting the device, so propagate errors from the Set Features when setting up the host memory buffer. We keep ignoring memory allocation failures, as the spec clearly says that the controller must work without a host memory buffer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org
2017-09-11nvme-pci: use appropriate initial chunk size for HMB allocationAkinobu Mita1-1/+1
The initial chunk size for host memory buffer allocation is currently PAGE_SIZE << MAX_ORDER. MAX_ORDER order allocation is usually failed without CONFIG_DMA_CMA. So the HMB allocation is retried with chunk size PAGE_SIZE << (MAX_ORDER - 1) in general, but there is no problem if the retry allocation works correctly. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> [hch: rebased] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org
2017-09-11nvme-pci: fix host memory buffer allocation fallbackChristoph Hellwig1-18/+30
nvme_alloc_host_mem currently contains two loops that are interwinded, and the outer retry loop turns out to be broken. Fix this by untangling the two. Based on a report an initial patch from Akinobu Mita. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org
2017-09-11nvme: fix lightnvm checkChristoph Hellwig4-35/+14
nvme_nvm_ns_supported assumes every device is a pci_dev, which leads to reading an incorrect field, or possible even a dereference of unallocated memory for fabrics controllers. Fix this by introducing a quirk for lighnvm capable devices instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2017-09-11block: fix integer overflow in __blkdev_sectors_to_bio_pages()Mikulas Patocka1-2/+2
Fix possible integer overflow in __blkdev_sectors_to_bio_pages if sector_t is 32-bit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 615d22a51c04 ("block: Fix __blkdev_issue_zeroout loop") Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabledScott Bauer2-0/+33
Users who are booting off their Opal enabled drives are having issues when they have a shadow MBR set up after s3/resume cycle. When the Drive has a shadow MBR setup the MBRDone flag is set to false upon power loss (S3/S4/S5). When the MBRDone flag is false I/O to LBA 0 -> LBA_END_MBR are remapped to the shadow mbr of the drive. If the drive contains useful data in the 0 -> end_mbr range upon s3 resume the user can never get to that data as the drive will keep remapping it to the MBR. To fix this when we unlock on S3 resume, we need to tell the drive that we're done with the shadow mbr (even though we didnt use it) by setting true to MBRDone. This way the drive will stop the remapping and the user can access their data. Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11block: tolerate tracing of NULL bioGreg Thelen1-3/+2
__get_request() can call trace_block_getrq() with bio=NULL which causes block_get_rq::TP_fast_assign() to deref a NULL pointer and panic. Syzkaller fuzzer panics with linux-next (1d53d908b79d7870d89063062584eead4cf83448): kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 2983 Comm: syzkaller401111 Not tainted 4.13.0-rc7-next-20170901+ #13 task: ffff8801cf1da000 task.stack: ffff8801ce440000 RIP: 0010:perf_trace_block_get_rq+0x697/0x970 include/trace/events/block.h:384 RSP: 0018:ffff8801ce4473f0 EFLAGS: 00010246 RAX: ffff8801cf1da000 RBX: 1ffff10039c88e84 RCX: 1ffffd1ffff84d27 RDX: dffffc0000000001 RSI: 1ffff1003b643e7a RDI: ffffe8ffffc26938 RBP: ffff8801ce447530 R08: 1ffff1003b643e6c R09: ffffe8ffffc26964 R10: 0000000000000002 R11: fffff91ffff84d2d R12: ffffe8ffffc1f890 R13: ffffe8ffffc26930 R14: ffffffff85cad9e0 R15: 0000000000000000 FS: 0000000002641880(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000043e670 CR3: 00000001d1d7a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: trace_block_getrq include/trace/events/block.h:423 [inline] __get_request block/blk-core.c:1283 [inline] get_request+0x1518/0x23b0 block/blk-core.c:1355 blk_old_get_request block/blk-core.c:1402 [inline] blk_get_request+0x1d8/0x3c0 block/blk-core.c:1427 sg_scsi_ioctl+0x117/0x750 block/scsi_ioctl.c:451 sg_ioctl+0x192d/0x2ed0 drivers/scsi/sg.c:1070 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe block_get_rq::TP_fast_assign() has multiple redundant ->dev assignments. Only one of them is NULL tolerant. Favor the NULL tolerant one. Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11dax: remove the pmem_dax_ops->flush abstractionMikulas Patocka8-77/+17
Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is buggy. A DM device may be composed of multiple underlying devices and all of them need to be flushed. That commit just routes the flush request to the first device and ignores the other devices. It could be fixed by adding more complex logic to the device mapper. But there is only one implementation of the method pmem_dax_ops->flush - that is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we don't need the pmem_dax_ops->flush abstraction at all, we can call arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush can't ever reach anything different from arch_wb_cache_pmem(). It should be also pointed out that for some uses of persistent memory it is needed to flush only a very small amount of data (such as 1 cacheline), and it would be overkill if we go through that device mapper machinery for a single flushed cache line. Fix this by removing the pmem_dax_ops->flush abstraction and call arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device mapper code that forwards the flushes. Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-09-11dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACKArnd Bergmann1-10/+10
The new lockdep support for completions causeed the stack usage in dm-integrity to explode, in case of write_journal from 504 bytes to 1120 (using arm gcc-7.1.1): drivers/md/dm-integrity.c: In function 'write_journal': drivers/md/dm-integrity.c:827:1: error: the frame size of 1120 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] The problem is that not only the size of 'struct completion' grows significantly, but we end up having multiple copies of it on the stack when we assign it from a local variable after the initial declaration. COMPLETION_INITIALIZER_ONSTACK() is the right thing to use when we want to declare and initialize a completion on the stack. However, this driver doesn't do that and instead initializes the completion just before it is used. In this case, init_completion() does the same thing more efficiently, and drops the stack usage for the function above down to 496 bytes. While the other functions in this file are not bad enough to cause a warning, they benefit equally from the change, so I do the change across the entire file. In the one place where we reuse a completion, I picked the cheaper reinit_completion() over init_completion(). Fixes: cd8084f91c02 ("locking/lockdep: Apply crossrelease to completions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>