aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-08-05sh: system call wire upYoshinori Sato4-2/+58
Signed-off-by: Yoshinori Sato <[email protected]> Signed-off-by: Rich Felker <[email protected]>
2016-08-05sh: Delete unnecessary checks before the function call "mempool_destroy"Markus Elfring1-4/+2
The mempool_destroy() function tests whether its argument is NULL and then returns immediately. Thus the test around the calls is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <[email protected]> Signed-off-by: Rich Felker <[email protected]>
2016-08-05sh: do not perform IPI-based cache flush except on boards that need itRich Felker1-0/+3
Signed-off-by: Rich Felker <[email protected]>
2016-08-05sh: add SMP support for J2Rich Felker3-0/+145
Support is hooked up via a cpu start method specified in the device tree, and also depends on DT nodes that describe the interfaces for performing IPI and identifying which cpu execution is taking place on. The currently used method is a form of spin table, where secondary cpus are unblocked by writing to a special address. Signed-off-by: Rich Felker <[email protected]>
2016-08-05sh: SMP support for SH2 entry.SRich Felker1-0/+50
The SH2 version of entry.S uses global variables, which need to be cpu-local in order to work with SMP. For ease of access from asm, simply use arrays indexed by cpu number, and require the availability of an address (mmio register or properly setup per-cpu memory) from which the current cpu's index can be read. Signed-off-by: Rich Felker <[email protected]>
2016-08-05sh: add working futex atomic ops on userspace addresses for smpRich Felker4-124/+134
The version of futex.h in asm-generic should really be adapted to do the same thing so that this hideous code does not have to be duplicated per-arch. Signed-off-by: Rich Felker <[email protected]>
2016-08-05sh: add J2 atomics using the cas.l instructionRich Felker9-216/+481
Signed-off-by: Rich Felker <[email protected]>
2016-08-05sh: add AT_HWCAP flag for J-Core cas.l instructionRich Felker2-0/+3
The J-Core cpu has, as an ISA extension, an atomic compare-and-swap instruction cas.l which applications need to use (instead the imask or gusa atomic models, which are fundamentally limited to UP) for synchronization in order to be compatible with SMP systems. Provide a hwcap flag so that it's possible to do runtime selection and support both. Signed-off-by: Rich Felker <[email protected]>
2016-08-05sh: add support for J-Core J2 processorRich Felker10-5/+127
At the CPU/ISA level, the J2 is compatible with SH-2, and thus the changes to add J2 support build on existing SH-2 support. However, J2 does not duplicate the memory-mapped SH-2 features like the cache interface. Instead, the cache interfaces is described in the device tree, and new code is added to be able to access the flat device tree at early boot before it is unflattened. Support is also added for receiving interrupts on trap numbers in the range 16 to 31, since the J-Core aic1 interrupt controller generates these traps. This range was unused but nominally for hardware exceptions on SH-2, and a few values in this range were used for exceptions on SH-2A, but SH-2A has its own version of the relevant code. No individual cpu subtypes are added for J2 since the intent moving forward is to represent SoCs with device tree rather than as hard-coded subtypes in the kernel. The CPU_SUBTYPE_J2 Kconfig item exists only to fit into the existing cpu selection mechanism until it is overhauled. Signed-off-by: Rich Felker <[email protected]>
2016-08-04ACPI / hotplug / PCI: Runtime resume bridges before bus rescansRafael J. Wysocki1-0/+6
If a PCI bridge (or PCIe port) that is runtime-suspended gets an ACPI hotplug notification, such as a bus check, it has to be resumed before re-scanning the devices below it, or those devices will not be accessible and will be treated as hot-removed. Make that happen and let the bridge suspend again after the bus below it has been re-scanned. This is a replacement for commit 16468c783cb4 ("ACPI / hotplug / PCI: Runtime resume bridge before rescan") that has been reverted, because it introduced a system resume regression (due to missing bridge->pci_dev checks that are necessary in case the notification is targeted at the host bridge) and it is necessary for the code added by commit 006d44e49a25 ("PCI: Add runtime PM support for PCIe ports") to work as expected. Tested-by: Linus Torvalds <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-04.mailmap: Correct entries for Mauro Carvalho Chehab and Shuah KhanJoe Perches1-2/+11
mailmap entries may not use more than 2 email addresses per line. Signed-off-by: Joe Perches <[email protected]> Acked-by: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2016-08-04Merge branch 'akpm' (patches from Andrew)Linus Torvalds6-11/+16
Merge misc fixes from Andrew Morton: "A few late-breaking fixes" * emailed patches from Andrew Morton <[email protected]>: mm/memblock.c: fix NULL dereference error MAINTAINERS: update cgroup's document path slub: drop bogus inline for fixup_red_left() powerpc/fsl_rio: fix a missing error code mm: initialise per_cpu_nodestats for all online pgdats at boot mm/memblock: fix a typo in a comment mm: disable CONFIG_MEMORY_HOTPLUG when KASAN is enabled
2016-08-04Merge tag 'for-linus-2' of ↵Linus Torvalds66-3012/+4323
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
2016-08-04Merge tag 'for-linus' of ↵Linus Torvalds107-663/+16539
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-04Merge branch 'for-next' of ↵Linus Torvalds23-67/+5110
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "The most notable item is IBM virtual SCSI target driver, that was originally ported to target-core back in 2010 by Tomo-san, and has been brought forward to v4.x code by Bryant Ly, Michael Cyr and co over the last months. Also included are two ORDERED task related bug-fixes Bryant + Michael found along the way using ibmvscsis with AIX guests, plus a few miscellaneous target-core + iscsi-target bug-fixes with associated stable tags" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: fix spelling mistake: "limitiation" -> "limitation" target: Fix residual overflow handling in target_complete_cmd_with_length tcm_fc: set and unset FCP_SPPF_TARG_FCN iscsi-target: Fix panic when adding second TCP connection to iSCSI session ibmvscsis: Initial commit of IBM VSCSI Tgt Driver target: Fix ordered task CHECK_CONDITION early exception handling target: Fix ordered task target_setup_cmd_from_cdb exception hang target: Fix max_unmap_lba_count calc overflow target: Fix race between iscsi-target connection shutdown + ABORT_TASK target: Fix missing complete during ABORT_TASK + CMD_T_FABRIC_STOP
2016-08-04mm/memblock.c: fix NULL dereference errorzijun_hu1-1/+4
It causes NULL dereference error and failure to get type_a->regions[0] info if parameter type_b of __next_mem_range_rev() == NULL Fix this by checking before dereferring and initializing idx_b to 0 The approach is tested by dumping all types of region via __memblock_dump_all() and __next_mem_range_rev() fixed to UART separately the result is okay after checking the logs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zijun_hu <[email protected]> Tested-by: zijun_hu <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-04MAINTAINERS: update cgroup's document pathseokhoon.yoon1-2/+2
cgroup's document path is changed to "cgroup-v1". update it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: seokhoon.yoon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-04slub: drop bogus inline for fixup_red_left()Geert Uytterhoeven1-1/+1
With m68k-linux-gnu-gcc-4.1: include/linux/slub_def.h:126: warning: `fixup_red_left' declared inline after being called include/linux/slub_def.h:126: warning: previous declaration of `fixup_red_left' was here Commit c146a2b98eb5 ("mm, kasan: account for object redzone in SLUB's nearest_obj()") made fixup_red_left() global, but forgot to remove the inline keyword. Fixes: c146a2b98eb5898e ("mm, kasan: account for object redzone in SLUB's nearest_obj()") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Alexander Potapenko <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-04powerpc/fsl_rio: fix a missing error codeDan Carpenter1-0/+1
We should set the error code here rather than incorrectly returning 0. Otherwise static checkers complain. Link: http://lkml.kernel.org/r/20160804053525.GM775@mwanda Signed-off-by: Dan Carpenter <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Alexandre Bounine <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-04mm: initialise per_cpu_nodestats for all online pgdats at bootMel Gorman1-5/+5
Paul Mackerras and Reza Arbab reported that machines with memoryless nodes fail when vmstats are refreshed. Paul reported an oops as follows Unable to handle kernel paging request for data at address 0xff7a10000 Faulting instruction address: 0xc000000000270cd0 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-kvm+ #118 task: c000000ff0680010 task.stack: c000000ff0704000 NIP: c000000000270cd0 LR: c000000000270ce8 CTR: 0000000000000000 REGS: c000000ff0707900 TRAP: 0300 Not tainted (4.7.0-kvm+) MSR: 9000000102009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE,TM[E]> CR: 846b6824 XER: 20000000 CFAR: c000000000008768 DAR: 0000000ff7a10000 DSISR: 42000000 SOFTE: 1 NIP refresh_zone_stat_thresholds+0x80/0x240 LR refresh_zone_stat_thresholds+0x98/0x240 Call Trace: refresh_zone_stat_thresholds+0xb8/0x240 (unreliable) Both supplied potential fixes but one potentially misses checks and another had redundant initialisations. This version initialises per_cpu_nodestats on a per-pgdat basis instead of on a per-zone basis. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reported-by: Paul Mackerras <[email protected]> Reported-by: Reza Arbab <[email protected]> Tested-by: Reza Arbab <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-04mm/memblock: fix a typo in a commentAlexander Kuleshov1-2/+2
s/accomodate/accommodate/ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-04mm: disable CONFIG_MEMORY_HOTPLUG when KASAN is enabledzhong jiang1-0/+1
At present it is obvious that memory online and offline will fail when KASAN is enabled. So add the condition to limit the memory_hotplug when KASAN is enabled. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhong jiang <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-04Merge tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linuxLinus Torvalds37-335/+784
Pull nfsd updates from Bruce Fields: "Highlights: - Trond made a change to the server's tcp logic that allows a fast client to better take advantage of high bandwidth networks, but may increase the risk that a single client could starve other clients; a new sunrpc.svc_rpc_per_connection_limit parameter should help mitigate this in the (hopefully unlikely) event this becomes a problem in practice. - Tom Haynes added a minimal flex-layout pnfs server, which is of no use in production for now--don't build it unless you're doing client testing or further server development" * tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits) nfsd: remove some dead code in nfsd_create_locked() nfsd: drop unnecessary MAY_EXEC check from create nfsd: clean up bad-type check in nfsd_create_locked nfsd: remove unnecessary positive-dentry check nfsd: reorganize nfsd_create nfsd: check d_can_lookup in fh_verify of directories nfsd: remove redundant zero-length check from create nfsd: Make creates return EEXIST instead of EACCES SUNRPC: Detect immediate closure of accepted sockets SUNRPC: accept() may return sockets that are still in SYN_RECV nfsd: allow nfsd to advertise multiple layout types nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock nfsd/blocklayout: Make sure calculate signature/designator length aligned xfs: abstract block export operations from nfsd layouts SUNRPC: Remove unused callback xpo_adjust_wspace() SUNRPC: Change TCP socket space reservation SUNRPC: Add a server side per-connection limit SUNRPC: Micro optimisation for svc_data_ready SUNRPC: Call the default socket callbacks instead of open coding SUNRPC: lock the socket while detaching it ...
2016-08-04Merge branch 'for-linus-4.8' of ↵Linus Torvalds45-823/+1072
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull more btrfs updates from Chris Mason: "This is part two of my btrfs pull, which is some cleanups and a batch of fixes. Most of the code here is from Jeff Mahoney, making the pointers we pass around internally more consistent and less confusing overall. I noticed a small problem right before I sent this out yesterday, so I fixed it up and re-tested overnight" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (40 commits) Btrfs: fix __MAX_CSUM_ITEMS btrfs: btrfs_abort_transaction, drop root parameter btrfs: add btrfs_trans_handle->fs_info pointer btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction btrfs: convert nodesize macros to static inlines btrfs: introduce BTRFS_MAX_ITEM_SIZE btrfs: cleanup, remove prototype for btrfs_find_root_ref btrfs: copy_to_sk drop unused root parameter btrfs: simpilify btrfs_subvol_inherit_props btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root btrfs: tests, require fs_info for root btrfs: tests, move initialization into tests/ btrfs: btrfs_test_opt and friends should take a btrfs_fs_info btrfs: prefix fsid to all trace events btrfs: plumb fs_info into btrfs_work btrfs: remove obsolete part of comment in statfs btrfs: hide test-only member under ifdef btrfs: Ratelimit "no csum found" info message btrfs: Add ratelimit to btrfs printing Btrfs: fix unexpected balance crash due to BUG_ON ...
2016-08-04Merge tag 'upstream-4.8-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds12-93/+278
Pull UBI/UBIFS updates from Richard Weinberger: "This contains mostly cleanups and minor improvements of UBI and UBIFS" * tag 'upstream-4.8-rc1' of git://git.infradead.org/linux-ubifs: ubi: Use bitmaps in Fastmap self-check code ubi: Be more paranoid while seaching for the most recent Fastmap ubi: Check whether the Fastmap anchor matches the super block ubi: Rework Fastmap attach base code ubi: Fix whitespace issue in count_fastmap_pebs() ubi: Introduce vol_ignored() ubi: Fix scan_fast() comment ubifs: switch_gc_head: Remove redondant sync of wbuf ubi: Make volume resize power cut aware ubi: Fix early logging ubi: gluebi: Fix double refcounting ubifs: Silence early error messages if MS_SILENT is set ubi: Fix race condition between ubi device creation and udev ubifs: Update comment for ubifs_errc ubi: Only read necessary size when reading the VID header ubifs: Make xattr structures static ubifs: Silence error output if MS_SILENT is set
2016-08-04Merge branch 'for-linus-4.8-rc1' of ↵Linus Torvalds9-24/+32
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: "Beside of various fixes this also contains patches to enable features such was Kcov, kmemleak and TRACE_IRQFLAGS_SUPPORT on UML" * 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() um: Support kcov um: Enable TRACE_IRQFLAGS_SUPPORT um: Use asm-generic/irqflags.h um: Fix possible deadlock in sig_handler_common() um: Select HAVE_DEBUG_KMEMLEAK um: Setup physical memory in setup_arch() um: Eliminate null test after alloc_bootmem
2016-08-04DocBook: use DOCBOOKS="" to ignore DocBooks instead of IGNORE_DOCBOOKS=1Jani Nikula1-16/+10
Instead of a separate ignore flag, use the obvious DOCBOOKS="" to ignore all DocBook files. This is also in line with the Sphinx build being ignored if a non-empty DOCBOOKS make variable is specified on the make command line. This replaces the IGNORE_DOCBOOKS introduced in commit 547218864afb2745d9d137f005f3380ef96b26ab Author: Mauro Carvalho Chehab <[email protected]> Date: Sat Jul 9 13:12:45 2016 -0300 doc-rst: add an option to ignore DocBooks when generating docs and aligns with commit 6387872c86ea6698ed8faa3ccad1d1bd60f762f7 Author: Jani Nikula <[email protected]> Date: Fri Jul 1 15:24:44 2016 +0300 Documentation/sphinx: skip build if user requested specific DOCBOOKS Cc: Daniel Vetter <[email protected]> Signed-off-by: Jani Nikula <[email protected]> Tested-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2016-08-04Merge branch 'parisc-4.8-1' of ↵Linus Torvalds4-35/+182
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - added an optimized hash implementation for parisc (George Spelvin) - C99 style cleanups in iomap.c (Amitoj Kaur Chawla) - added breaks to switch statement in PDC function (noticed by Dan Carpenter) * 'parisc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Change structure intialisation to C99 style in iomap.c parisc: Add break statements to pdc_pat_io_pci_cfg_read() parisc: Add <asm/hash.h>
2016-08-04Merge branch 'for-linus' of ↵Linus Torvalds7-285/+344
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: "This series is all about Nicolas flat format support for MMU systems. Traditional m68k no-MMU flat format binaries can now be run on m68k MMU enabled systems too. The series includes some nice cleanups of the binfmt_flat code and converts it to using proper user space accessor functions. With all this in place you can boot and run a complete no-MMU flat format based user space on an MMU enabled system" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: enable binfmt_flat on systems with an MMU binfmt_flat: allow compressed flat binary format to work on MMU systems binfmt_flat: add MMU-specific support binfmt_flat: update libraries' data segment pointer with userspace accessors binfmt_flat: use clear_user() rather than memset() to clear .bss binfmt_flat: use proper user space accessors with old relocs code binfmt_flat: use proper user space accessors with relocs processing code binfmt_flat: clean up create_flat_tables() and stack accesses binfmt_flat: use generic transfer_args_to_stack() elf_fdpic_transfer_args_to_stack(): make it generic binfmt_flat: prevent kernel dammage from corrupted executable headers binfmt_flat: convert printk invocations to their modern form binfmt_flat: assorted cleanups m68k: use same start_thread() on MMU and no-MMU m68k: fix file path comment m68k: fix bFLT executable running on MMU enabled systems
2016-08-04nfsd: remove some dead code in nfsd_create_locked()Dan Carpenter1-3/+2
We changed this around in f135af1041f ('nfsd: reorganize nfsd_create') so "dchild" can't be an error pointer any more. Also, dchild can't be NULL here (and dput would already handle this even if it was). Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-08-04nfsd: drop unnecessary MAY_EXEC check from createJ. Bruce Fields2-11/+2
We need an fh_verify to make sure we at least have a dentry, but actual permission checks happen later. Signed-off-by: J. Bruce Fields <[email protected]>
2016-08-04nfsd: clean up bad-type check in nfsd_create_lockedJ. Bruce Fields1-7/+4
Minor cleanup, no change in behavior. Signed-off-by: J. Bruce Fields <[email protected]>
2016-08-04nfsd: remove unnecessary positive-dentry checkJ. Bruce Fields1-10/+0
vfs_{create,mkdir,mknod} each begin with a call to may_create(), which returns EEXIST if the object already exists. This check is therefore unnecessary. (In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to RFC 1094, our code seems to believe that a CREATE of an existing file should succeed. I'm leaving that behavior alone.) Signed-off-by: J. Bruce Fields <[email protected]>
2016-08-04nfsd: reorganize nfsd_createJ. Bruce Fields3-55/+61
There's some odd logic in nfsd_create() that allows it to be called with the parent directory either locked or unlocked. The only already-locked caller is NFSv2's nfsd_proc_create(). It's less confusing to split out the unlocked case into a separate function which the NFSv2 code can call directly. Also fix some comments while we're here. Signed-off-by: J. Bruce Fields <[email protected]>
2016-08-04nfsd: check d_can_lookup in fh_verify of directoriesJ. Bruce Fields2-13/+10
Create and other nfsd ops generally assume we can call lookup_one_len on inodes with S_IFDIR set. Al says that this assumption isn't true in general, though it should be for the filesystem objects nfsd sees. Add a check just to make sure our assumption isn't violated. Remove a couple checks for i_op->lookup in create code. Cc: Al Viro <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-08-04nfsd: remove redundant zero-length check from createJ. Bruce Fields2-6/+0
lookup_one_len already has this check. The only effect of this patch is to return access instead of perm in the 0-length-filename case. I actually prefer nfserr_perm (or _inval?), but I doubt anyone cares. The isdotent check seems redundant too, but I worry that some client might actually care about that strange nfserr_exist error. Signed-off-by: J. Bruce Fields <[email protected]>
2016-08-04nfsd: Make creates return EEXIST instead of EACCESOleg Drokin2-2/+15
When doing a create (mkdir/mknod) on a name, it's worth checking the name exists first before returning EACCES in case the directory is not writeable by the user. This makes return values on the client more consistent regardless of whenever the entry there is cached in the local cache or not. Another positive side effect is certain programs only expect EEXIST in that case even despite POSIX allowing any valid error to be returned. Signed-off-by: Oleg Drokin <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-08-04mm/block: convert rw_page users to bio op useMike Christie11-59/+60
The rw_page users were not converted to use bio/req ops. As a result bdev_write_page is not passing down REQ_OP_WRITE and the IOs will be sent down as reads. Signed-off-by: Mike Christie <[email protected]> Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code") Modified by me to: 1) Drop op_flags passing into ->rw_page(), as we don't use it. 2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK Signed-off-by: Jens Axboe <[email protected]>
2016-08-04loop: make do_req_filebacked more robustChristoph Hellwig1-33/+22
Use a switch statement to iterate over the possible operations and error out if it's an incorrect one. Signed-off-by: Jens Axboe <[email protected]>
2016-08-04loop: don't try to use AIO for discardsChristoph Hellwig1-4/+8
Fix a fat-fingered conversion to the req_op accessors, and also use a switch statement to make it more obvious what is being checked. Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: Dave Chinner <[email protected]> Fixes: c2df40 ("drivers: use req op accessor"); Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-04blk-mq: fix deadlock in blk_mq_register_disk() error pathJens Axboe1-4/+8
If we fail registering any of the hardware queues, we call into blk_mq_unregister_disk() with the hotplug mutex already held. Since blk_mq_unregister_disk() attempts to acquire the same mutex, we end up in a less than happy place. Reported-by: Jinpu Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-04Include: blkdev: Removed duplicate 'struct request;' declaration.John Pittman1-1/+0
In include/linux/blkdev.h duplicate declarations of the request struct exist. Cleaned up by removing the second, unneeded declaration. Signed-off-by: John Pittman <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-04Fixup direct bi_rw modifiersShaun Tancheff1-1/+1
bi_rw should be using bio_set_op_attrs to set bi_rw. Signed-off-by: Shaun Tancheff <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: David Sterba <[email protected]> Cc: Mike Christie <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-04block: fix bdi vs gendisk lifetime mismatchDan Williams4-1/+22
The name for a bdi of a gendisk is derived from the gendisk's devt. However, since the gendisk is destroyed before the bdi it leaves a window where a new gendisk could dynamically reuse the same devt while a bdi with the same name is still live. Arrange for the bdi to hold a reference against its "owner" disk device while it is registered. Otherwise we can hit sysfs duplicate name collisions like the following: WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1' Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015 0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351 0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8 Call Trace: [<ffffffff8134caec>] dump_stack+0x63/0x87 [<ffffffff8108c351>] __warn+0xd1/0xf0 [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80 [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90 [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320 [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0 [<ffffffff8134ff55>] kobject_add+0x75/0xd0 [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f [<ffffffff8148b0a5>] device_add+0x125/0x610 [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100 [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20 [<ffffffff811b775c>] bdi_register+0x8c/0x180 [<ffffffff811b7877>] bdi_register_dev+0x27/0x30 [<ffffffff813317f5>] add_disk+0x175/0x4a0 Cc: <[email protected]> Reported-by: Yi Zhang <[email protected]> Tested-by: Yi Zhang <[email protected]> Signed-off-by: Dan Williams <[email protected]> Fixed up missing 0 return in bdi_register_owner(). Signed-off-by: Jens Axboe <[email protected]>
2016-08-04blk-mq: Allow timeouts to run while queue is freezingGabriel Krisman Bertazi1-1/+14
In case a submitted request gets stuck for some reason, the block layer can prevent the request starvation by starting the scheduled timeout work. If this stuck request occurs at the same time another thread has started a queue freeze, the blk_mq_timeout_work will not be able to acquire the queue reference and will return silently, thus not issuing the timeout. But since the request is already holding a q_usage_counter reference and is unable to complete, it will never release its reference, preventing the queue from completing the freeze started by first thread. This puts the request_queue in a hung state, forever waiting for the freeze completion. This was observed while running IO to a NVMe device at the same time we toggled the CPU hotplug code. Eventually, once a request got stuck requiring a timeout during a queue freeze, we saw the CPU Hotplug notification code get stuck inside blk_mq_freeze_queue_wait, as shown in the trace below. [c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable) [c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350 [c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990 [c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0 [c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110 [c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0 [c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100 [c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0 [c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400 [c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0 [c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50 [c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140 [c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0 [c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0 [c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0 [c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200 [c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0 [c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230 [c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110 [c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4 The fix is to allow the timeout work to execute in the window between dropping the initial refcount reference and the release of the last reference, which actually marks the freeze completion. This can be achieved with percpu_refcount_tryget, which does not require the counter to be alive. This way the timeout work can do it's job and terminate a stuck request even during a freeze, returning its reference and avoiding the deadlock. Allowing the timeout to run is just a part of the fix, since for some devices, we might get stuck again inside the device driver's timeout handler, should it attempt to allocate a new request in that path - which is a quite common action for Abort commands, which need to be sent after a timeout. In NVMe, for instance, we call blk_mq_alloc_request from inside the timeout handler, which will fail during a freeze, since it also tries to acquire a queue reference. I considered a similar change to blk_mq_alloc_request as a generic solution for further device driver hangs, but we can't do that, since it would allow new requests to disturb the freeze process. I thought about creating a new function in the block layer to support unfreezable requests for these occasions, but after working on it for a while, I feel like this should be handled in a per-driver basis. I'm now experimenting with changes to the NVMe timeout path, but I'm open to suggestions of ways to make this generic. Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Cc: Brian King <[email protected]> Cc: Keith Busch <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-04nbd: fix race in ioctlVegard Nossum1-8/+4
Quentin ran into this bug: WARNING: CPU: 64 PID: 10085 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x65/0x80 sysfs: cannot create duplicate filename '/devices/virtual/block/nbd3/pid' Modules linked in: nbd CPU: 64 PID: 10085 Comm: qemu-nbd Tainted: G D 4.6.0+ #7 0000000000000000 ffff8820330bba68 ffffffff814b8791 ffff8820330bbac8 0000000000000000 ffff8820330bbab8 ffffffff810d04ab ffff8820330bbaa8 0000001f00000296 0000000000017681 ffff8810380bf000 ffffffffa0001790 Call Trace: [<ffffffff814b8791>] dump_stack+0x4d/0x6c [<ffffffff810d04ab>] __warn+0xdb/0x100 [<ffffffff810d0574>] warn_slowpath_fmt+0x44/0x50 [<ffffffff81218c65>] sysfs_warn_dup+0x65/0x80 [<ffffffff81218a02>] sysfs_add_file_mode_ns+0x172/0x180 [<ffffffff81218a35>] sysfs_create_file_ns+0x25/0x30 [<ffffffff81594a76>] device_create_file+0x36/0x90 [<ffffffffa0000e8d>] __nbd_ioctl+0x32d/0x9b0 [nbd] [<ffffffff814cc8e8>] ? find_next_bit+0x18/0x20 [<ffffffff810f7c29>] ? select_idle_sibling+0xe9/0x120 [<ffffffff810f6cd7>] ? __enqueue_entity+0x67/0x70 [<ffffffff810f9bf0>] ? enqueue_task_fair+0x630/0xe20 [<ffffffff810efa76>] ? resched_curr+0x36/0x70 [<ffffffff810f0078>] ? check_preempt_curr+0x78/0x90 [<ffffffff810f00a2>] ? ttwu_do_wakeup+0x12/0x80 [<ffffffff810f01b1>] ? ttwu_do_activate.constprop.86+0x61/0x70 [<ffffffff810f0c15>] ? try_to_wake_up+0x185/0x2d0 [<ffffffff810f0d6d>] ? default_wake_function+0xd/0x10 [<ffffffff81105471>] ? autoremove_wake_function+0x11/0x40 [<ffffffffa0001577>] nbd_ioctl+0x67/0x94 [nbd] [<ffffffff814ac0fd>] blkdev_ioctl+0x14d/0x940 [<ffffffff811b0da2>] ? put_pipe_info+0x22/0x60 [<ffffffff811d96cc>] block_ioctl+0x3c/0x40 [<ffffffff811ba08d>] do_vfs_ioctl+0x8d/0x5e0 [<ffffffff811aa329>] ? ____fput+0x9/0x10 [<ffffffff810e9092>] ? task_work_run+0x72/0x90 [<ffffffff811ba627>] SyS_ioctl+0x47/0x80 [<ffffffff8185f5df>] entry_SYSCALL_64_fastpath+0x17/0x93 ---[ end trace 7899b295e4f850c8 ]--- It seems fairly obvious that device_create_file() is not being protected from being run concurrently on the same nbd. Quentin found the following relevant commits: 1a2ad21 nbd: add locking to nbd_ioctl 90b8f28 [PATCH] end of methods switch: remove the old ones d4430d6 [PATCH] beginning of methods conversion 08f8585 [PATCH] move block_device_operations to blkdev.h It would seem that the race was introduced in the process of moving nbd from BKL to unlocked ioctls. By setting nbd->task_recv while the mutex is held, we can prevent other processes from running concurrently (since nbd->task_recv is also checked while the mutex is held). Reported-and-tested-by: Quentin Casasnovas <[email protected]> Cc: Markus Pargmann <[email protected]> Cc: Paul Clements <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-04block: fix use-after-free in seq fileVegard Nossum1-0/+1
I got a KASAN report of use-after-free: ================================================================== BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508 Read of size 8 by task trinity-c1/315 ============================================================================= BUG kmalloc-32 (Not tainted): kasan: bad access detected ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315 ___slab_alloc+0x4f1/0x520 __slab_alloc.isra.58+0x56/0x80 kmem_cache_alloc_trace+0x260/0x2a0 disk_seqf_start+0x66/0x110 traverse+0x176/0x860 seq_read+0x7e3/0x11a0 proc_reg_read+0xbc/0x180 do_loop_readv_writev+0x134/0x210 do_readv_writev+0x565/0x660 vfs_readv+0x67/0xa0 do_preadv+0x126/0x170 SyS_preadv+0xc/0x10 do_syscall_64+0x1a1/0x460 return_from_SYSCALL_64+0x0/0x6a INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315 __slab_free+0x17a/0x2c0 kfree+0x20a/0x220 disk_seqf_stop+0x42/0x50 traverse+0x3b5/0x860 seq_read+0x7e3/0x11a0 proc_reg_read+0xbc/0x180 do_loop_readv_writev+0x134/0x210 do_readv_writev+0x565/0x660 vfs_readv+0x67/0xa0 do_preadv+0x126/0x170 SyS_preadv+0xc/0x10 do_syscall_64+0x1a1/0x460 return_from_SYSCALL_64+0x0/0x6a CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480 ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480 ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970 Call Trace: [<ffffffff81d6ce81>] dump_stack+0x65/0x84 [<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0 [<ffffffff814704ff>] object_err+0x2f/0x40 [<ffffffff814754d1>] kasan_report_error+0x221/0x520 [<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40 [<ffffffff83888161>] klist_iter_exit+0x61/0x70 [<ffffffff82404389>] class_dev_iter_exit+0x9/0x10 [<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50 [<ffffffff8151f812>] seq_read+0x4b2/0x11a0 [<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180 [<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210 [<ffffffff814b4c45>] do_readv_writev+0x565/0x660 [<ffffffff814b8a17>] vfs_readv+0x67/0xa0 [<ffffffff814b8de6>] do_preadv+0x126/0x170 [<ffffffff814b92ec>] SyS_preadv+0xc/0x10 This problem can occur in the following situation: open() - pread() - .seq_start() - iter = kmalloc() // succeeds - seqf->private = iter - .seq_stop() - kfree(seqf->private) - pread() - .seq_start() - iter = kmalloc() // fails - .seq_stop() - class_dev_iter_exit(seqf->private) // boom! old pointer As the comment in disk_seqf_stop() says, stop is called even if start failed, so we need to reinitialise the private pointer to NULL when seq iteration stops. An alternative would be to set the private pointer to NULL when the kmalloc() in disk_seqf_start() fails. Cc: [email protected] Signed-off-by: Vegard Nossum <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-04f2fs: drop bio->bi_rw manual assignmentJens Axboe1-1/+0
Merge 4fc29c1aa375 included this extra line, but it's not needed (or useful) since we'll bio_set_op_attrs() right after to properly set the op and flags for the bio. Signed-off-by: Jens Axboe <[email protected]>
2016-08-04block: add missing group association in bio-cloning functionsPaolo Valente3-6/+18
When a bio is cloned, the newly created bio must be associated with the same blkcg as the original bio (if BLK_CGROUP is enabled). If this operation is not performed, then the new bio is not associated with any group, and the group of the current task is returned when the group of the bio is requested. Depending on the cloning frequency, this may cause a large percentage of the bios belonging to a given group to be treated as if belonging to other groups (in most cases as if belonging to the root group). The expected group isolation may thereby be broken. This commit adds the missing association in bio-cloning functions. Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers") Cc: [email protected] # v4.3+ Signed-off-by: Paolo Valente <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Reviewed-by: Jeff Moyer <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-04blkcg: kill unused field nr_undestroyed_grpsHou Tao1-5/+0
'nr_undestroyed_grps' in struct throtl_data was used to count the number of throtl_grp related with throtl_data, but now throtl_grp is tracked by blkcg_gq, so it is useless anymore. Signed-off-by: Hou Tao <[email protected]> Signed-off-by: Jens Axboe <[email protected]>