aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-07-26scsi: virtio_scsi: fix pi_bytes{out,in} on 4 KiB block size devicesGreg Edwards1-4/+4
When the underlying device is a 4 KiB logical block size device with a protection interval exponent of 0, i.e. 4096 bytes data + 8 bytes PI, the driver miscalculates the pi_bytes{out,in} by a factor of 8x (64 bytes). This leads to errors on all reads and writes on 4 KiB logical block size devices when CONFIG_BLK_DEV_INTEGRITY is enabled and the VIRTIO_SCSI_F_T10_PI feature bit has been negotiated. Fixes: e6dc783a38ec0 ("virtio-scsi: Enable DIF/DIX modes in SCSI host LLD") Acked-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Edwards <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-26block: move bio_integrity_{intervals,bytes} into blkdev.hGreg Edwards2-22/+34
This allows bio_integrity_bytes() to be called from drivers instead of open coding it. Acked-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Edwards <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-25xen/blkfront: remove unused macrosJuergen Gross1-5/+0
Remove some macros not used anywhere. Acked-by: Roger Pau Monné <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-25Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/blockJens Axboe16-266/+536
Pull NVMe updates from Christoph: "Highlights: - massively improved tracepoints (Keith Busch) - support for larger inline data in the RDMA host and target (Steve Wise) - RDMA setup/teardown path fixes and refactor (Sagi Grimberg) - Command Supported and Effects log support for the NVMe target (Chaitanya Kulkarni) - buffered I/O support for the NVMe target (Chaitanya Kulkarni) plus the usual set of cleanups and small enhancements." * 'nvme-4.19' of git://git.infradead.org/nvme: nvmet: don't use uuid_le type nvmet: check fileio lba range access boundaries nvmet: fix file discard return status nvme-rdma: centralize admin/io queue teardown sequence nvme-rdma: centralize controller setup sequence nvme-rdma: unquiesce queues when deleting the controller nvme-rdma: mark expected switch fall-through nvme: add disk name to trace events nvme: add controller name to trace events nvme: use hw qid in trace events nvme: cache struct nvme_ctrl reference to struct nvme_request nvmet-rdma: add an error flow for post_recv failures nvmet-rdma: add unlikely check in the fast path nvmet-rdma: support max(16KB, PAGE_SIZE) inline data nvme-rdma: support up to 4 segments of inline data nvmet: add buffered I/O support for file backed ns nvmet: add commands supported and effects log page nvme: move init of keep_alive work item to controller initialization nvme.h: resync with nvme-cli
2018-07-24block: allow max_discard_segments to be stackedMike Snitzer1-1/+1
Set max_discard_segments to USHRT_MAX in blk_set_stacking_limits() so that blk_stack_limits() can stack up this limit for stacked devices. before: $ cat /sys/block/nvme0n1/queue/max_discard_segments 256 $ cat /sys/block/dm-0/queue/max_discard_segments 1 after: $ cat /sys/block/nvme0n1/queue/max_discard_segments 256 $ cat /sys/block/dm-0/queue/max_discard_segments 256 Fixes: 1e739730c5b9e ("block: optionally merge discontiguous discard bios into a single request") Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-24block: unexport bio_clone_biosetChristoph Hellwig3-79/+68
Now only used by the bounce code, so move it there and mark the function static. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-24md: remove a bogus commentChristoph Hellwig1-4/+0
The function name mentioned doesn't exist, and the code next to it doesn't match the description either. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-24block: remove bio_clone_kmallocChristoph Hellwig1-6/+0
Unused now. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-24exofs: use bio_clone_fast in _write_mirrorChristoph Hellwig1-2/+2
The mirroring code never changes the bio data or biovecs. This means we can reuse the biovec allocation easily instead of duplicating it. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by Boaz Harrosh <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-24bcache: don't clone bio in bch_data_verifyChristoph Hellwig1-1/+5
We immediately overwrite the biovec array, so instead just allocate a new bio and copy over the disk, setor and size. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Acked-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-24block: bio_set_pages_dirty can't see NULL bv_page in a valid bio_vecChristoph Hellwig1-4/+2
So don't bother handling it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-24block: simplify bio_check_pages_dirtyChristoph Hellwig1-35/+21
bio_check_pages_dirty currently inviolates the invariant that bv_page of a bio_vec inside bi_vcnt shouldn't be zero, and that is going to become really annoying with multpath biovecs. Fortunately there isn't any all that good reason for it - once we decide to defer freeing the bio to a workqueue holding onto a few additional pages isn't really an issue anymore. So just check if there is a clean page that needs dirtying in the first path, and do a second pass to free them if there was none, while the cache is still hot. Also use the chance to micro-optimize bio_dirty_fn a bit by not saving irq state - we know we are called from a workqueue. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-24block: Rename the null_blk_mod kernel module back into null_blkBart Van Assche2-3/+3
Commit ca4b2a011948 ("null_blk: add zone support") breaks several blktests scripts because it renamed the null_blk kernel module into null_blk_mod. Hence rename null_blk_mod back into null_blk. Fixes: ca4b2a011948 ("null_blk: add zone support") Signed-off-by: Bart Van Assche <[email protected]> Cc: Matias Bjorling <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Damien Le Moal <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-24nvmet: don't use uuid_le typeAndy Shevchenko1-1/+1
Don't use sizeof(uuid_le) where none of the parameters is type of uuid_le. Since both arguments are u8 [16], use size of destination there. Moreover, uuid_le is a deprecated type, and nvmet is using uuid_t already. Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-24nvmet: check fileio lba range access boundariesSagi Grimberg1-2/+17
Fail out-of-bounds with a proper status code. Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-24nvmet: fix file discard return statusSagi Grimberg1-8/+10
If nvmet_copy_from_sgl failed, we falsly return successful completion status. Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-24nvme-rdma: centralize admin/io queue teardown sequenceSagi Grimberg1-37/+29
We follow the same queue teardown sequence in delete, reset and error recovery. Centralize the logic. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-24nvme-rdma: centralize controller setup sequenceSagi Grimberg1-77/+53
Centralize controller sequence to a single routine that correctly cleans up after failures instead of having multiple apperances in several flows (create, reset, reconnect). One thing that we also gain here are the sanity/boundary checks also when connecting back to a dynamic controller. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-24nvme-rdma: unquiesce queues when deleting the controllerSagi Grimberg1-0/+2
If the controller is going away, we need to unquiesce the IO queues so that all pending request can fail gracefully before moving forward with controller deletion. Do that before we destroy the IO queues so blk_cleanup_queue won't block in freeze. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-24nvme-rdma: mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-24nvme: add disk name to trace eventsKeith Busch2-7/+37
This will print the disk name to the nvme event trace for io requests so a user can better distinguish traffic to different disks. This can be used to create disk based filters. For example, to see only nvme0n2 traffic: echo "disk == \"nvme0n2\"" > /sys/kernel/debug/tracing/events/nvme/filter Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> [hch: turned __assign_disk_name into an inline function] Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-24nvme: add controller name to trace eventsKeith Busch1-6/+11
This appends the controller instance to the nvme trace buffer to distinguish which controller is dispatching and completing a command. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvme: use hw qid in trace eventsKeith Busch3-71/+53
We can not match a command to its completion based on the command id alone. We need the submitting queue identifier to pair with the completion, so this patch adds that to the trace buffer. This patch is also collapsing the admin and IO submission traces into a single one so we don't need to duplicate this and creating unnecessary code branches: we know if the command is an admin vs IO based on the qid. And since we're here, the patch fixes code formatting in the area. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> [hch: move the qid helper to nvme.h and made it an inline function] Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvme: cache struct nvme_ctrl reference to struct nvme_requestSagi Grimberg5-0/+6
We will need to reference the controller in the setup and completion time for tracing and future traffic based keep alive support. Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvmet-rdma: add an error flow for post_recv failuresMax Gurtovoy1-5/+21
Posting receive buffer operation can fail, thus we should make sure to have an error flow during initialization phase. While we're here, add a debug print in case of a failure. Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvmet-rdma: add unlikely check in the fast pathMax Gurtovoy1-1/+1
ib_post_send operation should succeed unless something unusual happened to the ib device. Signed-off-by: Max Gurtovoy <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvmet-rdma: support max(16KB, PAGE_SIZE) inline dataSteve Wise6-43/+169
The patch enables inline data sizes using up to 4 recv sges, and capping the size at 16KB or at least 1 page size. So on a 4K page system, up to 16KB is supported, and for a 64K page system 1 page of 64KB is supported. We avoid > 0 order page allocations for the inline buffers by using multiple recv sges, one for each page. If the device cannot support the configured inline data size due to lack of enough recv sges, then log a warning and reduce the inline size. Add a new configfs port attribute, called param_inline_data_size, to allow configuring the size of inline data for a given nvmf port. The maximum size allowed is still enforced by nvmet-rdma with NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE). And the default size, if not specified via configfs, is still PAGE_SIZE. This preserves the existing behavior, but allows larger inline sizes for small page systems. If the configured inline data size exceeds NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is reduced. If param_inline_data_size is set to 0, then inline data is disabled for that nvmf port. Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Steve Wise <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvme-rdma: support up to 4 segments of inline dataSteve Wise1-11/+27
Allow up to 4 segments of inline data for NVMF WRITE operations. This reduces latency for small WRITEs by removing the need for the target to issue a READ WR for IB, or a REG_MR + READ WR chain for iWarp. Also cap the inline segments used based on the limitations of the device. Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Steve Wise <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvmet: add buffered I/O support for file backed nsChaitanya Kulkarni4-5/+67
Add a new "buffered_io" attribute, which disabled direct I/O and thus enables page cache based caching when enabled. The attribute can only be changed when the namespace is disabled as the file has to be reopend for the change to take effect. The possibly blocking read/write are deferred to a newly introduced global workqueue. Signed-off-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvmet: add commands supported and effects log pageChaitanya Kulkarni1-1/+34
This patch adds support for Commands Supported and Effects log page (Log Identifier 05h) for NVMeOF. This also makes it easier to find which commands are supported, e.g. :- subnqn : testnqn1 Admin Command Set ACS2 [Get Log Page ] 00000001 ACS6 [Identify ] 00000001 ACS8 [Abort ] 00000001 ACS9 [Set Features ] 00000001 ACS10 [Get Features ] 00000001 ACS12 [Asynchronous Event Request ] 00000001 ACS24 [Keep Alive ] 00000001 NVM Command Set IOCS0 [Flush ] 00000001 IOCS1 [Write ] 00000001 IOCS2 [Read ] 00000001 IOCS8 [Write Zeroes ] 00000001 IOCS9 [Dataset Management ] 00000001 This partticular functionality can be used from the host side to examine the NVMeOF ctrl commands supported. Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvme: move init of keep_alive work item to controller initializationJames Smart1-3/+4
Currently, the code initializes the keep alive work item whenever nvme_start_keep_alive() is called. However, this routine is called several times while reconnecting, etc. Although it's hoped that keep alive is always disabled and not scheduled when start is called, re-initing if it were scheduled or completing can have very bad side effects. There's no need for re-initialization. Move the keep_alive work item and cmd struct initialization to controller init. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-23nvme.h: resync with nvme-cliRevanth Rajashekar1-0/+5
Added some feature ids present in nvme-cli but not kernel. Signed-off-by: Revanth Rajashekar <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-07-22blk-mq: fail the request in case issue failureMing Lei1-2/+6
Inside blk_mq_try_issue_list_directly(), if the request is issued as failed, we shouldn't try to do it again, otherwise the warning in blk_mq_start_request() will be triggered. This change is aligned to behaviour of other ways of request issue & dispatch. Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") Cc: Kashyap Desai <[email protected]> Cc: Laurence Oberman <[email protected]> Cc: Omar Sandoval <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Kashyap Desai <[email protected]> Cc: kernel test robot <[email protected]> Cc: LKP <[email protected]> Reported-by: kernel test robot <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-22Linux 4.18-rc6Linus Torvalds1-1/+1
2018-07-22Merge tag 'nvme-for-4.18' of git://git.infradead.org/nvmeLinus Torvalds2-34/+41
Pull NVMe fixes from Christoph Hellwig: - fix a regression in 4.18 that causes a memory leak on probe failure (Keith Bush) - fix a deadlock in the passthrough ioctl code (Scott Bauer) - don't enable AENs if not supported (Weiping Zhang) - fix an old regression in metadata handling in the passthrough ioctl code (Roland Dreier) * tag 'nvme-for-4.18' of git://git.infradead.org/nvme: nvme: fix handling of metadata_len for NVME_IOCTL_IO_CMD nvme: don't enable AEN if not supported nvme: ensure forward progress during Admin passthru nvme-pci: fix memory leak on probe failure
2018-07-22Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds8-29/+14
Pull vfs fixes from Al Viro: "Fix several places that screw up cleanups after failures halfway through opening a file (one open-coding filp_clone_open() and getting it wrong, two misusing alloc_file()). That part is -stable fodder from the 'work.open' branch. And Christoph's regression fix for uapi breakage in aio series; include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel definition of sigset_t, the reason for doing so in the first place had been bogus - there's no need to expose struct __aio_sigset in aio_abi.h at all" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: don't expose __aio_sigset in uapi ocxlflash_getfile(): fix double-iput() on alloc_file() failures cxl_getfile(): fix double-iput() on alloc_file() failures drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()
2018-07-22alpha: fix osf_wait4() breakageAl Viro2-5/+2
kernel_wait4() expects a userland address for status - it's only rusage that goes as a kernel one (and needs a copyout afterwards) [ Also, fix the prototype of kernel_wait4() to have that __user annotation - Linus ] Fixes: 92ebce5ac55d ("osf_wait4: switch to kernel_wait4()") Cc: [email protected] # v4.13+ Signed-off-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-07-22blk-rq-qos: make depth comparisons unsignedJosef Bacik2-5/+5
With the change to use UINT_MAX I broke the depth check as any value of inflight (ie 0) would be less than (int)UINT_MAX. Fix this by changing everything to unsigned int to match the depth. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-21Merge tag 'armsoc-fixes' of ↵Linus Torvalds3-7/+25
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: - Fix interrupt type on ethernet switch for i.MX-based RDU2 - GPC on i.MX exposed too large a register window which resulted in userspace being able to crash the machine. - Fixup of bad merge resolution moving GPIO DT nodes under pinctrl on droid4. * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: imx6: RDU2: fix irq type for mv88e6xxx switch soc: imx: gpc: restrict register range for regmap access ARM: dts: omap4-droid4: fix dts w.r.t. pwm
2018-07-21Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "A single fix for a MCE-polling regression, which prevented the disabling of polling" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/MCE: Remove min interval polling limitation
2018-07-21Merge branch 'x86-pti-urgent-for-linus' of ↵Linus Torvalds3-9/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti fixes from Ingo Molnar: "An APM fix, and a BTS hardware-tracing fix related to PTI changes" * 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apm: Don't access __preempt_count with zeroed fs x86/events/intel/ds: Fix bts_interrupt_threshold alignment
2018-07-21Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2-2/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: a stop-machine preemption fix and a SCHED_DEADLINE fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix switched_from_dl() warning stop_machine: Disable preemption when waking two stopper threads
2018-07-21Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds4-8/+84
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core kernel fixes from Ingo Molnar: "This is mostly the copy_to_user_mcsafe() related fixes from Dan Williams, and an ORC fix for Clang" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe() lib/iov_iter: Document _copy_to_iter_flushcache() lib/iov_iter: Document _copy_to_iter_mcsafe() objtool: Use '.strtab' if '.shstrtab' doesn't exist, to support ORC tables on Clang
2018-07-21Merge tag 'powerpc-4.18-4' of ↵Linus Torvalds8-14/+52
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Two regression fixes, one for xmon disassembly formatting and the other to fix the E500 build. Two commits to fix a potential security issue in the VFIO code under obscure circumstances. And finally a fix to the Power9 idle code to restore SPRG3, which is user visible and used for sched_getcpu(). Thanks to: Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy, James Clarke" * tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle) powerpc/Makefile: Assemble with -me500 when building for E500 KVM: PPC: Check if IOMMU page is contained in the pinned physical page vfio/spapr: Use IOMMU pageshift rather than pagesize powerpc/xmon: Fix disassembly since printf changes
2018-07-21Merge tag 'for-4.18-rc5-tag' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fix of a corruption regarding fsync and clone, under some very specific conditions explained in the patch. The fix is marked for stable 3.16+ so I'd like to get it merged now given the impact" * tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix file data corruption after cloning a range and fsync
2018-07-21mm: make vm_area_alloc() initialize core fieldsLinus Torvalds7-26/+17
Like vm_area_dup(), it initializes the anon_vma_chain head, and the basic mm pointer. The rest of the fields end up being different for different users, although the plan is to also initialize the 'vm_ops' field to a dummy entry. Signed-off-by: Linus Torvalds <[email protected]>
2018-07-21mm: make vm_area_dup() actually copy the old vma dataLinus Torvalds3-11/+7
.. and re-initialize th eanon_vma_chain head. This removes some boiler-plate from the users, and also makes it clear why it didn't need use the 'zalloc()' version. Signed-off-by: Linus Torvalds <[email protected]>
2018-07-21mm: use helper functions for allocating and freeing vm_area structsLinus Torvalds7-27/+44
The vm_area_struct is one of the most fundamental memory management objects, but the management of it is entirely open-coded evertwhere, ranging from allocation and freeing (using kmem_cache_[z]alloc and kmem_cache_free) to initializing all the fields. We want to unify this in order to end up having some unified initialization of the vmas, and the first step to this is to at least have basic allocation functions. Right now those functions are literally just wrappers around the kmem_cache_*() calls. This is a purely mechanical conversion: # new vma: kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc() # copy old vma kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old) # free vma kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma) to the point where the old vma passed in to the vm_area_dup() function isn't even used yet (because I've left all the old manual initialization alone). Signed-off-by: Linus Torvalds <[email protected]>
2018-07-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-9/+20
Merge fixes from Andrew Morton: "5 fixes" * emailed patches from Andrew Morton <[email protected]>: mm: memcg: fix use after free in mem_cgroup_iter() mm/huge_memory.c: fix data loss when splitting a file pmd fat: fix memory allocation failure handling of match_strdup() MAINTAINERS: Peter has moved mm/memblock: add missing include <linux/bootmem.h>
2018-07-21mm: memcg: fix use after free in mem_cgroup_iter()Jing Xia1-1/+1
It was reported that a kernel crash happened in mem_cgroup_iter(), which can be triggered if the legacy cgroup-v1 non-hierarchical mode is used. Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f ...... Call trace: mem_cgroup_iter+0x2e0/0x6d4 shrink_zone+0x8c/0x324 balance_pgdat+0x450/0x640 kswapd+0x130/0x4b8 kthread+0xe8/0xfc ret_from_fork+0x10/0x20 mem_cgroup_iter(): ...... if (css_tryget(css)) <-- crash here break; ...... The crashing reason is that mem_cgroup_iter() uses the memcg object whose pointer is stored in iter->position, which has been freed before and filled with POISON_FREE(0x6b). And the root cause of the use-after-free issue is that invalidate_reclaim_iterators() fails to reset the value of iter->position to NULL when the css of the memcg is released in non- hierarchical mode. Link: http://lkml.kernel.org/r/[email protected] Fixes: 6df38689e0e9 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim") Signed-off-by: Jing Xia <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>