aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-03-08axonram: Fix gendisk handlingJan Kara1-1/+4
It is invalid to call del_gendisk() when disk->queue is NULL. Fix error handling in axon_ram_probe() to avoid doing that. Also del_gendisk() does not drop a reference to gendisk allocated by alloc_disk(). That has to be done by put_disk(). Add that call where needed. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08blk: improve order of bio handling in generic_make_request()NeilBrown1-4/+21
To avoid recursion on the kernel stack when stacked block devices are in use, generic_make_request() will, when called recursively, queue new requests for later handling. They will be handled when the make_request_fn for the current bio completes. If any bios are submitted by a make_request_fn, these will ultimately be handled seqeuntially. If the handling of one of those generates further requests, they will be added to the end of the queue. This strict first-in-first-out behaviour can lead to deadlocks in various ways, normally because a request might need to wait for a previous request to the same device to complete. This can happen when they share a mempool, and can happen due to interdependencies particular to the device. Both md and dm have examples where this happens. These deadlocks can be erradicated by more selective ordering of bios. Specifically by handling them in depth-first order. That is: when the handling of one bio generates one or more further bios, they are handled immediately after the parent, before any siblings of the parent. That way, when generic_make_request() calls make_request_fn for some particular device, we can be certain that all previously submited requests for that device have been completely handled and are not waiting for anything in the queue of requests maintained in generic_make_request(). An easy way to achieve this would be to use a last-in-first-out stack instead of a queue. However this will change the order of consecutive bios submitted by a make_request_fn, which could have unexpected consequences. Instead we take a slightly more complex approach. A fresh queue is created for each call to a make_request_fn. After it completes, any bios for a different device are placed on the front of the main queue, followed by any bios for the same device, followed by all bios that were already on the queue before the make_request_fn was called. This provides the depth-first approach without reordering bios on the same level. This, by itself, it not enough to remove all deadlocks. It just makes it possible for drivers to take the extra step required themselves. To avoid deadlocks, drivers must never risk waiting for a request after submitting one to generic_make_request. This includes never allocing from a mempool twice in the one call to a make_request_fn. A common pattern in drivers is to call bio_split() in a loop, handling the first part and then looping around to possibly split the next part. Instead, a driver that finds it needs to split a bio should queue (with generic_make_request) the second part, handle the first part, and then return. The new code in generic_make_request will ensure the requests to underlying bios are processed first, then the second bio that was split off. If it splits again, the same process happens. In each case one bio will be completely handled before the next one is attempted. With this is place, it should be possible to disable the punt_bios_to_recover() recovery thread for many block devices, and eventually it may be possible to remove it completely. Ref: http://www.spinics.net/lists/raid/msg54680.html Tested-by: Jinpu Wang <[email protected]> Inspired-by: Lars Ellenberg <[email protected]> Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08Revert "scsi, block: fix duplicate bdi name registration crashes"Jan Kara5-65/+8
This reverts commit 0dba1314d4f81115dce711292ec7981d17231064. It causes leaking of device numbers for SCSI when SCSI registers multiple gendisks for one request_queue in succession. It can be easily reproduced using Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE. Furthermore the protection provided by this commit is not needed anymore as the problem it was fixing got also fixed by commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()". [1]: http://marc.info/?l=linux-block&m=148554717109098&w=2 Signed-off-by: Jan Kara <[email protected]> Acked-by: Dan Williams <[email protected]> Tested-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08block: Make del_gendisk() safer for disks without queuesJan Kara1-6/+10
Commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()" added disk->queue dereference to del_gendisk(). Although del_gendisk() is not supposed to be called without disk->queue valid and blk_unregister_queue() warns in that case, this change will make it oops instead. Return to the old more robust behavior of just warning when del_gendisk() gets called for gendisk with disk->queue being NULL. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Jan Kara <[email protected]> Tested-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08bdi: Fix use-after-free in wb_congested_put()Jan Kara1-15/+21
bdi_writeback_congested structures get created for each blkcg and bdi regardless whether bdi is registered or not. When they are created in unregistered bdi and the request queue (and thus bdi) is then destroyed while blkg still holds reference to bdi_writeback_congested structure, this structure will be referencing freed bdi and last wb_congested_put() will try to remove the structure from already freed bdi. With commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()", SCSI started to destroy bdis without calling bdi_unregister() first (previously it was calling bdi_unregister() even for unregistered bdis) and thus the code detaching bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we started hitting this use-after-free bug. It is enough to boot a KVM instance with virtio-scsi device to trigger this behavior. Fix the problem by detaching bdi_writeback_congested structures in bdi_exit() instead of bdi_unregister(). This is also more logical as they can get attached to bdi regardless whether it ever got registered or not. Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820 Signed-off-by: Jan Kara <[email protected]> Tested-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08block: Allow bdi re-registrationJan Kara1-0/+7
SCSI can call device_add_disk() several times for one request queue when a device in unbound and bound, creating new gendisk each time. This will lead to bdi being repeatedly registered and unregistered. This was not a big problem until commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()" since bdi was only registered repeatedly (bdi_register() handles repeated calls fine, only we ended up leaking reference to gendisk due to overwriting bdi->owner) but unregistered only in blk_cleanup_queue() which didn't get called repeatedly. After 165a5e22fafb we were doing correct bdi_register() - bdi_unregister() cycles however bdi_unregister() is not prepared for it. So make sure bdi_unregister() cleans up bdi in such a way that it is prepared for a possible following bdi_register() call. An easy way to provoke this behavior is to enable CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a scsi disk which immediately hangs without this fix. Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820 Signed-off-by: Jan Kara <[email protected]> Tested-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08block/sed: Fix opal user range check and unused variablesJon Derrick1-8/+2
Fixes check that the opal user is within the range, and cleans up unused method variables. Signed-off-by: Jon Derrick <[email protected]> Reviewed-by: Scott Bauer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08zram: set physical queue limits to avoid array out of bounds accessesJohannes Thumshirn1-0/+2
zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using the NVMe over Fabrics loopback target which potentially sends a huge bulk of pages attached to the bio's bvec this results in a kernel panic because of array out of bounds accesses in zram_decompress_page(). Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08blk-mq: free hctx->cpumask in release handler of hctx's kobjectMing Lei2-12/+1
It is obviously that hctx->cpumask is per hctx, and both share same lifetime, so this patch moves freeing of hctx->cpumask into release handler of hctx's kobject. Signed-off-by: Ming Lei <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08blk-mq: make lifetime consistent between hctx and its kobjectMing Lei2-9/+11
This patch removes kobject_put() over hctx in __blk_mq_unregister_dev(), and trys to keep lifetime consistent between hctx and hctx's kobject. Now blk_mq_sysfs_register() and blk_mq_sysfs_unregister() become totally symmetrical, and kobject's refcounter drops to zero just when the hctx is freed. Signed-off-by: Ming Lei <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08blk-mq: make lifetime consitent between q/ctx and its kobjectMing Lei3-8/+20
Currently from kobject view, both q->mq_kobj and ctx->kobj can be released during one cycle of blk_mq_register_dev() and blk_mq_unregister_dev(). Actually, sw queue's lifetime is same with its request queue's, which is covered by request_queue->kobj. So we don't need to call kobject_put() for the two kinds of kobject in __blk_mq_unregister_dev(), instead we do that in release handler of request queue. Signed-off-by: Ming Lei <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-08blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()Ming Lei3-4/+5
Both q->mq_kobj and sw queues' kobjects should have been initialized once, instead of doing that each add_disk context. Also this patch removes clearing of ctx in blk_mq_init_cpu_queues() because percpu allocator fills zero to allocated variable. This patch fixes one issue[1] reported from Omar. [1] kernel wearning when doing unbind/bind on one scsi-mq device [ 19.347924] kobject (ffff8800791ea0b8): tried to init an initialized object, something is seriously wrong. [ 19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34 [ 19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014 [ 19.350920] Workqueue: events_unbound async_run_entry_fn [ 19.350920] Call Trace: [ 19.350920] dump_stack+0x63/0x83 [ 19.350920] kobject_init+0x77/0x90 [ 19.350920] blk_mq_register_dev+0x40/0x130 [ 19.350920] blk_register_queue+0xb6/0x190 [ 19.350920] device_add_disk+0x1ec/0x4b0 [ 19.350920] sd_probe_async+0x10d/0x1c0 [sd_mod] [ 19.350920] async_run_entry_fn+0x48/0x150 [ 19.350920] process_one_work+0x1d0/0x480 [ 19.350920] worker_thread+0x48/0x4e0 [ 19.350920] kthread+0x101/0x140 [ 19.350920] ? process_one_work+0x480/0x480 [ 19.350920] ? kthread_create_on_node+0x60/0x60 [ 19.350920] ret_from_fork+0x2c/0x40 Cc: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-07Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds27-57/+106
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes and minor updates all over the place: - an SGI/UV fix - a defconfig update - a build warning fix - move the boot_params file to the arch location in debugfs - a pkeys fix - selftests fix - boot message fixes - sparse fixes - a resume warning fix - ioapic hotplug fixes - reboot quirks ... plus various minor cleanups" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build/x86_64_defconfig: Enable CONFIG_R8169 x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk x86/hpet: Prevent might sleep splat on resume x86/boot: Correct setup_header.start_sys name x86/purgatory: Fix sparse warning, symbol not declared x86/purgatory: Make functions and variables static x86/events: Remove last remnants of old filenames x86/pkeys: Check against max pkey to avoid overflows x86/ioapic: Split IOAPIC hot-removal into two steps x86/PCI: Implement pcibios_release_device to release IRQ from IOAPIC x86/intel_rdt: Remove duplicate inclusion of linux/cpu.h x86/vmware: Remove duplicate inclusion of asm/timer.h x86/hyperv: Hide unused label x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register x86/selftests: Add clobbers for int80 on x86_64 x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR() x86/apic: Fix a warning message in logical CPU IDs allocation x86/kdebugfs: Move boot params hierarchy under (debugfs)/x86/
2017-03-07Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "This includes a fix for lockups caused by incorrect nsecs related cleanup, and a capabilities check fix for timerfd" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: jiffies: Revert bogus conversion of NSEC_PER_SEC to TICK_NSEC timerfd: Only check CAP_WAKE_ALARM when it is needed
2017-03-07Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds10-32/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A fix for KVM's scheduler clock which (erroneously) was always marked unstable, a fix for RT/DL load balancing, plus latency fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface sched/core: Fix pick_next_task() for RT,DL sched/fair: Make select_idle_cpu() more aggressive
2017-03-07Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds13-23/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This includes a fix for a crash if certain special addresses are kprobed, plus does a rename of two Kconfig variables that were a minor misnomer" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed
2017-03-07Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds3-12/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Change the new refcount_t warnings from WARN() to WARN_ONCE() - two ww_mutex fixes - plus a new lockdep self-consistency check for a bug that triggered in practice * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/ww_mutex: Adjust the lock number for stress test locking/lockdep: Add nest_lock integrity test locking/ww_mutex: Replace cpu_relax() with cond_resched() for tests locking/refcounts: Change WARN() to WARN_ONCE()
2017-03-07Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull IRQ fix from Ingo Molnar: "Fix an ARM TI DRA7XX SoC irqchip driver local variables type bug/warning" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/crossbar: Fix incorrect type of local variables
2017-03-07Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "A boot crash fix, and a secure boot related boot messages fix" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/arm: Fix boot crash with CONFIG_CPUMASK_OFFSTACK=y efi/libstub: Treat missing SecureBoot variable as Secure Boot disabled
2017-03-07Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds6-6/+28
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "A couple of sched.h splitup related build fixes, plus an objtool fix" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix another GCC jump table detection issue drivers/char/nwbutton: Fix build breakage caused by include file reshuffling h8300: Fix build breakage caused by header file changes avr32: Fix build error caused by include file reshuffling
2017-03-07Merge branch 'idr-4.11' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds7-18/+283
Pull idr fix (and new tests) from Matthew Wilcox: "One urgent patch in here; freeing the correct IDA bitmap. Everything else is changes to the test suite" * 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax: radix tree test suite: Specify -m32 in LDFLAGS too ida: Free correct IDA bitmap radix tree test suite: Depend on Makefile and quieten grep radix tree test suite: Fix build with --as-needed radix tree test suite: Build 32 bit binaries radix tree test suite: Add performance test for radix_tree_join() radix tree test suite: Add performance test for radix_tree_split() radix tree test suite: Add performance benchmarks radix tree test suite: Add test for radix_tree_clear_tags() radix tree test suite: Add tests for ida_simple_get() and ida_simple_remove() radix tree test suite: Add test for idr_get_next()
2017-03-07Merge tag 'powerpc-4.11-3' of ↵Linus Torvalds20-124/+729
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Five fairly small fixes for things that went in this cycle. A fairly large patch to rework the CAS logic on Power9, necessitated by a late change to the firmware API, and we can't boot without it. Three fixes going to stable, allowing more instructions to be emulated on LE, fixing a boot crash on 32-bit Freescale BookE machines, and the OPAL XICS workaround. And a patch from me to sort the selects under CONFIG PPC. Annoying churn, but worth it in the long run, and best for it to go in now to avoid conflicts. Thanks to: Alexey Kardashevskiy, Anton Blanchard, Balbir Singh, Gautham R. Shenoy, Laurentiu Tudor, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Sachin Sant, Shile Zhang, Suraj Jitindar Singh" * tag 'powerpc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Sort the selects under CONFIG_PPC powerpc/64: Fix L1D cache shape vector reporting L1I values powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info() powerpc: Update to new option-vector-5 format for CAS powerpc: Parse the command line before calling CAS powerpc/xics: Work around limitations of OPAL XICS priority handling powerpc/64: Fix checksum folding in csum_add() powerpc/powernv: Fix opal tracepoints with JUMP_LABEL=n powerpc/booke: Fix boot crash due to null hugepd powerpc: Fix compiling a BE kernel with a powerpc64le toolchain selftest/powerpc: Fix false failures for skipped tests powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop powerpc/64: Invalidate process table caching after setting process table powerpc: emulate_step() tests for load/store instructions powerpc: Emulation support for load/store instructions on LE
2017-03-07Merge branch 'stable/for-linus-4.11' of ↵Linus Torvalds3-0/+60
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb updates from Konrad Rzeszutek Wilk: "Two tiny implementations of the DMA API for callback in ARM (for Xen)" * 'stable/for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb-xen: implement xen_swiotlb_get_sgtable callback swiotlb-xen: implement xen_swiotlb_dma_mmap callback
2017-03-07radix tree test suite: Specify -m32 in LDFLAGS tooMatthew Wilcox1-0/+1
Michael's patch to use the default make rule for linking and the patch from Rehas to use -m32 if building a 32-bit test-suite on a 64-bit platform don't work well together. Reported-by: Rehas Sachdeva <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07ida: Free correct IDA bitmapMatthew Wilcox4-5/+35
There's a relatively rare race where we look at the per-cpu preallocated IDA bitmap, see it's NULL, allocate a new one, and atomically update it. If the kmalloc() happened to sleep and we were rescheduled to a different CPU, or an interrupt came in at the exact right time, another task might have successfully allocated a bitmap and already deposited it. I forgot what the semantics of cmpxchg() were and ended up freeing the wrong bitmap leading to KASAN reporting a use-after-free. Dmitry found the bug with syzkaller & wrote the patch. I wrote the test case that will reproduce the bug without his patch being applied. Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07radix tree test suite: Depend on Makefile and quieten grepMatthew Wilcox1-2/+2
Changing the CFLAGS in the Makefile didn't always lead to a recompilation because the OFILES didn't depend on the Makefile. Also, after doing make clean, grep would still complain about a missing map-shift.h; we need -s as well as -q. Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07radix tree test suite: Fix build with --as-neededMichael Ellerman1-4/+2
Currently the radix tree test suite doesn't build with toolchains that use --as-needed by default, for example Ubuntu's: cc -I. -I../../include -g -O2 -Wall -D_LGPL_SOURCE -fsanitize=address -lpthread -lurcu main.o ... -o main /usr/bin/ld: regression1.o: undefined reference to symbol 'pthread_join@@GLIBC_2.17' /lib/powerpc64le-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status This is caused by the custom makefile rules placing LDFLAGS before the .o files that need the libraries. We could fix it by using --no-as-needed, or rewriting the custom rules. But we can also just drop the custom rules and move the libraries to LDLIBS, and then the default rules work correctly - with the one caveat that we need to add -fsanitize=address to LDFLAGS because that must be passed to the linker as well as the compiler. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07radix tree test suite: Build 32 bit binariesRehas Sachdeva1-0/+4
Add option 'make BUILD=32' for building 32-bit binaries. Signed-off-by: Rehas Sachdeva <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07radix tree test suite: Add performance test for radix_tree_join()Rehas Sachdeva1-0/+47
Signed-off-by: Rehas Sachdeva <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07radix tree test suite: Add performance test for radix_tree_split()Rehas Sachdeva1-0/+44
Signed-off-by: Rehas Sachdeva <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07radix tree test suite: Add performance benchmarksRehas Sachdeva1-7/+75
Add performance benchmarks for radix tree insertion, tagging and deletion. Signed-off-by: Rehas Sachdeva <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07radix tree test suite: Add test for radix_tree_clear_tags()Rehas Sachdeva1-0/+29
Assert that radix_tree_clear_tags() clears the tags on the passed node and slot. Assert that the case where the radix tree has only one entry at index zero and the node is NULL, is also handled. Signed-off-by: Rehas Sachdeva <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07radix tree test suite: Add tests for ida_simple_get() and ida_simple_remove()Rehas Sachdeva1-0/+19
Assert that ida_simple_get() allocates an id in the passed range or returns error on failure, and ida_simple_remove() releases an allocated id. Signed-off-by: Rehas Sachdeva <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07radix tree test suite: Add test for idr_get_next()Rehas Sachdeva1-0/+25
Assert that idr_get_next() returns the next populated entry in the tree with an ID greater than or equal to the value pointed to by @nextid argument. Signed-off-by: Rehas Sachdeva <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2017-03-07Merge branch 'for-linus' of ↵Linus Torvalds2-8/+12
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fix from Eric Biederman: "This fixes a race between put_ucounts and get_ucounts that can cause a use after free. The fix works by simplifying the code and so there is not even a temptation to be clever and play spinlock vs atomic reference games" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucount: Remove the atomicity from ucount->count
2017-03-07Merge tag 'trace-v4.11-rc1' of ↵Linus Torvalds5-7/+36
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "There was some breakage with the changes for jump labels in the 4.11 merge window: - powerpc broke as jump labels uses the two LSB bits as flags in initialization. A check was added to make sure that all jump label entries were 4 bytes aligned, but powerpc didn't work that way for modules. Adding an alignment in the module linker script appeared to be the best solution. - Jump labels also added an anonymous union to access those LSB bits as a normal long. But because this structure had static initialization, it broke older compilers that could not statically initialize anonymous unions without brackets. - The command line parameter for setting function graph filter broke the "EMPTY_HASH" descriptor by modifying it instead of creating a new hash to hold the entries. - The command line parameter ftrace_graph_max_depth was added to allow its setting at boot time. It uses existing code and only the command line hook was added. This is not really a fix, but as it uses existing code without affecting anything else, I added it to this release. It was ready before the merge window closed, but I wanted to let it sit in linux-next for a couple of days first" * tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/graph: Add ftrace_graph_max_depth kernel parameter tracing: Add #undef to fix compile error jump_label: Add comment about initialization order for anonymous unions jump_label: Fix anonymous union initialization module: set __jump_table alignment to 8 ftrace/graph: Do not modify the EMPTY_HASH for the function_graph filter tracing: Fix code comment for ftrace_ops_get_func()
2017-03-07jiffies: Revert bogus conversion of NSEC_PER_SEC to TICK_NSECFrederic Weisbecker1-1/+1
commit 93825f2ec736 converted NSEC_PER_SEC to TICK_NSEC because the author confused NSEC_PER_JIFFY with NSEC_PER_SEC. As a result, the calculation of refined jiffies got broken, triggering lockups. Fixes: 93825f2ec736 ("jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY") Reported-and-tested-by: Meelis Roos <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-03-07objtool: Fix another GCC jump table detection issueJosh Poimboeuf3-3/+25
Arnd Bergmann reported a (false positive) objtool warning: drivers/infiniband/sw/rxe/rxe_resp.o: warning: objtool: rxe_responder()+0xfe: sibling call from callable instruction with changed frame pointer The issue is in find_switch_table(). It tries to find a switch statement's jump table by walking backwards from an indirect jump instruction, looking for a relocation to the .rodata section. In this case it stopped walking prematurely: the first .rodata relocation it encountered was for a variable (resp_state_name) instead of a jump table, so it just assumed there wasn't a jump table. The fix is to ignore any .rodata relocation which refers to an ELF object symbol. This works because the jump tables are anonymous and have no symbols associated with them. Reported-by: Arnd Bergmann <[email protected]> Tested-by: Arnd Bergmann <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 3732710ff6f2 ("objtool: Improve rare switch jump table pattern detection") Link: http://lkml.kernel.org/r/20170302225723.3ndbsnl4hkqbne7a@treble Signed-off-by: Ingo Molnar <[email protected]>
2017-03-07drivers/char/nwbutton: Fix build breakage caused by include file reshufflingGuenter Roeck1-1/+1
Fix: drivers/char/nwbutton.c: In function 'button_sequence_finished': drivers/char/nwbutton.c:134:3: error: implicit declaration of function 'kill_cad_pid' The declaration has been moved from one include file to another. Signed-off-by: Guenter Roeck <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: c3edc4010e9d102 ("sched/headers: Move task_struct::signal and ...") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-07h8300: Fix build breakage caused by header file changesGuenter Roeck1-1/+1
Fix the following h8300 build failures: arch/h8300/kernel/ptrace_h.c: In function ‘trace_trap’: arch/h8300/kernel/ptrace_h.c:253:3: error: implicit declaration of function ‘force_sig’ Signed-off-by: Guenter Roeck <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: [email protected] Fixes: c3edc4010e9d ("sched/headers: Move task_struct::signal and ...") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-07avr32: Fix build error caused by include file reshufflingGuenter Roeck1-1/+1
Various avr32 builds fail: arch/avr32/oprofile/backtrace.c:58: error: dereferencing pointer to incomplete type arch/avr32/oprofile/backtrace.c:60: error: implicit declaration of function 'user_mode' Signed-off-by: Guenter Roeck <[email protected]> Acked-by: Hans-Christian Noren Egtvedt <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: f780d89a0e82 ("sched/headers: Remove <asm/ptrace.h> from ...") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-06ucount: Remove the atomicity from ucount->countEric W. Biederman2-8/+12
Always increment/decrement ucount->count under the ucounts_lock. The increments are there already and moving the decrements there means the locking logic of the code is simpler. This simplification in the locking logic fixes a race between put_ucounts and get_ucounts that could result in a use-after-free because the count could go zero then be found by get_ucounts and then be freed by put_ucounts. A bug presumably this one was found by a combination of syzkaller and KASAN. JongWhan Kim reported the syzkaller failure and Dmitry Vyukov spotted the race in the code. Cc: [email protected] Fixes: f6b2db1a3e8d ("userns: Make the count of user namespaces per user") Reported-by: JongHwan Kim <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Reviewed-by: Andrei Vagin <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2017-03-06powerpc: Sort the selects under CONFIG_PPCMichael Ellerman1-66/+72
We have a big list of selects under CONFIG_PPC, and currently they're completely unsorted. This means people tend to add new selects at the bottom of the list, and so two commits which both add a new select will often conflict. Instead sort it alphabetically. This is nicer in and of itself, but also means two commits that add a new select will have a greater chance of not conflicting. Add a note at the top and bottom asking people to keep it sorted. And while we're here pad out the 'if' expressions to make them stand out. Suggested-by: Stephen Rothwell <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-03-06powerpc/64: Fix L1D cache shape vector reporting L1I valuesMichael Ellerman1-2/+2
It seems we didn't pay quite enough attention when testing the new cache shape vectors, which means we didn't notice the bug where the vector for the L1D was using the L1I values. Fix it, resulting in eg: L1I cache size: 0x8000 32768B 32K L1I line size: 0x80 8-way associative L1D cache size: 0x10000 65536B 64K L1D line size: 0x80 8-way associative Fixes: 98a5f361b862 ("powerpc: Add new cache geometry aux vectors") Cut-and-paste-bug-by: Benjamin Herrenschmidt <[email protected]> Badly-reviewed-by: Michael Ellerman <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-03-06x86/build/x86_64_defconfig: Enable CONFIG_R8169Andy Shevchenko1-0/+1
Very common PCIe ethernet card. Already enabled in i386_defconfig. Signed-off-by: Andy Shevchenko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-03-06x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirkMatjaz Hegedic1-0/+8
Without the parameter reboot=a, ASUS EeeBook X205TA/W will hang when it should reboot. This adds the appropriate quirk, thus fixing the problem. Signed-off-by: Matjaz Hegedic <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-03-06powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info()Anton Blanchard1-1/+4
I see a panic in early boot when building with a recent gcc toolchain. The issue is a divide by zero, which is undefined. Older toolchains let us get away with it: int foo(int a) { return a / 0; } foo: li 9,0 divw 3,3,9 extsw 3,3 blr But newer ones catch it: foo: trap Add a check to avoid the divide by zero. Fixes: e2827fe5c156 ("powerpc/64: Clean up ppc64_caches using a struct per cache") Signed-off-by: Anton Blanchard <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-03-06powerpc: Update to new option-vector-5 format for CASSuraj Jitindar Singh3-14/+150
On POWER9 the ibm,client-architecture-support (CAS) negotiation process has been updated to change how the host to guest negotiation is done for the new hash/radix mmu as well as the nest mmu, process tables and guest translation shootdown (GTSE). This is documented in the unreleased PAPR ACR "CAS option vector additions for P9". The host tells the guest which options it supports in ibm,arch-vec-5-platform-support. The guest then chooses a subset of these to request in the CAS call and these are agreed to in the ibm,architecture-vec-5 property of the chosen node. Thus we read ibm,arch-vec-5-platform-support and make our selection before calling CAS. We then parse the ibm,architecture-vec-5 property of the chosen node to check whether we should run as hash or radix. ibm,arch-vec-5-platform-support format: index value pairs: <index, val> ... <index, val> index: Option vector 5 byte number val: Some representation of supported values Signed-off-by: Suraj Jitindar Singh <[email protected]> Acked-by: Paul Mackerras <[email protected]> [mpe: Don't print about unknown options, be consistent with OV5_FEAT] Signed-off-by: Michael Ellerman <[email protected]>
2017-03-06powerpc: Parse the command line before calling CASSuraj Jitindar Singh1-5/+5
On POWER9 the hypervisor requires the guest to decide whether it would like to use a hash or radix mmu model at the time it calls ibm,client-architecture-support (CAS) based on what the hypervisor has said it's allowed to do. It is possible to disable radix by passing "disable_radix" on the command line. The next patch will add support for the new CAS format, thus we need to parse the command line before calling CAS so we can correctly select which mmu we would like to use. Signed-off-by: Suraj Jitindar Singh <[email protected]> Reviewed-by: Paul Mackerras <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-03-06powerpc/xics: Work around limitations of OPAL XICS priority handlingBalbir Singh2-3/+24
The CPPR (Current Processor Priority Register) of a XICS interrupt presentation controller contains a value N, such that only interrupts with a priority "more favoured" than N will be received by the CPU, where "more favoured" means "less than". So if the CPPR has the value 5 then only interrupts with a priority of 0-4 inclusive will be received. In theory the CPPR can support a value of 0 to 255 inclusive. In practice Linux only uses values of 0, 4, 5 and 0xff. Setting the CPPR to 0 rejects all interrupts, setting it to 0xff allows all interrupts. The values 4 and 5 are used to differentiate IPIs from external interrupts. Setting the CPPR to 5 allows IPIs to be received but not external interrupts. The CPPR emulation in the OPAL XICS implementation only directly supports priorities 0 and 0xff. All other priorities are considered equivalent, and mapped to a single priority value internally. This means when using icp-opal we can not allow IPIs but not externals. This breaks Linux's use of priority values when a CPU is hot unplugged. After migrating IRQs away from the CPU that is being offlined, we set the priority to 5, meaning we still want the offline CPU to receive IPIs. But the effect of the OPAL XICS emulation's use of a single priority value is that all interrupts are rejected by the CPU. With the CPU offline, and not receiving IPIs, we may not be able to wake it up to bring it back online. The first part of the fix is in icp_opal_set_cpu_priority(). CPPR values of 0 to 4 inclusive will correctly cause all interrupts to be rejected, so we pass those CPPR values through to OPAL. However if we are called with a CPPR of 5 or greater, the caller is expecting to be able to allow IPIs but not external interrupts. We know this doesn't work, so instead of rejecting all interrupts we choose the opposite which is to allow all interrupts. This is still not correct behaviour, but we know for the only existing caller (xics_migrate_irqs_away()), that it is the better option. The other part of the fix is in xics_migrate_irqs_away(). Instead of setting priority (CPPR) to 0, and then back to 5 before migrating IRQs, we migrate the IRQs before setting the priority back to 5. This should have no effect on an ICP backend with a working set_priority(), and on icp-opal it means we will keep all interrupts blocked until after we've finished doing the IRQ migration. Additionally we wait for 5ms after doing the migration to make sure there are no IRQs in flight. Fixes: d74361881f0d ("powerpc/xics: Add ICP OPAL backend") Cc: [email protected] # v4.8+ Suggested-by: Michael Ellerman <[email protected]> Reported-by: Vaidyanathan Srinivasan <[email protected]> Tested-by: Vaidyanathan Srinivasan <[email protected]> Signed-off-by: Balbir Singh <[email protected]> [mpe: Rewrote comments and change log, change delay to 5ms] Signed-off-by: Michael Ellerman <[email protected]>