blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2013-06-26	mutex: Add w/w tests to lib/locking-selftest.c	Maarten Lankhorst	1	-19/+381
	This stresses the lockdep code in some ways specifically useful to ww_mutexes. It adds checks for most of the common locking errors. Signed-off-by: Maarten Lankhorst <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/20130620113124.4001.23186.stgit@patser Signed-off-by: Ingo Molnar <[email protected]>
2013-06-26	mutex: Add w/w mutex slowpath debugging	Daniel Vetter	1	-0/+13
	Injects EDEADLK conditions at pseudo-random interval, with exponential backoff up to UINT_MAX (to ensure that every lock operation still completes in a reasonable time). This way we can test the wound slowpath even for ww mutex users where contention is never expected, and the ww deadlock avoidance algorithm is only needed for correctness against malicious userspace. An example would be protecting kernel modesetting properties, which thanks to single-threaded X isn't really expected to contend, ever. I've looked into using the CONFIG_FAULT_INJECTION infrastructure, but decided against it for two reasons: - EDEADLK handling is mandatory for ww mutex users and should never affect the outcome of a syscall. This is in contrast to -ENOMEM injection. So fine configurability isn't required. - The fault injection framework only allows to set a simple probability for failure. Now the probability that a ww mutex acquire stage with N locks will never complete (due to too many injected EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock operations for the completely uncontended case would be O(exp(N)). The per-acuiqire ctx exponential backoff solution choosen here only results in O(log N) overhead due to injection and so O(log N * N) lock operations. This way we can fail with high probability (and so have good test coverage even for fancy backoff and lock acquisition paths) without running into patalogical cases. Note that EDEADLK will only ever be injected when we managed to acquire the lock. This prevents any behaviour changes for users which rely on the EALREADY semantics. Signed-off-by: Daniel Vetter <[email protected]> Signed-off-by: Maarten Lankhorst <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patser Signed-off-by: Ingo Molnar <[email protected]>
2013-06-26	mutex: Add support for wound/wait style locks	Maarten Lankhorst	1	-0/+2
	Wound/wait mutexes are used when other multiple lock acquisitions of a similar type can be done in an arbitrary order. The deadlock handling used here is called wait/wound in the RDBMS literature: The older tasks waits until it can acquire the contended lock. The younger tasks needs to back off and drop all the locks it is currently holding, i.e. the younger task is wounded. For full documentation please read Documentation/ww-mutex-design.txt. References: https://lwn.net/Articles/548909/ Signed-off-by: Maarten Lankhorst <[email protected]> Acked-by: Daniel Vetter <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-21	lib/Kconfig.debug: Restrict FRAME_POINTER for MIPS	Markos Chandras	1	-1/+1
	FAULT_INJECTION_STACKTRACE_FILTER selects FRAME_POINTER but that symbol is not available for MIPS. Fixes the following problem on a randconfig: warning: (LOCKDEP && FAULT_INJECTION_STACKTRACE_FILTER && LATENCYTOP && KMEMCHECK) selects FRAME_POINTER which has unmet direct dependencies (DEBUG_KERNEL && (CRIS \|\| M68K \|\| FRV \|\| UML \|\| AVR32 \|\| SUPERH \|\| BLACKFIN \|\| MN10300 \|\| METAG) \|\| ARCH_WANT_FRAME_POINTERS) Signed-off-by: Markos Chandras <[email protected]> Acked-by: Steven J. Hill <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/5441/ Signed-off-by: Ralf Baechle <[email protected]>
2013-06-19	X.509: do not emit any informational output	Arnd Bergmann	1	-2/+0
	When building a kernel using 'make -s', I expect to see an empty output, except for build warnings and errors. The build_OID_registry code always prints one line when run, which is not helpful to most people building the kernels, and which makes it harder to automatically check for build warnings. Let's just remove the one line output. Signed-off-by: Arnd Bergmann <[email protected]> Cc: David Howells <[email protected]> Cc: Rusty Russell <[email protected]>
2013-06-18	treewide: Fix typo in printk	Masanari Iida	1	-2/+2
	Correct spelling typo in printk within various drivers. Signed-off-by: Masanari Iida <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2013-06-17	Merge 3.10-rc6 into driver-core-next	Greg Kroah-Hartman	1	-1/+1
	We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-06-17	Merge 3.10-rc6 into char-misc-next	Greg Kroah-Hartman	1	-1/+1
	We want the fixes in here.
2013-06-16	percpu-refcount: use RCU-sched insted of normal RCU	Tejun Heo	1	-1/+1
	percpu-refcount was incorrectly using preempt_disable/enable() for RCU critical sections against call_rcu(). 6a24474da8 ("percpu-refcount: consistently use plain (non-sched) RCU") fixed it by converting the preepmtion operations with rcu_read_[un]lock() citing that there isn't any advantage in using sched-RCU over using the usual one; however, rcu_read_[un]lock() for the preemptible RCU implementation - CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly more expensive than preempt_disable/enable(). In a contrived microbench which repeats the followings, - percpu_ref_get() - copy 32 bytes of data into percpu buffer - percpu_put_get() - copy 32 bytes of data into percpu buffer rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by about 15% when compared to using sched-RCU. As the RCU critical sections are extremely short, using sched-RCU shouldn't have any latency implications. Convert to RCU-sched. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Kent Overstreet <[email protected]> Acked-by: "Paul E. McKenney" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Rusty Russell <[email protected]>
2013-06-13	percpu-refcount: implement percpu_tryget() along with ↵	Tejun Heo	1	-6/+17
	percpu_ref_kill_and_confirm() Implement percpu_tryget() which stops giving out references once the percpu_ref is visible as killed. Because the refcnt is per-cpu, different CPUs will start to see a refcnt as killed at different points in time and tryget() may continue to succeed on subset of cpus for a while after percpu_ref_kill() returns. For use cases where it's necessary to know when all CPUs start to see the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new function takes an extra argument @confirm_kill which is invoked when the refcnt is guaranteed to be viewed as killed on all CPUs. While this isn't the prettiest interface, it doesn't force synchronous wait and is much safer than requiring the caller to do its own call_rcu(). v2: Patch description rephrased to emphasize that tryget() may continue to succeed on some CPUs after kill() returns as suggested by Kent. v3: Function comment in percpu_ref_kill_and_confirm() updated warning people to not depend on the implied RCU grace period from the confirm callback as it's an implementation detail. Signed-off-by: Tejun Heo <[email protected]> Slightly-Grumpily-Acked-by: Kent Overstreet <[email protected]>
2013-06-13	percpu-refcount: implement percpu_ref_cancel_init()	Tejun Heo	1	-0/+31
	Normally, percpu_ref_init() initializes and percpu_ref_kill() initiates destruction which completes asynchronously. The asynchronous destruction can be problematic in init failure path where the caller wants to destroy half-constructed object - distinguishing half-constructed objects from the usual release method can be painful for complex objects. This patch implements percpu_ref_cancel_init() which synchronously destroys the percpu_ref without invoking release. To avoid unintentional misuses, the function requires the ref to have finished percpu_ref_init() but never used and triggers WARN otherwise. v2: Explain the weird name and usage restriction in the function comment. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Kent Overstreet <[email protected]>
2013-06-13	percpu-refcount: add __must_check to percpu_ref_init() and don't use ↵	Tejun Heo	1	-3/+1
	ACCESS_ONCE() in percpu_ref_kill_rcu() Two small changes. * Unlike most init functions, percpu_ref_init() allocates memory and may fail. Let's mark it with __must_check in case the caller forgets. * percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to dereference @ref->pcpu_count, which can be misleading. The pointer is guaranteed to be valid and visible and can't change underneath the function. Drop ACCESS_ONCE(). Signed-off-by: Tejun Heo <[email protected]>
2013-06-12	percpu-refcount: cosmetic updates	Tejun Heo	1	-3/+4
	* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t postfix for types and the type is gonna be used for a different type of callback too. * Add @ARG to function comments. * Drop unnecessary and unaligned indentation from percpu_ref_init() function comment. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Kent Overstreet <[email protected]>
2013-06-12	lib/mpi/mpicoder.c: looping issue, need stop when equal to zero, found by ↵	Chen Gang	1	-1/+1
	'EXTRA_FLAGS=-W'. For 'while' looping, need stop when 'nbytes == 0', or will cause issue. ('nbytes' is size_t which is always bigger or equal than zero). The related warning: (with EXTRA_CFLAGS=-W) lib/mpi/mpicoder.c:40:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] Signed-off-by: Chen Gang <[email protected]> Cc: Rusty Russell <[email protected]> Cc: David Howells <[email protected]> Cc: James Morris <[email protected]> Cc: Andy Shevchenko <[email protected]> Acked-by: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-06-07	kobject: sanitize argument for format string	Kees Cook	1	-1/+1
	Unlike kobject_set_name(), the kset_create_and_add() interface does not provide a way to use format strings, so make sure that the interface cannot be abused accidentally. It looks like all current callers use static strings, so there's no existing flaw. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-06-05	net: core: move mac_pton() to lib/net_utils.c	Andy Shevchenko	3	-0/+31
	Since we have at least one user of this function outside of CONFIG_NET scope, we have to provide this function independently. The proposed solution is to move it under lib/net_utils.c with corresponding configuration variable and select wherever it is needed. Signed-off-by: Andy Shevchenko <[email protected]> Reported-by: Arnd Bergmann <[email protected]> Acked-by: David S. Miller <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-06-05	crypto: crct10dif - Use PTR_RET	Herbert Xu	1	-3/+1
	lib/crc-t10dif.c:42:1-3: WARNING: PTR_RET can be used Use PTR_RET rather than if(IS_ERR(...)) + PTR_ERR Generated by: coccinelle/api/ptr_ret.cocci Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-06-03	percpu-refcount: Don't use silly cmpxchg()	Kent Overstreet	1	-15/+4
	The cmpxchg() was just to ensure the debug check didn't race, which was a bit excessive. The caller is supposed to do the appropriate synchronization, which means percpu_ref_kill() can just do a simple store. Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-06-03	percpu: implement generic percpu refcounting	Kent Overstreet	2	-1/+129
	This implements a refcount with similar semantics to atomic_get()/atomic_dec_and_test() - but percpu. It also implements two stage shutdown, as we need it to tear down the percpu counts. Before dropping the initial refcount, you must call percpu_ref_kill(); this puts the refcount in "shutting down mode" and switches back to a single atomic refcount with the appropriate barriers (synchronize_rcu()). It's also legal to call percpu_ref_kill() multiple times - it only returns true once, so callers don't have to reimplement shutdown synchronization. [[email protected]: fix build] [[email protected]: coding-style tweak] Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-06-03	debugfs: add get/set for atomic types	Seth Jennings	1	-21/+0
	debugfs currently lack the ability to create attributes that set/get atomic_t values. This patch adds support for this through a new debugfs_create_atomic_t() function. Signed-off-by: Seth Jennings <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-05-29	sprintf: hex_string(): fix comment	Steven Rostedt	1	-1/+1
	hex_string() had a typo in a comment. Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2013-05-24	MPILIB: disable usage of floating point registers on parisc	Helge Deller	1	-2/+3
	The umul_ppmm() macro for parisc uses the xmpyu assembler statement which does calculation via a floating point register. But usage of floating point registers inside the Linux kernel are not allowed and gcc will stop compilation due to the -mdisable-fpregs compiler option. Fix this by disabling the umul_ppmm() and udiv_qrnnd() macros. The mpilib will then use the generic built-in implementations instead. Signed-off-by: Helge Deller <[email protected]>
2013-05-23	Merge tag 'driver-core-3.10-rc2' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg Kroah-Hartman: "Here are 3 tiny driver core fixes for 3.10-rc2. A needed symbol export, a change to make it easier to track down offending sysfs files with incorrect attributes, and a klist bugfix. All have been in linux-next for a while" * tag 'driver-core-3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: klist: del waiter from klist_remove_waiters before wakeup waitting process driver core: print sysfs attribute name when warning about bogus permissions driver core: export subsys_virtual_register
2013-05-23	lib: make iovec obj instead of lib	Randy Dunlap	1	-2/+2
	Fix build error io vmw_vmci.ko when CONFIG_VMWARE_VMCI=m by chaning iovec.o from lib-y to obj-y. ERROR: "memcpy_toiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined! ERROR: "memcpy_fromiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined! Signed-off-by: Randy Dunlap <[email protected]> Acked-by: Rusty Russell <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-21	klist: del waiter from klist_remove_waiters before wakeup waitting process	wang, biao	1	-1/+1
	There is a race between klist_remove and klist_release. klist_remove uses a local var waiter saved on stack. When klist_release calls wake_up_process(waiter->process) to wake up the waiter, waiter might run immediately and reuse the stack. Then, klist_release calls list_del(&waiter->list) to change previous wait data and cause prior waiter thread corrupt. The patch fixes it against kernel 3.9. Signed-off-by: wang, biao <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-05-20	crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform ↵	Tim Chen	2	-43/+34
	framework When CRC T10 DIF is calculated using the crypto transform framework, we wrap the crc_t10dif function call to utilize it. This allows us to take advantage of any accelerated CRC T10 DIF transform that is plugged into the crypto framework. Signed-off-by: Tim Chen <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-05-20	Hoist memcpy_fromiovec/memcpy_toiovec into lib/	Rusty Russell	2	-1/+54
	ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined! That function is only present with CONFIG_NET. Turns out that crypto/algif_skcipher.c also uses that outside net, but it actually needs sockets anyway. In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist that function and revert that commit too. socket.h already includes uio.h, so no callers need updating; trying only broke things fo x86_64 randconfig (thanks Fengguang!). Reported-by: Randy Dunlap <[email protected]> Acked-by: David S. Miller <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2013-05-08	Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-block	Linus Torvalds	1	-10/+46
	Pull block driver updates from Jens Axboe: "It might look big in volume, but when categorized, not a lot of drivers are touched. The pull request contains: - mtip32xx fixes from Micron. - A slew of drbd updates, this time in a nicer series. - bcache, a flash/ssd caching framework from Kent. - Fixes for cciss" * 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits) bcache: Use bd_link_disk_holder() bcache: Allocator cleanup/fixes cciss: bug fix to prevent cciss from loading in kdump crash kernel cciss: add cciss_allow_hpsa module parameter drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions mtip32xx: Workaround for unaligned writes bcache: Make sure blocksize isn't smaller than device blocksize bcache: Fix merge_bvec_fn usage for when it modifies the bvm bcache: Correctly check against BIO_MAX_PAGES bcache: Hack around stuff that clones up to bi_max_vecs bcache: Set ra_pages based on backing device's ra_pages bcache: Take data offset from the bdev superblock. mtip32xx: mtip32xx: Disable TRIM support mtip32xx: fix a smatch warning bcache: Disable broken btree fuzz tester bcache: Fix a format string overflow bcache: Fix a minor memory leak on device teardown bcache: Documentation updates bcache: Use WARN_ONCE() instead of __WARN() bcache: Add missing #include <linux/prefetch.h> ...
2013-05-07	rwsem: check counter to avoid cmpxchg calls	Davidlohr Bueso	1	-1/+3
	This patch tries to reduce the amount of cmpxchg calls in the writer failed path by checking the counter value first before issuing the instruction. If ->count is not set to RWSEM_WAITING_BIAS then there is no point wasting a cmpxchg call. Furthermore, Michel states "I suppose it helps due to the case where someone else steals the lock while we're trying to acquire sem->wait_lock." Two very different workloads and machines were used to see how this patch improves throughput: pgbench on a quad-core laptop and aim7 on a large 8 socket box with 80 cores. Some results comparing Michel's fast-path write lock stealing (tps-rwsem) on a quad-core laptop running pgbench: \| db_size \| clients \| tps-rwsem \| tps-patch \| +---------+----------+----------------+--------------+ \| 160 MB \| 1 \| 6906 \| 9153 \| + 32.5 \| 160 MB \| 2 \| 15931 \| 22487 \| + 41.1% \| 160 MB \| 4 \| 33021 \| 32503 \| \| 160 MB \| 8 \| 34626 \| 34695 \| \| 160 MB \| 16 \| 33098 \| 34003 \| \| 160 MB \| 20 \| 31343 \| 31440 \| \| 160 MB \| 30 \| 28961 \| 28987 \| \| 160 MB \| 40 \| 26902 \| 26970 \| \| 160 MB \| 50 \| 25760 \| 25810 \| ------------------------------------------------------ \| 1.6 GB \| 1 \| 7729 \| 7537 \| \| 1.6 GB \| 2 \| 19009 \| 23508 \| + 23.7% \| 1.6 GB \| 4 \| 33185 \| 32666 \| \| 1.6 GB \| 8 \| 34550 \| 34318 \| \| 1.6 GB \| 16 \| 33079 \| 32689 \| \| 1.6 GB \| 20 \| 31494 \| 31702 \| \| 1.6 GB \| 30 \| 28535 \| 28755 \| \| 1.6 GB \| 40 \| 27054 \| 27017 \| \| 1.6 GB \| 50 \| 25591 \| 25560 \| ------------------------------------------------------ \| 7.6 GB \| 1 \| 6224 \| 7469 \| + 20.0% \| 7.6 GB \| 2 \| 13611 \| 12778 \| \| 7.6 GB \| 4 \| 33108 \| 32927 \| \| 7.6 GB \| 8 \| 34712 \| 34878 \| \| 7.6 GB \| 16 \| 32895 \| 33003 \| \| 7.6 GB \| 20 \| 31689 \| 31974 \| \| 7.6 GB \| 30 \| 29003 \| 28806 \| \| 7.6 GB \| 40 \| 26683 \| 26976 \| \| 7.6 GB \| 50 \| 25925 \| 25652 \| ------------------------------------------------------ For the aim7 worloads, they overall improved on top of Michel's patchset. For full graphs on how the rwsem series plus this patch behaves on a large 8 socket machine against a vanilla kernel: http://stgolabs.net/rwsem-aim7-results.tar.gz Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	kref: minor cleanup	Anatol Pomozov	1	-1/+1
	- make warning smp-safe - result of atomic _unless_zero functions should be checked by caller to avoid use-after-free error - trivial whitespace fix. Link: https://lkml.org/lkml/2013/4/12/391 Tested: compile x86, boot machine and run xfstests Signed-off-by: Anatol Pomozov <[email protected]> [ Removed line-break, changed to use WARN_ON_ONCE() - Linus ] Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	Merge branch 'rwsem-optimizations'	Linus Torvalds	2	-146/+132
	Merge rwsem optimizations from Michel Lespinasse: "These patches extend Alex Shi's work (which added write lock stealing on the rwsem slow path) in order to provide rwsem write lock stealing on the fast path (that is, without taking the rwsem's wait_lock). I have unfortunately been unable to push this through -next before due to Ingo Molnar / David Howells / Peter Zijlstra being busy with other things. However, this has gotten some attention from Rik van Riel and Davidlohr Bueso who both commented that they felt this was ready for v3.10, and Ingo Molnar has said that he was OK with me pushing directly to you. So, here goes :) Davidlohr got the following test results from pgbench running on a quad-core laptop: \| db_size \| clients \| tps-vanilla \| tps-rwsem \| +---------+----------+----------------+--------------+ \| 160 MB \| 1 \| 5803 \| 6906 \| + 19.0% \| 160 MB \| 2 \| 13092 \| 15931 \| \| 160 MB \| 4 \| 29412 \| 33021 \| \| 160 MB \| 8 \| 32448 \| 34626 \| \| 160 MB \| 16 \| 32758 \| 33098 \| \| 160 MB \| 20 \| 26940 \| 31343 \| + 16.3% \| 160 MB \| 30 \| 25147 \| 28961 \| \| 160 MB \| 40 \| 25484 \| 26902 \| \| 160 MB \| 50 \| 24528 \| 25760 \| ------------------------------------------------------ \| 1.6 GB \| 1 \| 5733 \| 7729 \| + 34.8% \| 1.6 GB \| 2 \| 9411 \| 19009 \| + 101.9% \| 1.6 GB \| 4 \| 31818 \| 33185 \| \| 1.6 GB \| 8 \| 33700 \| 34550 \| \| 1.6 GB \| 16 \| 32751 \| 33079 \| \| 1.6 GB \| 20 \| 30919 \| 31494 \| \| 1.6 GB \| 30 \| 28540 \| 28535 \| \| 1.6 GB \| 40 \| 26380 \| 27054 \| \| 1.6 GB \| 50 \| 25241 \| 25591 \| ------------------------------------------------------ \| 7.6 GB \| 1 \| 5779 \| 6224 \| \| 7.6 GB \| 2 \| 10897 \| 13611 \| + 24.9% \| 7.6 GB \| 4 \| 32683 \| 33108 \| \| 7.6 GB \| 8 \| 33968 \| 34712 \| \| 7.6 GB \| 16 \| 32287 \| 32895 \| \| 7.6 GB \| 20 \| 27770 \| 31689 \| + 14.1% \| 7.6 GB \| 30 \| 26739 \| 29003 \| \| 7.6 GB \| 40 \| 24901 \| 26683 \| \| 7.6 GB \| 50 \| 17115 \| 25925 \| + 51.5% ------------------------------------------------------ (Davidlohr also has one additional patch which further improves throughput, though I will ask him to send it directly to you as I have suggested some minor changes)." * emailed patches from Michel Lespinasse <[email protected]>: rwsem: no need for explicit signed longs x86 rwsem: avoid taking slow path when stealing write lock rwsem: do not block readers at head of queue if other readers are active rwsem: implement support for write lock stealing on the fastpath rwsem: simplify __rwsem_do_wake rwsem: skip initial trylock in rwsem_down_write_failed rwsem: avoid taking wait_lock in rwsem_down_write_failed rwsem: use cmpxchg for trying to steal write lock rwsem: more agressive lock stealing in rwsem_down_write_failed rwsem: simplify rwsem_down_write_failed rwsem: simplify rwsem_down_read_failed rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed rwsem: shorter spinlocked section in rwsem_down_failed_common() rwsem: make the waiter type an enumeration rather than a bitmask
2013-05-07	rwsem: no need for explicit signed longs	Davidlohr Bueso	1	-5/+3
	Change explicit "signed long" declarations into plain "long" as suggested by Peter Hurley. Signed-off-by: Davidlohr Bueso <[email protected]> Reviewed-by: Michel Lespinasse <[email protected]> Signed-off-by: Michel Lespinasse <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: do not block readers at head of queue if other readers are active	Michel Lespinasse	1	-2/+8
	This change fixes a race condition where a reader might determine it needs to block, but by the time it acquires the wait_lock the rwsem has active readers and no queued waiters. In this situation the reader can run in parallel with the existing active readers; it does not need to block until the active readers complete. Thanks to Peter Hurley for noticing this possible race. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: implement support for write lock stealing on the fastpath	Michel Lespinasse	1	-32/+32
	When we decide to wake up readers, we must first grant them as many read locks as necessary, and then actually wake up all these readers. But in order to know how many read shares to grant, we must first count the readers at the head of the queue. This might take a while if there are many readers, and we want to be protected against a writer stealing the lock while we're counting. To that end, we grant the first reader lock before counting how many more readers are queued. We also require some adjustments to the wake_type semantics. RWSEM_WAKE_NO_ACTIVE used to mean that we had found the count to be RWSEM_WAITING_BIAS, in which case the rwsem was known to be free as nobody could steal it while we hold the wait_lock. This doesn't make sense once we implement fastpath write lock stealing, so we now use RWSEM_WAKE_ANY in that case. Similarly, when rwsem_down_write_failed found that a read lock was active, it would use RWSEM_WAKE_READ_OWNED which signalled that new readers could be woken without checking first that the rwsem was available. We can't do that anymore since the existing readers might release their read locks, and a writer could steal the lock before we wake up additional readers. So, we have to use a new RWSEM_WAKE_READERS value to indicate we only want to wake readers, but we don't currently hold any read lock. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: simplify __rwsem_do_wake	Michel Lespinasse	2	-30/+19
	This is mostly for cleanup value: - We don't need several gotos to handle the case where the first waiter is a writer. Two simple tests will do (and generate very similar code). - In the remainder of the function, we know the first waiter is a reader, so we don't have to double check that. We can use do..while loops to iterate over the readers to wake (generates slightly better code). Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: skip initial trylock in rwsem_down_write_failed	Michel Lespinasse	1	-8/+9
	We can skip the initial trylock in rwsem_down_write_failed() if there are known active lockers already, thus saving one likely-to-fail cmpxchg. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Acked-by: Rik van Riel <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: avoid taking wait_lock in rwsem_down_write_failed	Michel Lespinasse	1	-2/+8
	In rwsem_down_write_failed(), if there are active locks after we wake up (i.e. the lock got stolen from us), skip taking the wait_lock and go back to sleep immediately. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Acked-by: Rik van Riel <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: use cmpxchg for trying to steal write lock	Michel Lespinasse	1	-20/+6
	Using rwsem_atomic_update to try stealing the write lock forced us to undo the adjustment in the failure path. We can have simpler and faster code by using cmpxchg instead. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Acked-by: Rik van Riel <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: more agressive lock stealing in rwsem_down_write_failed	Michel Lespinasse	1	-21/+8
	Some small code simplifications can be achieved by doing more agressive lock stealing: - When rwsem_down_write_failed() notices that there are no active locks (and thus no thread to wake us if we decided to sleep), it used to wake the first queued process. However, stealing the lock is also sufficient to deal with this case, so we don't need this check anymore. - In try_get_writer_sem(), we can steal the lock even when the first waiter is a reader. This is correct because the code path that wakes readers is protected by the wait_lock. As to the performance effects of this change, they are expected to be minimal: readers are still granted the lock (rather than having to acquire it themselves) when they reach the front of the wait queue, so we have essentially the same behavior as in rwsem-spinlock. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: simplify rwsem_down_write_failed	Michel Lespinasse	1	-24/+9
	When waking writers, we never grant them the lock - instead, they have to acquire it themselves when they run, and remove themselves from the wait_list when they succeed. As a result, we can do a few simplifications in rwsem_down_write_failed(): - We don't need to check for !waiter.task since __rwsem_do_wake() doesn't remove writers from the wait_list - There is no point releaseing the wait_lock before entering the wait loop, as we will need to reacquire it immediately. We can change the loop so that the lock is always held at the start of each loop iteration. - We don't need to get a reference on the task structure, since the task is responsible for removing itself from the wait_list. There is no risk, like in the rwsem_down_read_failed() case, that a task would wake up and exit (thus destroying its task structure) while __rwsem_do_wake() is still running - wait_lock protects against that. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: simplify rwsem_down_read_failed	Michel Lespinasse	1	-20/+2
	When trying to acquire a read lock, the RWSEM_ACTIVE_READ_BIAS adjustment doesn't cause other readers to block, so we never have to worry about waking them back after canceling this adjustment in rwsem_down_read_failed(). We also never want to steal the lock in rwsem_down_read_failed(), so we don't have to grab the wait_lock either. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed	Michel Lespinasse	1	-15/+57
	Remove the rwsem_down_failed_common function and replace it with two identical copies of its code in rwsem_down_{read,write}_failed. This is because we want to make different optimizations in rwsem_down_{read,write}_failed; we are adding this pure-duplication step as a separate commit in order to make it easier to check the following steps. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: shorter spinlocked section in rwsem_down_failed_common()	Michel Lespinasse	1	-5/+3
	This change reduces the size of the spinlocked and TASK_UNINTERRUPTIBLE sections in rwsem_down_failed_common(): - We only need the sem->wait_lock to insert ourselves on the wait_list; the waiter node can be prepared outside of the wait_lock. - The task state only needs to be set to TASK_UNINTERRUPTIBLE immediately before checking if we actually need to sleep; it doesn't need to protect the entire function. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07	rwsem: make the waiter type an enumeration rather than a bitmask	Michel Lespinasse	2	-18/+24
	We are not planning to add some new waiter flags, so we can convert the waiter type into an enumeration. Background: David Howells suggested I do this back when I tried adding a new waiter type for unfair readers. However, I believe the cleanup applies regardless of that use case. Signed-off-by: Michel Lespinasse <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-05	Give the OID registry file module info to avoid kernel tainting	David Howells	1	-0/+5
	Give the OID registry file module information so that it doesn't taint the kernel when compiled as a module and loaded. Reported-by: Dros Adamson <[email protected]> Signed-off-by: David Howells <[email protected]> cc: Trond Myklebust <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2013-05-02	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux	Linus Torvalds	1	-3/+1
	Pull drm updates from Dave Airlie: "This is the main drm pull request for 3.10. Wierd bits: - OMAP drm changes required OMAP dss changes, in drivers/video, so I took them in here. - one more fbcon fix for font handover - VT switch avoidance in pm code - scatterlist helpers for gpu drivers - have acks from akpm Highlights: - qxl kms driver - driver for the spice qxl virtual GPU Nouveau: - fermi/kepler VRAM compression - GK110/nvf0 modesetting support. Tegra: - host1x core merged with 2D engine support i915: - vt switchless resume - more valleyview support - vblank fixes - modesetting pipe config rework radeon: - UVD engine support - SI chip tiling support - GPU registers initialisation from golden values. exynos: - device tree changes - fimc block support Otherwise: - bunches of fixes all over the place." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (513 commits) qxl: update to new idr interfaces. drm/nouveau: fix build with nv50->nvc0 drm/radeon: fix handling of v6 power tables drm/radeon: clarify family checks in pm table parsing drm/radeon: consolidate UVD clock programming drm/radeon: fix UPLL_REF_DIV_MASK definition radeon: add bo tracking debugfs drm/radeon: add new richland pci ids drm/radeon: add some new SI PCI ids drm/radeon: fix scratch reg handling for UVD fence drm/radeon: allocate SA bo in the requested domain drm/radeon: fix possible segfault when parsing pm tables drm/radeon: fix endian bugs in atom_allocate_fb_scratch() OMAPDSS: TFP410: return EPROBE_DEFER if the i2c adapter not found OMAPDSS: VENC: Add error handling for venc_probe_pdata OMAPDSS: HDMI: Add error handling for hdmi_probe_pdata OMAPDSS: RFBI: Add error handling for rfbi_probe_pdata OMAPDSS: DSI: Add error handling for dsi_probe_pdata OMAPDSS: SDI: Add error handling for sdi_probe_pdata OMAPDSS: DPI: Add error handling for dpi_probe_pdata ...
2013-05-02	Merge branch 'sched-urgent-for-linus' of ↵	Linus Torvalds	1	-13/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "This fixes the cputime scaling overflow problems for good without having bad 32-bit overhead, and gets rid of the div64_u64_rem() helper as well." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "math64: New div64_u64_rem helper" sched: Avoid prev->stime underflow sched: Do not account bogus utime sched: Avoid cputime scaling overflow
2013-05-01	Merge branch 'for-linus' of ↵	Linus Torvalds	1	-2/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...
2013-04-30	Merge branch 'akpm' (incoming from Andrew)	Linus Torvalds	9	-54/+291
	Merge third batch of fixes from Andrew Morton: "Most of the rest. I still have two large patchsets against AIO and IPC, but they're a bit stuck behind other trees and I'm about to vanish for six days. - random fixlets - inotify - more of the MM queue - show_stack() cleanups - DMI update - kthread/workqueue things - compat cleanups - epoll udpates - binfmt updates - nilfs2 - hfs - hfsplus - ptrace - kmod - coredump - kexec - rbtree - pids - pidns - pps - semaphore tweaks - some w1 patches - relay updates - core Kconfig changes - sysrq tweaks" * emailed patches from Andrew Morton <[email protected]>: (109 commits) Documentation/sysrq: fix inconstistent help message of sysrq key ethernet/emac/sysrq: fix inconstistent help message of sysrq key sparc/sysrq: fix inconstistent help message of sysrq key powerpc/xmon/sysrq: fix inconstistent help message of sysrq key ARM/etm/sysrq: fix inconstistent help message of sysrq key power/sysrq: fix inconstistent help message of sysrq key kgdb/sysrq: fix inconstistent help message of sysrq key lib/decompress.c: fix initconst notifier-error-inject: fix module names in Kconfig kernel/sys.c: make prctl(PR_SET_MM) generally available UAPI: remove empty Kbuild files menuconfig: print more info for symbol without prompts init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display kconfig menu: move Virtualization drivers near other virtualization options Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS relay: use macro PAGE_ALIGN instead of FIX_SIZE kernel/relay.c: move FIX_SIZE macro into relay.c kernel/relay.c: remove unused function argument actor drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave() drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave() ...
2013-04-30	lib/decompress.c: fix initconst	Andi Kleen	1	-1/+1
	Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>