blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2010-05-27	exit: avoid sig->count in __exit_signal() to detect the group-dead case	Oleg Nesterov	1	-2/+3
	Change __exit_signal() to check thread_group_leader() instead of atomic_dec_and_test(&sig->count). This must be equivalent, the group leader must be released only after all other threads have exited and passed __exit_signal(). Henceforth sig->count is not actually used, except in fs/proc for get_nr_threads/etc. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Cc: Veaceslav Falico <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	exit: avoid sig->count in de_thread/__exit_signal synchronization	Oleg Nesterov	2	-7/+6
	de_thread() and __exit_signal() use signal_struct->count/notify_count for synchronization. We can simplify the code and use ->notify_count only. Instead of comparing these two counters, we can change de_thread() to set ->notify_count = nr_of_sub_threads, then change __exit_signal() to dec-and-test this counter and notify group_exit_task. Note that __exit_signal() checks "notify_count > 0" just for symmetry with exit_notify(), we could just check it is != 0. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Cc: Veaceslav Falico <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	exit: change zap_other_threads() to count sub-threads	Oleg Nesterov	2	-9/+10
	Change zap_other_threads() to return the number of other sub-threads found on ->thread_group list. Other changes are cosmetic: - change the code to use while_each_thread() helper - remove the obsolete comment about SIGKILL/SIGSTOP Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Cc: Veaceslav Falico <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	exit: exit_notify() can trust signal->notify_count < 0	Oleg Nesterov	1	-5/+2
	signal_struct->count in its current form must die. - it has no reasons to be atomic_t - it looks like a reference counter, but it is not - otoh, we really need to make task->signal refcountable, just look at the extremely ugly task_rq_unlock_wait() called from __exit_signals(). - we should change the lifetime rules for task->signal, it should be pinned to task_struct. We have a lot of code which can be simplified after that. - it is not needed! while the code is correct, any usage of this counter is artificial, except fs/proc uses it correctly to show the number of threads. This series removes the usage of sig->count from exit pathes. This patch: Now that Veaceslav changed copy_signal() to use zalloc(), exit_notify() can just check notify_count < 0 to ensure the execing sub-threads needs the notification from us. No need to do other checks, notify_count != 0 must always mean ->group_exit_task != NULL is waiting for us. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Cc: Veaceslav Falico <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	coredump: shift down_write(mmap_sem) into coredump_wait()	Oleg Nesterov	1	-12/+7
	- move the cprm.mm_flags checks up, before we take mmap_sem - move down_write(mmap_sem) and ->core_state check from do_coredump() to coredump_wait() This simplifies the code and makes the locking symmetrical. Signed-off-by: Oleg Nesterov <[email protected]> Cc: David Howells <[email protected]> Cc: Neil Horman <[email protected]> Cc: Roland McGrath <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	coredump: factor out put_cred() calls	Oleg Nesterov	1	-10/+6
	Given that do_coredump() calls put_cred() on exit path, it is a bit ugly to do put_cred() + "goto fail" twice, just add the new "fail_creds" label. Signed-off-by: Oleg Nesterov <[email protected]> Cc: David Howells <[email protected]> Cc: Neil Horman <[email protected]> Cc: Roland McGrath <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	coredump: cleanup "ispipe" code	Oleg Nesterov	1	-22/+17
	- kill "int dump_count", argv_split(argcp) accepts argcp == NULL. - move "int dump_count" under " if (ispipe)" branch, fail_dropcount can check ispipe. - move "char **helper_argv" as well, change the code to do argv_free() right after call_usermodehelper_fns(). - If call_usermodehelper_fns() fails goto close_fail label instead of closing the file by hand. Signed-off-by: Oleg Nesterov <[email protected]> Cc: David Howells <[email protected]> Cc: Neil Horman <[email protected]> Cc: Roland McGrath <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	coredump: factor out the not-ispipe file checks	Oleg Nesterov	1	-32/+31
	do_coredump() does a lot of file checks after it opens the file or calls usermode helper. But all of these checks are only needed in !ispipe case. Move this code into the "else" branch and kill the ugly repetitive ispipe checks. Signed-off-by: Oleg Nesterov <[email protected]> Cc: David Howells <[email protected]> Cc: Neil Horman <[email protected]> Cc: Roland McGrath <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	call_usermodehelper: UMH_WAIT_EXEC ignores kernel_thread() failure	Oleg Nesterov	1	-2/+2
	UMH_WAIT_EXEC should report the error if kernel_thread() fails, like UMH_WAIT_PROC does. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	call_usermodehelper: simplify/fix UMH_NO_WAIT case	Oleg Nesterov	1	-6/+4
	__call_usermodehelper(UMH_NO_WAIT) has 2 problems: - if kernel_thread() fails, call_usermodehelper_freeinfo() is not called. - for unknown reason UMH_NO_WAIT has UMH_WAIT_PROC logic, we spawn yet another thread which waits until the user mode application exits. Change the UMH_NO_WAIT code to use ____call_usermodehelper() instead of wait_for_helper(), and do call_usermodehelper_freeinfo() unconditionally. We can rely on CLONE_VFORK, do_fork(CLONE_VFORK) until the child exits or execs. With or without this patch UMH_NO_WAIT does not report the error if kernel_thread() fails, this is correct since the caller doesn't wait for result. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	wait_for_helper: SIGCHLD from user-space can lead to use-after-free	Oleg Nesterov	1	-5/+5
	1. wait_for_helper() calls allow_signal(SIGCHLD) to ensure the child can't autoreap itself. However, this means that a spurious SIGCHILD from user-space can set TIF_SIGPENDING and: - kernel_thread() or sys_wait4() can fail due to signal_pending() - worse, wait4() can fail before ____call_usermodehelper() execs or exits. In this case the caller may kfree(subprocess_info) while the child still uses this memory. Change the code to use SIG_DFL instead of magic "(void __user *)2" set by allow_signal(). This means that SIGCHLD won't be delivered, yet the child won't autoreap itsefl. The problem is minor, only root can send a signal to this kthread. 2. If sys_wait4(&ret) fails it doesn't populate "ret", in this case wait_for_helper() reports a random value from uninitialized var. With this patch sys_wait4() should never fail, but still it makes sense to initialize ret = -ECHILD so that the caller can notice the problem. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	call_usermodehelper: no need to unblock signals	Oleg Nesterov	1	-3/+0
	____call_usermodehelper() correctly calls flush_signal_handlers() to set SIG_DFL, but sigemptyset(->blocked) and recalc_sigpending() are not needed. This kthread was forked by workqueue thread, all signals must be unblocked and ignored, no pending signal is possible. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	umh: creds: kill subprocess_info->cred logic	Oleg Nesterov	4	-81/+0
	Now that nobody ever changes subprocess_info->cred we can kill this member and related code. ____call_usermodehelper() always runs in the context of freshly forked kernel thread, it has the proper ->cred copied from its parent kthread, keventd. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	umh: creds: convert call_usermodehelper_keys() to use subprocess_info->init()	Oleg Nesterov	5	-37/+34
	call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change subprocess_info->cred in advance. Now that we have info->init() we can change this code to set tgcred->session_keyring in context of execing kernel thread. Note: since currently call_usermodehelper_keys() is never called with UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup() are not really needed, we could rely on install_session_keyring_to_cred() which does key_get() on success. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	exec: replace call_usermodehelper_pipe with use of umh init function and ↵	Neil Horman	3	-96/+56
	resolve limit The first patch in this series introduced an init function to the call_usermodehelper api so that processes could be customized by caller. This patch takes advantage of that fact, by customizing the helper in do_coredump to create the pipe and set its core limit to one (for our recusrsion check). This lets us clean up the previous uglyness in the usermodehelper internals and factor call_usermodehelper out entirely. While I'm at it, we can also modify the helper setup to look for a core limit value of 1 rather than zero for our recursion check Signed-off-by: Neil Horman <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	kmod: add init function to usermodehelper	Neil Horman	3	-35/+73
	About 6 months ago, I made a set of changes to how the core-dump-to-a-pipe feature in the kernel works. We had reports of several races, including some reports of apps bypassing our recursion check so that a process that was forked as part of a core_pattern setup could infinitely crash and refork until the system crashed. We fixed those by improving our recursion checks. The new check basically refuses to fork a process if its core limit is zero, which works well. Unfortunately, I've been getting grief from maintainer of user space programs that are inserted as the forked process of core_pattern. They contend that in order for their programs (such as abrt and apport) to work, all the running processes in a system must have their core limits set to a non-zero value, to which I say 'yes'. I did this by design, and think thats the right way to do things. But I've been asked to ease this burden on user space enough times that I thought I would take a look at it. The first suggestion was to make the recursion check fail on a non-zero 'special' number, like one. That way the core collector process could set its core size ulimit to 1, and enable the kernel's recursion detection. This isn't a bad idea on the surface, but I don't like it since its opt-in, in that if a program like abrt or apport has a bug and fails to set such a core limit, we're left with a recursively crashing system again. So I've come up with this. What I've done is modify the call_usermodehelper api such that an extra parameter is added, a function pointer which will be called by the user helper task, after it forks, but before it exec's the required process. This will give the caller the opportunity to get a call back in the processes context, allowing it to do whatever it needs to to the process in the kernel prior to exec-ing the user space code. In the case of do_coredump, this callback is ues to set the core ulimit of the helper process to 1. This elimnates the opt-in problem that I had above, as it allows the ulimit for core sizes to be set to the value of 1, which is what the recursion check looks for in do_coredump. This patch: Create new function call_usermodehelper_fns() and allow it to assign both an init and cleanup function, as we'll as arbitrary data. The init function is called from the context of the forked process and allows for customization of the helper process prior to calling exec. Its return code gates the continuation of the process, or causes its exit. Also add an arbitrary data pointer to the subprocess_info struct allowing for data to be passed from the caller to the new process, and the subsequent cleanup process Also, use this patch to cleanup the cleanup function. It currently takes an argp and envp pointer for freeing, which is ugly. Lets instead just make the subprocess_info structure public, and pass that to the cleanup and init routines Signed-off-by: Neil Horman <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	signals: check_kill_permission(): don't check creds if same_thread_group()	Oleg Nesterov	1	-2/+4
	Andrew Tridgell reports that aio_read(SIGEV_SIGNAL) can fail if the notification from the helper thread races with setresuid(), see http://samba.org/~tridge/junkcode/aio_uid.c This happens because check_kill_permission() doesn't permit sending a signal to the task with the different cred->xids. But there is not any security reason to check ->cred's when the task sends a signal (private or group-wide) to its sub-thread. Whatever we do, any thread can bypass all security checks and send SIGKILL to all threads, or it can block a signal SIG and do kill(gettid(), SIG) to deliver this signal to another sub-thread. Not to mention that CLONE_THREAD implies CLONE_VM. Change check_kill_permission() to avoid the credentials check when the sender and the target are from the same thread group. Also, move "cred = current_cred()" down to avoid calling get_current() twice. Note: David Howells pointed out we could relax this even more, the CLONE_SIGHAND (without CLONE_THREAD) case probably does not need these checks too. Roland said: : The glibc (libpthread) that does set*id across threads has : been in use for a while (2.3.4?), probably in distro's using kernels as old : or older than any active -stable streams. In the race in question, this : kernel bug is breaking valid POSIX application expectations. Reported-by: Andrew Tridgell <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Acked-by: David Howells <[email protected]> Cc: Eric Paris <[email protected]> Cc: Jakub Jelinek <[email protected]> Cc: James Morris <[email protected]> Cc: Roland McGrath <[email protected]> Cc: Stephen Smalley <[email protected]> Cc: <[email protected]> [all kernel versions] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	ptrace: PTRACE_GETFDPIC: fix the unsafe usage of child->mm	Oleg Nesterov	1	-2/+8
	Now that Mike Frysinger unified the FDPIC ptrace code, we can fix the unsafe usage of child->mm in ptrace_request(PTRACE_GETFDPIC). We have the reference to task_struct, and ptrace_check_attach() verified the tracee is stopped. But nothing can protect from SIGKILL after that, we must not assume child->mm != NULL. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Mike Frysinger <[email protected]> Acked-by: David Howells <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Greg Ungerer <[email protected]> Acked-by: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	ptrace: unify FDPIC implementations	Mike Frysinger	4	-67/+29
	The Blackfin/FRV/SuperH guys all have the same exact FDPIC ptrace code in their arch handlers (since they were probably copied & pasted). Since these ptrace interfaces are an arch independent aspect of the FDPIC code, unify them in the common ptrace code so new FDPIC ports don't need to copy and paste this fundamental stuff yet again. Signed-off-by: Mike Frysinger <[email protected]> Acked-by: Roland McGrath <[email protected]> Acked-by: David Howells <[email protected]> Acked-by: Paul Mundt <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	cpusets: randomize node rotor used in cpuset_mem_spread_node()	Jack Steiner	5	-1/+31
	Some workloads that create a large number of small files tend to assign too many pages to node 0 (multi-node systems). Part of the reason is that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at node 0 for newly created tasks. This patch changes the rotor to be initialized to a random node number of the cpuset. [[email protected]: fix layout] [[email protected]: Define stub numa_random() for !NUMA configuration] Signed-off-by: Jack Steiner <[email protected]> Signed-off-by: Lee Schermerhorn <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Paul Menage <[email protected]> Cc: Jack Steiner <[email protected]> Cc: Robin Holt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	cpusets: new round-robin rotor for SLAB allocations	Jack Steiner	4	-5/+24
	We have observed several workloads running on multi-node systems where memory is assigned unevenly across the nodes in the system. There are numerous reasons for this but one is the round-robin rotor in cpuset_mem_spread_node(). For example, a simple test that writes a multi-page file will allocate pages on nodes 0 2 4 6 ... Odd nodes are skipped. (Sometimes it allocates on odd nodes & skips even nodes). An example is shown below. The program "lfile" writes a file consisting of 10 pages. The program then mmaps the file & uses get_mempolicy(..., MPOL_F_NODE) to determine the nodes where the file pages were allocated. The output is shown below: # ./lfile allocated on nodes: 2 4 6 0 1 2 6 0 2 There is a single rotor that is used for allocating both file pages & slab pages. Writing the file allocates both a data page & a slab page (buffer_head). This advances the RR rotor 2 nodes for each page allocated. A quick confirmation seems to confirm this is the cause of the uneven allocation: # echo 0 >/dev/cpuset/memory_spread_slab # ./lfile allocated on nodes: 6 7 8 9 0 1 2 3 4 5 This patch introduces a second rotor that is used for slab allocations. Signed-off-by: Jack Steiner <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Paul Menage <[email protected]> Cc: Jack Steiner <[email protected]> Cc: Robin Holt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	memcg: clean up memory thresholds	Kirill A. Shutemov	1	-85/+66
	Introduce struct mem_cgroup_thresholds. It helps to reduce number of checks of thresholds type (memory or mem+swap). [[email protected]: repair comment] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Phil Carmody <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Paul Menage <[email protected]> Cc: Li Zefan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	cgroups: make cftype.unregister_event() void-returning	Kirill A. Shutemov	3	-26/+42
	Since we are unable to handle an error returned by cftype.unregister_event() properly, let's make the callback void-returning. mem_cgroup_unregister_event() has been rewritten to be a "never fail" function. On mem_cgroup_usage_register_event() we save old buffer for thresholds array and reuse it in mem_cgroup_usage_unregister_event() to avoid allocation. Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Phil Carmody <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Paul Menage <[email protected]> Cc: Li Zefan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	memcg: fix mis-accounting of file mapped racy with migration	[email protected]	4	-41/+107
	FILE_MAPPED per memcg of migrated file cache is not properly updated, because our hook in page_add_file_rmap() can't know to which memcg FILE_MAPPED should be counted. Basically, this patch is for fixing the bug but includes some big changes to fix up other messes. Now, at migrating mapped file, events happen in following sequence. 1. allocate a new page. 2. get memcg of an old page. 3. charge ageinst a new page before migration. But at this point, no changes to new page's page_cgroup, no commit for the charge. (IOW, PCG_USED bit is not set.) 4. page migration replaces radix-tree, old-page and new-page. 5. page migration remaps the new page if the old page was mapped. 6. Here, the new page is unlocked. 7. memcg commits the charge for newpage, Mark the new page's page_cgroup as PCG_USED. Because "commit" happens after page-remap, we can count FILE_MAPPED at "5", because we should avoid to trust page_cgroup->mem_cgroup. if PCG_USED bit is unset. (Note: memcg's LRU removal code does that but LRU-isolation logic is used for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is not on LRU or page_cgroup->mem_cgroup is NULL.) We can lose file_mapped accounting information at 5 because FILE_MAPPED is updated only when mapcount changes 0->1. So we should catch it. BTW, historically, above implemntation comes from migration-failure of anonymous page. Because we charge both of old page and new page with mapcount=0, we can't catch - the page is really freed before remap. - migration fails but it's freed before remap or .....corner cases. New migration sequence with memcg is: 1. allocate a new page. 2. mark PageCgroupMigration to the old page. 3. charge against a new page onto the old page's memcg. (here, new page's pc is marked as PageCgroupUsed.) 4. page migration replaces radix-tree, page table, etc... 5. At remapping, new page's page_cgroup is now makrked as "USED" We can catch 0->1 event and FILE_MAPPED will be properly updated. And we can catch SWAPOUT event after unlock this and freeing this page by unmap() can be caught. 7. Clear PageCgroupMigration of the old page. So, FILE_MAPPED will be correctly updated. Then, for what MIGRATION flag is ? Without it, at migration failure, we may have to charge old page again because it may be fully unmapped. "charge" means that we have to dive into memory reclaim or something complated. So, it's better to avoid charge it again. Before this patch, __commit_charge() was working for both of the old/new page and fixed up all. But this technique has some racy condtion around FILE_MAPPED and SWAPOUT etc... Now, the kernel use MIGRATION flag and don't uncharge old page until the end of migration. I hope this change will make memcg's page migration much simpler. This page migration has caused several troubles. Worth to add a flag for simplification. Reviewed-by: Daisuke Nishimura <[email protected]> Tested-by: Daisuke Nishimura <[email protected]> Reported-by: Daisuke Nishimura <[email protected]> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	mm: memcontrol - uninitialised return value	Phil Carmody	1	-1/+1
	Only an out of memory error will cause ret to be set. Signed-off-by: Phil Carmody <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	mm: remove unnecessary use of atomic	Phil Carmody	1	-7/+7
	The bottom 4 hunks are atomically changing memory to which there are no aliases as it's freshly allocated, so there's no need to use atomic operations. The other hunks are just atomic_read and atomic_set, and do not involve any read-modify-write. The use of atomic_{read,set} doesn't prevent a read/write or write/write race, so if a race were possible (I'm not saying one is), then it would still be there even with atomic_set. See: http://digitalvampire.org/blog/index.php/2007/05/13/atomic-cargo-cults/ Signed-off-by: Phil Carmody <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	memcg: make oom killer a no-op when no killable task can be found	David Rientjes	1	-4/+1
	It's pointless to try to kill current if select_bad_process() did not find an eligible task to kill in mem_cgroup_out_of_memory() since it's guaranteed that current is a member of the memcg that is oom and it is, by definition, unkillable. Signed-off-by: David Rientjes <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Li Zefan <[email protected]> Cc: Daisuke Nishimura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	memcg: update documentation	KAMEZAWA Hiroyuki	1	-93/+198
	Some information are old, and I think current document doesn't work as "a guide for users". We need summary of all of our controls, at least. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	memcg: move charge of file pages	Daisuke Nishimura	4	-18/+125
	This patch adds support for moving charge of file pages, which include normal file, tmpfs file and swaps of tmpfs file. It's enabled by setting bit 1 of <target cgroup>/memory.move_charge_at_immigrate. Unlike the case of anonymous pages, file pages(and swaps) in the range mmapped by the task will be moved even if the task hasn't done page fault, i.e. they might not be the task's "RSS", but other task's "RSS" that maps the same file. And mapcount of the page is ignored(the page can be moved even if page_mapcount(page) > 1). So, conditions that the page/swap should be met to be moved is that it must be in the range mmapped by the target task and it must be charged to the old cgroup. [[email protected]: coding-style fixes] [[email protected]: fix warning] Signed-off-by: Daisuke Nishimura <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	memcg: clean up move charge	Daisuke Nishimura	1	-37/+59
	This patch cleans up move charge code by: - define functions to handle pte for each types, and make is_target_pte_for_mc() cleaner. - instead of checking the MOVE_CHARGE_TYPE_ANON bit, define a function that checks the bit. Signed-off-by: Daisuke Nishimura <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	memcg: oom kill disable and oom status	KAMEZAWA Hiroyuki	2	-19/+117
	This adds a feature to disable oom-killer for memcg, if disabled, of course, tasks under memcg will stop. But now, we have oom-notifier for memcg. And the world around memcg is not under out-of-memory. memcg's out-of-memory just shows memcg hits limit. Then, administrator or management daemon can recover the situation by - kill some process - enlarge limit, add more swap. - migrate some tasks - remove file cache on tmps (difficult ?) Unlike oom-killer, you can take enough information before killing tasks. (by gcore, or, ps etc.) [[email protected]: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	memcg: oom notifier	KAMEZAWA Hiroyuki	2	-9/+111
	Considering containers or other resource management softwares in userland, event notification of OOM in memcg should be implemented. Now, memcg has "threshold" notifier which uses eventfd, we can make use of it for oom notification. This patch adds oom notification eventfd callback for memcg. The usage is very similar to threshold notifier, but control file is memory.oom_control and no arguments other than eventfd is required. % cgroup_event_notifier /cgroup/A/memory.oom_control dummy (About cgroup_event_notifier, see Documentation/cgroup/) Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: David Rientjes <[email protected]> Cc: Davide Libenzi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	memcg: oom wakeup filter	KAMEZAWA Hiroyuki	1	-17/+46
	memcg's oom waitqueue is a system-wide wait_queue (for handling hierarchy.) So, it's better to add custom wake function and do filtering in wake up path. This patch adds a filtering feature for waking up oom-waiters. Hierarchy is properly handled. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	Documentation/cgroups/cgroups.txt: fix reference to "numtasks"	Trevor Woerner	1	-1/+1
	Signed-off-by: Trevor Woerner <[email protected]> Cc: Paul Menage <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	Documentation: SubmittingDrivers: Resources	Arce, Abraham	1	-0/+5
	- Add additional location (Git) for the kernel master tree - Add reference to Git Project Signed-off-by: Abraham Arce <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	ufs: permit mounting of BorderWare filesystems	Thomas Stewart	2	-0/+3
	I recently had to recover some files from an old broken machine that was running BorderWare Document Gateway. It's basically a drop in web server for sharing files. From the look of the init process and using strings on of a few files it seems to be based on FreeBSD 3.3. The process turned out to be more difficult than I imagined, but to cut a long story short BorderWare in their wisdom use a nonstandard magic number in their UFS (ufstype=44bsd) file systems. Thus Linux refuses to mount the file systems in order to recover the data. After a bit of hunting I was able to make a quick fix to fs/ufs/super.c in order to detect the new magic number. I assume that this number is the same for all installations. It's quite easy to find out from ufs_fs.h. The superblock sits 8k into the block device and the magic number its 1372 bytes into the superblock struct. # dd if=/dev/sda5 skip=$(( 8192 + 1372 )) bs=1 count=4 2> /dev/null \| hd 00000000 97 26 24 0f \|.&$.\| # Signed-off-by: Thomas Stewart <[email protected]> Cc: Evgeniy Dushistov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	drivers/telephony/ixj.c: use memdup_user	Julia Lawall	1	-11/+4
	Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = $kmalloc@p\\|kzalloc@p$(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) \|\| ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	fbdev: bf54x-lq043fb: fix unused warnings with backlight code	Mike Frysinger	1	-3/+4
	The current backlight code is stubbed out, so the new props changes added some warnings: drivers/video/bf54x-lq043fb.c: In function 'bfin_bf54x_probe': drivers/video/bf54x-lq043fb.c:666: warning: label 'out9' defined but not used drivers/video/bf54x-lq043fb.c:504: warning: unused variable 'props' Fix em ! Signed-off-by: Mike Frysinger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	fbdev: bfin-t350mcqb-fb: avoid unused warnings in backlight code	Mike Frysinger	1	-3/+4
	The current backlight code is stubbed out, so the new props changes added some warnings about unused label/prop. Signed-off-by: Mike Frysinger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	drivers/video/via: use memdup_user	Julia Lawall	1	-8/+3
	Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = $kmalloc@p\\|kzalloc@p$(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) \|\| ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <[email protected]> Cc: Joseph Chan <[email protected]> Cc: Scott Fang <[email protected]> Cc: Florian Tobias Schandinat <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	add support for S3 Trio3D/1X/2X	Ondrej Zary	1	-20/+81
	Add support for S3 Trio3D/1X (86C360) and S3 Trio3D/2X (86C362 and 86C368) cards to s3fb driver. Tested with 86C362 AGP and 86C368 PCI&AGP. [[email protected]: coding-style fixes] Signed-off-by: Ondrej Zary <[email protected]> Acked-by: Ondrej Zajicek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	drivers/gpio/it8761e_gpio: check return value of gpiochip_remove()	Daniel Mack	1	-1/+4
	This eliminates the following build warning: drivers/gpio/it8761e_gpio.c: In function `it8761e_gpio_exit': drivers/gpio/it8761e_gpio.c:220: warning: ignoring return value of `gpiochip_remove', declared with attribute warn_unused_result Signed-off-by: Daniel Mack <[email protected]> Cc: Denis Turischev <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	gpio: add Penwell gpio support	Alek Du	2	-34/+53
	Intel Penwell chip has two 96 pins GPIO blocks, which are very similiar as Intel Langwell chip GPIO block, except for pin number difference. This patch expends the original Langwell GPIO driver to support Penwell's. Signed-off-by: Alek Du <[email protected]> Cc: David Brownell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	arm: omap: remove the unused omap_gpio_set_debounce methods	Felipe Balbi	1	-74/+0
	Nobody uses that anymore, so remove and expect drivers to use the gpiolib implementation. Signed-off-by: Felipe Balbi <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: David Brownell <[email protected]> Cc: Mark Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	arm: omap: switch over to gpio_set_debounce	Felipe Balbi	5	-12/+6
	Stop using the omap-specific implementations for gpio debouncing now that gpiolib provides its own support. Signed-off-by: Felipe Balbi <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: David Brownell <[email protected]> Cc: Mark Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	arm: omap: gpio: implement set_debounce method	Felipe Balbi	1	-0/+68
	OMAP supports debouncing of gpio lines, implement the method using gpiolib. Signed-off-by: Felipe Balbi <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: David Brownell <[email protected]> Cc: Mark Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	gpiolib: introduce set_debounce method	Felipe Balbi	3	-0/+53
	A few architectures, like OMAP, allow you to set a debouncing time for the gpio before generating the IRQ. Teach gpiolib about that. Mark said: : This would be generally useful for embedded systems, especially where : the interrupt concerned is a wake source. It allows drivers to avoid : spurious interrupts from noisy sources so if the hardware supports it : the driver can avoid having to explicitly wait for the signal to become : stable and software has to cope with fewer events. We've lived without : it for quite some time, though. David said: : I looked at adding debounce support to the generic GPIO calls (and thus : gpiolib) some time back, but decided against it. I forget why at this : time (check list archives) but it wasn't because of lack of utility in : certain contexts. : : One thing to watch out for is just how variable the hardware capabilities : are. Atmel GPIOs have something like a fixed number of 32K clock cycles : for debounce, twl4030 had something odd, OMAPs were more like the Atmel : chips but with a different clock. In some cases debouncing had to be : ganged, not per-GPIO. And so forth. Signed-off-by: Felipe Balbi <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: David Brownell <[email protected]> Reviewed-by: Mark Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	gpiolib: make gpiochip_add() show a better error message	Ben Dooks	1	-1/+1
	The current message, 'not registered' is confusing as it implies it was not registered with something, whereas printing 'failed to register' implies it was the gpiochip_add() call that did not work correctly. Signed-off-by: Ben Dooks <[email protected]> Cc: David Brownell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	gpio: max732x: fix input configuration for open-drain pins	Marc Zyngier	1	-0/+7
	Fix a bug I noticed while hacking on the max732x driver for interrupt support. According to the datasheets, open-drain pins have to be configured as output-high (which in that case is actually high impedance) to be used as input. Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Eric Miao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27	max732x: correct nr_port checking off by one error	Axel Lin	1	-3/+3
	Setup both client_group_a and client_group_b if nr_port > 8 (not including nr_port==8). Signed-off-by: Axel Lin <[email protected]> Cc: Eric Miao <[email protected]> Cc: Ben Dooks <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>