aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2010-05-27dma-mapping: remove deprecated dma_sync_single and dma_sync_sg APIFUJITA Tomonori1-15/+0
Since 2.6.5, it had been commented, 'for backwards compatibility, removed in 2.7.x'. Since 2.6.31, it have been marked as __deprecated. I think that we can remove the API safely now. Signed-off-by: FUJITA Tomonori <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27dma-mapping: remove unnecessary sync_single_range_* in dma_map_opsFUJITA Tomonori2-28/+2
sync_single_range_for_cpu and sync_single_range_for_device hooks are unnecessary because sync_single_for_cpu and sync_single_for_device can be used instead. Signed-off-by: FUJITA Tomonori <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27swiotlb: remove unnecessary swiotlb_sync_single_range_*FUJITA Tomonori1-10/+0
swiotlb_sync_single_range_for_cpu and swiotlb_sync_single_range_for_device are unnecessary because swiotlb_sync_single_for_cpu and swiotlb_sync_single_for_device can be used instead. Signed-off-by: FUJITA Tomonori <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27lib/random32: export pseudo-random number generator for modulesJoe Eykholt1-0/+28
This patch moves the definition of struct rnd_state and the inline __seed() function to linux/random.h. It renames the static __random32() function to prandom32() and exports it for use in modules. prandom32() is useful as a privately-seeded pseudo random number generator that can give the same result every time it is initialized. For FCoE FC-BB-6 VN2VN mode self-selected unique FC address generation, we need an pseudo-random number generator seeded with the 64-bit world-wide port name. A truly random generator or one seeded with randomness won't do because the same sequence of numbers should be generated each time we boot or the link comes up. A prandom32_seed() inline function is added to the header file. It is inlined not for speed, but so the function won't be expanded in the base kernel, but only in the module that uses it. Signed-off-by: Joe Eykholt <[email protected]> Acked-by: Matt Mackall <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27INIT_SIGHAND: use SIG_DFL instead of NULLOleg Nesterov1-1/+1
Cosmetic, no changes in the compiled code. Just s/NULL/SIG_DFL/ to make it more readable and grep-friendly. Note: probably SIG_IGN makes more sense, we could kill ignore_signals(). But then kernel_init() should do flush_signal_handlers() before exec(). Signed-off-by: Oleg Nesterov <[email protected]> Cc: Cedric Le Goater <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Herbert Poetzl <[email protected]> Cc: Mathias Krause <[email protected]> Acked-by: Roland McGrath <[email protected]> Acked-by: Serge Hallyn <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27pids: init_struct_pid.tasks should never see the swapper processOleg Nesterov1-4/+4
"statically initialize struct pid for swapper" commit 820e45db says: Statically initialize a struct pid for the swapper process (pid_t == 0) and attach it to init_task. This is needed so task_pid(), task_pgrp() and task_session() interfaces work on the swapper process also. OK, but: - it doesn't make sense to add init_task.pids[].node into init_struct_pid.tasks[], and in fact this just wrong. idle threads are special, they shouldn't be visible on any global list. In particular do_each_pid_task(init_struct_pid) shouldn't see swapper. This is the actual reason why kill(0, SIGKILL) from /sbin/init (which starts with 0,0 special pids) crashes the kernel. The signal sent to pgid/sid == 0 must never see idle threads, even if the previous patch fixed the crash itself. - we have other idle threads running on the non-boot CPUs, see the next patch. Change INIT_STRUCT_PID/INIT_PID_LINK to create the empty/unhashed hlist_head/hlist_node. Like any other idle thread swapper can never exit, so detach_pid()->__hlist_del() is not possible, but we could change INIT_PID_LINK() to set pprev = &next if needed. All we need is the valid swapper->pids[].pid == &init_struct_pid. Reported-by: Mathias Krause <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Cc: Cedric Le Goater <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Herbert Poetzl <[email protected]> Cc: Mathias Krause <[email protected]> Acked-by: Roland McGrath <[email protected]> Acked-by: Serge Hallyn <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27INIT_TASK() should initialize ->thread_group listOleg Nesterov1-0/+1
The trivial /sbin/init doing int main(void) { kill(0, SIGKILL) } crashes the kernel. This happens because __kill_pgrp_info(init_struct_pid) also sends SIGKILL to the swapper process which runs with the uninitialized ->thread_group. Change INIT_TASK() to initialize ->thread_group properly. Note: the real problem is that the swapper process must not be visible to signals, see the next patch. But this change is right anyway and fixes the crash. Reported-and-tested-by: Mathias Krause <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Cc: Cedric Le Goater <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Herbert Poetzl <[email protected]> Cc: Mathias Krause <[email protected]> Acked-by: Roland McGrath <[email protected]> Acked-by: Serge Hallyn <[email protected]> Acked-by: Sukadev Bhattiprolu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27pids: increase pid_max based on num_possible_cpusHedi Berriche1-0/+9
On a system with a substantial number of processors, the early default pid_max of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes started before the login prompt. It's estimated that with 2048 CPU's we will pass the 32k limit. With 4096, we'll reach that limit very early during the boot cycle, and processes would stall waiting for an available pid. This patch increases the early maximum number of pids available, and increases the minimum number of pids that can be set during runtime. [[email protected]: fix warnings] Signed-off-by: Hedi Berriche <[email protected]> Signed-off-by: Mike Travis <[email protected]> Signed-off-by: Robin Holt <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Alan Cox <[email protected]> Cc: Greg KH <[email protected]> Cc: Rik van Riel <[email protected]> Cc: John Stoffel <[email protected]> Cc: Jack Steiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27rapidio: add switch domain routinesAlexandre Bounine1-0/+4
Add switch specific domain routines required for 16-bit routing support in switches with hierarchical implementation of routing tables. Signed-off-by: Alexandre Bounine <[email protected]> Cc: Matt Porter <[email protected]> Cc: Li Yang <[email protected]> Cc: Kumar Gala <[email protected]> Cc: Thomas Moll <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27rapidio: modify initialization of switch operationsAlexandre Bounine2-35/+9
Modify the way how RapidIO switch operations are declared. Multiple assignments through the linker script replaced by single initialization call. Signed-off-by: Alexandre Bounine <[email protected]> Cc: Matt Porter <[email protected]> Cc: Li Yang <[email protected]> Cc: Kumar Gala <[email protected]> Cc: Thomas Moll <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27rapidio: add enabling SRIO port RX and TXThomas Moll1-0/+5
Add the functionality to enable Input receiver and Output transmitter of every port, to allow non-maintenance traffic. Signed-off-by: Thomas Moll <[email protected]> Signed-off-by: Alexandre Bounine <[email protected]> Cc: Matt Porter <[email protected]> Cc: Li Yang <[email protected]> Cc: Kumar Gala <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27rapidio: add Port-Write handling for EMAlexandre Bounine4-4/+110
Add RapidIO Port-Write message handling in the context of Error Management Extensions Specification Rev.1.3. Signed-off-by: Alexandre Bounine <[email protected]> Tested-by: Thomas Moll <[email protected]> Cc: Matt Porter <[email protected]> Cc: Li Yang <[email protected]> Cc: Kumar Gala <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27rapidio: add IDT CPS/TSI switchesAlexandre Bounine3-2/+32
Extentions to RapidIO switch support: 1. modify switch route operation declarations to allow using single switch-specific file for family of switches that share the same route table operations. 2. add standard route table operations for switches that that support route table manipulation registers as defined in the Rev.1.3 of RapidIO specification. 3. add clear-route-table operation for switches 4. add CPSxx and TSIxxx families of RapidIO switches Signed-off-by: Alexandre Bounine <[email protected]> Tested-by: Thomas Moll <[email protected]> Cc: Matt Porter <[email protected]> Cc: Li Yang <[email protected]> Cc: Kumar Gala <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27ipc/sem.c: cacheline align the ipc spinlock for semaphoresManfred Spraul1-1/+3
Cacheline align the spinlock for sysv semaphores. Without the patch, the spinlock and sem_otime [written by every semop that modified the array] and sem_base [read in the hot path of try_atomic_semop()] can be in the same cacheline. Signed-off-by: Manfred Spraul <[email protected]> Cc: Chris Mason <[email protected]> Cc: Zach Brown <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Nick Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27notifier: change notifier_from_errno(0) to return NOTIFY_OKAkinobu Mita1-1/+4
This changes notifier_from_errno(0) to be NOTIFY_OK instead of NOTIFY_STOP_MASK | NOTIFY_OK. Currently, the notifiers which return encapsulated errno value have to do something like this: err = do_something(); // returns -errno if (err) return notifier_from_errno(err); else return NOTIFY_OK; This change makes the above code simple: err = do_something(); // returns -errno return return notifier_from_errno(err); Signed-off-by: Akinobu Mita <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27proc: turn signal_struct->count into "int nr_threads"Oleg Nesterov2-3/+3
No functional changes, just s/atomic_t count/int nr_threads/. With the recent changes this counter has a single user, get_nr_threads() And, none of its callers need the really accurate number of threads, not to mention each caller obviously races with fork/exit. It is only used to report this value to the user-space, except first_tid() uses it to avoid the unnecessary while_each_thread() loop in the unlikely case. It is a bit sad we need a word in struct signal_struct for this, perhaps we can change get_nr_threads() to approximate the number of threads using signal->live and kill ->nr_threads later. [[email protected]: coding-style fixes] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Acked-by: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27proc: get_nr_threads() doesn't need ->siglock any longerOleg Nesterov1-0/+5
Now that task->signal can't go away get_nr_threads() doesn't need ->siglock to read signal->count. Also, make it inline, move into sched.h, and convert 2 other proc users of signal->count to use this (now trivial) helper. Henceforth get_nr_threads() is the only valid user of signal->count, we are ready to turn it into "int nr_threads" or, perhaps, kill it. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Acked-by: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27kill the obsolete thread_group_cputime_free() helperOleg Nesterov1-4/+0
Kill the empty thread_group_cputime_free() helper. It was needed to free the per-cpu data which we no longer have. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Roland McGrath <[email protected]> Cc: Veaceslav Falico <[email protected]> Cc: Stanislaw Gruszka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27signals: kill the awful task_rq_unlock_wait() hackOleg Nesterov1-1/+0
Now that task->signal can't go away we can revert the horrible hack added by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock"). And we can do more cleanups sched_stats.h/posix-cpu-timers.c later. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Alan Cox <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Acked-by: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27signals: make task_struct->signal immutable/refcountableOleg Nesterov1-1/+1
We have a lot of problems with accessing task_struct->signal, it can "disappear" at any moment. Even current can't use its ->signal safely after exit_notify(). ->siglock helps, but it is not convenient, not always possible, and sometimes it makes sense to use task->signal even after this task has already dead. This patch adds the reference counter, sigcnt, into signal_struct. This reference is owned by task_struct and it is dropped in __put_task_struct(). Perhaps it makes sense to export get/put_signal_struct() later, but currently I don't see the immediate reason. Rename __cleanup_signal() to free_signal_struct() and unexport it. With the previous changes it does nothing except kmem_cache_free(). Change __exit_signal() to not clear/free ->signal, it will be freed when the last reference to any thread in the thread group goes away. Note: - when the last thead exits signal->tty can point to nowhere, see the next patch. - with or without this patch signal_struct->count should go away, or at least it should be "int nr_threads" for fs/proc. This will be addressed later. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Alan Cox <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Acked-by: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27exit: change zap_other_threads() to count sub-threadsOleg Nesterov1-1/+1
Change zap_other_threads() to return the number of other sub-threads found on ->thread_group list. Other changes are cosmetic: - change the code to use while_each_thread() helper - remove the obsolete comment about SIGKILL/SIGSTOP Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Cc: Veaceslav Falico <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27umh: creds: kill subprocess_info->cred logicOleg Nesterov2-2/+0
Now that nobody ever changes subprocess_info->cred we can kill this member and related code. ____call_usermodehelper() always runs in the context of freshly forked kernel thread, it has the proper ->cred copied from its parent kthread, keventd. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27umh: creds: convert call_usermodehelper_keys() to use subprocess_info->init()Oleg Nesterov1-17/+0
call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change subprocess_info->cred in advance. Now that we have info->init() we can change this code to set tgcred->session_keyring in context of execing kernel thread. Note: since currently call_usermodehelper_keys() is never called with UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup() are not really needed, we could rely on install_session_keyring_to_cred() which does key_get() on success. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27exec: replace call_usermodehelper_pipe with use of umh init function and ↵Neil Horman1-7/+0
resolve limit The first patch in this series introduced an init function to the call_usermodehelper api so that processes could be customized by caller. This patch takes advantage of that fact, by customizing the helper in do_coredump to create the pipe and set its core limit to one (for our recusrsion check). This lets us clean up the previous uglyness in the usermodehelper internals and factor call_usermodehelper out entirely. While I'm at it, we can also modify the helper setup to look for a core limit value of 1 rather than zero for our recursion check Signed-off-by: Neil Horman <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27kmod: add init function to usermodehelperNeil Horman1-10/+41
About 6 months ago, I made a set of changes to how the core-dump-to-a-pipe feature in the kernel works. We had reports of several races, including some reports of apps bypassing our recursion check so that a process that was forked as part of a core_pattern setup could infinitely crash and refork until the system crashed. We fixed those by improving our recursion checks. The new check basically refuses to fork a process if its core limit is zero, which works well. Unfortunately, I've been getting grief from maintainer of user space programs that are inserted as the forked process of core_pattern. They contend that in order for their programs (such as abrt and apport) to work, all the running processes in a system must have their core limits set to a non-zero value, to which I say 'yes'. I did this by design, and think thats the right way to do things. But I've been asked to ease this burden on user space enough times that I thought I would take a look at it. The first suggestion was to make the recursion check fail on a non-zero 'special' number, like one. That way the core collector process could set its core size ulimit to 1, and enable the kernel's recursion detection. This isn't a bad idea on the surface, but I don't like it since its opt-in, in that if a program like abrt or apport has a bug and fails to set such a core limit, we're left with a recursively crashing system again. So I've come up with this. What I've done is modify the call_usermodehelper api such that an extra parameter is added, a function pointer which will be called by the user helper task, after it forks, but before it exec's the required process. This will give the caller the opportunity to get a call back in the processes context, allowing it to do whatever it needs to to the process in the kernel prior to exec-ing the user space code. In the case of do_coredump, this callback is ues to set the core ulimit of the helper process to 1. This elimnates the opt-in problem that I had above, as it allows the ulimit for core sizes to be set to the value of 1, which is what the recursion check looks for in do_coredump. This patch: Create new function call_usermodehelper_fns() and allow it to assign both an init and cleanup function, as we'll as arbitrary data. The init function is called from the context of the forked process and allows for customization of the helper process prior to calling exec. Its return code gates the continuation of the process, or causes its exit. Also add an arbitrary data pointer to the subprocess_info struct allowing for data to be passed from the caller to the new process, and the subsequent cleanup process Also, use this patch to cleanup the cleanup function. It currently takes an argp and envp pointer for freeing, which is ugly. Lets instead just make the subprocess_info structure public, and pass that to the cleanup and init routines Signed-off-by: Neil Horman <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27cpusets: randomize node rotor used in cpuset_mem_spread_node()Jack Steiner2-0/+9
Some workloads that create a large number of small files tend to assign too many pages to node 0 (multi-node systems). Part of the reason is that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at node 0 for newly created tasks. This patch changes the rotor to be initialized to a random node number of the cpuset. [[email protected]: fix layout] [[email protected]: Define stub numa_random() for !NUMA configuration] Signed-off-by: Jack Steiner <[email protected]> Signed-off-by: Lee Schermerhorn <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Paul Menage <[email protected]> Cc: Jack Steiner <[email protected]> Cc: Robin Holt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27cpusets: new round-robin rotor for SLAB allocationsJack Steiner2-0/+7
We have observed several workloads running on multi-node systems where memory is assigned unevenly across the nodes in the system. There are numerous reasons for this but one is the round-robin rotor in cpuset_mem_spread_node(). For example, a simple test that writes a multi-page file will allocate pages on nodes 0 2 4 6 ... Odd nodes are skipped. (Sometimes it allocates on odd nodes & skips even nodes). An example is shown below. The program "lfile" writes a file consisting of 10 pages. The program then mmaps the file & uses get_mempolicy(..., MPOL_F_NODE) to determine the nodes where the file pages were allocated. The output is shown below: # ./lfile allocated on nodes: 2 4 6 0 1 2 6 0 2 There is a single rotor that is used for allocating both file pages & slab pages. Writing the file allocates both a data page & a slab page (buffer_head). This advances the RR rotor 2 nodes for each page allocated. A quick confirmation seems to confirm this is the cause of the uneven allocation: # echo 0 >/dev/cpuset/memory_spread_slab # ./lfile allocated on nodes: 6 7 8 9 0 1 2 3 4 5 This patch introduces a second rotor that is used for slab allocations. Signed-off-by: Jack Steiner <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Paul Menage <[email protected]> Cc: Jack Steiner <[email protected]> Cc: Robin Holt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27cgroups: make cftype.unregister_event() void-returningKirill A. Shutemov1-1/+1
Since we are unable to handle an error returned by cftype.unregister_event() properly, let's make the callback void-returning. mem_cgroup_unregister_event() has been rewritten to be a "never fail" function. On mem_cgroup_usage_register_event() we save old buffer for thresholds array and reuse it in mem_cgroup_usage_unregister_event() to avoid allocation. Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Phil Carmody <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Paul Menage <[email protected]> Cc: Li Zefan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27memcg: fix mis-accounting of file mapped racy with migration[email protected]2-2/+9
FILE_MAPPED per memcg of migrated file cache is not properly updated, because our hook in page_add_file_rmap() can't know to which memcg FILE_MAPPED should be counted. Basically, this patch is for fixing the bug but includes some big changes to fix up other messes. Now, at migrating mapped file, events happen in following sequence. 1. allocate a new page. 2. get memcg of an old page. 3. charge ageinst a new page before migration. But at this point, no changes to new page's page_cgroup, no commit for the charge. (IOW, PCG_USED bit is not set.) 4. page migration replaces radix-tree, old-page and new-page. 5. page migration remaps the new page if the old page was mapped. 6. Here, the new page is unlocked. 7. memcg commits the charge for newpage, Mark the new page's page_cgroup as PCG_USED. Because "commit" happens after page-remap, we can count FILE_MAPPED at "5", because we should avoid to trust page_cgroup->mem_cgroup. if PCG_USED bit is unset. (Note: memcg's LRU removal code does that but LRU-isolation logic is used for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is not on LRU or page_cgroup->mem_cgroup is NULL.) We can lose file_mapped accounting information at 5 because FILE_MAPPED is updated only when mapcount changes 0->1. So we should catch it. BTW, historically, above implemntation comes from migration-failure of anonymous page. Because we charge both of old page and new page with mapcount=0, we can't catch - the page is really freed before remap. - migration fails but it's freed before remap or .....corner cases. New migration sequence with memcg is: 1. allocate a new page. 2. mark PageCgroupMigration to the old page. 3. charge against a new page onto the old page's memcg. (here, new page's pc is marked as PageCgroupUsed.) 4. page migration replaces radix-tree, page table, etc... 5. At remapping, new page's page_cgroup is now makrked as "USED" We can catch 0->1 event and FILE_MAPPED will be properly updated. And we can catch SWAPOUT event after unlock this and freeing this page by unmap() can be caught. 7. Clear PageCgroupMigration of the old page. So, FILE_MAPPED will be correctly updated. Then, for what MIGRATION flag is ? Without it, at migration failure, we may have to charge old page again because it may be fully unmapped. "charge" means that we have to dive into memory reclaim or something complated. So, it's better to avoid charge it again. Before this patch, __commit_charge() was working for both of the old/new page and fixed up all. But this technique has some racy condtion around FILE_MAPPED and SWAPOUT etc... Now, the kernel use MIGRATION flag and don't uncharge old page until the end of migration. I hope this change will make memcg's page migration much simpler. This page migration has caused several troubles. Worth to add a flag for simplification. Reviewed-by: Daisuke Nishimura <[email protected]> Tested-by: Daisuke Nishimura <[email protected]> Reported-by: Daisuke Nishimura <[email protected]> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27memcg: move charge of file pagesDaisuke Nishimura1-0/+5
This patch adds support for moving charge of file pages, which include normal file, tmpfs file and swaps of tmpfs file. It's enabled by setting bit 1 of <target cgroup>/memory.move_charge_at_immigrate. Unlike the case of anonymous pages, file pages(and swaps) in the range mmapped by the task will be moved even if the task hasn't done page fault, i.e. they might not be the task's "RSS", but other task's "RSS" that maps the same file. And mapcount of the page is ignored(the page can be moved even if page_mapcount(page) > 1). So, conditions that the page/swap should be met to be moved is that it must be in the range mmapped by the target task and it must be charged to the old cgroup. [[email protected]: coding-style fixes] [[email protected]: fix warning] Signed-off-by: Daisuke Nishimura <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27gpiolib: introduce set_debounce methodFelipe Balbi2-0/+10
A few architectures, like OMAP, allow you to set a debouncing time for the gpio before generating the IRQ. Teach gpiolib about that. Mark said: : This would be generally useful for embedded systems, especially where : the interrupt concerned is a wake source. It allows drivers to avoid : spurious interrupts from noisy sources so if the hardware supports it : the driver can avoid having to explicitly wait for the signal to become : stable and software has to cope with fewer events. We've lived without : it for quite some time, though. David said: : I looked at adding debounce support to the generic GPIO calls (and thus : gpiolib) some time back, but decided against it. I forget why at this : time (check list archives) but it wasn't because of lack of utility in : certain contexts. : : One thing to watch out for is just how variable the hardware capabilities : are. Atmel GPIOs have something like a fixed number of 32K clock cycles : for debounce, twl4030 had something odd, OMAPs were more like the Atmel : chips but with a different clock. In some cases debouncing had to be : ganged, not per-GPIO. And so forth. Signed-off-by: Felipe Balbi <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: David Brownell <[email protected]> Reviewed-by: Mark Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27gpiolib: document that names can contain printk format specifiersUwe Kleine-König1-1/+3
Signed-off-by: Uwe Kleine-König <[email protected]> Cc: David Brownell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27gpiolib: make names array and its values constUwe Kleine-König2-2/+2
gpiolib doesn't need to modify the names and I assume most initializers use string constants that shouldn't be modified anyhow. [[email protected]: fix drivers/gpio/cs5535-gpio.c] Signed-off-by: Uwe Kleine-König <[email protected]> Cc: Kevin Wells <[email protected]> Cc: David Brownell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27gpio: add interrupt handling capability to max732xMarc Zyngier1-0/+3
Most of the GPIO expanders supported by the max732x driver have interrupt generation capability by reporting changes on input pins through an INT# pin. This patch implements the irq_chip functionnality (edge detection only). Signed-off-by: Marc Zyngier <[email protected]> Cc: Eric Miao <[email protected]> Cc: Jebediah Huang <[email protected]> Cc: David Brownell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27sdhci-spear: ST SPEAr based SDHCI controller glueViresh KUMAR1-0/+42
Add a glue layer to support the sdhci driver on the ST SPEAr platform. Signed-off-by: Viresh Kumar <[email protected]> Cc: <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Russell King <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27sdio: add new function for RAW (Read after Write) operationGrazvydas Ignotas1-0/+3
SDIO specification allows RAW (Read after Write) operation using IO_RW_DIRECT command (CMD52) by setting the RAW bit. This operation is similar to ordinary read/write commands, except that both write and read are performed using single command/response pair. The Linux SDIO layer already supports this internaly, only external function is missing for drivers to make use, which is added by this patch. This type of command is required to implement proper power save mode support in wl1251 wifi driver. Android has similar patch for G1 in it's tree for the same reason: http://android.git.kernel.org/?p=kernel/common.git;a=commitdiff;h=74a47786f6ecbe6c1cf9fb15efe6a968451deb52 Signed-off-by: Grazvydas Ignotas <[email protected]> Acked-by: Kalle Valo <[email protected]> Cc: Dmitry Shmidt <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27mmc: remove the "state" argument to mmc_suspend_host()Matt Fleming1-1/+1
Even though many mmc host drivers pass a pm_message_t argument to mmc_suspend_host() that argument isn't used the by MMC core. As host drivers are converted to dev_pm_ops they'll have to construct pm_message_t's (as they won't be passed by the PM subsystem any more) just to appease the mmc suspend interface. We might as well just delete the unused paramter. Signed-off-by: Matt Fleming <[email protected]> Acked-by: Anton Vorontsov <[email protected]> Acked-by: Michal Miroslaw <[email protected]>ZZ Acked-by: Sascha Sommer <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27mmc: add support MMCIF for SuperHYusuke Goda1-0/+39
MMCIF is the MMC Host Interface in SuperH. Signed-off-by: Yusuke Goda <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Magnus Damm <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27sdhci-pltfm: implement platform data passingAnton Vorontsov1-0/+35
This includes platform ops, quirks and (de)initialization callbacks. Signed-off-by: Anton Vorontsov <[email protected]> Cc: Richard Röjfors <[email protected]> Cc: David Vrabel <[email protected]> Cc: Pierre Ossman <[email protected]> Cc: Ben Dooks <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-26Revert "endian: #define __BYTE_ORDER"Linus Torvalds2-6/+0
This reverts commit b3b77c8caef1750ebeea1054e39e358550ea9f55, which was also totally broken (see commit 0d2daf5cc858 that reverted the crc32 version of it). As reported by Stephen Rothwell, it causes problems on big-endian machines: > In file included from fs/jfs/jfs_types.h:33, > from fs/jfs/jfs_incore.h:26, > from fs/jfs/file.c:22: > fs/jfs/endian24.h:36:101: warning: "__LITTLE_ENDIAN" is not defined The kernel has never had that crazy "__BYTE_ORDER == __LITTLE_ENDIAN" model. It's not how we do things, and it isn't how we _should_ do things. So don't go there. Requested-by: Stephen Rothwell <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds8-10/+113
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits) drivers/net/usb/asix.c: Fix pointer cast. be2net: Bug fix to avoid disabling bottom half during firmware upgrade. proc_dointvec: write a single value hso: add support for new products Phonet: fix potential use-after-free in pep_sock_close() ath9k: remove VEOL support for ad-hoc ath9k: change beacon allocation to prefer the first beacon slot sock.h: fix kernel-doc warning cls_cgroup: Fix build error when built-in macvlan: do proper cleanup in macvlan_common_newlink() V2 be2net: Bug fix in init code in probe net/dccp: expansion of error code size ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep wireless: fix sta_info.h kernel-doc warnings wireless: fix mac80211.h kernel-doc warnings iwlwifi: testing the wrong variable in iwl_add_bssid_station() ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs() ath9k_htc: dereferencing before check in hif_usb_tx_cb() rt2x00: Fix rt2800usb TX descriptor writing. rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions. ...
2010-05-25driver core: add devname module aliases to allow module on-demand auto-loadingKay Sievers1-0/+2
This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <[email protected]> Cc: David S. Miller <[email protected]> Cc: Miklos Szeredi <[email protected]> Cc: Chris Mason <[email protected]> Cc: Alasdair G Kergon <[email protected]> Cc: Tigran Aivazian <[email protected]> Cc: Ian Kent <[email protected]> Signed-Off-By: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-05-25Merge branch 'master' of ↵David S. Miller1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2010-05-25Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds1-0/+3
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (103 commits) ARM: 6141/1: Add audio support part in arch/arm/mach-w90x900 ARM: 5939/1: ARM: Add option CMDLINE_FORCE to force usage of the in-kernel cmdline ARM: 6140/1: silence a bogus sparse warning in unwind.c ARM: mach-at91: duplicated include ARM: arch/arm/nwfpe/fpsr.h: Checkpatch cleanup ARM: arch/arm/mach-shark/pci.c: Checkpatch cleanup ARM: arch/arm/nwfpe/ChangeLog: Checkpatch cleanup ARM: arch/arm/mach-sa1100/leds.c: Checkpatch cleanup ARM: arch/arm/mach-h720x/common.h: Checkpatch cleanup ARM: arch/arm/mach-footbridge/ebsa285-pci.c: Checkpatch cleanup ARM: arch/arm/mach-clps711x/Makefile.boot: Checkpatch cleanup ARM: arch/arm/boot/bootp/bootp.lds: Checkpatch cleanup ARM: SPEAR6xx: remove duplicated #include ARM: s3c6400_defconfig: Add NAND driver ARM: s3c6400_defconfig: enable sound as modules ARM: s3c6400_defconfig: enable power management ARM: s5pv210_defconfig: Update s5pv210_defconfig to v2.6.34 ARM: s5pc110_defconfig: Update s5pc110_defconfig to v2.6.34 ARM: s5p6442_defconfig: Update s5p6442_defconfig to v2.6.34 ARM: s5p6440_defconfig: Update s5p6440_defconfig to v2.6.34 ...
2010-05-25Merge branch 'for-linus' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/nes: Fix incorrect unlock in nes_process_mac_intr() RDMA/nes: Async event for closed QP causes crash RDMA/nes: Have ethtool read hardware registers for rx/tx stats RDMA/cxgb4: Only insert sq qid in lookup table RDMA/cxgb4: Support IB_WR_READ_WITH_INV opcode RDMA/cxgb4: Set fence flag for inv-local-stag work requests RDMA/cxgb4: Update some HW limits RDMA/cxgb4: Don't limit fastreg page list depth RDMA/cxgb4: Return proper errors in fastreg mr/pbl allocation RDMA/cxgb4: Fix overflow bug in CQ arm RDMA/cxgb4: Optimize CQ overflow detection RDMA/cxgb4: CQ size must be IQ size - 2 RDMA/cxgb4: Register RDMA provider based on LLD state_change events RDMA/cxgb4: Detach from the LLD after unregistering RDMA device IB/ipath: Remove support for QLogic PCIe QLE devices IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters IB/mad: Make needlessly global mad_sendq_size/mad_recvq_size static IB/core: Allow device-specific per-port sysfs files mlx4_core: Clean up mlx4_alloc_icm() a bit mlx4_core: Fix possible chunk sg list overflow in mlx4_alloc_icm()
2010-05-25Merge branch 'next-spi' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds2-102/+31
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6: spi/xilinx: Fix compile error spi/davinci: Fix clock prescale factor computation spi: move bitbang txrx utility functions to private header spi/mpc5121: Add SPI master driver for MPC5121 PSC powerpc/mpc5121: move PSC FIFO memory init to platform code spi/ep93xx: implemented driver for Cirrus EP93xx SPI controller Documentation/spi/* compile warning fix spi/omap2_mcspi: Check params before dereference or use spi/omap2_mcspi: add turbo mode support spi/omap2_mcspi: change default DMA_MIN_BYTES value to 160 spi/pl022: fix stop queue procedure spi/pl022: add support for the PL023 derivate spi/pl022: fix up differences between ARM and ST versions spi/spi_mpc8xxx: Do not use map_tx_dma to unmap rx_dma spi/spi_mpc8xxx: Fix QE mode Litte Endian spi/spi_mpc8xxx: fix potential memory corruption.
2010-05-25Merge branch 'for-linus' of ↵Linus Torvalds2-3/+9
git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: regulator: return set_mode is same mode is requested Regulators: ab3100/bq24022: add a missing .owner field in regulator_desc twl6030: regulator: Remove vsel tables and use formula for calculation mc13783-regulator: fix vaild voltage range checking for mc13783_fixed_regulator_set_voltage regulator: use voltage number array in 88pm860x regulator: make 88pm860x sharing one driver structure regulator: simplify regulator_register() error handling regulator: fix unset_regulator_supplies() to remove all matches regulator: prevent registration of matching regulator consumer supplies regulator: Allow regulator-regulator supplies to be specified by name
2010-05-25fbdev: move FBIO_WAITFORVSYNC to linux/fb.hGrazvydas Ignotas4-6/+2
FBIO_WAITFORVSYNC is currently implemented by matroxfb, atyfb, intelfb and more. All of them keep redefining the same FBIO_WAITFORVSYNC macro over and over again, so move it to linux/fb.h and clean up those duplicate defines. Signed-off-by: Grazvydas Ignotas <[email protected]> Cc: Ville Syrjala <[email protected]> Cc: Grant Likely <[email protected]> Cc: Maik Broemme <[email protected]> Cc: Petr Vandrovec <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Krzysztof Helt <[email protected]> Cc: "Hiremath, Vaibhav" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25fbdev: da8xx/omap-l1xx: implement double bufferingMartin Ambrose1-0/+1
This work includes the following: - Implement handler for FBIO_WAITFORVSYNC ioctl. - Allocate the data and palette buffers separately. A consequence of this is that the palette and data loading is now done in different phases. And that the LCD must be disabled temporarily after the palette is loaded but this will only happen once after init and each time the palette is changed. I think this is OK. - Allocate two (ping and pong) framebuffers from memory. - Add pan_display handler which toggles the LCDC DMA registers between the ping and pong buffers. Signed-off-by: Martin Ambrose <[email protected]> Cc: Chaithrika U S <[email protected]> Cc: Sudhakar Rajashekhara <[email protected]> Cc: Krzysztof Helt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25lis3: interrupt handlers for 8bit wakeup and click eventsSamu Onkalo1-0/+2
Content for the 8bit device threaded interrupt handlers. Depending on the interrupt line and chip configuration, either click or wakeup / freefall handler is called. In case of click, BTN_ event is sent via input device. In case of wakeup or freefall, input device ABS_ events are updated immediatelly. It is still possible to configure interrupt line 1 for fast freefall detection and use the second line either for click or threshold based interrupts. Or both lines can be used for click / threshold interrupts. Polled input device can be set to stopped state and still get coordinate updates via input device using interrupt based method. Polled mode and interrupt mode can also be used parallel. BTN_ events are remapped based on existing axis remapping information. Signed-off-by: Samu Onkalo <[email protected]> Acked-by: Eric Piel <[email protected]> Cc: Daniel Mack <[email protected]> Cc: Pavel Machek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>