aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-11-02sysctl: make CONFIG_SYSCTL_SYSCALL default to nWANG Cong2-37/+2
When I tried to send a patch to remove it, Andi told me we still need to keep compabitlies for old libc, so we can't remove this completely. Then just make it default to n and remove the doc from feature-removal-schedule.txt. Signed-off-by: WANG Cong <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02sysctl: add support for poll()Lucas De Marchi5-0/+108
Adding support for poll() in sysctl fs allows userspace to receive notifications of changes in sysctl entries. This adds a infrastructure to allow files in sysctl fs to be pollable and implements it for hostname and domainname. [[email protected]: s/declare/define/ for definitions] Signed-off-by: Lucas De Marchi <[email protected]> Cc: Greg KH <[email protected]> Cc: Kay Sievers <[email protected]> Cc: Al Viro <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02RapidIO: documentation updateAlexandre Bounine1-1/+1
Update rapidio.txt to reflect changes from recent patch. See http://marc.info/?l=linux-kernel&m=131285620113589&w=2 for details. Signed-off-by: Alexandre Bounine <[email protected]> Cc: Liu Gang <[email protected]> Cc: Micha Nelissen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02drivers/net/rionet.c: fix ethernet address macros for LE platformsAlexandre Bounine1-2/+2
Modify Ethernet addess macros to be compatible with BE/LE platforms Signed-off-by: Alexandre Bounine <[email protected]> Cc: Chul Kim <[email protected]> Cc: Kumar Gala <[email protected]> Cc: Matt Porter <[email protected]> Cc: Li Yang <[email protected]> Cc: <[email protected]> [2.6.39+] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02RapidIO: fix potential null deref in rio_setup_device()Alexandre Bounine1-1/+1
The "goto cleanup" path can deference "rswitch" when it is NULL. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Alexandre Bounine <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Kumar Gala <[email protected]> Cc: Matt Porter <[email protected]> Cc: Chul Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02RapidIO: add mport driver for Tsi721 bridgeAlexandre Bounine8-2/+3196
Add RapidIO mport driver for IDT TSI721 PCI Express-to-SRIO bridge device. The driver provides full set of callback functions defined for mport devices in RapidIO subsystem. It also is compatible with current version of RIONET driver (Ethernet over RapidIO messaging services). This patch is applicable to kernel versions starting from 2.6.39. Signed-off-by: Alexandre Bounine <[email protected]> Signed-off-by: Chul Kim <[email protected]> Cc: Kumar Gala <[email protected]> Cc: Matt Porter <[email protected]> Cc: Li Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02arch/powerpc/sysdev/fsl_rio.c: release rapidio port I/O region resource if ↵Liu Gang1-0/+1
port failed to initialize The "struct rio_mport" contains a member of master port I/O memory resource structure "struct resource iores". This resource will be read from device tree and be used for rapidio R/W transaction memory space. Rapidio requests the port I/O memory resource under the root resource "iomem_resource". struct rio_mport *port; port = kzalloc(sizeof(struct rio_mport), GFP_KERNEL); request_resource(&iomem_resource, &port->iores); When port failed to initialize, allocated "rio_mport" structure memory will be freed, and the port I/O memory resource structure pointer "&port->iores" will be invalid. If other requests resource under "iomem_resource", "&port->iores" node may be operated in the child resources list and this will cause the system to crash. So the requested port I/O memory resource should be released before freeing allocated "rio_mport" structure. Signed-off-by: Liu Gang <[email protected]> Acked-by: Alexandre Bounine <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Grant Likely <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02drivers/rapidio/rio-scan.c: use discovered bit to test if enumeration is ↵Liu Gang1-2/+2
complete The discovered bit in PGCCSR register indicates if the device has been discovered by system host. In Rapidio systems, some agent devices can also be master devices. They can issue requests into the system. Signed-off-by: Liu Gang <[email protected]> Acked-by: Alexandre Bounine <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02init: add root=PARTUUID=UUID/PARTNROFF=%d supportWill Drewry1-5/+43
Expand root=PARTUUID=UUID syntax to support selecting a root partition by integer offset from a known, unique partition. This approach provides similar properties to specifying a device and partition number, but using the UUID as the unique path prior to evaluating the offset. For example, root=PARTUUID=99DE9194-FC15-4223-9192-FC243948F88B/PARTNROFF=1 selects the partition with UUID 99DE.. then select the next partition. This change is motivated by a particular usecase in Chromium OS where the bootloader can easily determine what partition it is on (by UUID) but doesn't perform general partition table walking. That said, support for this model provides a direct mechanism for the user to modify the root partition to boot without specifically needing to extract each UUID or update the bootloader explicitly when the root partition UUID is changed (if it is recreated to be larger, for instance). Pinning to a /boot-style partition UUID allows the arbitrary root partition reconfiguration/modifications with slightly less ambiguity than just [dev][partition] and less stringency than the specific root partition UUID. [[email protected]: fix init sections warning] Signed-off-by: Will Drewry <[email protected]> Cc: Kay Sievers <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02include/linux/sem.h: make sysv_sem empty if SYSVIPC is disabledManfred Spraul1-2/+7
For the sysvsem undo, each task struct contains a sysv_sem structure with a pointer to the undo information. This pointer is only necessary if sysvipc is enabled - thus the pointer can be made conditional on CONFIG_SYSVIPC. Signed-off-by: Manfred Spraul <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Mike Galbraith <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02ipc/sem.c: remove private structures from public header fileManfred Spraul2-42/+46
include/linux/sem.h contains several structures that are only used within ipc/sem.c. The patch moves them into ipc/sem.c - there is no need to expose the structures to the whole kernel. No functional changes, only whitespace cleanups and 80-char per line fixes. Signed-off-by: Manfred Spraul <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Mike Galbraith <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02ipc/sem.c: handle spurious wakeupsManfred Spraul1-0/+9
semtimedop() does not handle spurious wakeups, it returns -EINTR to user space. Most other schedule() users would just loop and not return to user space. The patch adds such a loop to semtimedop() Signed-off-by: Manfred Spraul <[email protected]> Reported-by: Peter Zijlstra <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Mike Galbraith <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02ipc/sem.c: fix return code race with semop vs. semop +semctl(IPC_RMID)Manfred Spraul1-1/+0
sys_semtimedop() may return -EIDRM although the semaphore operation completed successfully: thread 1: thread 2: semtimedop(), sleeps semop(): * acquires sem_lock() semtimedop() woken up due to timeout sem_lock() loops * notices that thread 2 could be completed. * performs the operations that thread 2 is sleeping on. * marks the semaphore operation as IN_WAKEUP * drops sem_lock(), does wakeup, sets return code to 0 * thread delayed due to interrupt, whatever * returns to user space * thread still delayed semctl(IPC_RMID) * acquires sem_lock() * ipc_rmid(), ipcp->deleted=1 * drops sem_lock() * thread finally continues - but seem_lock() now fails due to ipcp->deleted == 1 * returns -EIDRM instead of 0 The fix is trivial: Always use the return code in queue.status. In real world, the race probably doesn't matter: If the semaphore array is destroyed, the app is probably not interested if the last operation succeeded or was already cancelled. Signed-off-by: Manfred Spraul <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Mike Galbraith <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02ida: make ida_simple_get/put() IRQ safeTejun Heo1-4/+7
It's often convenient to be able to release resource from IRQ context. Make ida_simple_*() use irqsave/restore spin ops so that they are IRQ safe. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Rusty Russell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02proc: fix races against execve() of /proc/PID/fd**Vasiliy Kulikov1-43/+103
fd* files are restricted to the task's owner, and other users may not get direct access to them. But one may open any of these files and run any setuid program, keeping opened file descriptors. As there are permission checks on open(), but not on readdir() and read(), operations on the kept file descriptors will not be checked. It makes it possible to violate procfs permission model. Reading fdinfo/* may disclosure current fds' position and flags, reading directory contents of fdinfo/ and fd/ may disclosure the number of opened files by the target task. This information is not sensible per se, but it can reveal some private information (like length of a password stored in a file) under certain conditions. Used existing (un)lock_trace functions to check for ptrace_may_access(), but instead of using EPERM return code from it use EACCES to be consistent with existing proc_pid_follow_link()/proc_pid_readlink() return code. If they differ, attacker can guess what fds exist by analyzing stat() return code. Patched handlers: stat() for fd/*, stat() and read() for fdindo/*, readdir() and lookup() for fd/ and fdinfo/. Signed-off-by: Vasiliy Kulikov <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02procfs: report EISDIR when reading sysctl dirs in procPavel Emelyanov1-0/+1
On reading sysctl dirs we should return -EISDIR instead of -EINVAL. Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: Cyrill Gorcunov <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02cpusets: avoid looping when storing to mems_allowed if one node remains setDavid Rientjes1-3/+6
{get,put}_mems_allowed() exist so that general kernel code may locklessly access a task's set of allowable nodes without having the chance that a concurrent write will cause the nodemask to be empty on configurations where MAX_NUMNODES > BITS_PER_LONG. This could incur a significant delay, however, especially in low memory conditions because the page allocator is blocking and reclaim requires get_mems_allowed() itself. It is not atypical to see writes to cpuset.mems take over 2 seconds to complete, for example. In low memory conditions, this is problematic because it's one of the most imporant times to change cpuset.mems in the first place! The only way a task's set of allowable nodes may change is through cpusets by writing to cpuset.mems and when attaching a task to a generic code is not reading the nodemask with get_mems_allowed() at the same time, and then clearing all the old nodes. This prevents the possibility that a reader will see an empty nodemask at the same time the writer is storing a new nodemask. If at least one node remains unchanged, though, it's possible to simply set all new nodes and then clear all the old nodes. Changing a task's nodemask is protected by cgroup_mutex so it's guaranteed that two threads are not changing the same task's nodemask at the same time, so the nodemask is guaranteed to be stored before another thread changes it and determines whether a node remains set or not. Signed-off-by: David Rientjes <[email protected]> Cc: Miao Xie <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Paul Menage <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02mm/page_cgroup.c: quiet sparse noiseH Hartley Sweeten1-1/+1
warning: symbol 'swap_cgroup_ctrl' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <[email protected]> Cc: Paul Menage <[email protected]> Cc: Li Zefan <[email protected]> Acked-by: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02memcg: Fix race condition in memcg_check_events() with this_cpu usageSteven Rostedt1-4/+6
Various code in memcontrol.c () calls this_cpu_read() on the calculations to be done from two different percpu variables, or does an open-coded read-modify-write on a single percpu variable. Disable preemption throughout these operations so that the writes go to the correct palces. [[email protected]: added this_cpu to __this_cpu conversion] Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Steven Rostedt <[email protected]> Cc: Greg Thelen <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02memcg: close race between charge and putbackJohannes Weiner1-1/+20
There is a potential race between a thread charging a page and another thread putting it back to the LRU list: charge: putback: SetPageCgroupUsed SetPageLRU PageLRU && add to memcg LRU PageCgroupUsed && add to memcg LRU The order of setting one flag and checking the other is crucial, otherwise the charge may observe !PageLRU while the putback observes !PageCgroupUsed and the page is not linked to the memcg LRU at all. Global memory pressure may fix this by trying to isolate and putback the page for reclaim, where that putback would link it to the memcg LRU again. Without that, the memory cgroup is undeletable due to a charge whose physical page can not be found and moved out. Signed-off-by: Johannes Weiner <[email protected]> Cc: Ying Han <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02memcg: skip scanning active lists based on individual sizeJohannes Weiner4-41/+25
Reclaim decides to skip scanning an active list when the corresponding inactive list is above a certain size in comparison to leave the assumed working set alone while there are still enough reclaim candidates around. The memcg implementation of comparing those lists instead reports whether the whole memcg is low on the requested type of inactive pages, considering all nodes and zones. This can lead to an oversized active list not being scanned because of the state of the other lists in the memcg, as well as an active list being scanned while its corresponding inactive list has enough pages. Not only is this wrong, it's also a scalability hazard, because the global memory state over all nodes and zones has to be gathered for each memcg and zone scanned. Make these calculations purely based on the size of the two LRU lists that are actually affected by the outcome of the decision. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Ying Han <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02memcg: do not expose uninitialized mem_cgroup_per_node to worldIgor Mammedov1-1/+1
If somebody is touching data too early, it might be easier to diagnose a problem when dereferencing NULL at mem->info.nodeinfo[node] than trying to understand why mem_cgroup_per_zone is [un|partly]initialized. Signed-off-by: Igor Mammedov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02memcg: fix oom schedule_timeout()KAMEZAWA Hiroyuki1-1/+1
Before calling schedule_timeout(), task state should be changed. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02memcg: rename mem variable to memcgRaghavendra K T2-479/+485
The memcg code sometimes uses "struct mem_cgroup *mem" and sometimes uses "struct mem_cgroup *memcg". Rename all mem variables to memcg in source file. Signed-off-by: Raghavendra K T <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02cgroup/kmemleak: Annotate alloc_page() for cgroup allocationsSteven Rostedt1-2/+5
When the cgroup base was allocated with kmalloc, it was necessary to annotate the variable with kmemleak_not_leak(). But because it has recently been changed to be allocated with alloc_page() (which skips kmemleak checks) causes a warning on boot up. I was triggering this output: allocated 8388608 bytes of page_cgroup please try 'cgroup_disable=memory' option if you don't want memory cgroups kmemleak: Trying to color unknown object at 0xf5840000 as Grey Pid: 0, comm: swapper Not tainted 3.0.0-test #12 Call Trace: [<c17e34e6>] ? printk+0x1d/0x1f^M [<c10e2941>] paint_ptr+0x4f/0x78 [<c178ab57>] kmemleak_not_leak+0x58/0x7d [<c108ae9f>] ? __rcu_read_unlock+0x9/0x7d [<c1cdb462>] kmemleak_init+0x19d/0x1e9 [<c1cbf771>] start_kernel+0x346/0x3ec [<c1cbf1b4>] ? loglevel+0x18/0x18 [<c1cbf0aa>] i386_start_kernel+0xaa/0xb0 After a bit of debugging I tracked the object 0xf840000 (and others) down to the cgroup code. The change from allocating base with kmalloc to alloc_page() has the base not calling kmemleak_alloc() which adds the pointer to the object_tree_root, but kmemleak_not_leak() adds it to the crt_early_log[] table. On kmemleak_init(), the entry is found in the early_log[] but not the object_tree_root, and this error message is displayed. If alloc_page() fails then it defaults back to vmalloc() which still uses the kmemleak_alloc() which makes us still need the kmemleak_not_leak() call. The solution is to call the kmemleak_alloc() directly if the alloc_page() succeeds. Reviewed-by: Michal Hocko <[email protected]> Signed-off-by: Steven Rostedt <[email protected]> Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Jonathan Nieder <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02cgroups: don't attach task to subsystem if migration failedBen Blum1-6/+9
If a task has exited to the point it has called cgroup_exit() already, then we can't migrate it to another cgroup anymore. This can happen when we are attaching a task to a new cgroup between the call to ->can_attach_task() on subsystems and the migration that is eventually tried in cgroup_task_migrate(). In this case cgroup_task_migrate() returns -ESRCH and we don't want to attach the task to the subsystems because the attachment to the new cgroup itself failed. Fix this by only calling ->attach_task() on the subsystems if the cgroup migration succeeded. Reported-by: Oleg Nesterov <[email protected]> Signed-off-by: Ben Blum <[email protected]> Acked-by: Paul Menage <[email protected]> Cc: Li Zefan <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02cgroups: more safe tasklist locking in cgroup_attach_procBen Blum1-3/+3
Fix unstable tasklist locking in cgroup_attach_proc. According to this thread - https://lkml.org/lkml/2011/7/27/243 - RCU is not sufficient to guarantee the tasklist is stable w.r.t. de_thread and exit. Taking tasklist_lock for reading, instead of rcu_read_lock, ensures proper exclusion. Signed-off-by: Ben Blum <[email protected]> Acked-by: Paul Menage <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Neil Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02hfs: fix hfs_find_init() sb->ext_tree NULL ptr oopsPhillip Lougher1-5/+15
Clement Lecigne reports a filesystem which causes a kernel oops in hfs_find_init() trying to dereference sb->ext_tree which is NULL. This proves to be because the filesystem has a corrupted MDB extent record, where the extents file does not fit into the first three extents in the file record (the first blocks). In hfs_get_block() when looking up the blocks for the extent file (HFS_EXT_CNID), it fails the first blocks special case, and falls through to the extent code (which ultimately calls hfs_find_init()) which is in the process of being initialised. Hfs avoids this scenario by always having the extents b-tree fitting into the first blocks (the extents B-tree can't have overflow extents). The fix is to check at mount time that the B-tree fits into first blocks, i.e. fail if HFS_I(inode)->alloc_blocks >= HFS_I(inode)->first_blocks Note, the existing commit 47f365eb57573 ("hfs: fix oops on mount with corrupted btree extent records") becomes subsumed into this as a special case, but only for the extents B-tree (HFS_EXT_CNID), it is perfectly acceptable for the catalog B-Tree file to grow beyond three extents, with the remaining extent descriptors in the extents overfow. This fixes CVE-2011-2203 Reported-by: Clement LECIGNE <[email protected]> Signed-off-by: Phillip Lougher <[email protected]> Cc: Jeff Mahoney <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02isofs: add readpages supportNamjae Jeon1-1/+9
Use mpage_readpages() instead of multiple calls to isofs_readpage() to reduce the CPU utilization and make performance higher. Signed-off-by: Namjae Jeon <[email protected]> Cc: Al Viro <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02minix: describe usage of different magic numbersSami Kerola1-5/+5
One can get this information from minix/inode.c, but adding the explanations at the definition sites is more appropriate. Signed-off-by: Sami Kerola <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02drivers/rtc/rtc-mc13xxx.c: move probe and remove callbacks to .init.text and ↵Uwe Kleine-König1-3/+3
.exit.text The driver is added using platform_driver_probe(), so the callbacks can be discarded more aggessively. Signed-off-by: Uwe Kleine-König <[email protected]> Cc: Alessandro Zummo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02rtc: add initial support for mcp7941x partsDavid Anders1-0/+27
Add initial support for the microchip mcp7941x series of real time clocks. The mcp7941x series is generally compatible with the ds1307 and ds1337 rtc devices from dallas semiconductor. minor differences include a backup battery enable bit, and the polarity of the oscillator enable bit. Signed-off-by: David Anders <[email protected]> Cc: Alessandro Zummo <[email protected]> Reviewed-by: Wolfram Sang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02drivers/rtc/class.c: convert idr to ida and use ida_simple_get()Jonathan Cameron1-23/+9
This is the one use of an ida that doesn't retry on receiving -EAGAIN. I'm assuming do so will cause no harm and may help on a rare occasion. Signed-off-by: Jonathan Cameron <[email protected]> Cc: Alessandro Zummo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02init/do_mounts_rd.c: fix ramdisk identification for padded cramfsNeil Armstrong1-0/+14
When a cramfs ramdisk padded with 512 bytes is given to the kernel, the current identify_ramdisk_image function fails to identify it. Tested with a padded cramfs image on an ARM based board. Signed-off-by: Neil Armstrong <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02ramfs: remove module leftoversRichard Weinberger1-10/+0
Since ramfs is hard-selected to "y", the module leftovers make no sense. Signed-off-by: Richard Weinberger <[email protected]> Reviewed-by: WANG Cong <[email protected]> Cc: Al Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02binfmt_elf: fix PIE execution with randomization disabledJiri Kosina1-1/+10
The case of address space randomization being disabled in runtime through randomize_va_space sysctl is not treated properly in load_elf_binary(), resulting in SIGKILL coming at exec() time for certain PIE-linked binaries in case the randomization has been disabled at runtime prior to calling exec(). Handle the randomize_va_space == 0 case the same way as if we were not supporting .text randomization at all. Based on original patch by H.J. Lu and Josh Boyer. Signed-off-by: Jiri Kosina <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Russell King <[email protected]> Cc: H.J. Lu <[email protected]> Cc: <[email protected]> Tested-by: Josh Boyer <[email protected]> Acked-by: Nicolas Pitre <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02thp: share get_huge_page_tail()Andrea Arcangeli5-44/+11
This avoids duplicating the function in every arch gup_fast. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Gibson <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: David Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02sparc: gup_pte_range() support THP based tail recountingAndrea Arcangeli1-0/+13
Up to this point the code assumed old refcounting for hugepages (pre-thp). This updates the code directly to the thp mapcount tail page refcounting. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Gibson <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Acked-by: David Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02s390: gup_huge_pmd() return 0 if pte changesAndrea Arcangeli1-10/+11
s390 didn't return 0 in that case, if it's rolling back the *nr pointer it should also return zero to avoid adding pages to the array at the wrong offset. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Gibson <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: David Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02s390: gup_huge_pmd() support THP tail recountingAndrea Arcangeli1-1/+23
Up to this point the code assumed old refcounting for hugepages (pre-thp). This updates the code directly to the thp mapcount tail page refcounting. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Gibson <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: David Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02powerpc: gup_huge_pmd() return 0 if pte changesAndrea Arcangeli1-10/+11
powerpc didn't return 0 in that case, if it's rolling back the *nr pointer it should also return zero to avoid adding pages to the array at the wrong offset. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Acked-by: David Gibson <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: David Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02powerpc: gup_hugepte() support THP based tail recountingAndrea Arcangeli1-1/+23
Up to this point the code assumed old refcounting for hugepages (pre-thp). This updates the code directly to the thp mapcount tail page refcounting. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Gibson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02powerpc: gup_hugepte() avoid freeing the head page too many timesAndrea Arcangeli1-3/+2
We only taken "refs" pins on the head page not "*nr" pins. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Acked-by: David Gibson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02powerpc: get_hugepte() don't put_page() the wrong pageAndrea Arcangeli1-1/+1
"page" may have changed to point to the next hugepage after the loop completed, The references have been taken on the head page, so the put_page must happen there too. This is a longstanding issue pre-thp inclusion. It's totally unclear how these page_cache_add_speculative and pte_val(pte) != pte_val(*ptep) checks are necessary across all the powerpc gup_fast code, when x86 doesn't need any of that: there's no way the page can be freed with irq disabled so we're guaranteed the atomic_inc will happen on a page with page_count > 0 (so not needing the speculative check). The pte check is also meaningless on x86: no need to rollback on x86 if the pte changed, because the pte can still change a CPU tick after the check succeeded and it won't be rolled back in that case. The important thing is we got a reference on a valid page that was mapped there a CPU tick ago. So not knowing the soft tlb refill code of ppc64 in great detail I'm not removing the "speculative" page_count increase and the pte checks across all the code, but unless there's a strong reason for it they should be later cleaned up too. If a pte can change from huge to non-huge (like it could happen with THP) passing a pte_t *ptep to gup_hugepte() would also require to repeat the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs only so I'm not altering that. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Acked-by: David Gibson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02powerpc: remove superfluous PageTail checks on the pte gup_fastAndrea Arcangeli1-13/+0
This part of gup_fast doesn't seem capable of handling hugetlbfs ptes, those should be handled by gup_hugepd only, so these checks are superfluous. Plus if this wasn't a noop, it would have oopsed because, the insistence of using the speculative refcounting would trigger a VM_BUG_ON if a tail page was encountered in the page_cache_get_speculative(). Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Acked-by: David Gibson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02mm: thp: tail page refcounting fixAndrea Arcangeli8-84/+171
Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: Andrea Arcangeli <[email protected]> Reported-by: Michel Lespinasse <[email protected]> Reviewed-by: Michel Lespinasse <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Gibson <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-11-02Merge git://github.com/rustyrussell/linuxLinus Torvalds12-64/+742
* git://github.com/rustyrussell/linux: virtio-blk: use ida to allocate disk index virtio: Add platform bus driver for memory mapped virtio device virtio: Dont add "config" to list for !per_vq_vector virtio: console: wait for first console port for early console output virtio: console: add port stats for bytes received, sent and discarded virtio: console: make discard_port_data() use get_inbuf() virtio: console: rename variable virtio: console: make get_inbuf() return port->inbuf if present virtio: console: Fix return type for get_inbuf() virtio: console: Use wait_event_freezable instead of _interruptible virtio: console: Ignore port name update request if name already set virtio: console: Fix indentation virtio: modify vring_init and vring_size to take account of the layout containing *_event_idx virtio.h: correct comment for struct virtio_driver virtio-net: Use virtio_config_val() for retrieving config virtio_config: Add virtio_config_val_len() virtio-console: Use virtio_config_val() for retrieving config
2011-11-02[IA64] Wire up cross memory attach syscallsTony Luck2-1/+5
Add sys_process_vm_readv and sys_process_vm_writev to ia64 syscall table. Passes tests at http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz Signed-off-by: Tony Luck <[email protected]>
2011-11-02ALSA: hda - Remove unused variablesTakashi Iwai3-12/+1
Just clean-up what GCC caught. Signed-off-by: Takashi Iwai <[email protected]>
2011-11-02ALSA: hda/realtek - Don't create alt-stream for capture when unnecessaryTakashi Iwai1-2/+6
When the driver finds multiple ADCs, it tries to create an alternative capture PCM stream. However, these secondary ADCs might be useless or in uncontrolled paths in some cases, e.g. when auto-mic or dynamic ADC-switching is enabled. Also, when only a single capture source is available, the multi-streams don't make sense, too. With this patch, the driver checks such condition and skips the alt stream appropriately. Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>