aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2010-05-27ptrace: PTRACE_GETFDPIC: fix the unsafe usage of child->mmOleg Nesterov1-2/+8
Now that Mike Frysinger unified the FDPIC ptrace code, we can fix the unsafe usage of child->mm in ptrace_request(PTRACE_GETFDPIC). We have the reference to task_struct, and ptrace_check_attach() verified the tracee is stopped. But nothing can protect from SIGKILL after that, we must not assume child->mm != NULL. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Mike Frysinger <[email protected]> Acked-by: David Howells <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Greg Ungerer <[email protected]> Acked-by: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27ptrace: unify FDPIC implementationsMike Frysinger1-0/+20
The Blackfin/FRV/SuperH guys all have the same exact FDPIC ptrace code in their arch handlers (since they were probably copied & pasted). Since these ptrace interfaces are an arch independent aspect of the FDPIC code, unify them in the common ptrace code so new FDPIC ports don't need to copy and paste this fundamental stuff yet again. Signed-off-by: Mike Frysinger <[email protected]> Acked-by: Roland McGrath <[email protected]> Acked-by: David Howells <[email protected]> Acked-by: Paul Mundt <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27cpusets: randomize node rotor used in cpuset_mem_spread_node()Jack Steiner1-0/+4
Some workloads that create a large number of small files tend to assign too many pages to node 0 (multi-node systems). Part of the reason is that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at node 0 for newly created tasks. This patch changes the rotor to be initialized to a random node number of the cpuset. [[email protected]: fix layout] [[email protected]: Define stub numa_random() for !NUMA configuration] Signed-off-by: Jack Steiner <[email protected]> Signed-off-by: Lee Schermerhorn <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Paul Menage <[email protected]> Cc: Jack Steiner <[email protected]> Cc: Robin Holt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27cpusets: new round-robin rotor for SLAB allocationsJack Steiner1-4/+16
We have observed several workloads running on multi-node systems where memory is assigned unevenly across the nodes in the system. There are numerous reasons for this but one is the round-robin rotor in cpuset_mem_spread_node(). For example, a simple test that writes a multi-page file will allocate pages on nodes 0 2 4 6 ... Odd nodes are skipped. (Sometimes it allocates on odd nodes & skips even nodes). An example is shown below. The program "lfile" writes a file consisting of 10 pages. The program then mmaps the file & uses get_mempolicy(..., MPOL_F_NODE) to determine the nodes where the file pages were allocated. The output is shown below: # ./lfile allocated on nodes: 2 4 6 0 1 2 6 0 2 There is a single rotor that is used for allocating both file pages & slab pages. Writing the file allocates both a data page & a slab page (buffer_head). This advances the RR rotor 2 nodes for each page allocated. A quick confirmation seems to confirm this is the cause of the uneven allocation: # echo 0 >/dev/cpuset/memory_spread_slab # ./lfile allocated on nodes: 6 7 8 9 0 1 2 3 4 5 This patch introduces a second rotor that is used for slab allocations. Signed-off-by: Jack Steiner <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Paul Menage <[email protected]> Cc: Jack Steiner <[email protected]> Cc: Robin Holt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-27cgroups: make cftype.unregister_event() void-returningKirill A. Shutemov1-1/+0
Since we are unable to handle an error returned by cftype.unregister_event() properly, let's make the callback void-returning. mem_cgroup_unregister_event() has been rewritten to be a "never fail" function. On mem_cgroup_usage_register_event() we save old buffer for thresholds array and reuse it in mem_cgroup_usage_unregister_event() to avoid allocation. Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Phil Carmody <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Paul Menage <[email protected]> Cc: Li Zefan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-26hrtimer: Avoid double seqlockStanislaw Gruszka1-1/+1
hrtimer_get_softirq_time() has it's own xtime lock protection, so it's safe to use plain __current_kernel_time() and avoid the double seqlock loop. Signed-off-by: Stanislaw Gruszka <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2010-05-26timers: Move local variable into else sectionThomas Gleixner1-2/+3
Fix nit-picking coding style detail. Signed-off-by: Thomas Gleixner <[email protected]>
2010-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds1-1/+3
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits) drivers/net/usb/asix.c: Fix pointer cast. be2net: Bug fix to avoid disabling bottom half during firmware upgrade. proc_dointvec: write a single value hso: add support for new products Phonet: fix potential use-after-free in pep_sock_close() ath9k: remove VEOL support for ad-hoc ath9k: change beacon allocation to prefer the first beacon slot sock.h: fix kernel-doc warning cls_cgroup: Fix build error when built-in macvlan: do proper cleanup in macvlan_common_newlink() V2 be2net: Bug fix in init code in probe net/dccp: expansion of error code size ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep wireless: fix sta_info.h kernel-doc warnings wireless: fix mac80211.h kernel-doc warnings iwlwifi: testing the wrong variable in iwl_add_bssid_station() ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs() ath9k_htc: dereferencing before check in hif_usb_tx_cb() rt2x00: Fix rt2800usb TX descriptor writing. rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions. ...
2010-05-25Revert "module: drop the lock while waiting for module to complete ↵Linus Torvalds1-37/+22
initialization." This reverts commit 480b02df3aa9f07d1c7df0cd8be7a5ca73893455, since Rafael reports that it causes occasional kernel paging request faults in load_module(). Dropping the module lock and re-taking it deep in the call-chain is definitely not the right thing to do. That just turns the mutex from a lock into a "random non-locking data structure" that doesn't actually protect what it's supposed to protect. Requested-and-tested-by: Rafael J. Wysocki <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Brandon Philips <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25proc_dointvec: write a single valueJ. R. Okajima1-1/+3
The commit 00b7c3395aec3df43de5bd02a3c5a099ca51169f "sysctl: refactor integer handling proc code" modified the behaviour of writing to /proc. Before the commit, write("1\n") to /proc/sys/kernel/printk succeeded. But now it returns EINVAL. This commit supports writing a single value to a multi-valued entry. Signed-off-by: J. R. Okajima <[email protected]> Reviewed-and-tested-by: WANG Cong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-05-25timers: Fix slack calculation reallyThomas Gleixner1-5/+8
commit f00e047ef (timers: Fix slack calculation for expired timers) fixed the issue of slack on expired timers only partially. Linus noticed that jiffies is volatile so it is reloaded twice, which generates bad code. But its worse. This can defeat the time_after() check if jiffies are incremented between time_after() and the slack calculation. Fix it by reading jiffies into a local variable, which prevents the compiler from loading it twice. While at it make the > -1 check into >= 0 which is easier to read. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Linus Torvalds <[email protected]>
2010-05-25ring-buffer: Move zeroing out excess in page to ring buffer codeSteven Rostedt2-8/+9
Currently the trace splice code zeros out the excess bytes in the page before sending it off to userspace. This is to make sure userspace is not getting anything it should not be when reading the pages, because the excess data was never initialized to zero before writing (for perfomance reasons). But the splice code has no business in doing this work, it should be done by the ring buffer. With the latest changes for recording lost events, the splice code gets it wrong anyway. Move the zeroing out of excess bytes into the ring buffer code. Signed-off-by: Steven Rostedt <[email protected]>
2010-05-25ring-buffer: Reset "real_end" when page is filledSteven Rostedt1-0/+8
The code to store the "lost events" requires knowing the real end of the page. Since the 'commit' includes the padding at the end of a page a "real_end" variable was used to keep track of the end not including the padding. If events were lost, the reader can place the count of events in the padded area if there is enough room. The bug this patch fixes is that when we fill the page we do not reset the real_end variable, and if the writer had wrapped a few times, the real_end would be incorrect. This patch simply resets the real_end if the page was filled. Signed-off-by: Steven Rostedt <[email protected]>
2010-05-25sysctl: don't use own implementation of hex_to_bin()Andy Shevchenko1-6/+3
Remove own implementation of hex_to_bin(). Signed-off-by: Andy Shevchenko <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25module: remove duplicate declaration of __ksymtab_gpl_futureWenji Huang1-2/+0
Minor cleanup on duplicate __{start/stop}__ksymtab_gpl_future. Signed-off-by: Wenji Huang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mem-hotplug: fix potential race while building zonelist for new populated zoneHaicheng Li1-1/+4
Add global mutex zonelists_mutex to fix the possible race: CPU0 CPU1 CPU2 (1) zone->present_pages += online_pages; (2) build_all_zonelists(); (3) alloc_page(); (4) free_page(); (5) build_all_zonelists(); (6) __build_all_zonelists(); (7) zone->pageset = alloc_percpu(); In step (3,4), zone->pageset still points to boot_pageset, so bad things may happen if 2+ nodes are in this state. Even if only 1 node is accessing the boot_pageset, (3) may still consume too much memory to fail the memory allocations in step (7). Besides, atomic operation ensures alloc_percpu() in step (7) will never fail since there is a new fresh memory block added in step(6). [[email protected]: hold zonelists_mutex when build_all_zonelists] Signed-off-by: Haicheng Li <[email protected]> Signed-off-by: Wu Fengguang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mem-hotplug: avoid multiple zones sharing same boot strapping boot_pagesetHaicheng Li1-1/+1
For each new populated zone of hotadded node, need to update its pagesets with dynamically allocated per_cpu_pageset struct for all possible CPUs: 1) Detach zone->pageset from the shared boot_pageset at end of __build_all_zonelists(). 2) Use mutex to protect zone->pageset when it's still shared in onlined_pages() Otherwises, multiple zones of different nodes would share same boot strapping boot_pageset for same CPU, which will finally cause below kernel panic: ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:1239! invalid opcode: 0000 [#1] SMP ... Call Trace: [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0 [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0 [<ffffffff81128407>] __page_cache_alloc+0x67/0x70 [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260 [<ffffffff81132751>] ra_submit+0x21/0x30 [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0 [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0 [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670 [<ffffffff81266cfa>] nfs_file_read+0xca/0x130 [<ffffffff8117b20a>] do_sync_read+0xfa/0x140 [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0 [<ffffffff8117c151>] sys_read+0x51/0x80 [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b RIP [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900 RSP <ffff88000d1e78a8> ---[ end trace 4bda28328b9990db ] [[email protected]: merge fix] Signed-off-by: Haicheng Li <[email protected]> Signed-off-by: Wu Fengguang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25cpu/mem hotplug: enable CPUs online before local memory onlineminskey guo1-0/+25
Enable users to online CPUs even if the CPUs belongs to a numa node which doesn't have onlined local memory. The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either in system boot/init period, or at the time of local memory online. For a numa node without onlined local memory, its zonelists are not initialized at present. As a result, any memory allocation operations executed by CPUs within this node will fail. In fact, an out-of-memory error is triggered when attempt to online CPUs before memory comes to online. This patch tries to create zonelists for such numa nodes, so that the memory allocation for this node can be fallback'ed to other nodes. [[email protected]: remove unneeded export] [[email protected]: coding-style fixes] Signed-off-by: minskey guo<[email protected]> Cc: Minchan Kim <[email protected]> Cc: Yasunori Goto <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mm: compaction: add a tunable that decides when memory should be compacted ↵Mel Gorman1-0/+15
and when it should be reclaimed The kernel applies some heuristics when deciding if memory should be compacted or reclaimed to satisfy a high-order allocation. One of these is based on the fragmentation. If the index is below 500, memory will not be compacted. This choice is arbitrary and not based on data. To help optimise the system and set a sensible default for this value, this patch adds a sysctl extfrag_threshold. The kernel will only compact memory if the fragmentation index is above the extfrag_threshold. [[email protected]: Fix build errors when proc fs is not configured] Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mm: compaction: add /proc trigger for memory compactionMel Gorman1-0/+10
Add a proc file /proc/sys/vm/compact_memory. When an arbitrary value is written to the file, all zones are compacted. The expected user of such a trigger is a job scheduler that prepares the system before the target application runs. Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25cpuset,mm: fix no node to alloc memory when changing cpuset's memsMiao Xie2-8/+52
Before applying this patch, cpuset updates task->mems_allowed and mempolicy by setting all new bits in the nodemask first, and clearing all old unallowed bits later. But in the way, the allocator may find that there is no node to alloc memory. The reason is that cpuset rebinds the task's mempolicy, it cleans the nodes which the allocater can alloc pages on, for example: (mpol: mempolicy) task1 task1's mpol task2 alloc page 1 alloc on node0? NO 1 1 change mems from 1 to 0 1 rebind task1's mpol 0-1 set new bits 0 clear disallowed bits alloc on node1? NO 0 ... can't alloc page goto oom This patch fixes this problem by expanding the nodes range first(set newly allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a variable to tell the write-side task that read-side task is reading nodemask, and the write-side task clears newly disallowed nodes after read-side task ends the current memory allocation. [[email protected]: fix spello] Signed-off-by: Miao Xie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Paul Menage <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ravikiran Thirumalai <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mempolicy: restructure rebinding-mempolicy functionsMiao Xie1-2/+2
Nick Piggin reported that the allocator may see an empty nodemask when changing cpuset's mems[1]. It happens only on the kernel that do not do atomic nodemask_t stores. (MAX_NUMNODES > BITS_PER_LONG) But I found that there is also a problem on the kernel that can do atomic nodemask_t stores. The problem is that the allocator can't find a node to alloc page when changing cpuset's mems though there is a lot of free memory. The reason is like this: (mpol: mempolicy) task1 task1's mpol task2 alloc page 1 alloc on node0? NO 1 1 change mems from 1 to 0 1 rebind task1's mpol 0-1 set new bits 0 clear disallowed bits alloc on node1? NO 0 ... can't alloc page goto oom I can use the attached program reproduce it by the following step: # mkdir /dev/cpuset # mount -t cpuset cpuset /dev/cpuset # mkdir /dev/cpuset/1 # echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus # echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems # echo $$ > /dev/cpuset/1/tasks # numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> & <nr_tasks> = max(nr_cpus - 1, 1) # killall -s SIGUSR1 cpuset_mem_hog # ./change_mems.sh several hours later, oom will happen though there is a lot of free memory. This patchset fixes this problem by expanding the nodes range first(set newly allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a variable to tell the write-side task that read-side task is reading nodemask, and the write-side task clears newly disallowed nodes after read-side task ends the current memory allocation. This patch: In order to fix no node to alloc memory, when we want to update mempolicy and mems_allowed, we expand the set of nodes first (set all the newly nodes) and shrink the set of nodes lazily(clean disallowed nodes), But the mempolicy's rebind functions may breaks the expanding. So we restructure the mempolicy's rebind functions and split the rebind work to two steps, just like the update of cpuset's mems: The 1st step: expand the set of the mempolicy's nodes. The 2nd step: shrink the set of the mempolicy's nodes. It is used when there is no real lock to protect the mempolicy in the read-side. Otherwise we can do rebind work at once. In order to implement it, we define enum mpol_rebind_step { MPOL_REBIND_ONCE, MPOL_REBIND_STEP1, MPOL_REBIND_STEP2, MPOL_REBIND_NSTEP, }; If the mempolicy needn't be updated by two steps, we can pass MPOL_REBIND_ONCE to the rebind functions. Or we can pass MPOL_REBIND_STEP1 to do the first step of the rebind work and pass MPOL_REBIND_STEP2 to do the second step work. Besides that, it maybe long time between these two step and we have to release the lock that protects mempolicy and mems_allowed. If we hold the lock once again, we must check whether the current mempolicy is under the rebinding (the first step has been done) or not, because the task may alloc a new mempolicy when we don't hold the lock. So we defined the following flag to identify it: #define MPOL_F_REBINDING (1 << 2) The new functions will be used in the next patch. Signed-off-by: Miao Xie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Paul Menage <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ravikiran Thirumalai <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25perf, trace: Fix !x86 build bugPeter Zijlstra1-1/+3
Patch b7e2ecef92 (perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction) made the unfortunate mistake of assuming the world is x86 only, correct this. The problem was that perf_fetch_caller_regs() did local_save_flags() into regs->flags, and I re-used that to remove another local_save_flags(), forgetting !x86 doesn't have regs->flags. Do the reverse, remove the local_save_flags() from perf_fetch_caller_regs() and let the ftrace site do the local_save_flags() instead. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Paul Mackerras <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <1274778175.5882.623.camel@twins> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-24timers: Fix slack calculation for expired timersJeff Chua1-3/+4
commit 3bbb9ec946 (timers: Introduce the concept of timer slack for legacy timers) does not take the case into account when the timer is already expired. This broke wireless drivers. The solution is not to apply slack to already expired timers. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arjan van de Ven <[email protected]>
2010-05-24timekeeping: Fix timezone updateThomas Gleixner1-4/+4
commit 64ce4c2f (time: Clean up warp_clock()) breaks the timezone update in a very subtle way. To avoid the direct access to timekeeping internals it adds the timezone delta to the current time with timespec_add_safe(). This works nicely when the timezone delta is > 0. If timezone delta is < 0 then the wrap check in timespec_add_safe() triggers and timespec_add_safe() returns TIME_MAX and screws up timekeeping completely. The comment above timespec_add_safe() says: It's assumed that both values are valid (>= 0) Add the timezone seconds adjustment directly. Reported-by: Rafael J. Wysocki <[email protected]> Tested-by: Rafael J. Wysocki <[email protected]> Acked-by: John Stultz <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2010-05-21Merge branch 'linux-next' of ↵Linus Torvalds1-1/+15
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (36 commits) PCI: hotplug: pciehp: Removed check for hotplug of display devices PCI: read memory ranges out of Broadcom CNB20LE host bridge PCI: Allow manual resource allocation for PCI hotplug bridges x86/PCI: make ACPI MCFG reserved error messages ACPI specific PCI hotplug: Use kmemdup PM/PCI: Update PCI power management documentation PCI: output FW warning in pci_read/write_vpd PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments PCI quirks: disable msi on AMD rs4xx internal gfx bridges PCI: Disable MSI for MCP55 on P5N32-E SLI x86/PCI: irq and pci_ids patch for additional Intel Cougar Point DeviceIDs PCI: aerdrv: trivial cleanup for aerdrv_core.c PCI: aerdrv: trivial cleanup for aerdrv.c PCI: aerdrv: introduce default_downstream_reset_link PCI: aerdrv: rework find_aer_service PCI: aerdrv: remove is_downstream PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS PCI: aerdrv: redefine PCI_ERR_ROOT_*_SRC PCI: aerdrv: rework do_recovery PCI: aerdrv: rework get_e_source() ...
2010-05-21Merge git://git.infradead.org/iommu-2.6Linus Torvalds1-4/+22
* git://git.infradead.org/iommu-2.6: intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables intel-iommu: Combine the BIOS DMAR table warning messages panic: Add taint flag TAINT_FIRMWARE_WORKAROUND ('I') panic: Allow warnings to set different taint flags intel-iommu: intel_iommu_map_range failed at very end of address space intel-iommu: errors with smaller iommu widths intel-iommu: Fix boot inside 64bit virtualbox with io-apic disabled intel-iommu: use physfn to search drhd for VF intel-iommu: Print out iommu seq_id intel-iommu: Don't complain that ACPI_DMAR_SCOPE_TYPE_IOAPIC is not supported intel-iommu: Avoid global flushes with caching mode. intel-iommu: Use correct domain ID when caching mode is enabled intel-iommu mistakenly uses offset_pfn when caching mode is enabled intel-iommu: use for_each_set_bit() intel-iommu: Fix section mismatch dmar_ir_support() uses dmar_tbl.
2010-05-21Merge branch 'modules' of ↵Linus Torvalds1-22/+37
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * 'modules' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: module: drop the lock while waiting for module to complete initialization. MODULE_DEVICE_TABLE(isapnp, ...) does nothing hisax_fcpcipnp: fix broken isapnp device table. isapnp: move definitions to mod_devicetable.h so file2alias can reach them.
2010-05-21Merge branch 'for-2.6.35' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds4-29/+56
* 'for-2.6.35' of git://git.kernel.dk/linux-2.6-block: (86 commits) pipe: set lower and upper limit on max pages in the pipe page array pipe: add support for shrinking and growing pipes drbd: This is now equivalent to drbd release 8.3.8rc1 drbd: Do not free p_uuid early, this is done in the exit code of the receiver drbd: Null pointer deref fix to the large "multi bio rewrite" drbd: Fix: Do not detach, if a bio with a barrier fails drbd: Ensure to not trigger late-new-UUID creation multiple times drbd: Do not Oops when C_STANDALONE when uuid gets generated writeback: fix mixed up arguments to bdi_start_writeback() writeback: fix problem with !CONFIG_BLOCK compilation block: improve automatic native capacity unlocking block: use struct parsed_partitions *state universally in partition check code block,ide: simplify bdops->set_capacity() to ->unlock_native_capacity() block: restart partition scan after resizing a device buffer: make invalidate_bdev() drain all percpu LRU add caches block: remove all rcu head initializations writeback: fixups for !dirty_writeback_centisecs writeback: bdi_writeback_task() must set task state before calling schedule() writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync drivers/block/drbd: Use kzalloc ...
2010-05-21sysctl: fix kernel-doc notation and typosRandy Dunlap1-19/+19
Fix kernel-doc warnings, kernel-doc special characters, and typos in recent kernel/sysctl.c additions. Signed-off-by: Randy Dunlap <[email protected]> Cc: Amerigo Wang <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-54/+131
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits) random: simplify fips mode crypto: authenc - Fix cryptlen calculation crypto: talitos - add support for sha224 crypto: talitos - add hash algorithms crypto: talitos - second prepare step for adding ahash algorithms crypto: talitos - prepare for adding ahash algorithms crypto: n2 - Add Niagara2 crypto driver crypto: skcipher - Add ablkcipher_walk interfaces crypto: testmgr - Add testing for async hashing and update/final crypto: tcrypt - Add speed tests for async hashing crypto: scatterwalk - Fix scatterwalk_done() test crypto: hifn_795x - Rename ablkcipher_walk to hifn_cipher_walk padata: Use get_online_cpus/put_online_cpus in padata_free padata: Add some code comments padata: Flush the padata queues actively padata: Use a timer to handle remaining objects in the reorder queues crypto: shash - Remove usage of CRYPTO_MINALIGN crypto: mv_cesa - Use resource_size crypto: omap - OMAP macros corrected padata: Use get_online_cpus/put_online_cpus ... Fix up conflicts in arch/arm/mach-omap2/devices.c
2010-05-21Merge branch 'master' into for-2.6.35Jens Axboe102-5064/+12638
Conflicts: fs/ext3/fsync.c Signed-off-by: Jens Axboe <[email protected]>
2010-05-21pipe: set lower and upper limit on max pages in the pipe page arrayJens Axboe1-0/+9
We need at least two to guarantee proper POSIX behaviour, so never allow a smaller limit than that. Also expose a /proc/sys/fs/pipe-max-pages sysctl file that allows root to define a sane upper limit. Make it default to 16 times the default size, which is 16 pages. Signed-off-by: Jens Axboe <[email protected]>
2010-05-21pipe: add support for shrinking and growing pipesJens Axboe2-29/+46
This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for growing and shrinking the size of a pipe and adjusts pipe.c and splice.c (and relay and network splice) usage to work with these larger (or smaller) pipes. Signed-off-by: Jens Axboe <[email protected]>
2010-05-21Merge branch 'dbg-early-merge' of ↵Linus Torvalds1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'dbg-early-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: echi-dbgp: Add kernel debugger support for the usb debug port earlyprintk,vga,kdb: Fix \b and \r for earlyprintk=vga with kdb kgdboc: Add ekgdboc for early use of the kernel debugger x86,early dr regs,kgdb: Allow kernel debugger early dr register access x86,kgdb: Implement early hardware breakpoint debugging x86, kgdb, init: Add early and late debug states x86, kgdb: early trap init for early debug
2010-05-21Merge branch 'kdb-merge' of ↵Linus Torvalds22-1767/+8285
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'kdb-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: (25 commits) kdb,debug_core: Allow the debug core to receive a panic notification MAINTAINERS: update kgdb, kdb, and debug_core info debug_core,kdb: Allow the debug core to process a recursive debug entry printk,kdb: capture printk() when in kdb shell kgdboc,kdb: Allow kdb to work on a non open console port kgdb: Add the ability to schedule a breakpoint via a tasklet mips,kgdb: kdb low level trap catch and stack trace powerpc,kgdb: Introduce low level trap catching x86,kgdb: Add low level debug hook kgdb: remove post_primary_code references kgdb,docs: Update the kgdb docs to include kdb kgdboc,keyboard: Keyboard driver for kdb with kgdb kgdb: gdb "monitor" -> kdb passthrough sparc,sunzilog: Add console polling support for sunzilog serial driver sh,sh-sci: Use NO_POLL_CHAR in the SCIF polled console code kgdb,8250,pl011: Return immediately from console poll kgdb: core changes to support kdb kdb: core for kgdb back end (2 of 2) kdb: core for kgdb back end (1 of 2) kgdb,blackfin: Add in kgdb_arch_set_pc for blackfin ...
2010-05-21sysfs: add struct file* to bin_attr callbacksChris Wright2-2/+3
This allows bin_attr->read,write,mmap callbacks to check file specific data (such as inode owner) as part of any privilege validation. Signed-off-by: Chris Wright <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-05-21lockdep: Add novalidate class for dev->mutex conversionPeter Zijlstra1-0/+5
The conversion of device->sem to device->mutex resulted in lockdep warnings. Create a novalidate class for now until the driver folks come up with separate classes. That way we have at least the basic mutex debugging coverage. Add a checkpatch error so the usage is reserved for device->mutex. [ tglx: checkpatch and compile fix for LOCKDEP=n ] Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-05-21kref: remove kref_setNeilBrown1-2/+2
Of the three uses of kref_set in the kernel: One really should be kref_put as the code is letting go of a reference, Two really should be kref_init because the kref is being initialised. This suggests that making kref_set available encourages bad code. So fix the three uses and remove kref_set completely. Signed-off-by: NeilBrown <[email protected]> Acked-by: Mimi Zohar <[email protected]> Acked-by: Serge Hallyn <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-05-21Merge branch 'perf/core' of ↵Steven Rostedt25-489/+762
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/core-7 Conflicts: include/linux/ftrace_event.h include/trace/ftrace.h kernel/trace/trace_event_perf.c kernel/trace/trace_kprobe.c kernel/trace/trace_syscalls.c Signed-off-by: Steven Rostedt <[email protected]>
2010-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds1-0/+16
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (182 commits) [SCSI] aacraid: add an ifdef'd device delete case instead of taking the device offline [SCSI] aacraid: prohibit access to array container space [SCSI] aacraid: add support for handling ATA pass-through commands. [SCSI] aacraid: expose physical devices for models with newer firmware [SCSI] aacraid: respond automatically to volumes added by config tool [SCSI] fcoe: fix fcoe module ref counting [SCSI] libfcoe: FIP Keep-Alive messages for VPorts are sent with incorrect port_id and wwn [SCSI] libfcoe: Fix incorrect MAC address clearing [SCSI] fcoe: fix a circular locking issue with rtnl and sysfs mutex [SCSI] libfc: Move the port_id into lport [SCSI] fcoe: move link speed checking into its own routine [SCSI] libfc: Remove extra pointer check [SCSI] libfc: Remove unused fc_get_host_port_type [SCSI] fcoe: fixes wrong error exit in fcoe_create [SCSI] libfc: set seq_id for incoming sequence [SCSI] qla2xxx: Updates to ISP82xx support. [SCSI] qla2xxx: Optionally disable target reset. [SCSI] qla2xxx: ensure flash operation and host reset via sg_reset are mutually exclusive [SCSI] qla2xxx: Silence bogus warning by gcc for wrap and did. [SCSI] qla2xxx: T10 DIF support added. ...
2010-05-21perf: Optimize perf_tp_event_match()Peter Zijlstra1-1/+4
Since we know tracepoints come from kernel context, avoid conditionals that try and establish that very fact. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Steven Rostedt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-21perf: Remove more code from the fastpathPeter Zijlstra1-16/+4
Sanity checks cost instructions. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Steven Rostedt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-21perf: Optimize the !vmalloc backed bufferPeter Zijlstra1-15/+26
Reduce code and data by using the knowledge that for !PERF_USE_VMALLOC data_order is always 0. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Steven Rostedt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-21perf: Optimize perf_output_copy()Peter Zijlstra1-28/+26
Reduce the clutter in perf_output_copy() by keeping an interator in perf_output_handle. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Steven Rostedt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-21perf: Fix wakeup storm for RO mmap()sPeter Zijlstra1-2/+2
RO mmap()s don't update the tail pointer, so comparing against it for determining the written data size doesn't really do any good. Keep track of when we last did a wakeup, and compare against that. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Steven Rostedt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-21perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffersPeter Zijlstra1-0/+19
Since we want to ensure buffers only have a single writer, we must avoid creating one with multiple. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Steven Rostedt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-21perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to ↵Peter Zijlstra4-113/+128
track events Avoid the swevent hash-table by using per-tracepoint hlists. Also, avoid conditionals on the fast path by ordering with probe unregister so that we should never get on the callback path without the data being there. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Steven Rostedt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-21perf, trace: Optimize tracepoints by removing IRQ-disable from ↵Peter Zijlstra3-56/+37
perf/tracepoint interaction Improves performance. Acked-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Steven Rostedt <[email protected]> LKML-Reference: <1274259525.5605.10352.camel@twins> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6Linus Torvalds1-0/+1
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (229 commits) USB: remove unused usb_buffer_alloc and usb_buffer_free macros usb: musb: update gfp/slab.h includes USB: ftdi_sio: fix legacy SIO-device header USB: kl5usb105: reimplement using generic framework USB: kl5usb105: minor clean ups USB: kl5usb105: fix memory leak USB: io_ti: use kfifo to implement write buffering USB: io_ti: remove unsused private counter USB: ti_usb: use kfifo to implement write buffering USB: ir-usb: fix incorrect write-buffer length USB: aircable: fix incorrect write-buffer length USB: safe_serial: straighten out read processing USB: safe_serial: reimplement read using generic framework USB: safe_serial: reimplement write using generic framework usb-storage: always print quirks USB: usb-storage: trivial debug improvements USB: oti6858: use port write fifo USB: oti6858: use kfifo to implement write buffering USB: cypress_m8: use kfifo to implement write buffering USB: cypress_m8: remove unused drain define ... Fix up conflicts (due to usb_buffer_alloc/free renaming) in drivers/input/tablet/acecad.c drivers/input/tablet/kbtab.c drivers/input/tablet/wacom_sys.c drivers/media/video/gspca/gspca.c sound/usb/usbaudio.c