aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2010-06-23rcu: apply RCU protection to wake_affine()Daniel J Blueman1-0/+2
The task_group() function returns a pointer that must be protected by either RCU, the ->alloc_lock, or the cgroup lock (see the rcu_dereference_check() in task_subsys_state(), which is invoked by task_group()). The wake_affine() function currently does none of these, which means that a concurrent update would be within its rights to free the structure returned by task_group(). Because wake_affine() uses this structure only to compute load-balancing heuristics, there is no reason to acquire either of the two locks. Therefore, this commit introduces an RCU read-side critical section that starts before the first call to task_group() and ends after the last use of the "tg" pointer returned from task_group(). Thanks to Li Zefan for pointing out the need to extend the RCU read-side critical section from that proposed by the original patch. Signed-off-by: Daniel J Blueman <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-06-23virtio-pci: disable msi at startupMichael S. Tsirkin2-0/+4
virtio-pci resets the device at startup by writing to the status register, but this does not clear the pci config space, specifically msi enable status which affects register layout. This breaks things like kdump when they try to use e.g. virtio-blk. Fix by forcing msi off at startup. Since pci.c already has a routine to do this, we export and use it instead of duplicating code. Signed-off-by: Michael S. Tsirkin <[email protected]> Tested-by: Vivek Goyal <[email protected]> Acked-by: Jesse Barnes <[email protected]> Cc: [email protected] Signed-off-by: Rusty Russell <[email protected]> Cc: [email protected]
2010-06-23virtio: return ENOMEM on out of memoryMichael S. Tsirkin1-1/+1
add_buf returns ring size on out of memory, this is not what devices expect. Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Amit Shah <[email protected]> Signed-off-by: Rusty Russell <[email protected]> Cc: [email protected] # .34.x
2010-06-23xfs: always use iget in bulkstatChristoph Hellwig6-282/+59
The non-coherent bulkstat versionsthat look directly at the inode buffers causes various problems with performance optimizations that make increased use of just logging inodes. This patch makes bulkstat always use iget, which should be fast enough for normal use with the radix-tree based inode cache introduced a while ago. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2010-06-24xfs: prevent swapext from operating on write-only filesDan Rosenberg1-1/+4
This patch prevents user "foo" from using the SWAPEXT ioctl to swap a write-only file owned by user "bar" into a file owned by "foo" and subsequently reading it. It does so by checking that the file descriptors passed to the ioctl are also opened for reading. Signed-off-by: Dan Rosenberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2010-06-22Input: wacom - fix serial number handling on Cintiq 21UX2Ping Cheng1-4/+5
Cintiq 21UX2 added 8 more bits for the tool serial number and more buttons for the expresskey. We did not enable them properly in the last patch. Signed-off-by: Ping Cheng <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
2010-06-22Input: fixup X86_MRST selectsRandy Dunlap2-2/+2
Some of the recent X86_MRST additions make some "select"s conditional on X86_MRST but missed some related kconfig symbols, causing: drivers/built-in.o: In function `ps2_end_command': (.text+0x257ab2): undefined reference to `i8042_check_port_owner' drivers/built-in.o: In function `ps2_end_command': (.text+0x257ae1): undefined reference to `i8042_unlock_chip' drivers/built-in.o: In function `ps2_begin_command': (.text+0x257b40): undefined reference to `i8042_check_port_owner' drivers/built-in.o: In function `ps2_begin_command': (.text+0x257b6f): undefined reference to `i8042_lock_chip' when SERIO_I8042=m, SERIO_LIBPS2=y, KEYBOARD_ATKBD=y. We need to make i8042 dependant upon !X86_MRST and allow deselecting atkbd on Moorestown even when !CONFIG_EMBEDDED. Signed-off-by: Randy Dunlap <[email protected]> Cc: Jacob Pan <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
2010-06-22Merge commit 'v2.6.35-rc3' into for-linusDmitry Torokhov1029-7874/+83540
2010-06-22Merge branch 'master' of ↵David S. Miller1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2010-06-22NFSv4: Fix an embarassing typo in encode_attrs()Trond Myklebust1-2/+2
Apparently, we have never been able to set the atime correctly from the NFSv4 client. Reported-by: 小倉一夫 <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected]
2010-06-22NFSv4: Ensure that /proc/self/mountinfo displays the minor version numberTrond Myklebust1-4/+18
Currently, we do not display the minor version mount parameter in the /proc mount info. Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected]
2010-06-22NFSv4.1: Ensure that we initialise the session when following a referralTrond Myklebust1-71/+51
Put the code that is common to both the referral and ordinary mount cases into a common helper routine. Signed-off-by: Trond Myklebust <[email protected]>
2010-06-22SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir()Trond Myklebust1-16/+22
If the attempt to read the calldir fails, then instead of storing the read bytes, we currently discard them. This leads to a garbage final result when upon re-entry to the same routine, we read the remaining bytes. Fixes the regression in bugzilla number 16213. Please see https://bugzilla.kernel.org/show_bug.cgi?id=16213 Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected]
2010-06-22nfs4 use mandatory attribute file type in nfs4_get_rootAndy Adamson1-1/+1
S_ISDIR(fsinfo.fattr->mode) checks the file type rather than the mode bits, so we should be checking for the NFS_ATTR_FATTR_TYPE fattr property. Signed-off-by: Andy Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected]
2010-06-22hso: remove setting of low_latency flagFilip Aben1-1/+0
This patch removes the setting of the low_latency flag. tty_flip_buffer_push() is occasionally being called in irq context, which causes a hang if the low_latency flag is set. Removing the low_latency flag only seems to impact the flush to ldisc, which will now be put on a workqueue. Signed-off-by: Filip Aben <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-06-22ALSA: hda - Add Macbook 5,2 quirkLuke Yelavich1-0/+1
BugLink: https://bugs.launchpad.net/bugs/463178 Set Macbook 5,2 (106b:4a00) hardware to use ALC885_MB5 Cc: <[email protected]> Signed-off-by: Luke Yelavich <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2010-06-22ALSA: hda - Fix uninitialized variableTakashi Iwai1-1/+1
Fix the following compile warning. kctl should be NULL-initialized. sound/pci/hda/patch_realtek.c: In function ‘alc_build_controls’: sound/pci/hda/patch_realtek.c:2550:23: warning: ‘kctl’ may be used uninitialized in this function Signed-off-by: Takashi Iwai <[email protected]>
2010-06-22clocksource: sh_cmt: Fix up bogus shift value.Paul Mundt1-1/+1
The previous CMT fixup accidentally copied in the TMU shift value, reset this back to its original value while preserving the TMU fix. Signed-off-by: Paul Mundt <[email protected]>
2010-06-21ceph: delay umount until all mds requests drop inode+dentry refsSage Weil1-0/+6
This fixes a race between handle_reply finishing an mds request, signalling completion, and then dropping the request structing and its dentry+inode refs, and pre_umount function waiting for requests to finish before letting the vfs tear down the dcache. If umount was delayed waiting for mds requests, we could race and BUG in shrink_dcache_for_umount_subtree because of a slow dput. This delays umount until the msgr queue flushes, which means handle_reply will exit and will have dropped the ceph_mds_request struct. I'm assuming the VFS has already ensured that its calls have all completed and those request refs have thus been dropped as well (I haven't seen that race, at least). Signed-off-by: Sage Weil <[email protected]>
2010-06-21ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulateSage Weil1-7/+12
Handle a splice_dentry failure (due to a d_materialize_unique error) without crashing. (Also, report the error code.) Signed-off-by: Sage Weil <[email protected]>
2010-06-21udp: Fix bogus UFO packet generationHerbert Xu1-3/+6
It has been reported that the new UFO software fallback path fails under certain conditions with NFS. I tracked the problem down to the generation of UFO packets that are smaller than the MTU. The software fallback path simply discards these packets. This patch fixes the problem by not generating such packets on the UFO path. Signed-off-by: Herbert Xu <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-06-21lasi82596: fix netdev_mc_count conversionHelge Deller1-1/+1
Fix commit 4cd24eaf0 (net: use netdev_mc_count and netdev_mc_empty when appropriate) Signed-off-by: Helge Deller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-06-21NET: MIPSsim: Fix modpost warning.Ralf Baechle1-1/+1
$ make CONFIG_DEBUG_SECTION_MISMATCH=y [...] WARNING: drivers/net/built-in.o(.data+0x0): Section mismatch in reference from the variable mipsnet_driver to the function .init.text:mipsnet_probe() The variable mipsnet_driver references the function __init mipsnet_probe() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console, [...] Fixed by making mipsnet_probe __devinit. Signed-off-by: Ralf Baechle <[email protected]> drivers/net/mipsnet.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Signed-off-by: David S. Miller <[email protected]>
2010-06-21tracing: Fix undeclared ENOSYS in include/linux/tracepoint.hWu Zhangjin1-0/+1
The header file include/linux/tracepoint.h may be included without include/linux/errno.h and then the compiler will fail on building for undelcared ENOSYS. This patch fixes this problem via including <linux/errno.h> to include/linux/tracepoint.h. Signed-off-by: Wu Zhangjin <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2010-06-21Merge branch 'fix/misc' into for-linusTakashi Iwai1-0/+1
2010-06-21ALSA: usb/endpoint, fix dangling pointer useJiri Slaby1-0/+1
Stanse found that in snd_usb_parse_audio_endpoints, there is a dangling pointer dereference. When snd_usb_parse_audio_format fails, fp is freed, and continue invoked. On the next loop, there is "fp && fp->altsetting == 1 && fp->channels == 1" test, but fp is set from the last iteration (but is bogus) and thus ilegally dereferenced. Set fp to NULL before "continue". Signed-off-by: Jiri Slaby <[email protected]> Acked-by: Daniel Mack <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2010-06-21cfq: fix recursive call in cfq_blkiocg_update_completion_stats()Jens Axboe1-1/+1
e98ef89b has a typo, causing cfq_blkiocg_update_completion_stats() to call itself instead of blkiocg_update_completion_stats(). Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-06-21arch/sh/mm: Eliminate a double lockJulia Lawall1-1/+1
The function begins and ends with a read_lock. The latter is changed to a read_unlock. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @locked@ expression E1; position p; @@ read_lock(E1@p,...); @r exists@ expression x <= locked.E1; expression locked.E1; expression E2; identifier lock; position locked.p,p1,p2; @@ *lock@p1 (E1@p,...); ... when != E1 when != \(x = E2\|&x\) *lock@p2 (E1,...); // </smpl> Signed-off-by: Julia Lawall <[email protected]> Acked-by: Matt Fleming <[email protected]> Signed-off-by: Paul Mundt <[email protected]>
2010-06-20Merge branch 'fix/misc' into for-linusTakashi Iwai1-11/+11
2010-06-20Merge branch 'fix/asoc' into for-linusTakashi Iwai1-2/+0
2010-06-20x86: Fix rebooting on Dell Precision WorkStation T7400Thomas Backlund1-0/+8
Dell Precision WorkStation T7400 freezes on reboot unless reboot=b is used. Reference: https://qa.mandriva.com/show_bug.cgi?id=58017 Signed-off-by: Thomas Backlund <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-06-20hwmon: (k8temp) Bypass core swapping on single-core processorsJean Delvare1-1/+1
Commit a2e066bba2aad6583e3ff648bf28339d6c9f0898 introduced core swapping for CPU models 64 and later. I recently had a report about a Sempron 3200+, model 95, for which this patch broke temperature reading. It happens that this is a single-core processor, so the effect of the swapping was to read a temperature value for a core that didn't exist, leading to an incorrect value (-49 degrees C.) Disabling core swapping on singe-core processors should fix this. Additional comment from Andreas: The BKDG says Thermal Sensor Core Select (ThermSenseCoreSel)-Bit 2. This bit selects the CPU whose temperature is reported in the CurTemp field. This bit only applies to dual core processors. For single core processors CPU0 Thermal Sensor is always selected. k8temp_probe() correctly detected that SEL_CORE can't be used on single core CPU. Thus k8temp did never update the temperature values stored in temp[1][x] and -49 degrees was reported. For single core CPUs we must use the values read into temp[0][x]. Signed-off-by: Jean Delvare <[email protected]> Tested-by: Rick Moritz <[email protected]> Acked-by: Andreas Herrmann <[email protected]> Cc: [email protected]
2010-06-20hwmon: (i5k_amb) Fix sysfs attribute for lockdepKAMEZAWA Hiroyuki1-0/+6
i5k_amb.ko uses dynamically allocated memory (by kmalloc) for attributes passed to sysfs. So, sysfs_attr_init() should be called for working happy with lockdep. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Jean Delvare <[email protected]> Cc: [email protected] [2.6.34 only]
2010-06-20hwmon: (k10temp) Do not blacklist known working CPU modelsJean Delvare1-2/+12
When detecting AM2+ or AM3 socket with DDR2, only blacklist cores which are known to exist in AM2+ format. Signed-off-by: Jean Delvare <[email protected]> Acked-by: Clemens Ladisch <[email protected]> Cc: Andreas Herrmann <[email protected]> Cc: [email protected]
2010-06-18drm/i915: gen3 page flipping fixesJesse Barnes6-9/+46
Gen3 chips have slightly different flip commands, and also contain a bit that indicates whether a "flip pending" interrupt means the flip has been queued or has been completed. So implement support for the gen3 flip command, and make sure we use the flip pending interrupt correctly depending on the value of ECOSKPD bit 0. Signed-off-by: Jesse Barnes <[email protected]> Signed-off-by: Eric Anholt <[email protected]>
2010-06-18drm/i915: don't queue flips during a flip pending eventJesse Barnes1-0/+11
Hardware will set the flip pending ISR bit as soon as it receives the flip instruction, and (supposedly) clear it once the flip completes (e.g. at the next vblank). If we try to send down a flip instruction while the ISR bit is set, the hardware can become very confused, and we may never receive the corresponding flip pending interrupt, effectively hanging the chip. Signed-off-by: Jesse Barnes <[email protected]> Signed-off-by: Eric Anholt <[email protected]>
2010-06-18x86: Fix vsyscall on gcc 4.5 with -OsAndi Kleen1-1/+1
This fixes the -Os breaks with gcc 4.5 bug. rdtsc_barrier needs to be force inlined, otherwise user space will jump into kernel space and kill init. This also addresses http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44129 I believe. Signed-off-by: Andi Kleen <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: <[email protected]>
2010-06-18ath5k: initialize ah->ah_current_channelBob Copeland1-0/+1
ath5k assumes ah_current_channel is always a valid pointer in several places, but a newly created interface may not have a channel. To avoid null pointer dereferences, set it up to point to the first available channel until later reconfigured. This fixes the following oops: $ rmmod ath5k $ insmod ath5k $ iw phy0 set distance 11000 BUG: unable to handle kernel NULL pointer dereference at 00000006 IP: [<d0a1ff24>] ath5k_hw_set_coverage_class+0x74/0x1b0 [ath5k] *pde = 00000000 Oops: 0000 [#1] last sysfs file: /sys/devices/pci0000:00/0000:00:0e.0/ieee80211/phy0/index Modules linked in: usbhid option usb_storage usbserial usblp evdev lm90 scx200_acb i2c_algo_bit i2c_dev i2c_core via_rhine ohci_hcd ne2k_pci 8390 leds_alix2 xt_IMQ imq nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_cc Pid: 1597, comm: iw Not tainted (2.6.32.14 #8) EIP: 0060:[<d0a1ff24>] EFLAGS: 00010296 CPU: 0 EIP is at ath5k_hw_set_coverage_class+0x74/0x1b0 [ath5k] EAX: 000000c2 EBX: 00000000 ECX: ffffffff EDX: c12d2080 ESI: 00000019 EDI: cf8c0000 EBP: d0a30edc ESP: cfa09bf4 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process iw (pid: 1597, ti=cfa09000 task=cf88a000 task.ti=cfa09000) Stack: d0a34f35 d0a353f8 d0a30edc 000000fe cf8c0000 00000000 1900063d cfa8c9e0 <0> cfa8c9e8 cfa8c0c0 cfa8c000 d0a27f0c 199d84b4 cfa8c200 00000010 d09bfdc7 <0> 00000000 00000000 ffffffff d08e0d28 cf9263c0 00000001 cfa09cc4 00000000 Call Trace: [<d0a27f0c>] ? ath5k_hw_attach+0xc8c/0x3c10 [ath5k] [<d09bfdc7>] ? __ieee80211_request_smps+0x1347/0x1580 [mac80211] [<d08e0d28>] ? nl80211_send_scan_start+0x7b8/0x4520 [cfg80211] [<c10f5db9>] ? nla_parse+0x59/0xc0 [<c11ca8d9>] ? genl_rcv_msg+0x169/0x1a0 [<c11ca770>] ? genl_rcv_msg+0x0/0x1a0 [<c11c7e68>] ? netlink_rcv_skb+0x38/0x90 [<c11c9649>] ? genl_rcv+0x19/0x30 [<c11c7c03>] ? netlink_unicast+0x1b3/0x220 [<c11c893e>] ? netlink_sendmsg+0x26e/0x290 [<c11a409e>] ? sock_sendmsg+0xbe/0xf0 [<c1032780>] ? autoremove_wake_function+0x0/0x50 [<c104d846>] ? __alloc_pages_nodemask+0x106/0x530 [<c1074933>] ? do_lookup+0x53/0x1b0 [<c10766f9>] ? __link_path_walk+0x9b9/0x9e0 [<c11acab0>] ? verify_iovec+0x50/0x90 [<c11a42b1>] ? sys_sendmsg+0x1e1/0x270 [<c1048e50>] ? find_get_page+0x10/0x50 [<c104a96f>] ? filemap_fault+0x5f/0x370 [<c1059159>] ? __do_fault+0x319/0x370 [<c11a55b4>] ? sys_socketcall+0x244/0x290 [<c101962c>] ? do_page_fault+0x1ec/0x270 [<c1019440>] ? do_page_fault+0x0/0x270 [<c1002ae5>] ? syscall_call+0x7/0xb Code: 00 b8 fe 00 00 00 b9 f8 53 a3 d0 89 5c 24 14 89 7c 24 10 89 44 24 0c 89 6c 24 08 89 4c 24 04 c7 04 24 35 4f a3 d0 e8 7c 30 60 f0 <0f> b7 43 06 ba 06 00 00 00 a8 10 75 0e 83 e0 20 83 f8 01 19 d2 EIP: [<d0a1ff24>] ath5k_hw_set_coverage_class+0x74/0x1b0 [ath5k] SS:ESP 0068:cfa09bf4 CR2: 0000000000000006 ---[ end trace 54f73d6b10ceb87b ]--- Cc: [email protected] Reported-by: Steve Brown <[email protected]> Signed-off-by: Bob Copeland <[email protected]> Signed-off-by: John W. Linville <[email protected]>
2010-06-18cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=nVivek Goyal2-27/+142
Hi Jens, Few days back Ingo noticed a CFQ boot time warning. This patch fixes it. The issue here is that with CFQ_GROUP_IOSCHED=n, CFQ should not really be making blkio stat related calls. > Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With > some > configs i get bad spinlock warnings during bootup: > > [ 28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 > usecs > [ 28.972003] calling b44_init+0x0/0x55 @ 1 > [ 28.976009] bus: 'pci': add driver b44 > [ 28.976374] sda: > [ 28.978157] BUG: spinlock bad magic on CPU#1, async/0/117 > [ 28.980000] lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, +.owner_cpu: 0 > [ 28.980000] Pid: 117, comm: async/0 Not tainted +2.6.35-rc2-tip-01092-g010e7ef-dirty #8183 > [ 28.980000] Call Trace: > [ 28.980000] [<41ba6d55>] ? printk+0x20/0x24 > [ 28.980000] [<4134b7b7>] spin_bug+0x7c/0x87 > [ 28.980000] [<4134b853>] do_raw_spin_lock+0x1e/0x123 > [ 28.980000] [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20 > [ 28.980000] [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20 > [ 28.980000] [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb > [ 28.980000] [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1 > [ 28.980000] [<41337bc7>] cfq_insert_request+0x8c/0x425 Signed-off-by: Vivek Goyal <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-06-18PCI/PM: Do not use native PCIe PME by defaultRafael J. Wysocki2-7/+16
Commit c7f486567c1d0acd2e4166c47069835b9f75e77b (PCI PM: PCIe PME root port service driver) causes the native PCIe PME signaling to be used by default, if the BIOS allows the kernel to control the standard configuration registers of PCIe root ports. However, the native PCIe PME is coupled to the native PCIe hotplug and calling pcie_pme_acpi_setup() makes some BIOSes expect that the native PCIe hotplug will be used as well. That, in turn, causes problems to appear on systems where the PCIe hotplug driver is not loaded. The usual symptom, as reported by Jaroslav Kameník and others, is that the ACPI GPE associated with PCIe hotplug keeps firing continuously causing kacpid to take substantial percentage of CPU time. To work around this issue, change the default so that the native PCIe PME signaling is only used if directly requested with the help of the pcie_pme= command line switch. Fixes https://bugzilla.kernel.org/show_bug.cgi?id=15924 , which is a listed regression from 2.6.33. Signed-off-by: Rafael J. Wysocki <[email protected]> Reported-by: Jaroslav Kameník <[email protected]> Tested-by: Antoni Grzymala <[email protected]> Signed-off-by: Jesse Barnes <[email protected]>
2010-06-18percpu: fix first chunk match in per_cpu_ptr_to_phys()Tejun Heo1-3/+28
per_cpu_ptr_to_phys() determines whether the passed in @addr belongs to the first_chunk or not by just matching the address against the address range of the base unit (unit0, used by cpu0). When an adress from another cpu was passed in, it will always determine that the address doesn't belong to the first chunk even when it does. This makes the function return a bogus physical address which may lead to crash. This problem was discovered by Cliff Wickman while investigating a crash during kdump on a SGI UV system. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Cliff Wickman <[email protected]> Tested-by: Cliff Wickman <[email protected]> Cc: [email protected]
2010-06-18kbuild: Clean up and speed up the localversion logicMichal Marek3-119/+136
Now that we run scripts/setlocalversion during every build, it makes sense to move all the localversion logic there. This cleans up the toplevel Makefile and also makes sure that the script is called only once in 'make prepare' (previously, it would be called every time due to a variable expansion in an ifneq statement). No user-visible change is intended, unless one runs the setlocalversion script directly. Reported-by: Dmitry Torokhov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Nico Schottelius <[email protected]> Signed-off-by: Michal Marek <[email protected]>
2010-06-18sched: Fix over-scheduling bugAlex,Shi1-3/+0
Commit e70971591 ("sched: Optimize unused cgroup configuration") introduced an imbalanced scheduling bug. If we do not use CGROUP, function update_h_load won't update h_load. When the system has a large number of tasks far more than logical CPU number, the incorrect cfs_rq[cpu]->h_load value will cause load_balance() to pull too many tasks to the local CPU from the busiest CPU. So the busiest CPU keeps going in a round robin. That will hurt performance. The issue was found originally by a scientific calculation workload that developed by Yanmin. With that commit, the workload performance drops about 40%. CPU before after 00 : 2 : 7 01 : 1 : 7 02 : 11 : 6 03 : 12 : 7 04 : 6 : 6 05 : 11 : 7 06 : 10 : 6 07 : 12 : 7 08 : 11 : 6 09 : 12 : 6 10 : 1 : 6 11 : 1 : 6 12 : 6 : 6 13 : 2 : 6 14 : 2 : 6 15 : 1 : 6 Reviewed-by: Yanmin zhang <[email protected]> Signed-off-by: Alex Shi <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1276754893.9452.5442.camel@debian> Signed-off-by: Ingo Molnar <[email protected]>
2010-06-17bridge: fdb cleanup runs too oftenstephen hemminger1-4/+2
It is common in end-node, non STP bridges to set forwarding delay to zero; which causes the forwarding database cleanup to run every clock tick. Change to run only as soon as needed or at next ageing timer interval which ever is sooner. Use round_jiffies_up macro rather than attempting round up by changing value. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-06-17cfq: Don't allow queue merges for queues that have no process referencesJeff Moyer1-2/+11
Hi, A user reported a kernel bug when running a particular program that did the following: created 32 threads - each thread took a mutex, grabbed a global offset, added a buffer size to that offset, released the lock - read from the given offset in the file - created a new thread to do the same - exited The result is that cfq's close cooperator logic would trigger, as the threads were issuing I/O within the mean seek distance of one another. This workload managed to routinely trigger a use after free bug when walking the list of merge candidates for a particular cfqq (cfqq->new_cfqq). The logic used for merging queues looks like this: static void cfq_setup_merge(struct cfq_queue *cfqq, struct cfq_queue *new_cfqq) { int process_refs, new_process_refs; struct cfq_queue *__cfqq; /* Avoid a circular list and skip interim queue merges */ while ((__cfqq = new_cfqq->new_cfqq)) { if (__cfqq == cfqq) return; new_cfqq = __cfqq; } process_refs = cfqq_process_refs(cfqq); /* * If the process for the cfqq has gone away, there is no * sense in merging the queues. */ if (process_refs == 0) return; /* * Merge in the direction of the lesser amount of work. */ new_process_refs = cfqq_process_refs(new_cfqq); if (new_process_refs >= process_refs) { cfqq->new_cfqq = new_cfqq; atomic_add(process_refs, &new_cfqq->ref); } else { new_cfqq->new_cfqq = cfqq; atomic_add(new_process_refs, &cfqq->ref); } } When a merge candidate is found, we add the process references for the queue with less references to the queue with more. The actual merging of queues happens when a new request is issued for a given cfqq. In the case of the test program, it only does a single pread call to read in 1MB, so the actual merge never happens. Normally, this is fine, as when the queue exits, we simply drop the references we took on the other cfqqs in the merge chain: /* * If this queue was scheduled to merge with another queue, be * sure to drop the reference taken on that queue (and others in * the merge chain). See cfq_setup_merge and cfq_merge_cfqqs. */ __cfqq = cfqq->new_cfqq; while (__cfqq) { if (__cfqq == cfqq) { WARN(1, "cfqq->new_cfqq loop detected\n"); break; } next = __cfqq->new_cfqq; cfq_put_queue(__cfqq); __cfqq = next; } However, there is a hole in this logic. Consider the following (and keep in mind that each I/O keeps a reference to the cfqq): q1->new_cfqq = q2 // q2 now has 2 process references q3->new_cfqq = q2 // q2 now has 3 process references // the process associated with q2 exits // q2 now has 2 process references // queue 1 exits, drops its reference on q2 // q2 now has 1 process reference // q3 exits, so has 0 process references, and hence drops its references // to q2, which leaves q2 also with 0 process references q4 comes along and wants to merge with q3 q3->new_cfqq still points at q2! We follow that link and end up at an already freed cfqq. So, the fix is to not follow a merge chain if the top-most queue does not have a process reference, otherwise any queue in the chain could be already freed. I also changed the logic to disallow merging with a queue that does not have any process references. Previously, we did this check for one of the merge candidates, but not the other. That doesn't really make sense. Without the attached patch, my system would BUG within a couple of seconds of running the reproducer program. With the patch applied, my system ran the program for over an hour without issues. This addresses the following bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=16217 Thanks a ton to Phil Carns for providing the bug report and an excellent reproducer. [ Note for stable: this applies to 2.6.32/33/34 ]. Signed-off-by: Jeff Moyer <[email protected]> Reported-by: Phil Carns <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2010-06-17nohz: Fix nohz ratelimitPeter Zijlstra1-4/+1
Chris Wedgwood reports that 39c0cbe (sched: Rate-limit nohz) causes a serial console regression, unresponsiveness, and indeed it does. The reason is that the nohz code is skipped even when the tick was already stopped before the nohz_ratelimit(cpu) condition changed. Move the nohz_ratelimit() check to the other conditions which prevent long idle sleeps. Reported-by: Chris Wedgwood <[email protected]> Tested-by: Brian Bloniarz <[email protected]> Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg KH <[email protected]> Cc: Alan Cox <[email protected]> Cc: OGAWA Hirofumi <[email protected]> Cc: Jef Driesen <[email protected]> LKML-Reference: <1276790557.27822.516.camel@twins> Signed-off-by: Thomas Gleixner <[email protected]>
2010-06-17perf record: prevent kill(0, SIGTERM);Ian Munsie1-1/+1
At exit, perf record will kill the process it was profiling by sending a SIGTERM to child_pid (if it had been initialised), but in certain situations child_pid may be 0 and perf would mistakenly kill more processes than intended. child_pid is set to the return of fork() to either 0 or the pid of the child. Ordinarily this would not present an issue as the child calls execvp to spawn the process to be profiled and would therefore never run it's sig_atexit and never attempt to kill pid 0. However, if a nonexistant binary had been passed in to perf record the call to execvp would fail and child_pid would be left set to 0. The child would then exit and it's atexit handler, finding that child_pid was initialised to 0, would call kill(0, SIGTERM), resulting in every process within it's process group being killed. In the case that perf was being run directly from the shell this typically would not be an issue as the shell isolates the process. However, if perf was being called from another program it could kill unexpected processes, which may even include X. This patch changes the logic of the test for whether child_pid was initialised to only consider positive pids as valid, thereby never attempting to kill pid 0. Cc: David S. Miller <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tom Zanussi <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ian Munsie <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-06-17ceph: fix crush map update decodingSage Weil1-0/+1
If the incremental osdmap has a new crush map, advance the position after decoding so that we can parse the rest of the osdmap properly. Signed-off-by: Sage Weil <[email protected]>
2010-06-17Merge branch 'bugzilla-15951' into releaseLen Brown2-15/+9
2010-06-17ACPI / PM: Do not enable GPEs for system wakeup in advanceRafael J. Wysocki2-15/+9
After commit 9630bdd9b15d2f489c646d8bc04b60e53eb5ec78 (ACPI: Use GPE reference counting to support shared GPEs) the wakeup enable mask bits of GPEs are set as soon as the GPEs are enabled to wake up the system. Unfortunately, this leads to a regression reported by Michal Hocko, where a system is woken up from ACPI S5 by a device that is not supposed to do that, because the wakeup enable mask bit of this device's GPE is always set when acpi_enter_sleep_state() calls acpi_hw_enable_all_wakeup_gpes(), although it should only be set if the device is supposed to wake up the system from the target state. To work around this issue, rework the ACPI power management code so that GPEs are not enabled to wake up the system upfront, but only during a system state transition when the target state of the system is known. [Of course, this means that the reference counting of "wakeup" GPEs doesn't really make sense and it is sufficient to set/unset the wakeup mask bits for them during system sleep transitions. This will allow us to simplify the GPE handling code quite a bit, but that change is too intrusive for 2.6.35.] Fixes https://bugzilla.kernel.org/show_bug.cgi?id=15951 Signed-off-by: Rafael J. Wysocki <[email protected]> Reported-and-tested-by: Michal Hocko <[email protected]> Signed-off-by: Len Brown <[email protected]>