aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-03-07bonding 802.3ad: Fix the state machine locking v2Nils Carlson1-5/+11
Changes since v1: * Clarify an unclear comment * Move a (possible) name change to a separate patch The ad_rx_machine, ad_periodic_machine and ad_port_selection_logic functions all inspect and alter common fields within the port structure. Previous to this patch, only the ad_rx_machines were mutexed, and the periodic and port_selection could run unmutexed against an ad_rx_machine trigged by an arriving LACPDU. This patch remedies the situation by protecting all the state machines from concurrency. This is accomplished by locking around all the state machines for a given port, which are executed at regular intervals; and the ad_rx_machine when handling an incoming LACPDU. Signed-off-by: Nils Carlson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-07drivers/net/macvtap: fix error checkNicolas Kaiser1-1/+2
'len' is unsigned of type size_t and can't be negative. Signed-off-by: Nicolas Kaiser <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-07net: fix multithreaded signal handling in unix recv routinesRainer Weikusat1-4/+13
The unix_dgram_recvmsg and unix_stream_recvmsg routines in net/af_unix.c utilize mutex_lock(&u->readlock) calls in order to serialize read operations of multiple threads on a single socket. This implies that, if all n threads of a process block in an AF_UNIX recv call trying to read data from the same socket, one of these threads will be sleeping in state TASK_INTERRUPTIBLE and all others in state TASK_UNINTERRUPTIBLE. Provided that a particular signal is supposed to be handled by a signal handler defined by the process and that none of this threads is blocking the signal, the complete_signal routine in kernel/signal.c will select the 'first' such thread it happens to encounter when deciding which thread to notify that a signal is supposed to be handled and if this is one of the TASK_UNINTERRUPTIBLE threads, the signal won't be handled until the one thread not blocking on the u->readlock mutex is woken up because some data to process has arrived (if this ever happens). The included patch fixes this by changing mutex_lock to mutex_lock_interruptible and handling possible error returns in the same way interruptions are handled by the actual receive-code. Signed-off-by: Rainer Weikusat <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-08drm: index i shadowed in 2nd looproel1-2/+2
Index i was already used in thhe first loop Signed-off-by: Roel Kluin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2011-03-07mmc: sdio: Allow sdio operations in other threads during sdio_add_func()Dmitry Shmidt1-2/+1
This fixes a bug introduced by 807e8e40673d ("mmc: Fix sd/sdio/mmc initialization frequency retries") that prevented SDIO drivers from performing SDIO commands in their probe routines -- the above patch called mmc_claim_host() before sdio_add_func(), which causes a deadlock if an external SDIO driver calls sdio_claim_host(). Fix tested on an OLPC XO-1.75 with libertas on SDIO. Signed-off-by: Dmitry Shmidt <[email protected]> Reviewed-and-Tested-by: Chris Ball <[email protected]> Signed-off-by: Chris Ball <[email protected]>
2011-03-08Merge remote branch 'ickle/drm-intel-fixes' into drm-fixesDave Airlie9-41/+70
* ickle/drm-intel-fixes: drm/i915: Rebind the buffer if its alignment constraints changes with tiling drm/i915: Disable GPU semaphores by default drm/i915: Do not overflow the MMADDR write FIFO Revert "drm/i915: fix corruptions on i8xx due to relaxed fencing"
2011-03-07Merge branch 'omap-fixes-for-linus' of ↵Linus Torvalds2-22/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 * 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: omap: mailbox: resolve hang issue OMAP2+: PM: SmartReflex: fix memory leaks in Smartreflex driver arm: mach-omap2: smartreflex: fix another memory leak
2011-03-07Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds5-37/+120
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] tape: deadlock on system work queue [S390] keyboard: integer underflow bug [S390] xpram: remove __initdata attribute from module parameters
2011-03-08drm/nv50-nvc0: prevent multiple vm/bar flushes occuring simultanenouslyBen Skeggs2-0/+12
The per-vm mutex doesn't prevent this completely, a flush coming from the BAR VM could potentially happen at the same time as one for the channel VM. Not to mention that if/when we get per-client/channel VM, this will happen far more frequently. Signed-off-by: Ben Skeggs <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2011-03-08drm/nouveau: fix regression causing ttm to not be able to evict vramBen Skeggs2-3/+5
TTM assumes an error condition from man->func->get_node() means that something went horribly wrong, and causes it to bail. The driver is supposed to return 0, and leave mm_node == NULL to signal that it couldn't allocate any memory. Signed-off-by: Ben Skeggs <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2011-03-07net: Enter net/ipv6/ even if CONFIG_IPV6=nThomas Graf1-3/+1
exthdrs_core.c and addrconf_core.c in net/ipv6/ contain bits which must be made available even if IPv6 is disabled. net/ipv6/Makefile already correctly includes them if CONFIG_IPV6=n but net/Makefile prevents entering the subdirectory. Signed-off-by: Thomas Graf <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-07net/smsc911x.c: Set the VLAN1 register to fix VLAN MTU problemGöran Weinholt1-0/+5
The smsc911x driver would drop frames longer than 1518 bytes, which is a problem for networks with VLAN tagging. The VLAN1 tag register is used to increase the legal frame size to 1522 when a VLAN tag is identified. Signed-off-by: Göran Weinholt <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-07nfsd4: fix bad pointer on failure to find delegationJ. Bruce Fields1-6/+7
In case of a nonempty list, the return on error here is obviously bogus; it ends up being a pointer to the list head instead of to any valid delegation on the list. In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky, then renew_client may oops, and it may take an embarassingly long time to figure out why. Facepalm. BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 IP: [<ffffffff81292965>] nfsd4_delegreturn+0x125/0x200 ... Cc: [email protected] Signed-off-by: J. Bruce Fields <[email protected]>
2011-03-07Btrfs: deal with short returns from copy_from_userChris Mason1-0/+13
When copy_from_user is only able to copy some of the bytes we requested, we may end up creating a partially up to date page. To avoid garbage in the page, we need to treat a partial copy as a zero length copy. This makes the rest of the file_write code drop the page and retry the whole copy instead of marking the partially up to date page as dirty. Signed-off-by: Chris Mason <[email protected]> cc: [email protected]
2011-03-07Btrfs: fix regressions in copy_from_user handlingChris Mason1-42/+59
Commit 914ee295af418e936ec20a08c1663eaabe4cd07a fixed deadlocks in btrfs_file_write where we would catch page faults on pages we had locked. But, there were a few problems: 1) The x86-32 iov_iter_copy_from_user_atomic code always fails to copy data when the amount to copy is more than 4K and the offset to start copying from is not page aligned. The result was btrfs_file_write looping forever retrying the iov_iter_copy_from_user_atomic We deal with this by changing btrfs_file_write to drop down to single page copies when iov_iter_copy_from_user_atomic starts returning failure. 2) The btrfs_file_write code was leaking delalloc reservations when iov_iter_copy_from_user_atomic returned zero. The looping above would result in the entire filesystem running out of delalloc reservations and constantly trying to flush things to disk. 3) btrfs_file_write will lock down page cache pages, make sure any writeback is finished, do the copy_from_user and then release them. Before the loop runs we check the first and last pages in the write to see if they are only being partially modified. If the start or end of the write isn't aligned, we make sure the corresponding pages are up to date so that we don't introduce garbage into the file. With the copy_from_user changes, we're allowing the VM to reclaim the pages after a partial update from copy_from_user, but we're not making sure the page cache page is up to date when we loop around to resume the write. We deal with this by pushing the up to date checks down into the page prep code. This fits better with how the rest of file_write works. Signed-off-by: Chris Mason <[email protected]> Reported-by: Mitch Harder <[email protected]> cc: [email protected]
2011-03-07drm/i915: Rebind the buffer if its alignment constraints changes with tilingChris Wilson3-5/+21
Early gen3 and gen2 chipset do not have the relaxed per-surface tiling constraints of the later chipsets, so we need to check that the GTT alignment is correct for the new tiling. If it is not, we need to rebind. Reported-by: Daniel Vetter <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Signed-off-by: Chris Wilson <[email protected]>
2011-03-07drm/i915: Disable GPU semaphores by defaultChris Wilson3-2/+6
Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09 down to the use of GPU semaphores, and we already know that they appear broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling GPU semaphores is simply hiding another bug by the latency and side-effects of the additional device interaction it introduces...) However, use of semaphores is a massive performance improvement... Only as long as the system remains stable. Enable at your peril. Reported-by: Andi Kleen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921 Signed-off-by: Chris Wilson <[email protected]>
2011-03-07i2c-omap: Program I2C_WE on OMAP4 to enable i2c wakeupRajendra Nayak1-3/+1
For the I2C module to be wakeup capable, programming I2C_WE register (which was skipped for OMAP4430) is needed even on OMAP4. This fixes i2c controller timeouts which were seen recently with the static dependency being cleared between MPU and L4PER clockdomains. Signed-off-by: Rajendra Nayak <[email protected]> [[email protected]: re-flowed description] Signed-off-by: Ben Dooks <[email protected]>
2011-03-06bnx2x: fix MaxBW configurationDmitry Kravkov1-2/+2
Increase resolution of MaxBW algorithm to suit Min Bandwidth configuration. Signed-off-by: Dmitry Kravkov <[email protected]> Signed-off-by: Eilon Greenstein <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-06bnx2x: (NPAR) prevent HW access in D3 stateDmitry Kravkov4-10/+40
Changing speed setting in NPAR requires HW access, this patch delays the access to D0 state when performed in D3. Signed-off-by: Dmitry Kravkov <[email protected]> Signed-off-by: Eilon Greenstein <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-06bnx2x: fix link notificationDmitry Kravkov1-7/+7
Report link to OS and other PFs after HW is fully reconfigured according to new link parameters. (Affected only Multi Function modes). Signed-off-by: Dmitry Kravkov <[email protected]> Signed-off-by: Eilon Greenstein <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-06bnx2x: fix non-pmf device load flowDmitry Kravkov1-2/+3
Remove port MAX BW configuration from non-pmf functions, which caused reconfigure of HW according to 10G (fake) link. Signed-off-by: Dmitry Kravkov <[email protected]> Signed-off-by: Eilon Greenstein <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-06Merge branch 'for-linus' of ↵Linus Torvalds5-15/+57
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: hda - Don't set to D3 in Cirrus errata init verbs ALSA: hda - add new Fermi 5xx codec IDs to snd-hda ASoC: WM8994: Ensure late enable events are processed for the ADCs ASoC: WM8994: Don't disable the AIF[1|2]CLK_ENA unconditionaly ASoC: Fix WM9081 platform data initialisation ALSA: hda - Fix unable to record issue on ASUS N82JV ALSA: HDA: Realtek: Fixup jack detection to input subsystem
2011-03-06virtio: console: Don't access vqs if device was unpluggedAmit Shah1-0/+8
If a virtio-console device gets unplugged while a port is open, a subsequent close() call on the port accesses vqs to free up buffers. This can lead to a crash. The buffers are already freed up as a result of the call to unplug_ports() from virtcons_remove(). The fix is to simply not access vq information if port->portdev is NULL. Reported-by: juzhang <[email protected]> CC: [email protected] Signed-off-by: Amit Shah <[email protected]> Signed-off-by: Rusty Russell <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-06Merge branch 'fix/asoc' into for-linusTakashi Iwai2-9/+47
2011-03-06drm/i915: Do not overflow the MMADDR write FIFOChris Wilson6-19/+42
Whilst the GT is powered down (rc6), writes to MMADDR are placed in a FIFO by the System Agent. This is a limited resource, only 64 entries, of which 20 are reserved for Display and PCH writes, and so we must take care not to queue up too many writes. To avoid this, there is counter which we can poll to ensure there are sufficient free entries in the fifo. "Issuing a write to a full FIFO is not supported; at worst it could result in corruption or a system hang." Reported-and-Tested-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34056 Signed-off-by: Chris Wilson <[email protected]>
2011-03-06Revert "drm/i915: fix corruptions on i8xx due to relaxed fencing"Chris Wilson1-15/+1
This reverts commit c2e0eb167070a6e9dcb49c84c13c79a30d672431. As it turns out, userspace already depends upon being able to enable tiling on existing bo which it promises to be large enough for its purposes i.e. it will not access beyond the end of the last full-tile row. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35016 Reported-and-tested-by: Kamal Mostafa <[email protected]> Signed-off-by: Chris Wilson <[email protected]>
2011-03-05Merge branch 'for-linus' of ↵Linus Torvalds6-50/+72
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: no .snap inside of snapped namespace libceph: fix msgr standby handling libceph: fix msgr keepalive flag libceph: fix msgr backoff libceph: retry after authorization failure libceph: fix handling of short returns from get_user_pages ceph: do not clear I_COMPLETE from d_release ceph: do not set I_COMPLETE Revert "ceph: keep reference to parent inode on ceph_dentry"
2011-03-04mm: use correct numa policy node for transparent hugepagesAndi Kleen2-8/+19
Pass down the correct node for a transparent hugepage allocation. Most callers continue to use the current node, however the hugepaged daemon now uses the previous node of the first to be collapsed page instead. This ensures that khugepaged does not mess up local memory for an existing process which uses local policy. The choice of node is somewhat primitive currently: it just uses the node of the first page in the pmd range. An alternative would be to look at multiple pages and use the most popular node. I used the simplest variant for now which should work well enough for the case of all pages being on the same node. [[email protected]: coding-style fixes] Acked-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04mm: preserve original node for transparent huge page copiesAndi Kleen1-2/+2
This makes a difference for LOCAL policy, where the node cannot be determined from the policy itself, but has to be gotten from the original page. Acked-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04mm: add alloc_page_vma_node()Andi Kleen1-0/+2
Add a alloc_page_vma_node that allows passing the "local" node in. Used in a followon patch. Acked-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04mm: change alloc_pages_vma to pass down the policy node for local policyAndi Kleen3-11/+11
Currently alloc_pages_vma() always uses the local node as policy node for the LOCAL policy. Pass this node down as an argument instead. No behaviour change from this patch, but will be needed for followons. Acked-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04RapidIO: Update MAINTAINERSAlexandre Bounine1-0/+1
Signed-off-by: Alexandre Bounine <[email protected]> Cc: Matt Porter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04drivers/video/backlight/ltv350qv.c: fix a memory leakAxel Lin1-1/+8
Signed-off-by: Axel Lin <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Richard Purdie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04MAINTAINERS: add maintainer of Samsung Mobile Machine supportKyungmin Park1-0/+9
Add maintainer of Samsung Mobile machine support. Currently, Aquila, Goni, Universal (C210), and Nuri board are supported. Signed-off-by: Kyungmin Park <[email protected]> Cc: Joe Perches <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04pps: make pps_gen_parport depend on BROKENThomas Gleixner1-1/+1
This driver causes hard lockups, when the active clock soure is jiffies. The reason is that it loops with interrupts disabled waiting for a timestamp to be reached by polling getnstimeofday(). Though with a jiffies clocksource, when that code runs on the same CPU which is responsible for updating jiffies, then we loop in circles for ever simply because the timer interrupt cannot update jiffies. So both UP and SMP can be affected. There is no easy fix for that problem so make it depend on BROKEN for now. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Rodolfo Giometti <[email protected]> Cc: john stultz <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04drivers/misc/bmp085.c: add MODULE_DEVICE_TABLEAxel Lin1-0/+1
The device table is required to load modules based on modaliases. Signed-off-by: Axel Lin <[email protected]> Cc: Shubhrajyoti D <[email protected]> Cc: Christoph Mair <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04cpuset: add a missing unlock in cpuset_write_resmask()Li Zefan1-2/+5
Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails. [[email protected]: avoid multiple return points] Signed-off-by: Li Zefan <[email protected]> Cc: Paul Menage <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Miao Xie <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04drivers/rtc/rtc-s3c.c: fix prototype for s3c_rtc_setaie()Axel Lin1-5/+7
Fix s3c_rtc_setaie() prototype to eliminate the following compile warning: drivers/rtc/rtc-s3c.c:383: warning: initialization from incompatible pointer type (akpm: the rtc_class_ops.alarm_irq_enable() handler is being passed two arguments where it expects just one, presumably with undesired effects) Signed-off-by: Axel Lin <[email protected]> Cc: Alessandro Zummo <[email protected]> Cc: Ben Dooks <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04Merge branch 'for-linus' of ↵Linus Torvalds2-4/+14
git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: Blackfin: iflush: update anomaly 05000491 workaround Blackfin: outs[lwb]: make sure count is greater than 0
2011-03-04Merge branch 'rmobile-fixes-for-linus' of ↵Linus Torvalds6-15/+27
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: ARM: mach-shmobile: mackerel: modify LCDC clock divider value ARM: mach-shmobile: ap4evb: modify LCDC clock divider value ARM: mach-shmobile: mackerel: fixup memory initialize for zboot ARM: mach-shmobile: ap4evb: fixup memory initialize for zboot ARM: mach-shmobile: Add sh73a0 MIPI-CSI and CEU clocks ARM: mach-shmobile: AG5EVM MIPI-DSI LCD reset delay fix
2011-03-04Merge branch 'sh-fixes-for-linus' of ↵Linus Torvalds4-6/+22
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: sh: Change __nosave_XXX symbols to long sh: Flush executable pages in copy_user_highpage sh: Ensure ST40-300 BogoMIPS value is consistent sh: sh7750: Fix incompatible pointer type sh: sh7750: move machtypes.h to include/generated
2011-03-04Merge branch 'drm-fixes' of ↵Linus Torvalds3-6/+11
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/nouveau: allocate kernel's notifier object at end of block
2011-03-04nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3)Neil Horman1-2/+42
The "bad_page()" page allocator sanity check was reported recently (call chain as follows): bad_page+0x69/0x91 free_hot_cold_page+0x81/0x144 skb_release_data+0x5f/0x98 __kfree_skb+0x11/0x1a tcp_ack+0x6a3/0x1868 tcp_rcv_established+0x7a6/0x8b9 tcp_v4_do_rcv+0x2a/0x2fa tcp_v4_rcv+0x9a2/0x9f6 do_timer+0x2df/0x52c ip_local_deliver+0x19d/0x263 ip_rcv+0x539/0x57c netif_receive_skb+0x470/0x49f :virtio_net:virtnet_poll+0x46b/0x5c5 net_rx_action+0xac/0x1b3 __do_softirq+0x89/0x133 call_softirq+0x1c/0x28 do_softirq+0x2c/0x7d do_IRQ+0xec/0xf5 default_idle+0x0/0x50 ret_from_intr+0x0/0xa default_idle+0x29/0x50 cpu_idle+0x95/0xb8 start_kernel+0x220/0x225 _sinittext+0x22f/0x236 It occurs because an skb with a fraglist was freed from the tcp retransmit queue when it was acked, but a page on that fraglist had PG_Slab set (indicating it was allocated from the Slab allocator (which means the free path above can't safely free it via put_page. We tracked this back to an nfsv4 setacl operation, in which the nfs code attempted to fill convert the passed in buffer to an array of pages in __nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer to a page struct via virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab bit is set. We can't create a buffer with kmalloc and free it later in the tcp ack path with put_page, so we need to either: 1) ensure that when we create the list of pages, no page struct has PG_Slab set or 2) not use a page list to send this data Given that these buffers can be multiple pages and arbitrarily sized, I think (1) is the right way to go. I've written the below patch to allocate a page from the buddy allocator directly and copy the data over to it. This ensures that we have a put_page free-able page for every entry that winds up on an skb frag list, so it can be safely freed when the frame is acked. We do a put page on each entry after the rpc_call_sync call so as to drop our own reference count to the page, leaving only the ref count taken by tcp_sendpages. This way the data will be properly freed when the ack comes in Successfully tested by myself to solve the above oops. Note, as this is the result of a setacl operation that exceeded a page of data, I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm CCing security on this as well. Signed-off-by: Neil Horman <[email protected]> CC: Trond Myklebust <[email protected]> CC: [email protected] CC: Jeff Layton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04ceph: no .snap inside of snapped namespaceSage Weil1-0/+1
Otherwise you can do things like # mkdir .snap/foo # cd .snap/foo/.snap # ls <badness> Signed-off-by: Sage Weil <[email protected]>
2011-03-04libceph: fix msgr standby handlingSage Weil1-8/+22
The standby logic used to be pretty dependent on the work requeueing behavior that changed when we switched to WQ_NON_REENTRANT. It was also very fragile. Restructure things so that: - We clear WRITE_PENDING when we set STANDBY. This ensures we will requeue work when we wake up later. - con_work backs off if STANDBY is set. There is nothing to do if we are in standby. - clear_standby() helper is called by both con_send() and con_keepalive(), the two actions that can wake us up again. Move the connect_seq++ logic here. Signed-off-by: Sage Weil <[email protected]>
2011-03-04libceph: fix msgr keepalive flagSage Weil2-6/+4
There was some broken keepalive code using a dead variable. Shift to using the proper bit flag. Signed-off-by: Sage Weil <[email protected]>
2011-03-04libceph: fix msgr backoffSage Weil2-2/+29
With commit f363e45f we replaced a bunch of hacky workqueue mutual exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is that the exponential backoff breaks in certain cases: * con_work attempts to connect. * we get an immediate failure, and the socket state change handler queues immediate work. * con_work calls con_fault, we decide to back off, but can't queue delayed work. In this case, we add a BACKOFF bit to make con_work reschedule delayed work next time it runs (which should be immediately). Signed-off-by: Sage Weil <[email protected]>
2011-03-04MAINTAINERS: Update shaggy's email addressDave Kleikamp1-1/+1
Signed-off-by: Dave Kleikamp <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04minimal fix for do_filp_open() raceAl Viro1-3/+10
failure exits on the no-O_CREAT side of do_filp_open() merge with those of O_CREAT one; unfortunately, if do_path_lookup() returns -ESTALE, we'll get out_filp:, notice that we are about to return -ESTALE without having trying to create the sucker with LOOKUP_REVAL and jump right into the O_CREAT side of code. And proceed to try and create a file. Usually that'll fail with -ESTALE again, but we can race and get that attempt of pathname resolution to succeed. open() without O_CREAT really shouldn't end up creating files, races or not. The real fix is to rearchitect the whole do_filp_open(), but for now splitting the failure exits will do. Signed-off-by: Al Viro <[email protected]>