aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-11-21ipc,shm: fix shm_file deletion racesGreg Thelen1-5/+23
When IPC_RMID races with other shm operations there's potential for use-after-free of the shm object's associated file (shm_file). Here's the race before this patch: TASK 1 TASK 2 ------ ------ shm_rmid() ipc_lock_object() shmctl() shp = shm_obtain_object_check() shm_destroy() shum_unlock() fput(shp->shm_file) ipc_lock_object() shmem_lock(shp->shm_file) <OOPS> The oops is caused because shm_destroy() calls fput() after dropping the ipc_lock. fput() clears the file's f_inode, f_path.dentry, and f_path.mnt, which causes various NULL pointer references in task 2. I reliably see the oops in task 2 if with shmlock, shmu This patch fixes the races by: 1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock(). 2) modify at risk operations to check shm_file while holding ipc_object_lock(). Example workloads, which each trigger oops... Workload 1: while true; do id=$(shmget 1 4096) shm_rmid $id & shmlock $id & wait done The oops stack shows accessing NULL f_inode due to racing fput: _raw_spin_lock shmem_lock SyS_shmctl Workload 2: while true; do id=$(shmget 1 4096) shmat $id 4096 & shm_rmid $id & wait done The oops stack is similar to workload 1 due to NULL f_inode: touch_atime shmem_mmap shm_mmap mmap_region do_mmap_pgoff do_shmat SyS_shmat Workload 3: while true; do id=$(shmget 1 4096) shmlock $id shm_rmid $id & shmunlock $id & wait done The oops stack shows second fput tripping on an NULL f_inode. The first fput() completed via from shm_destroy(), but a racing thread did a get_file() and queued this fput(): locks_remove_flock __fput ____fput task_work_run do_notify_resume int_signal Fixes: c2c737a0461e ("ipc,shm: shorten critical region for shmat") Fixes: 2caacaa82a51 ("ipc,shm: shorten critical region for shmctl") Signed-off-by: Greg Thelen <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: <[email protected]> # 3.10.17+ 3.11.6+ Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-21mm: thp: give transparent hugepage code a separate copy_pageDave Hansen3-38/+48
Right now, the migration code in migrate_page_copy() uses copy_huge_page() for hugetlbfs and thp pages: if (PageHuge(page) || PageTransHuge(page)) copy_huge_page(newpage, page); So, yay for code reuse. But: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); and a non-hugetlbfs page has no page_hstate(). This works 99% of the time because page_hstate() determines the hstate from the page order alone. Since the page order of a THP page matches the default hugetlbfs page order, it works. But, if you change the default huge page size on the boot command-line (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate so page_hstate() returns null and copy_huge_page() oopses pretty fast since copy_huge_page() dereferences the hstate: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) { ... Mel noticed that the migration code is really the only user of these functions. This moves all the copy code over to migrate.c and makes copy_huge_page() work for THP by checking for it explicitly. I believe the bug was introduced in commit b32967ff101a ("mm: numa: Add THP migration for the NUMA working set scanning fault case") [[email protected]: fix coding-style and comment text, per Naoya Horiguchi] Signed-off-by: Dave Hansen <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Tested-by: Dave Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-21checkpatch: fix "Use of uninitialized value" warningsJoe Perches1-0/+1
checkpatch is currently confused about some complex macros and references undefined variables $stat and $cond. Make sure these are defined before using them. Signed-off-by: Joe Perches <[email protected]> Reported-by: Gerhard Sittig <[email protected]> Acked-by: Andy Whitcroft <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-21configfs: fix race between dentry put and lookupJunxiao Bi1-2/+14
A race window in configfs, it starts from one dentry is UNHASHED and end before configfs_d_iput is called. In this window, if a lookup happen, since the original dentry was UNHASHED, so a new dentry will be allocated, and then in configfs_attach_attr(), sd->s_dentry will be updated to the new dentry. Then in configfs_d_iput(), BUG_ON(sd->s_dentry != dentry) will be triggered and system panic. sys_open: sys_close: ... fput dput dentry_kill __d_drop <--- dentry unhashed here, but sd->dentry still point to this dentry. lookup_real configfs_lookup configfs_attach_attr---> update sd->s_dentry to new allocated dentry here. d_kill configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry) triggered here. To fix it, change configfs_d_iput to not update sd->s_dentry if sd->s_count > 2, that means there are another dentry is using the sd beside the one that is going to be put. Use configfs_dirent_lock in configfs_attach_attr to sync with configfs_d_iput. With the following steps, you can reproduce the bug. 1. enable ocfs2, this will mount configfs at /sys/kernel/config and fill configure in it. 2. run the following script. while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & Signed-off-by: Junxiao Bi <[email protected]> Cc: Joel Becker <[email protected]> Cc: Al Viro <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-21gso: handle new frag_list of frags GRO packetsHerbert Xu1-25/+50
Recently GRO started generating packets with frag_lists of frags. This was not handled by GSO, thus leading to a crash. Thankfully these packets are of a regular form and are easy to handle. This patch handles them in two ways. For completely non-linear frag_list entries, we simply continue to iterate over the frag_list frags once we exhaust the normal frags. For frag_list entries with linear parts, we call pskb_trim on the first part of the frag_list skb, and then process the rest of the frags in the usual way. This patch also kills a chunk of dead frag_list code that has obviously never ever been run since it ends up generating a bogus GSO-segmented packet with a frag_list entry. Future work is planned to split super big packets into TSO ones. Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb") Reported-by: Christoph Paasch <[email protected]> Reported-by: Jerry Chu <[email protected]> Reported-by: Sander Eikelenboom <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Tested-by: Sander Eikelenboom <[email protected]> Tested-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-21GFS2: Fix ref count bug relating to atomic_openSteven Whitehouse1-1/+4
In the case that atomic_open calls finish_no_open() with the dentry that was supplied to gfs2_atomic_open() an extra reference count is required. This patch fixes that issue preventing a bug trap triggering at umount time. Signed-off-by: Steven Whitehouse <[email protected]>
2013-11-21genetlink: fix genl_set_err() group IDJohannes Berg1-0/+3
Fix another really stupid bug - I introduced genl_set_err() precisely to be able to adjust the group and reject invalid ones, but then forgot to do so. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-21genetlink: fix genlmsg_multicast() bugJohannes Berg2-6/+3
Unfortunately, I introduced a tremendously stupid bug into genlmsg_multicast() when doing all those multicast group changes: it adjusts the group number, but then passes it to genlmsg_multicast_netns() which does that again. Somehow, my tests failed to catch this, so add a warning into genlmsg_multicast_netns() and remove the offending group ID adjustment. Also add a warning to the similar code in other functions so people who misuse them are more loudly warned. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-21packet: fix use after free race in send path when dev is releasedDaniel Borkmann2-23/+37
Salam reported a use after free bug in PF_PACKET that occurs when we're sending out frames on a socket bound device and suddenly the net device is being unregistered. It appears that commit 827d9780 introduced a possible race condition between {t,}packet_snd() and packet_notifier(). In the case of a bound socket, packet_notifier() can drop the last reference to the net_device and {t,}packet_snd() might end up suddenly sending a packet over a freed net_device. To avoid reverting 827d9780 and thus introducing a performance regression compared to the current state of things, we decided to hold a cached RCU protected pointer to the net device and maintain it on write side via bind spin_lock protected register_prot_hook() and __unregister_prot_hook() calls. In {t,}packet_snd() path, we access this pointer under rcu_read_lock through packet_cached_dev_get() that holds reference to the device to prevent it from being freed through packet_notifier() while we're in send path. This is okay to do as dev_put()/dev_hold() are per-cpu counters, so this should not be a performance issue. Also, the code simplifies a bit as we don't need need_rls_dev anymore. Fixes: 827d978037d7 ("af-packet: Use existing netdev reference for bound sockets.") Reported-by: Salam Noureddine <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Salam Noureddine <[email protected]> Cc: Ben Greear <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-21xen-netback: stop the VIF thread before unbinding IRQsDavid Vrabel1-3/+3
If the VIF thread is still running after unbinding the Tx and Rx IRQs in xenvif_disconnect(), the thread may attempt to raise an event which will BUG (as the irq is unbound). Signed-off-by: David Vrabel <[email protected]> Acked-by: Wei Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-21wimax: remove dead codeMichael Opdenacker1-1/+0
This removes a code line that is between a "return 0;" and an error label. This code line can never be reached. Found by Coverity (CID: 1130529) Signed-off-by: Michael Opdenacker <[email protected]> Acked-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-21Merge branch 'for-davem' of ↵David S. Miller17-53/+135
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2013-11-21 Please pull this batch of fixes intended for the 3.13 stream! For the Bluetooth bits, Gustavo says: "A few fixes for 3.13. There is 3 fixes to the RFCOMM protocol. One crash fix to L2CAP. A simple fix to a bad behaviour in the SMP protocol." On top of that... Amitkumar Karwar sends a quintet of mwifiex fixes -- two fixes related to failure handling, two memory leak fixes, and a NULL pointer fix. Felix Fietkau corrects and earlier rt2x00 HT descriptor handling fix to address a crash. Geyslan G. Bem fixes a memory leak in brcmfmac. Larry Finger address more pointer arithmetic errors in rtlwifi. Luis R. Rodriguez provides a regulatory fix in the shared ath code. Sujith Manoharan brings a couple ath9k initialization fixes. Ujjal Roy offers one more mwifiex fix to avoid invalid memory accesses when unloading the USB driver. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-11-21Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller8-16/+29
Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains fixes for your net tree, they are: * Remove extra quote from connlimit configuration in Kconfig, from Randy Dunlap. * Fix missing mss option in syn packets sent to the backend in our new synproxy target, from Martin Topholm. * Use window scale announced by client when sending the forged syn to the backend, from Martin Topholm. * Fix IPv6 address comparison in ebtables, from Luís Fernando Cornachioni Estrozi. * Fix wrong endianess in sequence adjustment which breaks helpers in NAT configurations, from Phil Oester. * Fix the error path handling of nft_compat, from me. * Make sure the global conntrack counter is decremented after the object has been released, also from me. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-11-21Documentation: filesystems: update btrfs tools sectionDavid Sterba1-16/+6
The tools mentioned have been obsoleted long ago, replace with the current ones. CC: [email protected] Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-21Documentation: filesystems: add new btrfs mount optionsDavid Sterba1-1/+11
Two new options were added in 3.12: commit and rescan_uuid_tree CC: [email protected] Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-21Merge branch 'master' of ↵John W. Linville17-53/+135
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-11-21Merge tag 'asoc-v3.13-5' of ↵Takashi Iwai5-42/+86
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v3.13 A bunch of device specific fixes, nothing with a general impact here.
2013-11-21ALSA: hda - Add headset quirk for Dell Inspiron 3135David Henningsson1-0/+1
Cc: [email protected] (3.10+) BugLink: https://bugs.launchpad.net/bugs/1253636 Signed-off-by: David Henningsson <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2013-11-21drm/sysfs: fix hotplug regression since lifetime changesDavid Herrmann1-7/+33
airlied: The lifetime changes introduced in 5bdebb183c9702a8c57a01dff09337be3de337a6 tried to use device_create, however that led to the regression where dev->type wasn't getting set correctly. First attempt at fixing it would have led to a race, so this undoes the device_createa work and does it all manually making sure the dev->type is setup before we register the device. Signed-off-by: David Herrmann <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2013-11-21GFS2: fix potential NULL pointer dereferenceMichal Nazarewicz1-1/+2
Commit [e66cf1610: GFS2: Use lockref for glocks] replaced call: atomic_read(&gi->gl->gl_ref) == 0 with: __lockref_is_dead(&gl->gl_lockref) therefore changing how gl is accessed, from gi->gl to plan gl. However, gl can be a NULL pointer, and so gi->gl needs to be used instead (which is guaranteed not to be NULL because fo the while loop checking that condition). Signed-off-by: Michal Nazarewicz <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2013-11-21KVM: kvm_clear_guest_page(): fix empty_zero_page usageHeiko Carstens1-2/+3
Using the address of 'empty_zero_page' as source address in order to clear a page is wrong. On some architectures empty_zero_page is only the pointer to the struct page of the empty_zero_page. Therefore the clear page operation would copy the contents of a couple of struct pages instead of clearing a page. For kvm only arm/arm64 are affected by this bug. To fix this use the ZERO_PAGE macro instead which will return the struct page address of the empty_zero_page on all architectures. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-11-21drm/exynos: g2d: fix memory leak to userptrInki Dae1-0/+2
This patch releases a vma object when cleaning up userptr resources. A new vma object was allocated and copied when getting userptr pages so the new vma object should be freed properly if the userptr pages aren't used anymore. Signed-off-by: Inki Dae <[email protected]> Signed-off-by: Kyungmin Park <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2013-11-21Merge branch 'ttm-fixes-3.13' of git://people.freedesktop.org/~thomash/linux ↵Dave Airlie4-10/+62
into drm-fixes The set_need_resched() removal fix and yet another fix in ttm_bo_move_memcpy(). * 'ttm-fixes-3.13' of git://people.freedesktop.org/~thomash/linux: drm/ttm: Remove set_need_resched from the ttm fault handler drm/ttm: Don't move non-existing data
2013-11-21Merge branch 'vmwgfx-fixes-3.13' of ↵Dave Airlie10-70/+533
git://people.freedesktop.org/~thomash/linux into drm-fixes Below is a fix for a false lockep warning, and the vmwgfx prime implementation. * 'vmwgfx-fixes-3.13' of git://people.freedesktop.org/~thomash/linux: drm/vmwgfx: Make vmwgfx dma buffers prime aware drm/vmwgfx: Make surfaces prime-aware drm/vmwgfx: Hook up the prime ioctls drm/ttm: Add a minimal prime implementation for ttm base objects drm/vmwgfx: Fix false lockdep warning drm/ttm: Allow execbuf util reserves without ticket
2013-11-21Merge tag 'drm-intel-fixes-2013-11-20' of ↵Dave Airlie9-24/+81
git://people.freedesktop.org/~danvet/drm-intel into drm-fixes Just a small pile of fixes for bugs and a few regressions. I'm still trying to track down a driver load hang on my g33 (which infuriatingly doesn't happen when loading the module manually after boot), somehow bisecting loves to go astray on this one :( And there's a (harmless) locking WARN in the suspend code due to one of Jesse's vlv backlight rework patches. Otherwise nothing outstanding afaik. * tag 'drm-intel-fixes-2013-11-20' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: Fix gen3 self-refresh watermarks drm/i915: Replicate BIOS eDP bpp clamping hack for hsw drm/i915: Do not enable package C8 on unsupported hardware drm/i915: Hold pc8 lock around toggling pc8.gpu_idle drm/i915: encoder->get_config is no longer optional drm/i915/tv: add ->get_config callback drm/i915: restore the early forcewake cleanup Partially revert "drm/i915: tune the RC6 threshold for stability" drm/i915: flush cursors harder i915: Use 120MHz LVDS SSC clock for gen5/gen6/gen7 x86/early quirk: use gen6 stolen detection for VLV drm/i915/dp: set sink to power down mode on dp disable
2013-11-21Merge branch 'drm-next-3.13' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie29-213/+346
into drm-fixes More fixes for radeon. This adds new queries for tiling on CIK, and fixes a crash in handling acpi atif backlight events on CIK. Some fixes for radeon for 3.13. Mostly CI stability fixes. I think I've tracked down the stability problems with dpm on Trinity/Richland, so I'm going to enable that by default now. * 'drm-next-3.13' of git://people.freedesktop.org/~agd5f/linux: drm/radeon: hook up backlight functions for CI and KV family. drm/radeon/cik: Add macrotile mode array query drm/radeon/cik: Return backend map information to userspace drm/radeon: enable DPM by default in TN asics drm/radeon: adjust TN dpm parameters for stability (v2) drm/radeon: use a single doorbell for cik kms compute drm/radeon/vm: don't attempt to update ptes if ib allocation fails drm/radeon: disable CIK CP semaphores for now drm/radeon: allow semaphore emission to fail drm/radeon: add semaphore trace point radeon: workaround pinning failure on low ram gpu radeon/i2c: do not count reg index in number of i2c byte we are writing. drm/radeon: cypress_dpm: Fix unused variable warning when CONFIG_ACPI=n drm: radeon: ni_dpm: Fix unused variable warning when CONFIG_ACPI=n
2013-11-21ALSA: hda - Fix the headphone jack detection on Sony VAIO TXTakashi Iwai1-0/+1
BIOS sets MISC_NO_PRESENCE bit wrongly to the pin config on NID 0x0f. Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2013-11-21ALSA: hda - Fix missing bass speaker on ASUS N550Takashi Iwai1-0/+16
The laptop has a built-in speaker on NID 0x1a. It's an LFE only on the right channel, so we need to provide an explicit chmap, too. There might be other surround speakers, but they can fixed in addition at later point, so let's fix the easier bass speaker at first. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65091 Signed-off-by: Takashi Iwai <[email protected]>
2013-11-20iscsi-target: chap auth shouldn't match username with trailing garbageEric Seppanen1-1/+4
In iSCSI negotiations with initiator CHAP enabled, usernames with trailing garbage are permitted, because the string comparison only checks the strlen of the configured username. e.g. "usernameXXXXX" will be permitted to match "username". Just check one more byte so the trailing null char is also matched. Signed-off-by: Eric Seppanen <[email protected]> Cc: <[email protected]> #3.1+ Signed-off-by: Nicholas Bellinger <[email protected]>
2013-11-20iscsi-target: fix extract_param to handle buffer length corner caseEric Seppanen1-1/+1
extract_param() is called with max_length set to the total size of the output buffer. It's not safe to allow a parameter length equal to the buffer size as the terminating null would be written one byte past the end of the output buffer. Signed-off-by: Eric Seppanen <[email protected]> Cc: <[email protected]> #3.1+ Signed-off-by: Nicholas Bellinger <[email protected]>
2013-11-20net/phy: Add the autocross feature for forced links on VSC82x4Madalin Bucur3-7/+73
Add auto-MDI/MDI-X capability for forced (autonegotiation disabled) 10/100 Mbps speeds on Vitesse VSC82x4 PHYs. Exported previously static function genphy_setup_forced() required by the new config_aneg handler in the Vitesse PHY module. Signed-off-by: Madalin Bucur <[email protected]> Signed-off-by: Shruti Kanetkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-20net/phy: Add VSC8662 supportSandeep Singh1-0/+14
Vitesse VSC8662 is Dual Port 10/100/1000Base-T Phy Its register set and features are similar to other Vitesse Phys. Signed-off-by: Sandeep Singh <[email protected]> Signed-off-by: Andy Fleming <[email protected]> Signed-off-by: Shruti Kanetkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-20net/phy: Add VSC8574 supportshaohui xie1-1/+16
The VSC8574 is a quad-port Gigabit Ethernet transceiver with four SerDes interfaces for quad-port dual media capability. Signed-off-by: Shaohui Xie <[email protected]> Signed-off-by: Andy Fleming <[email protected]> Signed-off-by: Shruti Kanetkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-20net/phy: Add VSC8234 supportAndy Fleming1-2/+17
Vitesse VSC8234 is quad port 10/100/1000BASE-T PHY with SGMII and SERDES MAC interfaces. Signed-off-by: Andy Fleming <[email protected]> Signed-off-by: Kumar Gala <[email protected]> Signed-off-by: Shruti Kanetkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-20net: add BUG_ON if kernel advertises msg_namelen > sizeof(struct ↵Hannes Frederic Sowa1-1/+2
sockaddr_storage) In that case it is probable that kernel code overwrote part of the stack. So we should bail out loudly here. The BUG_ON may be removed in future if we are sure all protocols are conformant. Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-20net: rework recvmsg handler msg_name and msg_namelen logicHannes Frederic Sowa35-115/+67
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <[email protected]> Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-20btrfs: update kconfig help textDavid Sterba1-5/+10
Reflect the current status. Portions of the text taken from the wiki pages. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20btrfs: fix bio_size_ok() for max_sectors > 0xffffAkinobu Mita1-1/+1
The data type of max_sectors in queue settings is unsigned int. But this value is stored to the local variable whose type is unsigned short in bio_size_ok(). This can cause unexpected result when max_sectors > 0xffff. Cc: Chris Mason <[email protected]> Cc: [email protected] Signed-off-by: Akinobu Mita <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20btrfs: Use trace condition for get_extent tracepointSteven Rostedt2-3/+4
Doing an if statement to test some condition to know if we should trigger a tracepoint is pointless when tracing is disabled. This just adds overhead and wastes a branch prediction. This is why the TRACE_EVENT_CONDITION() was created. It places the check inside the jump label so that the branch does not happen unless tracing is enabled. That is, instead of doing: if (em) trace_btrfs_get_extent(root, em); Which is basically this: if (em) if (static_key(trace_btrfs_get_extent)) { Using a TRACE_EVENT_CONDITION() we can just do: trace_btrfs_get_extent(root, em); And the condition trace event will do: if (static_key(trace_btrfs_get_extent)) { if (em) { ... The static key is a non conditional jump (or nop) that is faster than having to check if em is NULL or not. Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20btrfs: fix typo in the log messageAnand Jain1-1/+1
Signed-off-by: Anand Jain <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: fix list delete warning when removing ordered root from the listMiao Xie1-0/+1
Commit b02441999efcc6152b87cd58e7970bb7843f76cf "Btrfs: don't wait for the completion of all the ordered extents" introduced a bug that broke the ordered root list: WARNING: CPU: 1 PID: 7119 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98() It is because we forgot to return the roots in the splice list to the ordered list of the fs. Fix it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: print bytenr instead of page pointer in check-intStefan Behrens1-8/+17
The page pointer information was useless. The bytenr is what you want when you search for submitted write bios. Additionally, a new bit in the print mask is added that allows to selectively enable the check-int submit_bio verbose mode. Before, the global verbose mode had to be enabled leading to many million useless lines in the kernel log. And a comment is added that explains that LOG_BUF_SHIFT needs to be set to a really high value. Signed-off-by: Stefan Behrens <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: remove dead codes from ctree.hWang Shilong1-6/+0
These two functions are only stated but undefined. Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: don't wait for ordered data outside desired rangeFilipe David Borba Manana1-1/+1
In btrfs_wait_ordered_range(), if we found an extent to the left of the start of our desired wait range and the last byte of that extent is 1 less than the desired range's start, we would would wait for the IO completion of that extent unnecessarily. Signed-off-by: Filipe David Borba Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: fix lockdep error in async commitLiu Bo1-2/+2
Lockdep complains about btrfs's async commit: [ 2372.462171] [ BUG: bad unlock balance detected! ] [ 2372.462191] 3.12.0+ #32 Tainted: G W [ 2372.462209] ------------------------------------- [ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at: [ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462305] but there are no more locks to release! [ 2372.462324] [ 2372.462324] other info that might help us debug this: [ 2372.462349] no locks held by ceph-osd/14048. [ 2372.462367] [ 2372.462367] stack backtrace: [ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G W 3.12.0+ #32 [ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 2372.462455] ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320 [ 2372.462491] ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650 [ 2372.462526] ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000 [ 2372.462562] Call Trace: [ 2372.462584] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462619] [<ffffffff816f094a>] dump_stack+0x45/0x56 [ 2372.462642] [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100 [ 2372.462677] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462710] [<ffffffff810b01ee>] lock_release+0x18e/0x210 [ 2372.462742] [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs] [ 2372.462783] [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs] [ 2372.462822] [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs] [ 2372.462849] [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0 [ 2372.462873] [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0 [ 2372.462897] [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100 [ 2372.462922] [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0 [ 2372.462946] [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0 [ 2372.462969] [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0 [ 2372.462991] [<ffffffff817045a4>] tracesys+0xdd/0xe2 ==================================================== It's because that we don't do the right thing when checking if it's ok to tell lockdep that we're trying to release the rwsem. If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze rwsem, which makes lockdep complains a lot. Reported-by: Ma Jianpeng <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: avoid heavy operations in btrfs_commit_superLiu Bo1-20/+1
The 'git blame' history shows that, the old transaction commit code has to do twice to ensure roots are updated and we have to flush metadata and super block manually, however, right now all of these can be handled well inside the transaction commit code without extra efforts. And the error handling part remains same with the current code, -- 'return to caller once we get error'. This saves us a transaction commit and a flush of super block, which are both heavy operations according to ftrace output analysis. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: fix __btrfs_start_workers retvalIlya Dryomov1-0/+1
__btrfs_start_workers returns 0 in case it raced with btrfs_stop_workers and lost the race. This is wrong because worker in this case is not allowed to start and is in fact destroyed. Return -EINVAL instead. Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: disable online raid-repair on ro mountsIlya Dryomov1-3/+8
This disables the "if needed, write the good copy back before the read is completed" part of the read sequence for read-only mounts. Cc: Jan Schmidt <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: do not inc uncorrectable_errors counter on ro scrubsIlya Dryomov1-2/+4
Currently if we discover an error when scrubbing in ro mode we a) blindly increment the uncorrectable_errors counter, and b) spam the dmesg with the 'unable to fixup (regular) error at ...' message, even though a) we haven't tried to determine if the error is correctable or not, and b) we haven't tried to fixup anything. Fix this. Cc: Stefan Behrens <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-20Btrfs: only drop modified extents if we logged the whole inodeJosef Bacik1-1/+1
If we fsync, seek and write, rename and then fsync again we will lose the modified hole extent because the rename will drop all of the modified extents since we didn't do the fast search. We need to only drop the modified extents if we didn't do the fast search and we were logging the entire inode as we don't need them anymore, otherwise this is being premature. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>