aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-10-30ixgbe: need not repeat init skb with NULLJunwei Zhang1-1/+1
Signed-off-by: Martin Zhang <[email protected]> Tested-by: Phil Schmitt <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2014-10-30igb: don't reuse pages with pfmemalloc flagRoman Gushchin1-1/+5
Incoming packet is dropped silently by sk_filter(), if the skb was allocated from pfmemalloc reserves and the corresponding socket is not marked with the SOCK_MEMALLOC flag. Igb driver allocates pages for DMA with __skb_alloc_page(), which calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case of OOM condition, igb can get pages with pfmemalloc flag set. If an incoming packet hits the pfmemalloc page and is large enough (small packets are copying into the memory, allocated with netdev_alloc_skb_ip_align(), so they are not affected), it will be dropped. This behavior is ok under high memory pressure, but the problem is that the igb driver reuses these mapped pages. So, packets are still dropping even if all memory issues are gone and there is a plenty of free memory. In my case, some TCP sessions hang on a small percentage (< 0.1%) of machines days after OOMs. Fix this by avoiding reuse of such pages. Signed-off-by: Roman Gushchin <[email protected]> Tested-by: Aaron Brown "[email protected]" Signed-off-by: Jeff Kirsher <[email protected]>
2014-10-30e1000: unset IFF_UNICAST_FLT on WMware 82545EMFrancesco Ruggeri1-1/+4
VMWare's e1000 implementation does not seem to support unicast filtering. This can be observed by configuring a macvlan interface on eth0 in a VM in VMWare Fusion 5.0.5, and trying to use that interface instead of eth0. Tested on 3.16. Signed-off-by: Francesco Ruggeri <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2014-10-30[media] rc5-decoder: BZ#85721: Fix RC5-SZ decodingMauro Carvalho Chehab1-1/+1
Changeset e87b540be2dd broke RC5-SZ decoding, as it forgot to add the extra bit check for the enabled protocols at the beginning of the logic. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Acked-by: David Härdeman <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2014-10-30[media] rc-core: fix protocol_change regression in ir_raw_event_registerTomas Melin2-1/+2
IR receiver using nuvoton-cir and lirc required additional configuration steps after upgrade from kernel 3.16 to 3.17-rcX. Bisected regression to commit da6e162d6a4607362f8478c715c797d84d449f8b ("[media] rc-core: simplify sysfs code"). The regression comes from adding function change_protocol in ir-raw.c. It changes behaviour so that only the protocol enabled by driver's map_name will be active after registration. This breaks user space behaviour, lirc does not get key press signals anymore. Enable lirc protocol by default for ir raw decoders to restore original behaviour. Cc: [email protected] # for v3.17 Signed-off-by: Tomas Melin <[email protected]> Acked-by: David Härdeman <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2014-10-30rbd: Fix error recovery in rbd_obj_read_sync()Jan Kara1-1/+1
When we fail to allocate page vector in rbd_obj_read_sync() we just basically ignore the problem and continue which will result in an oops later. Fix the problem by returning proper error. CC: Yehuda Sadeh <[email protected]> CC: Sage Weil <[email protected]> CC: [email protected] CC: [email protected] Coverity-id: 1226882 Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2014-10-30libceph: use memalloc flags for net IOMike Christie1-1/+9
This patch has ceph's lib code use the memalloc flags. If the VM layer needs to write data out to free up memory to handle new allocation requests, the block layer must be able to make forward progress. To handle that requirement we use structs like mempools to reserve memory for objects like bios and requests. The problem is when we send/receive block layer requests over the network layer, net skb allocations can fail and the system can lock up. To solve this, the memalloc related flags were added. NBD, iSCSI and NFS uses these flags to tell the network/vm layer that it should use memory reserves to fullfill allcation requests for structs like skbs. I am running ceph in a bunch of VMs in my laptop, so this patch was not tested very harshly. Signed-off-by: Mike Christie <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]>
2014-10-30rbd: use a single workqueue for all devicesIlya Dryomov1-15/+18
Using one queue per device doesn't make much sense given that our workfn processes "devices" and not "requests". Switch to a single workqueue for all devices. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2014-10-30ALSA: hda/realtek - Update Initial AMP for EAPD controlKailang Yang1-24/+10
The default EAPD control uses verb command to control EAPD. Some codec does not have verb command for EAPD. It needs to control by hidden register. This update will avoid wrong behavior for some codec. This patch will fix double setup for EAPD. It just needs to turn on by one site for verb command or hidden register controlled. Detailed changes: - alc889_coef_init() is replaced with alc_update_coef_idx() with a correct COEF value. - for ALC262, ALC887 and ALC900, the EAPD setup via the hidden register is removed because this rather conflicts with the EAPD verb setup. - For ALC888-VC, also the hidden register access is removed in alc888_coef_init(). - Remove the dead #if 0 code for ALC267/ALC268. Signed-off-by: Kailang Yang <[email protected]> Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2014-10-30ALSA: hda - change three SSID quirks to one pin quirkDavid Henningsson1-3/+13
These three HP machines all have the same pin config, so we can change it to a pin quirk. Signed-off-by: David Henningsson <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2014-10-30ALSA: hda - Set GPIO 4 low for a few HP machinesDavid Henningsson1-4/+29
These HP machines needs GPIO 4 low to enable the headphone amplifier. In addition, we still need to control LEDs via vref and GPIO. Cc: [email protected] BugLink: https://bugs.launchpad.net/bugs/1387128 Tested-by: TienFu Chen <[email protected]> Signed-off-by: David Henningsson <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2014-10-30ALSA: hda - Add ultra dock support for Thinkpad X240.Lukas Bossard1-1/+1
Adding ultra doch support for Lenovo Thinkpad X240 (17aa:2214). [Actually replaced the entry with ALC292_FIXUP_TPT440_DOCK -- tiwai] Signed-off-by: Lukas Bossard <[email protected]> Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2014-10-30Merge branch 'urgent-for-mingo' of ↵Ingo Molnar5-15/+66
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull two RCU fixes from Paul E. McKenney: " - Complete the work of commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and expedited grace periods), which was intended to allow synchronize_sched_expedited() to be safely used when holding locks acquired by CPU-hotplug notifiers. This commit makes the put_online_cpus() avoid the deadlock instead of just handling the get_online_cpus(). - Complete the work of commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs), which was intended to allow RCU to avoid allocating unneeded kthreads on systems where the firmware says that there are more CPUs than are really present. This commit makes rcu_barrier() aware of the mismatch, so that it doesn't hang waiting for non-existent CPUs. " Signed-off-by: Ingo Molnar <[email protected]>
2014-10-30Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar7-48/+131
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix report -F (abort, in_tx, mispredict, etc) segfaults for sample.data files without branch info (Jiri Olsa) - Add patch that should have went in a previous patchkit to use global cache provided by libunwind (Namhyung Kim) - Make CPUINFO_PROC an array to support different kernels, problem detected when the information reported via /proc/cpuinfo changed on ARM (Wang Nan) - 'perf probe' --demangle typo fix and a new --quiet option (Masami Hiramatsu) Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2014-10-30powerpc/fadump: Fix endianess issues in firmware assisted dump handlingHari Bathini3-85/+95
Firmware-assisted dump (fadump) kernel code is not endian safe. The below patch fixes this issue. Tested this patch with upstream kernel. Below output shows crash tool successfully opening LE fadump vmcore. # crash vmlinux vmcore GNU gdb (GDB) 7.6 This GDB was configured as "powerpc64le-unknown-linux-gnu"... KERNEL: vmlinux DUMPFILE: vmcore CPUS: 16 DATE: Wed Dec 31 19:00:00 1969 UPTIME: 00:03:28 LOAD AVERAGE: 0.46, 0.86, 0.41 TASKS: 268 NODENAME: linux-dhr2 RELEASE: 3.17.0-rc5-7-default VERSION: #6 SMP Tue Sep 30 01:06:34 EDT 2014 MACHINE: ppc64le (4116 Mhz) MEMORY: 40 GB PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details) PID: 6223 COMMAND: "bash" TASK: c0000009661b2500 [THREAD_INFO: c000000967ac0000] CPU: 2 STATE: TASK_RUNNING (PANIC) Signed-off-by: Hari Bathini <[email protected]> [mpe: Make the comment in pSeries_lpar_hptab_clear() clearer] Signed-off-by: Michael Ellerman <[email protected]>
2014-10-29mtd: cfi_cmdset_0001.c: fix resume for LH28F640BF chipsDmitry Eremin-Solenikov1-0/+2
After '#echo mem > /sys/power/state' some devices can not be properly resumed because apparently the MTD Partition Configuration Register has been reset to default thus the rootfs cannot be mounted cleanly on resume. An example of this can be found in the SA-1100 Developer's Manual at 9.5.3.3 where the second step of the Sleep Shutdown Sequence is described: "An internal reset is applied to the SA-1100. All units are reset...". As workaround we refresh the PCR value as done initially on chip setup. This behavior and the fix are confirmed by our tests done on 2 different Zaurus collie units with kernel 3.17. Fixes: 812c5fa82bae: ("mtd: cfi_cmdset_0001.c: add support for Sharp LH28F640BF NOR") Signed-off-by: Dmitry Eremin-Solenikov <[email protected]> Signed-off-by: Andrea Adami <[email protected]> Cc: <[email protected]> # 3.16+ Signed-off-by: Brian Norris <[email protected]>
2014-10-29mtd: omap: fix mtd devices not showing upFrans Klaver1-1/+1
Since commit 6d178ef2fd5e ("mtd: nand: Move ELM driver and rename as omap_elm"), I don't have any mtd devices present on my am335x. This changes the link order of the omap_elm and omap2 objects, causing them to probe in the wrong order. To fix this, make elm_config defer probing until the omap_elm driver is actually loaded. Signed-off-by: Frans Klaver <[email protected]> Acked-by: Roger Quadros <[email protected]> Signed-off-by: Brian Norris <[email protected]>
2014-10-30ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546Adam Lee1-0/+8
The wireless hotkey of Dell Vostro 3546 does not work with Win8 OSI. Due to insufficient documentation for the driver implementation, blacklist it as a workaround. Signed-off-by: Adam Lee <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2014-10-30powerpc: Fix section mismatch warningFabian Frederick1-1/+1
Add __init to MMU_setup() which uses __initdata boot_command_line. Also MMU_setup() is only called from MMU_init(), which is also __init. Warning appeared since commit 3e47d1474c2b. Fixes: 3e47d1474c2b ("powerpc: Remove powerpc specific cmd_line") Signed-off-by: Fabian Frederick <[email protected]> [mpe: Update changelog] Signed-off-by: Michael Ellerman <[email protected]>
2014-10-29Merge branch 'akpm' (incoming from Andrew Morton)Linus Torvalds27-349/+394
Merge misc fixes from Andrew Morton: "21 fixes" * emailed patches from Andrew Morton <[email protected]>: (21 commits) mm/balloon_compaction: fix deflation when compaction is disabled sh: fix sh770x SCIF memory regions zram: avoid NULL pointer access in concurrent situation mm/slab_common: don't check for duplicate cache names ocfs2: fix d_splice_alias() return code checking mm: rmap: split out page_remove_file_rmap() mm: memcontrol: fix missed end-writeback page accounting mm: page-writeback: inline account_page_dirtied() into single caller lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}() drivers/rtc/rtc-bq32k.c: fix register value memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock kernel/kmod: fix use-after-free of the sub_info structure drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc mm, thp: fix collapsing of hugepages on madvise drivers: of: add return value to of_reserved_mem_device_init() mm: free compound page with correct order gcov: add ARM64 to GCOV_PROFILE_ALL fsnotify: next_i is freed during fsnotify_unmount_inodes. mm/compaction.c: avoid premature range skip in isolate_migratepages_range ...
2014-10-30xfs: rework zero range to prevent invalid i_size updatesBrian Foster1-52/+20
The zero range operation is analogous to fallocate with the exception of converting the range to zeroes. E.g., it attempts to allocate zeroed blocks over the range specified by the caller. The XFS implementation kills all delalloc blocks currently over the aligned range, converts the range to allocated zero blocks (unwritten extents) and handles the partial pages at the ends of the range by sending writes through the pagecache. The current implementation suffers from several problems associated with inode size. If the aligned range covers an extending I/O, said I/O is discarded and an inode size update from a previous write never makes it to disk. Further, if an unaligned zero range extends beyond eof, the page write induced for the partial end page can itself increase the inode size, even if the zero range request is not supposed to update i_size (via KEEP_SIZE, similar to an fallocate beyond EOF). The latter behavior not only incorrectly increases the inode size, but can lead to stray delalloc blocks on the inode. Typically, post-eof preallocation blocks are either truncated on release or inode eviction or explicitly written to by xfs_zero_eof() on natural file size extension. If the inode size increases due to zero range, however, associated blocks leak into the address space having never been converted or mapped to pagecache pages. A direct I/O to such an uncovered range cannot convert the extent via writeback and will BUG(). For example: $ xfs_io -fc "pwrite 0 128k" -c "fzero -k 1m 54321" <file> ... $ xfs_io -d -c "pread 128k 128k" <file> <BUG> If the entire delalloc extent happens to not have page coverage whatsoever (e.g., delalloc conversion couldn't find a large enough free space extent), even a full file writeback won't convert what's left of the extent and we'll assert on inode eviction. Rework xfs_zero_file_space() to avoid buffered I/O for partial pages. Use the existing hole punch and prealloc mechanisms as primitives for zero range. This implementation is not efficient nor ideal as we writeback dirty data over the range and remove existing extents rather than convert to unwrittern. The former writeback, however, is currently the only mechanism available to ensure consistency between pagecache and extent state. Even a pagecache truncate/delalloc punch prior to hole punch has lead to inconsistencies due to racing with writeback. This provides a consistent, correct implementation of zero range that survives fsstress/fsx testing without assert failures. The implementation can be optimized from this point forward once the fundamental issue of pagecache and delalloc extent state consistency is addressed. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-10-30mm: Remove false WARN_ON from pagecache_isize_extended()Jan Kara1-1/+0
The WARN_ON checking whether i_mutex is held in pagecache_isize_extended() was wrong because some filesystems (e.g. XFS) use different locks for serialization of truncates / writes. So just remove the check. Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-10-30xfs: Check error during inode btree iteration in xfs_bulkstat()Jan Kara1-0/+4
xfs_bulkstat() doesn't check error return from xfs_btree_increment(). In case of specific fs corruption that could result in xfs_bulkstat() entering an infinite loop because we would be looping over the same chunk over and over again. Fix the problem by checking the return value and terminating the loop properly. Coverity-id: 1231338 cc: <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-10-29mm/balloon_compaction: fix deflation when compaction is disabledKonstantin Khlebnikov1-0/+2
If CONFIG_BALLOON_COMPACTION=n balloon_page_insert() does not link pages with balloon and doesn't set PagePrivate flag, as a result balloon_page_dequeue() cannot get any pages because it thinks that all of them are isolated. Without balloon compaction nobody can isolate ballooned pages. It's safe to remove this check. Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management"). Signed-off-by: Konstantin Khlebnikov <[email protected]> Reported-by: Matt Mullins <[email protected]> Cc: <[email protected]> [3.17] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29sh: fix sh770x SCIF memory regionsAndriy Skulysh1-3/+3
Resources scif1_resources & scif2_resources overlap. Actual SCIF region size is 0x10. This is regression from commit d850acf975be ("sh: Declare SCIF register base and IRQ as resources") Signed-off-by: Andriy Skulysh <[email protected]> Acked-by: Laurent Pinchart <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29zram: avoid NULL pointer access in concurrent situationWeijie Yang1-4/+6
There is a rare NULL pointer bug in mem_used_total_show() and mem_used_max_store() in concurrent situation, like this: zram is not initialized, process A is a mem_used_total reader which runs periodically, while process B try to init zram. process A process B access meta, get a NULL value init zram, done init_done() is true access meta->mem_pool, get a NULL pointer BUG This patch fixes this issue. Signed-off-by: Weijie Yang <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29mm/slab_common: don't check for duplicate cache namesMikulas Patocka1-10/+0
The SLUB cache merges caches with the same size and alignment and there was long standing bug with this behavior: - create the cache named "foo" - create the cache named "bar" (which is merged with "foo") - delete the cache named "foo" (but it stays allocated because "bar" uses it) - create the cache named "foo" again - it fails because the name "foo" is already used That bug was fixed in commit 694617474e33 ("slab_common: fix the check for duplicate slab names") by not warning on duplicate cache names when the SLUB subsystem is used. Recently, cache merging was implemented the with SLAB subsystem too, in 12220dea07f1 ("mm/slab: support slab merge")). Therefore we need stop checking for duplicate names even for the SLAB subsystem. This patch fixes the bug by removing the check. Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29ocfs2: fix d_splice_alias() return code checkingRichard Weinberger1-1/+1
d_splice_alias() can return a valid dentry, NULL or an ERR_PTR. Currently the code checks not for ERR_PTR and will cuase an oops in ocfs2_dentry_attach_lock(). Fix this by using IS_ERR_OR_NULL(). Signed-off-by: Richard Weinberger <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29mm: rmap: split out page_remove_file_rmap()Johannes Weiner1-32/+46
page_remove_rmap() has too many branches on PageAnon() and is hard to follow. Move the file part into a separate function. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29mm: memcontrol: fix missed end-writeback page accountingJohannes Weiner4-109/+96
Commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") changed page migration to uncharge the old page right away. The page is locked, unmapped, truncated, and off the LRU, but it could race with writeback ending, which then doesn't unaccount the page properly: test_clear_page_writeback() migration wait_on_page_writeback() TestClearPageWriteback() mem_cgroup_migrate() clear PCG_USED mem_cgroup_update_page_stat() if (PageCgroupUsed(pc)) decrease memcg pages under writeback release pc->mem_cgroup->move_lock The per-page statistics interface is heavily optimized to avoid a function call and a lookup_page_cgroup() in the file unmap fast path, which means it doesn't verify whether a page is still charged before clearing PageWriteback() and it has to do it in the stat update later. Rework it so that it looks up the page's memcg once at the beginning of the transaction and then uses it throughout. The charge will be verified before clearing PageWriteback() and migration can't uncharge the page as long as that is still set. The RCU lock will protect the memcg past uncharge. As far as losing the optimization goes, the following test results are from a microbenchmark that maps, faults, and unmaps a 4GB sparse file three times in a nested fashion, so that there are two negative passes that don't account but still go through the new transaction overhead. There is no actual difference: old: 33.195102545 seconds time elapsed ( +- 0.01% ) new: 33.199231369 seconds time elapsed ( +- 0.03% ) The time spent in page_remove_rmap()'s callees still adds up to the same, but the time spent in the function itself seems reduced: # Children Self Command Shared Object Symbol old: 0.12% 0.11% filemapstress [kernel.kallsyms] [k] page_remove_rmap new: 0.12% 0.08% filemapstress [kernel.kallsyms] [k] page_remove_rmap Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> [3.17.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29mm: page-writeback: inline account_page_dirtied() into single callerJohannes Weiner2-20/+4
A follow-up patch would have changed the call signature. To save the trouble, just fold it instead. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> [3.17.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()Jan Kara1-2/+6
If __bitmap_shift_left() or __bitmap_shift_right() are asked to shift by a multiple of BITS_PER_LONG, they will try to shift a long value by BITS_PER_LONG bits which is undefined. Change the functions to avoid the undefined shift. Coverity id: 1192175 Coverity id: 1192174 Signed-off-by: Jan Kara <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29drivers/rtc/rtc-bq32k.c: fix register valuePavel Machek1-1/+1
Fix register value in bq32000 trickle charging. Mike reported that I'm using wrong value in one trickle-charging case, and after checking docs, I must admit he's right. Signed-off-by: Pavel Machek <[email protected]> Reported-by: Mike Bremford <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()Yasuaki Ishimatsu1-5/+0
When hot adding the same memory after hot removal, the following messages are shown: WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426() ... Call Trace: dump_stack+0x46/0x58 warn_slowpath_common+0x81/0xa0 warn_slowpath_null+0x1a/0x20 free_area_init_node+0x3fe/0x426 hotadd_new_pgdat+0x90/0x110 add_memory+0xd4/0x200 acpi_memory_device_add+0x1aa/0x289 acpi_bus_attach+0xfd/0x204 acpi_bus_attach+0x178/0x204 acpi_bus_scan+0x6a/0x90 acpi_device_hotplug+0xe8/0x418 acpi_hotplug_work_fn+0x1f/0x2b process_one_work+0x14e/0x3f0 worker_thread+0x11b/0x510 kthread+0xe1/0x100 ret_from_fork+0x7c/0xb0 The detaled explanation is as follows: When hot removing memory, pgdat is set to 0 in try_offline_node(). But if the pgdat is allocated by bootmem allocator, the clearing step is skipped. And when hot adding the same memory, the uninitialized pgdat is reused. But free_area_init_node() checks wether pgdat is set to zero. As a result, free_area_init_node() hits WARN_ON(). This patch clears pgdat which is allocated by bootmem allocator in try_offline_node(). Signed-off-by: Yasuaki Ishimatsu <[email protected]> Cc: Zhang Zhen <[email protected]> Cc: Wang Nan <[email protected]> Cc: Tang Chen <[email protected]> Reviewed-by: Toshi Kani <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clockMarek Szyprowski1-6/+8
Fix unconditional initialization failure on non-exynos3250 SoCs. Commit df9e26d093d3 ("rtc: s3c: add support for RTC of Exynos3250 SoC") introduced rtc source clock support, but also added initialization failure on SoCs, which doesn't need such clock. Signed-off-by: Marek Szyprowski <[email protected]> Reviewed-by: Chanwoo Choi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29kernel/kmod: fix use-after-free of the sub_info structureMartin Schwidefsky1-39/+37
Found this in the message log on a s390 system: BUG kmalloc-192 (Not tainted): Poison overwritten Disabling lock debugging due to kernel taint INFO: 0x00000000684761f4-0x00000000684761f7. First byte 0xff instead of 0x6b INFO: Allocated in call_usermodehelper_setup+0x70/0x128 age=71 cpu=2 pid=648 __slab_alloc.isra.47.constprop.56+0x5f6/0x658 kmem_cache_alloc_trace+0x106/0x408 call_usermodehelper_setup+0x70/0x128 call_usermodehelper+0x62/0x90 cgroup_release_agent+0x178/0x1c0 process_one_work+0x36e/0x680 worker_thread+0x2f0/0x4f8 kthread+0x10a/0x120 kernel_thread_starter+0x6/0xc kernel_thread_starter+0x0/0xc INFO: Freed in call_usermodehelper_exec+0x110/0x1b8 age=71 cpu=2 pid=648 __slab_free+0x94/0x560 kfree+0x364/0x3e0 call_usermodehelper_exec+0x110/0x1b8 cgroup_release_agent+0x178/0x1c0 process_one_work+0x36e/0x680 worker_thread+0x2f0/0x4f8 kthread+0x10a/0x120 kernel_thread_starter+0x6/0xc kernel_thread_starter+0x0/0xc There is a use-after-free bug on the subprocess_info structure allocated by the user mode helper. In case do_execve() returns with an error ____call_usermodehelper() stores the error code to sub_info->retval, but sub_info can already have been freed. Regarding UMH_NO_WAIT, the sub_info structure can be freed by __call_usermodehelper() before the worker thread returns from do_execve(), allowing memory corruption when do_execve() failed after exec_mmap() is called. Regarding UMH_WAIT_EXEC, the call to umh_complete() allows call_usermodehelper_exec() to continue which then frees sub_info. To fix this race the code needs to make sure that the call to call_usermodehelper_freeinfo() is always done after the last store to sub_info->retval. Signed-off-by: Martin Schwidefsky <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtcStanimir Varbanov2-89/+135
Adds support for RTC device inside PM8941 PMIC. The RTC in this PMIC have two register spaces. Thus the rtc-pm8xxx is slightly reworked to reflect these differences. The register set for different PMIC chips are selected on DT compatible string base. [[email protected]: coding-style fixes] [[email protected]: simplify and fix locking in pm8xxx_rtc_set_time()] Signed-off-by: Stanimir Varbanov <[email protected]> Cc: Alessandro Zummo <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Josh Cartwright <[email protected]> Cc: Stanimir Varbanov <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29mm, thp: fix collapsing of hugepages on madviseDavid Rientjes3-16/+20
If an anonymous mapping is not allowed to fault thp memory and then madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never collapse this memory into thp memory. This occurs because the madvise(2) handler for thp, hugepage_madvise(), clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags until the final action of madvise_behavior(). This causes the khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when the vma had previously had VM_NOHUGEPAGE set. Fix this by passing the correct vma flags to the khugepaged mm slot handler. There's no chance khugepaged can run on this vma until after madvise_behavior() returns since we hold mm->mmap_sem. It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags in hugepage_advise(), but I didn't want to introduce special case behavior into madvise_behavior(). I think it's best to just let it always set vma->vm_flags itself. Signed-off-by: David Rientjes <[email protected]> Reported-by: Suleiman Souhlal <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29drivers: of: add return value to of_reserved_mem_device_init()Marek Szyprowski3-9/+17
Driver calling of_reserved_mem_device_init() might be interested if the initialization has been successful or not, so add support for returning error code. This fixes a build warining caused by commit 7bfa5ab6fa1b ("drivers: dma-coherent: add initialization from device tree"), which has been merged without this change and without fixing function return value. Fixes: 7bfa5ab6fa1b1 ("drivers: dma-coherent: add initialization from device tree") Signed-off-by: Marek Szyprowski <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Grant Likely <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Josh Cartwright <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kyungmin Park <[email protected]> Cc: Russell King <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29mm: free compound page with correct orderYu Zhao1-2/+2
Compound page should be freed by put_page() or free_pages() with correct order. Not doing so will cause tail pages leaked. The compound order can be obtained by compound_order() or use HPAGE_PMD_ORDER in our case. Some people would argue the latter is faster but I prefer the former which is more general. This bug was observed not just on our servers (the worst case we saw is 11G leaked on a 48G machine) but also on our workstations running Ubuntu based distro. $ cat /proc/vmstat | grep thp_zero_page_alloc thp_zero_page_alloc 55 thp_zero_page_alloc_failed 0 This means there is (thp_zero_page_alloc - 1) * (2M - 4K) memory leaked. Fixes: 97ae17497e99 ("thp: implement refcounting for huge zero page") Signed-off-by: Yu Zhao <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mel Gorman <[email protected]> Cc: David Rientjes <[email protected]> Cc: Bob Liu <[email protected]> Cc: <[email protected]> [3.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29gcov: add ARM64 to GCOV_PROFILE_ALLRiku Voipio1-1/+1
Following up the arm testing of gcov, turns out gcov on ARM64 works fine as well. Only change needed is adding ARM64 to Kconfig depends. Tested with qemu and mach-virt Signed-off-by: Riku Voipio <[email protected]> Acked-by: Peter Oberparleiter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29fsnotify: next_i is freed during fsnotify_unmount_inodes.Jerry Hoemann1-6/+11
During file system stress testing on 3.10 and 3.12 based kernels, the umount command occasionally hung in fsnotify_unmount_inodes in the section of code: spin_lock(&inode->i_lock); if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) { spin_unlock(&inode->i_lock); continue; } As this section of code holds the global inode_sb_list_lock, eventually the system hangs trying to acquire the lock. Multiple crash dumps showed: The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point back at itself. As this is not the value of list upon entry to the function, the kernel never exits the loop. To help narrow down problem, the call to list_del_init in inode_sb_list_del was changed to list_del. This poisons the pointers in the i_sb_list and causes a kernel to panic if it transverse a freed inode. Subsequent stress testing paniced in fsnotify_unmount_inodes at the bottom of the list_for_each_entry_safe loop showing next_i had become free. We believe the root cause of the problem is that next_i is being freed during the window of time that the list_for_each_entry_safe loop temporarily releases inode_sb_list_lock to call fsnotify and fsnotify_inode_delete. The code in fsnotify_unmount_inodes attempts to prevent the freeing of inode and next_i by calling __iget. However, the code doesn't do the __iget call on next_i if i_count == 0 or if i_state & (I_FREEING | I_WILL_FREE) The patch addresses this issue by advancing next_i in the above two cases until we either find a next_i which we can __iget or we reach the end of the list. This makes the handling of next_i more closely match the handling of the variable "inode." The time to reproduce the hang is highly variable (from hours to days.) We ran the stress test on a 3.10 kernel with the proposed patch for a week without failure. During list_for_each_entry_safe, next_i is becoming free causing the loop to never terminate. Advance next_i in those cases where __iget is not done. Signed-off-by: Jerry Hoemann <[email protected]> Cc: Jeff Kirsher <[email protected]> Cc: Ken Helias <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29mm/compaction.c: avoid premature range skip in isolate_migratepages_rangeJoonsoo Kim1-0/+3
Commit edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()") commonizes isolate_migratepages variants and make them use isolate_migratepages_block(). isolate_migratepages_block() could stop the execution when enough pages are isolated, but, there is no code in isolate_migratepages_range() to handle this case. In the result, even if isolate_migratepages_block() returns prematurely without checking all pages in the range, isolate_migratepages_block() is called repeately on the following pageblock and some pages in the previous range are skipped to check. Then, CMA is failed frequently due to this fact. To fix this problem, this patch let isolate_migratepages_range() know the situation that enough pages are isolated and stop the isolation in that case. Note that isolate_migratepages() has no such problem, because, it always stops the isolation after just one call of isolate_migratepages_block(). Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.Wang Nan1-0/+1
Commit ff7ee93f4715 ("cgroup/kmemleak: Annotate alloc_page() for cgroup allocations") introduces kmemleak_alloc() for alloc_page_cgroup(), but corresponding kmemleak_free() is missing, which makes kmemleak be wrongly disabled after memory offlining. Log is pasted at the end of this commit message. This patch add kmemleak_free() into free_page_cgroup(). During page offlining, this patch removes corresponding entries in kmemleak rbtree. After that, the freed memory can be allocated again by other subsystems without killing kmemleak. bash # for x in 1 2 3 4; do echo offline > /sys/devices/system/memory/memory$x/state ; sleep 1; done ; dmesg | grep leak Offlined Pages 32768 kmemleak: Cannot insert 0xffff880016969000 into the object search tree (overlaps existing) CPU: 0 PID: 412 Comm: sleep Not tainted 3.17.0-rc5+ #86 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x46/0x58 create_object+0x266/0x2c0 kmemleak_alloc+0x26/0x50 kmem_cache_alloc+0xd3/0x160 __sigqueue_alloc+0x49/0xd0 __send_signal+0xcb/0x410 send_signal+0x45/0x90 __group_send_sig_info+0x13/0x20 do_notify_parent+0x1bb/0x260 do_exit+0x767/0xa40 do_group_exit+0x44/0xa0 SyS_exit_group+0x17/0x20 system_call_fastpath+0x16/0x1b kmemleak: Kernel memory leak detector disabled kmemleak: Object 0xffff880016900000 (size 524288): kmemleak: comm "swapper/0", pid 0, jiffies 4294667296 kmemleak: min_count = 0 kmemleak: count = 0 kmemleak: flags = 0x1 kmemleak: checksum = 0 kmemleak: backtrace: log_early+0x63/0x77 kmemleak_alloc+0x4b/0x50 init_section_page_cgroup+0x7f/0xf5 page_cgroup_init+0xc5/0xd0 start_kernel+0x333/0x408 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0xf5/0xfc Fixes: ff7ee93f4715 (cgroup/kmemleak: Annotate alloc_page() for cgroup allocations) Signed-off-by: Wang Nan <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: <[email protected]> [3.2+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29inet: frags: remove the WARN_ON from inet_evict_bucketNikolay Aleksandrov1-1/+0
The WARN_ON in inet_evict_bucket can be triggered by a valid case: inet_frag_kill and inet_evict_bucket can be running in parallel on the same queue which means that there has been at least one more ref added by a previous inet_frag_find call, but inet_frag_kill can delete the timer before inet_evict_bucket which will cause the WARN_ON() there to trigger since we'll have refcnt!=1. Now, this case is valid because the queue is being "killed" for some reason (removed from the chain list and its timer deleted) so it will get destroyed in the end by one of the inet_frag_put() calls which reaches 0 i.e. refcnt is still valid. CC: Florian Westphal <[email protected]> CC: Eric Dumazet <[email protected]> CC: Patrick McLean <[email protected]> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Reported-by: Patrick McLean <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-29inet: frags: fix a race between inet_evict_bucket and inet_frag_killNikolay Aleksandrov1-1/+2
When the evictor is running it adds some chosen frags to a local list to be evicted once the chain lock has been released but at the same time the *frag_queue can be running for some of the same queues and it may call inet_frag_kill which will wait on the chain lock and will then delete the queue from the wrong list since it was added in the eviction one. The fix is simple - check if the queue has the evict flag set under the chain lock before deleting it, this is safe because the evict flag is set only under that lock and having the flag set also means that the queue has been detached from the chain list, so no need to delete it again. An important note to make is that we're safe w.r.t refcnt because inet_frag_kill and inet_evict_bucket will sync on the del_timer operation where only one of the two can succeed (or if the timer is executing - none of them), the cases are: 1. inet_frag_kill succeeds in del_timer - then the timer ref is removed, but inet_evict_bucket will not add this queue to its expire list but will restart eviction in that chain 2. inet_evict_bucket succeeds in del_timer - then the timer ref is kept until the evictor "expires" the queue, but inet_frag_kill will remove the initial ref and will set INET_FRAG_COMPLETE which will make the frag_expire fn just to remove its ref. In the end all of the queue users will do an inet_frag_put and the one that reaches 0 will free it. The refcount balance should be okay. CC: Florian Westphal <[email protected]> CC: Eric Dumazet <[email protected]> CC: Patrick McLean <[email protected]> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Suggested-by: Eric Dumazet <[email protected]> Reported-by: Patrick McLean <[email protected]> Tested-by: Patrick McLean <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-29ARM: OMAP2+: Warn about deprecated legacy booting modeTony Lindgren1-0/+4
We're moving omaps to use device tree based booting and already have omap2, omap4, omap5, am335x and am437x booting in device tree only mode. Only omap3 still has legacy booting still around and we really want to make that device tree only. So let's add a warning about deprecated legacy booting so we get people to upgrade their boards to use device tree based booting and find out about any remaining issues. Note that for most boards we already have the .dts file and those can be booted with without changing the bootloader using the appended DTB mode. Acked-By: Sebastian Reichel <[email protected]> Reviewed-by: Aaro Koskinen <[email protected]> Reviewed-by: Javier Martinez Canillas <[email protected]> Signed-off-by: Tony Lindgren <[email protected]>
2014-10-29ARM: omap2plus_defconfig: Fix errors with NAND BCHTony Lindgren1-0/+1
Looks like we need to have BCH enabled to get NAND working and to avoid getting: nand: error: CONFIG_MTD_NAND_ECC_BCH not enabled Signed-off-by: Tony Lindgren <[email protected]>
2014-10-29cnic: Update the rcu_access_pointer() usagesTej Parkash1-4/+1
1. Remove the rcu_read_lock/unlock around rcu_access_pointer 2. Replace the rcu_dereference with rcu_access_pointer Signed-off-by: Tej Parkash <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-29Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds7-46/+28
Pull block layer fixes from Jens Axboe: "A small collection of fixes for the current kernel. This contains: - Two error handling fixes from Jan Kara. One for null_blk on failure to add a device, and the other for the block/scsi_ioctl SCSI_IOCTL_SEND_COMMAND fixing up the error jump point. - A commit added in the merge window for the bio integrity bits unfortunately disabled merging for all requests if CONFIG_BLK_DEV_INTEGRITY wasn't set. Reverse the logic, so that integrity checking wont disallow merges when not enabled. - A fix from Ming Lei for merging and generating too many segments. This caused a BUG in virtio_blk. - Two error handling printk() fixups from Robert Elliott, improving the information given when we rate limit. - Error handling fixup on elevator_init() failure from Sudip Mukherjee. - A fix from Tony Battersby, fixing up a memory leak in the scatterlist handling with scsi-mq" * 'for-linus' of git://git.kernel.dk/linux-block: block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not defined lib/scatterlist: fix memory leak with scsi-mq block: fix wrong error return in elevator_init() scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND null_blk: Cleanup error recovery in null_add_dev() blk-merge: recaculate segment if it isn't less than max segments fs: clarify rate limit suppressed buffer I/O errors fs: merge I/O error prints into one line