aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-12-19ipv6: always set the new created dst's from in ip6_rt_copyLi RongQing1-3/+1
ip6_rt_copy only sets dst.from if ort has flag RTF_ADDRCONF and RTF_DEFAULT. but the prefix routes which did get installed by hand locally can have an expiration, and no any flag combination which can ensure a potential from does never expire, so we should always set the new created dst's from. This also fixes the new created dst is always expired since the ort, which is created by RA, maybe has RTF_EXPIRES and RTF_ADDRCONF, but no RTF_DEFAULT. Suggested-by: Hannes Frederic Sowa <[email protected]> CC: Gao feng <[email protected]> Signed-off-by: Li RongQing <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-12-19Merge tag 'keystone/maintainer-file' of ↵Kevin Hilman1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone into fixes From Santosh Shilimkar: Couple of updates to MAINTAINERS file for Keystone - Add git tree information - Add clock drivers entry * tag 'keystone/maintainer-file' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone: MAINTAINERS: Add keystone clock drivers MAINTAINERS: Add keystone git tree information Signed-off-by: Kevin Hilman <[email protected]>
2013-12-19net: fec: fix potential use after freeEric Dumazet1-2/+2
skb_tx_timestamp(skb) should be called _before_ TX completion has a chance to trigger, otherwise it is too late and we access freed memory. Signed-off-by: Eric Dumazet <[email protected]> Fixes: de5fb0a05348 ("net: fec: put tx to napi poll function to fix dead lock") Cc: Frank Li <[email protected]> Cc: Richard Cochran <[email protected]> Acked-by: Richard Cochran <[email protected]> Acked-by: Frank Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-12-19qla2xxx: Fix scsi_host leak on qlt_lport_register callback failureNicholas Bellinger1-0/+1
This patch fixes a possible scsi_host reference leak in qlt_lport_register(), when a non zero return from the passed (*callback) does not call drop the local reference via scsi_host_put() before returning. This currently does not effect existing tcm_qla2xxx code as the passed callback will never fail, but fix this up regardless for future code. Cc: Chad Dupuis <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-12-19target: Remove extra percpu_ref_initAndy Grover1-7/+1
lun->lun_ref is also initialized in core_tpg_post_addlun, so it doesn't need to be done in core_tpg_setup_virtual_lun0. (nab: Drop left-over percpu_ref_cancel_init in failure path) Signed-off-by: Andy Grover <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2013-12-19x86/efi: Don't select EFI from certain special ACPI driversJan Beulich5-6/+5
Commit 7ea6c6c1 ("Move cper.c from drivers/acpi/apei to drivers/firmware/efi") results in CONFIG_EFI being enabled even when the user doesn't want this. Since ACPI APEI used to build fine without UEFI (and as far as I know also has no functional depency on it), at least in that case using a reverse dependency is wrong (and a straight one isn't needed). Whether the same is true for ACPI_EXTLOG I don't know - if there is a functional dependency, it should depend on EFI rather than selecting it. It certainly has (currently) no build dependency. Adjust Kconfig and build logic so that the bad dependency gets avoided. Signed-off-by: Jan Beulich <[email protected]> Acked-by: Tony Luck <[email protected]> Cc: Matt Fleming <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-12-19bnx2x: downgrade "valid ME register value" message levelMichal Schmidt1-1/+1
"valid ME register value" is not an error. It should be logged for debugging only. Signed-off-by: Michal Schmidt <[email protected]> Acked-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-12-19hamradio/yam: fix info leak in ioctlSalva Peiró1-0/+1
The yam_ioctl() code fails to initialise the cmd field of the struct yamdrv_ioctl_cfg. Add an explicit memset(0) before filling the structure to avoid the 4-byte info leak. Signed-off-by: Salva Peiró <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-12-19drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl()Wenliang Fan1-0/+2
The local variable 'bi' comes from userspace. If userspace passed a large number to 'bi.data.calibrate', there would be an integer overflow in the following line: s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; Signed-off-by: Wenliang Fan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-12-19xen-netback: fix some error return codeWei Yongjun1-4/+12
'err' is overwrited to 0 after maybe_pull_tail() call, so the error code was not set if skb_partial_csum_set() call failed. Fix to return error -EPROTO from those error handling case instead of 0. Fixes: d52eb0d46f36 ('xen-netback: make sure skb linear area covers checksum field') Signed-off-by: Wei Yongjun <[email protected]> Acked-by: Wei Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-12-19net: inet_diag: zero out uninitialized idiag_{src,dst} fieldsDaniel Borkmann1-0/+16
Jakub reported while working with nlmon netlink sniffer that parts of the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6. That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3]. In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab] memory through this. At least, in udp_dump_one(), we allocate a skb in ... rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL); ... and then pass that to inet_sk_diag_fill() that puts the whole struct inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0], r->id.idiag_dst[0] and leave the rest untouched: r->id.idiag_src[0] = inet->inet_rcv_saddr; r->id.idiag_dst[0] = inet->inet_daddr; struct inet_diag_msg embeds struct inet_diag_sockid that is correctly / fully filled out in IPv6 case, but for IPv4 not. So just zero them out by using plain memset (for this little amount of bytes it's probably not worth the extra check for idiag_family == AF_INET). Similarly, fix also other places where we fill that out. Reported-by: Jakub Zawadzki <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-12-19x86 idle: Repair large-server 50-watt idle-power regressionLen Brown2-1/+5
Linux 3.10 changed the timing of how thread_info->flags is touched: x86: Use generic idle loop (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) This caused Intel NHM-EX and WSM-EX servers to experience a large number of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote from deep C-states to shallow C-states, which caused these platforms to experience a significant increase in idle power. Note that this issue was already present before the commit above, however, it wasn't seen often enough to be noticed in power measurements. Here we extend an errata workaround from the Core2 EX "Dunnington" to extend to NHM-EX and WSM-EX, to prevent these immediate returns from MWAIT, reducing idle power on these platforms. While only acpi_idle ran on Dunnington, intel_idle may also run on these two newer systems. As of today, there are no other models that are known to need this tweak. Link: http://lkml.kernel.org/r/CAJvTdK=%[email protected] Signed-off-by: Len Brown <[email protected]> Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com Cc: <[email protected]> # 3.12.x, 3.11.x, 3.10.x Signed-off-by: H. Peter Anvin <[email protected]>
2013-12-19libata, freezer: avoid block device removal while system is frozenTejun Heo2-0/+27
Freezable kthreads and workqueues are fundamentally problematic in that they effectively introduce a big kernel lock widely used in the kernel and have already been the culprit of several deadlock scenarios. This is the latest occurrence. During resume, libata rescans all the ports and revalidates all pre-existing devices. If it determines that a device has gone missing, the device is removed from the system which involves invalidating block device and flushing bdi while holding driver core layer locks. Unfortunately, this can race with the rest of device resume. Because freezable kthreads and workqueues are thawed after device resume is complete and block device removal depends on freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make progress, this can lead to deadlock - block device removal can't proceed because kthreads are frozen and kthreads can't be thawed because device resume is blocked behind block device removal. 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue") made this particular deadlock scenario more visible but the underlying problem has always been there - the original forker task and jbd2 are freezable too. In fact, this is highly likely just one of many possible deadlock scenarios given that freezer behaves as a big kernel lock and we don't have any debug mechanism around it. I believe the right thing to do is getting rid of freezable kthreads and workqueues. This is something fundamentally broken. For now, implement a funny workaround in libata - just avoid doing block device hot[un]plug while the system is frozen. Kernel engineering at its finest. :( v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built as a module. v3: Comment updated and polling interval changed to 10ms as suggested by Rafael. v4: Add #ifdef CONFIG_FREEZER around the hack as pm_freezing is not defined when FREEZER is not configured thus breaking build. Reported by kbuild test robot. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Tomaž Šolc <[email protected]> Reviewed-by: "Rafael J. Wysocki" <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801 Link: http://lkml.kernel.org/r/[email protected] Cc: Greg Kroah-Hartman <[email protected]> Cc: Len Brown <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: [email protected] Cc: kbuild test robot <[email protected]>
2013-12-19arm64: ptrace: avoid using HW_BREAKPOINT_EMPTY for disabled eventsWill Deacon1-20/+18
Commit 8f34a1da35ae ("arm64: ptrace: use HW_BREAKPOINT_EMPTY type for disabled breakpoints") fixed an issue with GDB trying to zero breakpoint control registers. The problem there is that the arch hw_breakpoint code will attempt to create a (disabled), execute breakpoint of length 0. This will fail validation and report unexpected failure to GDB. To avoid this, we treated disabled breakpoints as HW_BREAKPOINT_EMPTY, but that seems to have broken with recent kernels, causing watchpoints to be treated as TYPE_INST in the core code and returning ENOSPC for any further breakpoints. This patch fixes the problem by prioritising the `enable' field of the breakpoint: if it is cleared, we simply update the perf_event_attr to indicate that the thing is disabled and don't bother changing either the type or the length. This reinforces the behaviour that the breakpoint control register is essentially read-only apart from the enable bit when disabling a breakpoint. Cc: <[email protected]> Reported-by: Aaron Liu <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2013-12-19Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2-2/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "An RT group-scheduling fix and the sched-domains topology setup fix from Mel" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities sched: Assign correct scheduling domain to 'sd_llc'
2013-12-19Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2-3/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "An ABI documentation fix, and a mixed-PMU perf-info-corruption fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Document the new transaction sample type perf: Disable all pmus on unthrottling and rescheduling
2013-12-19Merge tag 'sound-3.13-rc5' of ↵Linus Torvalds16-45/+115
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "We have a bit more changes than usual in ASoC here, as it was slipped from the previous update. There are one minr ASoC PCM code fix and ASoC dmaengine fix, in addition of a collection of small ASoC driver fixes. The rest are a couple of HD-audio stable fixups, and a long-standing fix for the paused stream handling. So, all commits look not scary (and hopefully won't give you disastrous holiday season)" * tag 'sound-3.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Add Dell headset detection quirk for one more laptop model ASoC: wm8904: fix DSP mode B configuration ASoC: wm_adsp: Add small delay while polling DSP RAM start ALSA: Add SNDRV_PCM_STATE_PAUSED case in wait_for_avail function ASoC: kirkwood: Fix the CPU DAI rates ASoC: wm5110: Correct HPOUT3 DAPM route typo ALSA: hda - Add Dell headset detection quirk for three laptop models ALSA: hda - Add enable_msi=0 workaround for four HP machines ASoC: don't leak on error in snd_dmaengine_pcm_register ASoC: fsl: imx-wm8962: Don't update bias_level in machine driver ASoC: tegra: fix uninitialized variables in set_fmt ASoC: wm8962: Enable SYSCLK provisonally before fetching generated DSPCLK_DIV ASoC: sam9x5_wm8731: change to work in DSP A mode ASoC: atmel_ssc_dai: add dai trigger ops ASoC: soc-pcm: Use valid condition for snd_soc_dai_digital_mute() in hw_free()
2013-12-19null_blk: warning on ignored submit_queues paramMatias Bjorling1-2/+5
Let the user know when the number of submission queues are being ignored. Signed-off-by: Matias Bjorling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2013-12-19null_blk: refactor init and init errors code pathsMatias Bjorling1-25/+38
Simplify the initialization logic of the three block-layers. - The queue initialization is split into two parts. This allows reuse of code when initializing the sq-, bio- and mq-based layers. - Set submit_queues default value to 0 and always set it at init time. - Simplify the init error code paths. Signed-off-by: Matias Bjorling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2013-12-19null_blk: documentationMatias Bjorling1-0/+71
Add description of module and its parameters. Signed-off-by: Matias Bjorling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2013-12-19null_blk: mem garbage on NUMA systems during initMatias Bjorling1-4/+4
For NUMA systems, initializing the blk-mq layer and using per node hctx. We initialize submit queues to 1, while blk-mq nr_hw_queues is initialized to the number of NUMA nodes. This makes the null_init_hctx function overwrite memory outside of what it allocated. In my case it lead to writing garbage into struct request_queue's mq_map. Signed-off-by: Matias Bjorling <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-19drivers: block: Mark the functions as static in skd_main.cRashika Kheria1-2/+2
Mark functions skd_skmsg_state_to_str() and skd_skreq_state_to_str() as static in skd_main.c because they are not used outside this file. This eliminates the following warnings in skd_main.c: drivers/block/skd_main.c:5272:13: warning: no previous prototype for ‘skd_skmsg_state_to_str’ [-Wmissing-prototypes] drivers/block/skd_main.c:5284:13: warning: no previous prototype for ‘skd_skreq_state_to_str’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2013-12-19ARC: Allow conditional multiple inclusion of uapi/asm/unistd.hVineet Gupta1-1/+7
Commit 97bc386fc12d "ARC: Add guard macro to uapi/asm/unistd.h" inhibited multiple inclusion of ARCH unistd.h. This however hosed the system since Generic syscall table generator relies on it being included twice, and in lack-of an empty table was emitted by C preprocessor. Fix that by allowing one exception to rule for the special case (just like Xtensa) Suggested-by: Chen Gang <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2013-12-19Merge tag 'asoc-v3.13-rc4' of ↵Takashi Iwai541-2724/+5150
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v3.13 The fixes here are all driver specific ones, none of which particularly stand out but all of which are useful to users of those drivers.
2013-12-19Merge remote-tracking branches 'asoc/fix/adsp', 'asoc/fix/arizona', ↵Mark Brown11-32/+75
'asoc/fix/atmel', 'asoc/fix/fsl', 'asoc/fix/kirkwood', 'asoc/fix/tegra', 'asoc/fix/wm8904' and 'asoc/fix/wm8962' into asoc-linus
2013-12-19Merge remote-tracking branch 'asoc/fix/dma' into asoc-linusMark Brown1-11/+27
2013-12-19Merge remote-tracking branch 'asoc/fix/core' into asoc-linusMark Brown1-2/+3
2013-12-19ARM: shmobile: r8a7790: fix shdi resource sizesBen Dooks1-2/+2
The r8a7790.dtsi file has four sdhi nodes which the first two have the wrong resource size for their register block. This causes the sh_modbile_sdhi driver to fail to communicate with card at-all. Change sdhi{0,1} node size from 0x100 to 0x200 to correct these nodes as per Kuninori Morimoto's response to the original patch where all four nodes where changed. sdhi{2,3} are the correct size. This bug has been present since sdhi resources were added to the r8a7790 by 8c9b1aa41853272a ("ARM: shmobile: r8a7790: add MMCIF and SDHI DT templates") in v3.11-rc2. Signed-off-by: Ben Dooks <[email protected]> Tested-by: William Towle <[email protected]> Acked-by: Kuninori Morimoto <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2013-12-19ARM: shmobile: bockw: fixup DMA maskKuninori Morimoto1-1/+1
4dcfa60071b3d23f0181f27d8519f12e37cefbb9 (ARM: DMA-API: better handing of DMA masks for coherent allocations) exchanged DMA mask check method. Below warning will appear without this patch asoc-simple-card asoc-simple-card.0: \ Coherent DMA mask 0xffffffffffffffff is larger than dma_addr_t allows asoc-simple-card asoc-simple-card.0: \ Driver did not use or check the return value from dma_set_coherent_mask()? Signed-off-by: Kuninori Morimoto <[email protected]> Acked-by: Laurent Pinchart <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2013-12-19target/file: Update hw_max_sectors based on current block_sizeNicholas Bellinger4-5/+14
This patch allows FILEIO to update hw_max_sectors based on the current max_bytes_per_io. This is required because vfs_[writev,readv]() can accept a maximum of 2048 iovecs per call, so the enforced hw_max_sectors really needs to be calculated based on block_size. This addresses a >= v3.5 bug where block_size=512 was rejecting > 1M sized I/O requests, because FD_MAX_SECTORS was hardcoded to 2048 for the block_size=4096 case. (v2: Use max_bytes_per_io instead of ->update_hw_max_sectors) Reported-by: Henrik Goldman <[email protected]> Cc: <[email protected]> #3.5+ Signed-off-by: Nicholas Bellinger <[email protected]>
2013-12-19iser-target: Move INIT_WORK setup into isert_create_device_ib_resNicholas Bellinger1-2/+4
This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into isert_create_device_ib_res(), instead of being done each callback invocation in isert_cq_[rx,tx]_callback(). This also fixes a 'INFO: trying to register non-static key' warning when cancel_work_sync() is called before INIT_WORK has setup the struct work_struct. Reported-by: Or Gerlitz <[email protected]> Cc: <[email protected]> #3.12+ Signed-off-by: Nicholas Bellinger <[email protected]>
2013-12-19iscsi-target: Fix incorrect np->np_thread NULL assignmentNicholas Bellinger2-6/+1
When shutting down a target there is a race condition between iscsit_del_np() and __iscsi_target_login_thread(). The latter sets the thread pointer to NULL, and the former tries to issue kthread_stop() on that pointer without any synchronization. This patch moves the np->np_thread NULL assignment into iscsit_del_np(), after kthread_stop() has completed. It also removes the signal_pending() + np_state check, and only exits when kthread_should_stop() is true. Reported-by: Hannes Reinecke <[email protected]> Cc: <[email protected]> #3.12+ Signed-off-by: Nicholas Bellinger <[email protected]>
2013-12-19ARM: shmobile: armadillo: Add PWM backlight power supplyLaurent Pinchart1-0/+7
Commit 22ceeee16eb8f0d04de3ef43a5174fb30ec18af9 ("pwm-backlight: Add power supply support") added a mandatory power supply for the PWM backlight. Add a fixed 5V regulator to board code with a consumer supply entry for the backlight device. Signed-off-by: Laurent Pinchart <[email protected]> Signed-off-by: Simon Horman <[email protected]> (cherry picked from commit ad11cb9a5cf96346f1240995c672cdbb5501785c) Signed-off-by: Simon Horman <[email protected]>
2013-12-18Merge branch 'akpm' (incoming from Andrew)Linus Torvalds24-57/+249
Merge patches from Andrew Morton: "23 fixes and a MAINTAINERS update" * emailed patches from Andrew Morton <[email protected]>: (24 commits) mm/hugetlb: check for pte NULL pointer in __page_check_address() fix build with make 3.80 mm/mempolicy: fix !vma in new_vma_page() MAINTAINERS: add Davidlohr as GPT maintainer mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully mm/compaction: respect ignore_skip_hint in update_pageblock_skip mm/mempolicy: correct putback method for isolate pages if failed mm: add missing dependency in Kconfig sh: always link in helper functions extracted from libgcc mm: page_alloc: exclude unreclaimable allocations from zone fairness policy mm: numa: defer TLB flush for THP migration as long as possible mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates mm: fix TLB flush race between migration, and change_protection_range mm: numa: avoid unnecessary disruption of NUMA hinting during migration mm: numa: clear numa hinting information on mprotect sched: numa: skip inaccessible VMAs mm: numa: avoid unnecessary work on the failure path mm: numa: ensure anon_vma is locked to prevent parallel THP splits mm: numa: do not clear PTE for pte_numa update mm: numa: do not clear PMD during PTE update scan ...
2013-12-18mm/hugetlb: check for pte NULL pointer in __page_check_address()Jianguo Wu1-0/+4
In __page_check_address(), if address's pud is not present, huge_pte_offset() will return NULL, we should check the return value. Signed-off-by: Jianguo Wu <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: qiuxishi <[email protected]> Cc: Hanjun Guo <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18fix build with make 3.80Jan Beulich1-13/+7
According to Documentation/Changes, make 3.80 is still being supported for building the kernel, hence make files must not make (unconditional) use of features introduced only in newer versions. Commit 1bf49dd4be0b ("./Makefile: export initial ramdisk compression config option") however introduced "else ifeq" constructs which make 3.80 doesn't understand. Replace the logic there with more conventional (in the kernel build infrastructure) list constructs (except that the list here is intentionally limited to exactly one element). Signed-off-by: Jan Beulich <[email protected]> Cc: P J P <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm/mempolicy: fix !vma in new_vma_page()Wanpeng Li1-6/+8
BUG_ON(!vma) assumption is introduced by commit 0bf598d863e3 ("mbind: add BUG_ON(!vma) in new_vma_page()"), however, even if address = __vma_address(page, vma); and vma->start < address < vma->end page_address_in_vma() may still return -EFAULT because of many other conditions in it. As a result the while loop in new_vma_page() may end with vma=NULL. This patch revert the commit and also fix the potential dereference NULL pointer reported by Dan. http://marc.info/?l=linux-mm&m=137689530323257&w=2 kernel BUG at mm/mempolicy.c:1204! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 3 PID: 7056 Comm: trinity-child3 Not tainted 3.13.0-rc3+ #2 task: ffff8801ca5295d0 ti: ffff88005ab20000 task.ti: ffff88005ab20000 RIP: new_vma_page+0x70/0x90 RSP: 0000:ffff88005ab21db0 EFLAGS: 00010246 RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000008040075 RSI: ffff8801c3d74600 RDI: ffffea00079a8b80 RBP: ffff88005ab21dc8 R08: 0000000000000004 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffff2 R13: ffffea00079a8b80 R14: 0000000000400000 R15: 0000000000400000 FS: 00007ff49c6f4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff49c68f994 CR3: 000000005a205000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffffea00079a8b80 ffffea00079a8bc0 ffffea00079a8ba0 ffff88005ab21e50 ffffffff811adc7a 0000000000000000 ffff8801ca5295d0 0000000464e224f8 0000000000000000 0000000000000002 0000000000000000 ffff88020ce75c00 Call Trace: migrate_pages+0x12a/0x850 SYSC_mbind+0x513/0x6a0 SyS_mbind+0xe/0x10 ia32_do_call+0x13/0x13 Code: 85 c0 75 2f 4c 89 e1 48 89 da 31 f6 bf da 00 02 00 65 44 8b 04 25 08 f7 1c 00 e8 ec fd ff ff 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 4c 89 e6 48 89 df ba 01 00 00 00 e8 48 RIP [<ffffffff8119f200>] new_vma_page+0x70/0x90 RSP <ffff88005ab21db0> Signed-off-by: Wanpeng Li <[email protected]> Reported-by: Dave Jones <[email protected]> Reported-by: Sasha Levin <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Reviewed-by: Bob Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18MAINTAINERS: add Davidlohr as GPT maintainerDavidlohr Bueso1-0/+6
Add a new entry for the GPT standard. Any future changes will now be routed through linux-efi. Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Matt Fleming <[email protected]> Cc: Jens Axboe <[email protected]> Acked-by: Matt Domsch <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfullyJianguo Wu1-4/+10
After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0 PGD c23762067 PUD c24be2067 PMD 0 Oops: 0000 [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Signed-off-by: Jianguo Wu <[email protected]> Tested-by: Naoya Horiguchi <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm/compaction: respect ignore_skip_hint in update_pageblock_skipJoonsoo Kim1-0/+4
update_pageblock_skip() only fits to compaction which tries to isolate by pageblock unit. If isolate_migratepages_range() is called by CMA, it try to isolate regardless of pageblock unit and it don't reference get_pageblock_skip() by ignore_skip_hint. We should also respect it on update_pageblock_skip() to prevent from setting the wrong information. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Rafael Aquini <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: <[email protected]> [3.7+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm/mempolicy: correct putback method for isolate pages if failedJoonsoo Kim1-1/+1
queue_pages_range() isolates hugetlbfs pages and putback_lru_pages() can't handle these. We should change it to putback_movable_pages(). Naoya said that it is worth going into stable, because it can break in-use hugepage list. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Rafael Aquini <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: <[email protected]> [3.12.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm: add missing dependency in KconfigSima Baymani1-1/+1
Eliminate the following (rand)config warning by adding missing PROC_FS dependency: warning: (HWPOISON_INJECT && MEM_SOFT_DIRTY) selects PROC_PAGE_MONITOR which has unmet direct dependencies (PROC_FS && MMU) Signed-off-by: Sima Baymani <[email protected]> Suggested-by: David Rientjes <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18sh: always link in helper functions extracted from libgccGeert Uytterhoeven1-1/+1
E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m: ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined! For "lib-y", if no symbols in a compilation unit are referenced by other units, the compilation unit will not be included in vmlinux. This breaks modules that do reference those symbols. Use "obj-y" instead to fix this. http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/ This doesn't fix all cases. There are others, e.g. udivsi3. This is also not limited to sh, many architectures handle this in the same way. A simple solution is to unconditionally include all helper functions. A more complex solution is to make the choice of "lib-y" or "obj-y" depend on CONFIG_MODULES: obj-$(CONFIG_MODULES) += ... lib-y($CONFIG_MODULES) += ... Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Paul Mundt <[email protected]> Tested-by: Nobuhiro Iwamatsu <[email protected]> Reviewed-by: Nobuhiro Iwamatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm: page_alloc: exclude unreclaimable allocations from zone fairness policyJohannes Weiner1-1/+2
Dave Hansen noted a regression in a microbenchmark that loops around open() and close() on an 8-node NUMA machine and bisected it down to commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy"). That change forces the slab allocations of the file descriptor to spread out to all 8 nodes, causing remote references in the page allocator and slab. The round-robin policy is only there to provide fairness among memory allocations that are reclaimed involuntarily based on pressure in each zone. It does not make sense to apply it to unreclaimable kernel allocations that are freed manually, in this case instantly after the allocation, and incur the remote reference costs twice for no reason. Only round-robin allocations that are usually freed through page reclaim or slab shrinking. Bisected by Dave Hansen. Signed-off-by: Johannes Weiner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm: numa: defer TLB flush for THP migration as long as possibleMel Gorman2-7/+3
THP migration can fail for a variety of reasons. Avoid flushing the TLB to deal with THP migration races until the copy is ready to start. Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Alex Thorlton <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm: numa: guarantee that tlb_flush_pending updates are visible before page ↵Mel Gorman1-1/+6
table updates According to documentation on barriers, stores issued before a LOCK can complete after the lock implying that it's possible tlb_flush_pending can be visible after a page table update. As per revised documentation, this patch adds a smp_mb__before_spinlock to guarantee the correct ordering. Signed-off-by: Mel Gorman <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm: fix TLB flush race between migration, and change_protection_rangeRik van Riel8-7/+69
There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [*] flush TLB reload TLB from new entry read/write new page lose data [*] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [[email protected]: fix build] Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Cc: Alex Thorlton <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm: numa: avoid unnecessary disruption of NUMA hinting during migrationMel Gorman3-6/+37
do_huge_pmd_numa_page() handles the case where there is parallel THP migration. However, by the time it is checked the NUMA hinting information has already been disrupted. This patch adds an earlier check with some helpers. Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Alex Thorlton <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18mm: numa: clear numa hinting information on mprotectMel Gorman2-0/+4
On a protection change it is no longer clear if the page should be still accessible. This patch clears the NUMA hinting fault bits on a protection change. Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Alex Thorlton <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-12-18sched: numa: skip inaccessible VMAsMel Gorman1-0/+7
Inaccessible VMA should not be trapping NUMA hint faults. Skip them. Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Alex Thorlton <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>