Age | Commit message (Collapse) | Author | Files | Lines |
|
ip6_rt_copy only sets dst.from if ort has flag RTF_ADDRCONF and RTF_DEFAULT.
but the prefix routes which did get installed by hand locally can have an
expiration, and no any flag combination which can ensure a potential from
does never expire, so we should always set the new created dst's from.
This also fixes the new created dst is always expired since the ort, which
is created by RA, maybe has RTF_EXPIRES and RTF_ADDRCONF, but no RTF_DEFAULT.
Suggested-by: Hannes Frederic Sowa <[email protected]>
CC: Gao feng <[email protected]>
Signed-off-by: Li RongQing <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone into fixes
From Santosh Shilimkar:
Couple of updates to MAINTAINERS file for Keystone
- Add git tree information
- Add clock drivers entry
* tag 'keystone/maintainer-file' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone:
MAINTAINERS: Add keystone clock drivers
MAINTAINERS: Add keystone git tree information
Signed-off-by: Kevin Hilman <[email protected]>
|
|
skb_tx_timestamp(skb) should be called _before_ TX completion
has a chance to trigger, otherwise it is too late and we access
freed memory.
Signed-off-by: Eric Dumazet <[email protected]>
Fixes: de5fb0a05348 ("net: fec: put tx to napi poll function to fix dead lock")
Cc: Frank Li <[email protected]>
Cc: Richard Cochran <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Acked-by: Frank Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch fixes a possible scsi_host reference leak in qlt_lport_register(),
when a non zero return from the passed (*callback) does not call drop the
local reference via scsi_host_put() before returning.
This currently does not effect existing tcm_qla2xxx code as the passed callback
will never fail, but fix this up regardless for future code.
Cc: Chad Dupuis <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
|
|
lun->lun_ref is also initialized in core_tpg_post_addlun, so it doesn't
need to be done in core_tpg_setup_virtual_lun0.
(nab: Drop left-over percpu_ref_cancel_init in failure path)
Signed-off-by: Andy Grover <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
|
|
Commit 7ea6c6c1 ("Move cper.c from drivers/acpi/apei to
drivers/firmware/efi") results in CONFIG_EFI being enabled even
when the user doesn't want this. Since ACPI APEI used to build
fine without UEFI (and as far as I know also has no functional
depency on it), at least in that case using a reverse dependency
is wrong (and a straight one isn't needed).
Whether the same is true for ACPI_EXTLOG I don't know - if there
is a functional dependency, it should depend on EFI rather than
selecting it. It certainly has (currently) no build dependency.
Adjust Kconfig and build logic so that the bad dependency gets
avoided.
Signed-off-by: Jan Beulich <[email protected]>
Acked-by: Tony Luck <[email protected]>
Cc: Matt Fleming <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
"valid ME register value" is not an error. It should be logged for
debugging only.
Signed-off-by: Michal Schmidt <[email protected]>
Acked-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The yam_ioctl() code fails to initialise the cmd field
of the struct yamdrv_ioctl_cfg. Add an explicit memset(0)
before filling the structure to avoid the 4-byte info leak.
Signed-off-by: Salva Peiró <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The local variable 'bi' comes from userspace. If userspace passed a
large number to 'bi.data.calibrate', there would be an integer overflow
in the following line:
s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16;
Signed-off-by: Wenliang Fan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
'err' is overwrited to 0 after maybe_pull_tail() call, so the error
code was not set if skb_partial_csum_set() call failed. Fix to return
error -EPROTO from those error handling case instead of 0.
Fixes: d52eb0d46f36 ('xen-netback: make sure skb linear area covers checksum field')
Signed-off-by: Wei Yongjun <[email protected]>
Acked-by: Wei Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Jakub reported while working with nlmon netlink sniffer that parts of
the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6.
That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3].
In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab]
memory through this. At least, in udp_dump_one(), we allocate a skb in ...
rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL);
... and then pass that to inet_sk_diag_fill() that puts the whole struct
inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0],
r->id.idiag_dst[0] and leave the rest untouched:
r->id.idiag_src[0] = inet->inet_rcv_saddr;
r->id.idiag_dst[0] = inet->inet_daddr;
struct inet_diag_msg embeds struct inet_diag_sockid that is correctly /
fully filled out in IPv6 case, but for IPv4 not.
So just zero them out by using plain memset (for this little amount of
bytes it's probably not worth the extra check for idiag_family == AF_INET).
Similarly, fix also other places where we fill that out.
Reported-by: Jakub Zawadzki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Linux 3.10 changed the timing of how thread_info->flags is touched:
x86: Use generic idle loop
(7d1a941731fabf27e5fb6edbebb79fe856edb4e5)
This caused Intel NHM-EX and WSM-EX servers to experience a large number
of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote
from deep C-states to shallow C-states, which caused these platforms
to experience a significant increase in idle power.
Note that this issue was already present before the commit above,
however, it wasn't seen often enough to be noticed in power measurements.
Here we extend an errata workaround from the Core2 EX "Dunnington"
to extend to NHM-EX and WSM-EX, to prevent these immediate
returns from MWAIT, reducing idle power on these platforms.
While only acpi_idle ran on Dunnington, intel_idle
may also run on these two newer systems.
As of today, there are no other models that are known
to need this tweak.
Link: http://lkml.kernel.org/r/CAJvTdK=%[email protected]
Signed-off-by: Len Brown <[email protected]>
Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com
Cc: <[email protected]> # 3.12.x, 3.11.x, 3.10.x
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
Freezable kthreads and workqueues are fundamentally problematic in
that they effectively introduce a big kernel lock widely used in the
kernel and have already been the culprit of several deadlock
scenarios. This is the latest occurrence.
During resume, libata rescans all the ports and revalidates all
pre-existing devices. If it determines that a device has gone
missing, the device is removed from the system which involves
invalidating block device and flushing bdi while holding driver core
layer locks. Unfortunately, this can race with the rest of device
resume. Because freezable kthreads and workqueues are thawed after
device resume is complete and block device removal depends on
freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make
progress, this can lead to deadlock - block device removal can't
proceed because kthreads are frozen and kthreads can't be thawed
because device resume is blocked behind block device removal.
839a8e8660b6 ("writeback: replace custom worker pool implementation
with unbound workqueue") made this particular deadlock scenario more
visible but the underlying problem has always been there - the
original forker task and jbd2 are freezable too. In fact, this is
highly likely just one of many possible deadlock scenarios given that
freezer behaves as a big kernel lock and we don't have any debug
mechanism around it.
I believe the right thing to do is getting rid of freezable kthreads
and workqueues. This is something fundamentally broken. For now,
implement a funny workaround in libata - just avoid doing block device
hot[un]plug while the system is frozen. Kernel engineering at its
finest. :(
v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built
as a module.
v3: Comment updated and polling interval changed to 10ms as suggested
by Rafael.
v4: Add #ifdef CONFIG_FREEZER around the hack as pm_freezing is not
defined when FREEZER is not configured thus breaking build.
Reported by kbuild test robot.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Tomaž Šolc <[email protected]>
Reviewed-by: "Rafael J. Wysocki" <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801
Link: http://lkml.kernel.org/r/[email protected]
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: [email protected]
Cc: kbuild test robot <[email protected]>
|
|
Commit 8f34a1da35ae ("arm64: ptrace: use HW_BREAKPOINT_EMPTY type for
disabled breakpoints") fixed an issue with GDB trying to zero breakpoint
control registers. The problem there is that the arch hw_breakpoint code
will attempt to create a (disabled), execute breakpoint of length 0.
This will fail validation and report unexpected failure to GDB. To avoid
this, we treated disabled breakpoints as HW_BREAKPOINT_EMPTY, but that
seems to have broken with recent kernels, causing watchpoints to be
treated as TYPE_INST in the core code and returning ENOSPC for any
further breakpoints.
This patch fixes the problem by prioritising the `enable' field of the
breakpoint: if it is cleared, we simply update the perf_event_attr to
indicate that the thing is disabled and don't bother changing either the
type or the length. This reinforces the behaviour that the breakpoint
control register is essentially read-only apart from the enable bit
when disabling a breakpoint.
Cc: <[email protected]>
Reported-by: Aaron Liu <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"An RT group-scheduling fix and the sched-domains topology setup fix
from Mel"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities
sched: Assign correct scheduling domain to 'sd_llc'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"An ABI documentation fix, and a mixed-PMU perf-info-corruption fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Document the new transaction sample type
perf: Disable all pmus on unthrottling and rescheduling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"We have a bit more changes than usual in ASoC here, as it was slipped
from the previous update. There are one minr ASoC PCM code fix and
ASoC dmaengine fix, in addition of a collection of small ASoC driver
fixes. The rest are a couple of HD-audio stable fixups, and a
long-standing fix for the paused stream handling.
So, all commits look not scary (and hopefully won't give you
disastrous holiday season)"
* tag 'sound-3.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add Dell headset detection quirk for one more laptop model
ASoC: wm8904: fix DSP mode B configuration
ASoC: wm_adsp: Add small delay while polling DSP RAM start
ALSA: Add SNDRV_PCM_STATE_PAUSED case in wait_for_avail function
ASoC: kirkwood: Fix the CPU DAI rates
ASoC: wm5110: Correct HPOUT3 DAPM route typo
ALSA: hda - Add Dell headset detection quirk for three laptop models
ALSA: hda - Add enable_msi=0 workaround for four HP machines
ASoC: don't leak on error in snd_dmaengine_pcm_register
ASoC: fsl: imx-wm8962: Don't update bias_level in machine driver
ASoC: tegra: fix uninitialized variables in set_fmt
ASoC: wm8962: Enable SYSCLK provisonally before fetching generated DSPCLK_DIV
ASoC: sam9x5_wm8731: change to work in DSP A mode
ASoC: atmel_ssc_dai: add dai trigger ops
ASoC: soc-pcm: Use valid condition for snd_soc_dai_digital_mute() in hw_free()
|
|
Let the user know when the number of submission queues are being
ignored.
Signed-off-by: Matias Bjorling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Simplify the initialization logic of the three block-layers.
- The queue initialization is split into two parts. This allows reuse of
code when initializing the sq-, bio- and mq-based layers.
- Set submit_queues default value to 0 and always set it at init time.
- Simplify the init error code paths.
Signed-off-by: Matias Bjorling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add description of module and its parameters.
Signed-off-by: Matias Bjorling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
For NUMA systems, initializing the blk-mq layer and using per node hctx.
We initialize submit queues to 1, while blk-mq nr_hw_queues is
initialized to the number of NUMA nodes.
This makes the null_init_hctx function overwrite memory outside of what
it allocated. In my case it lead to writing garbage into struct
request_queue's mq_map.
Signed-off-by: Matias Bjorling <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Mark functions skd_skmsg_state_to_str() and skd_skreq_state_to_str() as
static in skd_main.c because they are not used outside this file.
This eliminates the following warnings in skd_main.c:
drivers/block/skd_main.c:5272:13: warning: no previous prototype for ‘skd_skmsg_state_to_str’ [-Wmissing-prototypes]
drivers/block/skd_main.c:5284:13: warning: no previous prototype for ‘skd_skreq_state_to_str’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <[email protected]>
Reviewed-by: Josh Triplett <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 97bc386fc12d "ARC: Add guard macro to uapi/asm/unistd.h"
inhibited multiple inclusion of ARCH unistd.h. This however hosed the system
since Generic syscall table generator relies on it being included twice,
and in lack-of an empty table was emitted by C preprocessor.
Fix that by allowing one exception to rule for the special case (just
like Xtensa)
Suggested-by: Chen Gang <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v3.13
The fixes here are all driver specific ones, none of which particularly
stand out but all of which are useful to users of those drivers.
|
|
'asoc/fix/atmel', 'asoc/fix/fsl', 'asoc/fix/kirkwood', 'asoc/fix/tegra', 'asoc/fix/wm8904' and 'asoc/fix/wm8962' into asoc-linus
|
|
|
|
|
|
The r8a7790.dtsi file has four sdhi nodes which the first two have the wrong
resource size for their register block. This causes the sh_modbile_sdhi driver
to fail to communicate with card at-all.
Change sdhi{0,1} node size from 0x100 to 0x200 to correct these nodes
as per Kuninori Morimoto's response to the original patch where all four
nodes where changed. sdhi{2,3} are the correct size.
This bug has been present since sdhi resources were added to the r8a7790 by
8c9b1aa41853272a ("ARM: shmobile: r8a7790: add MMCIF and SDHI DT
templates") in v3.11-rc2.
Signed-off-by: Ben Dooks <[email protected]>
Tested-by: William Towle <[email protected]>
Acked-by: Kuninori Morimoto <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
4dcfa60071b3d23f0181f27d8519f12e37cefbb9
(ARM: DMA-API: better handing of DMA masks for coherent allocations)
exchanged DMA mask check method.
Below warning will appear without this patch
asoc-simple-card asoc-simple-card.0: \
Coherent DMA mask 0xffffffffffffffff is larger than dma_addr_t allows
asoc-simple-card asoc-simple-card.0: \
Driver did not use or check the return value from dma_set_coherent_mask()?
Signed-off-by: Kuninori Morimoto <[email protected]>
Acked-by: Laurent Pinchart <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
This patch allows FILEIO to update hw_max_sectors based on the current
max_bytes_per_io. This is required because vfs_[writev,readv]() can accept
a maximum of 2048 iovecs per call, so the enforced hw_max_sectors really
needs to be calculated based on block_size.
This addresses a >= v3.5 bug where block_size=512 was rejecting > 1M
sized I/O requests, because FD_MAX_SECTORS was hardcoded to 2048 for
the block_size=4096 case.
(v2: Use max_bytes_per_io instead of ->update_hw_max_sectors)
Reported-by: Henrik Goldman <[email protected]>
Cc: <[email protected]> #3.5+
Signed-off-by: Nicholas Bellinger <[email protected]>
|
|
This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into
isert_create_device_ib_res(), instead of being done each callback
invocation in isert_cq_[rx,tx]_callback().
This also fixes a 'INFO: trying to register non-static key' warning
when cancel_work_sync() is called before INIT_WORK has setup the
struct work_struct.
Reported-by: Or Gerlitz <[email protected]>
Cc: <[email protected]> #3.12+
Signed-off-by: Nicholas Bellinger <[email protected]>
|
|
When shutting down a target there is a race condition between
iscsit_del_np() and __iscsi_target_login_thread().
The latter sets the thread pointer to NULL, and the former
tries to issue kthread_stop() on that pointer without any
synchronization.
This patch moves the np->np_thread NULL assignment into
iscsit_del_np(), after kthread_stop() has completed. It also
removes the signal_pending() + np_state check, and only
exits when kthread_should_stop() is true.
Reported-by: Hannes Reinecke <[email protected]>
Cc: <[email protected]> #3.12+
Signed-off-by: Nicholas Bellinger <[email protected]>
|
|
Commit 22ceeee16eb8f0d04de3ef43a5174fb30ec18af9 ("pwm-backlight: Add
power supply support") added a mandatory power supply for the PWM
backlight. Add a fixed 5V regulator to board code with a consumer supply
entry for the backlight device.
Signed-off-by: Laurent Pinchart <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
(cherry picked from commit ad11cb9a5cf96346f1240995c672cdbb5501785c)
Signed-off-by: Simon Horman <[email protected]>
|
|
Merge patches from Andrew Morton:
"23 fixes and a MAINTAINERS update"
* emailed patches from Andrew Morton <[email protected]>: (24 commits)
mm/hugetlb: check for pte NULL pointer in __page_check_address()
fix build with make 3.80
mm/mempolicy: fix !vma in new_vma_page()
MAINTAINERS: add Davidlohr as GPT maintainer
mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
mm/compaction: respect ignore_skip_hint in update_pageblock_skip
mm/mempolicy: correct putback method for isolate pages if failed
mm: add missing dependency in Kconfig
sh: always link in helper functions extracted from libgcc
mm: page_alloc: exclude unreclaimable allocations from zone fairness policy
mm: numa: defer TLB flush for THP migration as long as possible
mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates
mm: fix TLB flush race between migration, and change_protection_range
mm: numa: avoid unnecessary disruption of NUMA hinting during migration
mm: numa: clear numa hinting information on mprotect
sched: numa: skip inaccessible VMAs
mm: numa: avoid unnecessary work on the failure path
mm: numa: ensure anon_vma is locked to prevent parallel THP splits
mm: numa: do not clear PTE for pte_numa update
mm: numa: do not clear PMD during PTE update scan
...
|
|
In __page_check_address(), if address's pud is not present,
huge_pte_offset() will return NULL, we should check the return value.
Signed-off-by: Jianguo Wu <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: qiuxishi <[email protected]>
Cc: Hanjun Guo <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
According to Documentation/Changes, make 3.80 is still being supported
for building the kernel, hence make files must not make (unconditional)
use of features introduced only in newer versions.
Commit 1bf49dd4be0b ("./Makefile: export initial ramdisk compression
config option") however introduced "else ifeq" constructs which make
3.80 doesn't understand. Replace the logic there with more conventional
(in the kernel build infrastructure) list constructs (except that the
list here is intentionally limited to exactly one element).
Signed-off-by: Jan Beulich <[email protected]>
Cc: P J P <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
BUG_ON(!vma) assumption is introduced by commit 0bf598d863e3 ("mbind:
add BUG_ON(!vma) in new_vma_page()"), however, even if
address = __vma_address(page, vma);
and
vma->start < address < vma->end
page_address_in_vma() may still return -EFAULT because of many other
conditions in it. As a result the while loop in new_vma_page() may end
with vma=NULL.
This patch revert the commit and also fix the potential dereference NULL
pointer reported by Dan.
http://marc.info/?l=linux-mm&m=137689530323257&w=2
kernel BUG at mm/mempolicy.c:1204!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 3 PID: 7056 Comm: trinity-child3 Not tainted 3.13.0-rc3+ #2
task: ffff8801ca5295d0 ti: ffff88005ab20000 task.ti: ffff88005ab20000
RIP: new_vma_page+0x70/0x90
RSP: 0000:ffff88005ab21db0 EFLAGS: 00010246
RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000008040075 RSI: ffff8801c3d74600 RDI: ffffea00079a8b80
RBP: ffff88005ab21dc8 R08: 0000000000000004 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffff2
R13: ffffea00079a8b80 R14: 0000000000400000 R15: 0000000000400000
FS: 00007ff49c6f4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff49c68f994 CR3: 000000005a205000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffffea00079a8b80 ffffea00079a8bc0 ffffea00079a8ba0 ffff88005ab21e50
ffffffff811adc7a 0000000000000000 ffff8801ca5295d0 0000000464e224f8
0000000000000000 0000000000000002 0000000000000000 ffff88020ce75c00
Call Trace:
migrate_pages+0x12a/0x850
SYSC_mbind+0x513/0x6a0
SyS_mbind+0xe/0x10
ia32_do_call+0x13/0x13
Code: 85 c0 75 2f 4c 89 e1 48 89 da 31 f6 bf da 00 02 00 65 44 8b 04 25 08 f7 1c 00 e8 ec fd ff ff 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 4c 89 e6 48 89 df ba 01 00 00 00 e8 48
RIP [<ffffffff8119f200>] new_vma_page+0x70/0x90
RSP <ffff88005ab21db0>
Signed-off-by: Wanpeng Li <[email protected]>
Reported-by: Dave Jones <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Bob Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a new entry for the GPT standard. Any future changes will now be
routed through linux-efi.
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Matt Fleming <[email protected]>
Cc: Jens Axboe <[email protected]>
Acked-by: Matt Domsch <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After a successful hugetlb page migration by soft offline, the source
page will either be freed into hugepage_freelists or buddy(over-commit
page). If page is in buddy, page_hstate(page) will be NULL. It will
hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page().
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0
PGD c23762067 PUD c24be2067 PMD 0
Oops: 0000 [#1] SMP
So check PageHuge(page) after call migrate_pages() successfully.
Signed-off-by: Jianguo Wu <[email protected]>
Tested-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
update_pageblock_skip() only fits to compaction which tries to isolate
by pageblock unit. If isolate_migratepages_range() is called by CMA, it
try to isolate regardless of pageblock unit and it don't reference
get_pageblock_skip() by ignore_skip_hint. We should also respect it on
update_pageblock_skip() to prevent from setting the wrong information.
Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Rafael Aquini <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Cc: <[email protected]> [3.7+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
queue_pages_range() isolates hugetlbfs pages and putback_lru_pages()
can't handle these. We should change it to putback_movable_pages().
Naoya said that it is worth going into stable, because it can break
in-use hugepage list.
Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Cc: <[email protected]> [3.12.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Eliminate the following (rand)config warning by adding missing PROC_FS
dependency:
warning: (HWPOISON_INJECT && MEM_SOFT_DIRTY) selects PROC_PAGE_MONITOR which has unmet direct dependencies (PROC_FS && MMU)
Signed-off-by: Sima Baymani <[email protected]>
Suggested-by: David Rientjes <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m:
ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined!
For "lib-y", if no symbols in a compilation unit are referenced by other
units, the compilation unit will not be included in vmlinux. This
breaks modules that do reference those symbols.
Use "obj-y" instead to fix this.
http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/
This doesn't fix all cases. There are others, e.g. udivsi3.
This is also not limited to sh, many architectures handle this in the
same way.
A simple solution is to unconditionally include all helper functions.
A more complex solution is to make the choice of "lib-y" or "obj-y" depend
on CONFIG_MODULES:
obj-$(CONFIG_MODULES) += ...
lib-y($CONFIG_MODULES) += ...
Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: Paul Mundt <[email protected]>
Tested-by: Nobuhiro Iwamatsu <[email protected]>
Reviewed-by: Nobuhiro Iwamatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Dave Hansen noted a regression in a microbenchmark that loops around
open() and close() on an 8-node NUMA machine and bisected it down to
commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy").
That change forces the slab allocations of the file descriptor to spread
out to all 8 nodes, causing remote references in the page allocator and
slab.
The round-robin policy is only there to provide fairness among memory
allocations that are reclaimed involuntarily based on pressure in each
zone. It does not make sense to apply it to unreclaimable kernel
allocations that are freed manually, in this case instantly after the
allocation, and incur the remote reference costs twice for no reason.
Only round-robin allocations that are usually freed through page reclaim
or slab shrinking.
Bisected by Dave Hansen.
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
THP migration can fail for a variety of reasons. Avoid flushing the TLB
to deal with THP migration races until the copy is ready to start.
Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
table updates
According to documentation on barriers, stores issued before a LOCK can
complete after the lock implying that it's possible tlb_flush_pending
can be visible after a page table update. As per revised documentation,
this patch adds a smp_mb__before_spinlock to guarantee the correct
ordering.
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.
The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.
During that time, a CPU may continue writing to the page.
This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.
When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.
This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).
The basic race looks like this:
CPU A CPU B CPU C
load TLB entry
make entry PTE/PMD_NUMA
fault on entry
read/write old page
start migrating page
change PTE/PMD to new page
read/write old page [*]
flush TLB
reload TLB from new entry
read/write new page
lose data
[*] the old page may belong to a new user at this point!
The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.
This should fix both NUMA migration and compaction.
[[email protected]: fix build]
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
do_huge_pmd_numa_page() handles the case where there is parallel THP
migration. However, by the time it is checked the NUMA hinting
information has already been disrupted. This patch adds an earlier
check with some helpers.
Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
On a protection change it is no longer clear if the page should be still
accessible. This patch clears the NUMA hinting fault bits on a
protection change.
Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Inaccessible VMA should not be trapping NUMA hint faults. Skip them.
Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|