blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2015-04-08	KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu	Eugene Korenevsky	4	-16/+29
	cpuid_maxphyaddr(), which performs lot of memory accesses is called extensively across KVM, especially in nVMX code. This patch adds a cached value of maxphyaddr to vcpu.arch to reduce the pressure onto CPU cache and simplify the code of cpuid_maxphyaddr() callers. The cached value is initialized in kvm_arch_vcpu_init() and reloaded every time CPUID is updated by usermode. It is obvious that these reloads occur infrequently. Signed-off-by: Eugene Korenevsky <[email protected]> Message-Id: <20150329205612.GA1223@gnote> Signed-off-by: Paolo Bonzini <[email protected]>
2015-04-08	KVM: vmx: pass error code with internal error #2	Radim Krčmář	1	-1/+2
	Exposing the on-stack error code with internal error is cheap and potentially useful. Signed-off-by: Radim Krčmář <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2015-04-08	x86: vdso: fix pvclock races with task migration	Radim Krčmář	1	-8/+12
	If we were migrated right after __getcpu, but before reading the migration_count, we wouldn't notice that we read TSC of a different VCPU, nor that KVM's bug made pvti invalid, as only migration_count on source VCPU is increased. Change vdso instead of updating migration_count on destination. Cc: [email protected] Signed-off-by: Radim Krčmář <[email protected]> Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"") Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2015-04-08	KVM: remove kvm_read_hva and kvm_read_hva_atomic	Paolo Bonzini	1	-12/+2
	The corresponding write functions just use __copy_to_user. Do the same on the read side. This reverts what's left of commit 86ab8cffb498 (KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21) Cc: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Message-Id: <[email protected]>
2015-04-08	KVM: x86: optimize delivery of TSC deadline timer interrupt	Paolo Bonzini	1	-6/+7
	The newly-added tracepoint shows the following results on the tscdeadline_latency test: qemu-kvm-8387 [002] 6425.558974: kvm_vcpu_wakeup: poll time 10407 ns qemu-kvm-8387 [002] 6425.558984: kvm_vcpu_wakeup: poll time 0 ns qemu-kvm-8387 [002] 6425.561242: kvm_vcpu_wakeup: poll time 10477 ns qemu-kvm-8387 [002] 6425.561251: kvm_vcpu_wakeup: poll time 0 ns and so on. This is because we need to go through kvm_vcpu_block again after the timer IRQ is injected. Avoid it by polling once before entering kvm_vcpu_block. On my machine (Xeon E5 Sandy Bridge) this removes about 500 cycles (7%) from the latency of the TSC deadline timer. Signed-off-by: Paolo Bonzini <[email protected]>
2015-04-08	KVM: x86: extract blocking logic from __vcpu_run	Paolo Bonzini	1	-28/+34
	Rename the old __vcpu_run to vcpu_run, and extract part of it to a new function vcpu_block. The next patch will add a new condition in vcpu_block, avoid extra indentation. Reviewed-by: David Matlack <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2015-04-08	kvm: x86: fix x86 eflags fixed bit	Wanpeng Li	1	-3/+3
	Guest can't be booted w/ ept=0, there is a message dumped as below: If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=00000011 EBX=f000d2f6 ECX=00006cac EDX=000f8956 ESI=bffbdf62 EDI=00000000 EBP=00006c68 ESP=00006c68 EIP=0000d187 EFL=00000004 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =e000 000e0000 ffffffff 00809300 DPL=0 DS16 [-WA] CS =f000 000f0000 ffffffff 00809b00 DPL=0 CS16 [-RA] SS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] DS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] FS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] GS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f6a80 00000037 IDT= 000f6abe 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=01 1e b8 6a 2e 0f 01 16 74 6a 0f 20 c0 66 83 c8 01 0f 22 c0 <66> ea 8f d1 0f 00 08 00 b8 10 00 00 00 8e d8 8e c0 8e d0 8e e0 8e e8 89 c8 ff e2 89 c1 b8X X86 eflags bit 1 is fixed set, which means that 1 << 1 is set instead of 1, this patch fix it. Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2015-04-08	x86/asm/entry/64: Add forgotten CFI annotation	Denys Vlasenko	1	-0/+1
	Signed-off-by: Denys Vlasenko <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Drewry <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-04-08	x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout	Denys Vlasenko	5	-67/+26
	Interrupt entry points are handled with the following code, each 32-byte code block contains seven entry points: ... [push][jump 22] // 4 bytes [push][jump 18] // 4 bytes [push][jump 14] // 4 bytes [push][jump 10] // 4 bytes [push][jump 6] // 4 bytes [push][jump 2] // 4 bytes [push][jump common_interrupt][padding] // 8 bytes [push][jump] [push][jump] [push][jump] [push][jump] [push][jump] [push][jump] [push][jump common_interrupt][padding] [padding_2] common_interrupt: And there is a table which holds pointers to every entry point, IOW: to every push. In cold cache, two jumps are still costlier than one, even though we get the benefit of them residing in the same cacheline. This change replaces short jumps with near ones to 'common_interrupt', and pads every push+jump pair to 8 bytes. This way, each interrupt takes only one jump. This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before dispatch table with ".align 8" - we do not need anything stronger than that. The table of entry addresses (the interrupt[] array) is no longer necessary, the address of entries can be easily calculated as (irq_entries_start + i*8). text data bss dec hex filename 12546 0 0 12546 3102 entry_64.o.before 11626 0 0 11626 2d6a entry_64.o The size decrease is because 1656 bytes of .init.rodata are gone. That's initdata, though. The resident size does go up a bit. Run-tested (32 and 64 bits). Acked-and-Tested-by: Borislav Petkov <[email protected]> Signed-off-by: Denys Vlasenko <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Will Drewry <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-04-08	x86/asm/entry/64: Move opportunistic sysret code to syscall code path	Denys Vlasenko	1	-72/+86
	This change does two things: Copy-pastes "retint_swapgs:" code into syscall handling code, the copy is under "syscall_return:" label. The code is unchanged apart from some label renames. Removes "opportunistic sysret" code from "retint_swapgs:" code block, since now it won't be reached by syscall return. This in fact removes most of the code in question. text data bss dec hex filename 12530 0 0 12530 30f2 entry_64.o.before 12562 0 0 12562 3112 entry_64.o Run-tested. Acked-and-Tested-by: Borislav Petkov <[email protected]> Signed-off-by: Denys Vlasenko <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Will Drewry <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-04-08	Merge tag 'v4.0-rc7' into x86/asm, to resolve conflicts	Ingo Molnar	305	-1377/+2818
	Conflicts: arch/x86/kernel/entry_64.S Signed-off-by: Ingo Molnar <[email protected]>
2015-04-08	x86, selftests: Add sigreturn selftest	Andy Lutomirski	6	-0/+760
	This is my sigreturn test, added mostly unchanged from its old home. It exercises the sigreturn(2) syscall, specifically focusing on its interactions with various IRET corner cases. It tests for correct behavior in several areas that were historically dangerously buggy. For example, it exercises espfix on kernels of both bitnesses under various conditions, and it contains testcases for several now-fixed bugs in IRET error handling. If you run it on older kernels without the fixes, your system will crash. It probably won't eat your data in the process. There is no released kernel on which the sigreturn_64 test will pass, but it passes on tip:x86/asm. I plan to switch to lib.mk for Linux 4.2. I'm not using the ksft_ helpers at all yet. I can do that later. Signed-off-by: Andy Lutomirski <[email protected]> Acked-by: Shuah Khan <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/89d10b76b92c7202d8123654dc8d36701c017b3d.1428386971.git.luto@kernel.org [ Fixed empty format string GCC build warning in trivial_32bit_program.c ] Signed-off-by: Ingo Molnar <[email protected]>
2015-04-08	nios2: signal: Move restart_block to struct task_struct	Ley Foon Tan	2	-5/+1
	See https://lkml.org/lkml/2014/10/29/643 and commit f56141e3e2d9 ("all arches, signal: move restart_block to struct task_struct") Signed-off-by: Ley Foon Tan <[email protected]>
2015-04-08	drm: fix drm_mode_getconnector() locking imbalance regression	Tommi Rantala	1	-1/+3
	Regression in commit 2caa80e72b57c6216aec6f6a11fcfb4fec46daa0 Author: Daniel Vetter <[email protected]> Date: Sun Feb 22 11:38:36 2015 +0100 drm: Fix deadlock due to getconnector locking changes If the drm_connector_find() call returns NULL, we should no longer call drm_modeset_unlock() to avoid locking imbalance. Signed-off-by: Tommi Rantala <[email protected]> Cc: Daniel Vetter <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2015-04-08	md: fix md io stats accounting broken	Gu Zheng	1	-1/+5
	Simon reported the md io stats accounting issue: " I'm seeing "iostat -x -k 1" print this after a RAID1 rebuild on 4.0-rc5. It's not abnormal other than it's 3-disk, with one being SSD (sdc) and the other two being write-mostly: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 345.00 0.00 0.00 0.00 0.00 100.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 58779.00 0.00 0.00 0.00 0.00 100.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.00 0.00 0.00 0.00 0.00 100.00 " The cause is commit "18c0b223cf9901727ef3b02da6711ac930b4e5d4" uses the generic_start_io_acct to account the disk stats rather than the open code, but it also introduced the increase to .in_flight[rw] which is needless to md. So we re-use the open code here to fix it. Reported-by: Simon Kirby <[email protected]> Cc: <[email protected]> 3.19 Signed-off-by: Gu Zheng <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2015-04-07	iscsi-target: TargetAddress in SendTargets should bracket ipv6 addresses	Andy Grover	1	-2/+7
	"The domainname can be specified as either a DNS host name, a dotted-decimal IPv4 address, or a bracketed IPv6 address as specified in [RFC2732]." See https://bugzilla.redhat.com/show_bug.cgi?id=1206868 Reported-by: Kyle Brantley <[email protected]> Signed-off-by: Andy Grover <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2015-04-07	Merge tag 'media/v3.20-2' of ↵	Linus Torvalds	18	-30/+34
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: "A series of fixup patches for version 4.0: - one VB2 core fixup, when stopping the stream; - one VB2 core fixup for dma-contig memory type; - driver fixes at rtl28xx, s5p (tv, jpeg, mfc, soc-camera, sh_veu, cx23885, gspca" * tag 'media/v3.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] rtl28xxu: return success for unimplemented FE callback [media] rtl2832: disable regmap register cache [media] vb2: Fix dma_dir setting for dma-contig mem type [media] media: s5p-mfc: fix broken pointer cast on 64bit arch [media] media: s5p-mfc: fix mmap support for 64bit arch [media] cx23885: fix querycap [media] sh_veu: v4l2_dev wasn't set [media] s5p-mfc: Fix NULL pointer dereference caused by not set q->lock [media] s5p-jpeg: exynos3250: fix erroneous reset procedure [media] s5p-tv: hdmi needs I2C support [media] s5p-jpeg: Initialize cb and cr to zero [media] media: fix gspca drivers build dependencies [media] soc-camera: Fix devm_kfree() in soc_of_bind() [media] media: atmel-isi: increase the burst length to improve the performance [media] vb2: fix 'UNBALANCED' warnings when calling vb2_thread_stop()
2015-04-07	mm: numa: disable change protection for vma(VM_HUGETLB)	Naoya Horiguchi	1	-1/+3
	Currently when a process accesses a hugetlb range protected with PROTNONE, unexpected COWs are triggered, which finally puts the hugetlb subsystem into a broken/uncontrollable state, where for example h->resv_huge_pages is subtracted too much and wraps around to a very large number, and the free hugepage pool is no longer maintainable. This patch simply stops changing protection for vma(VM_HUGETLB) to fix the problem. And this also allows us to avoid useless overhead of minor faults. Signed-off-by: Naoya Horiguchi <[email protected]> Suggested-by: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: David Rientjes <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-07	include/linux/dmapool.h: declare struct device	Mark Brown	1	-0/+2
	dmapool uses struct device in function arguments but relies on an implicit inclusion to declare struct device causing warnings in some configurations: include/linux/dmapool.h:31:7: warning: 'struct device' declared inside parameter list Fix this by adding a struct device declaration to the file. Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-07	mm: move zone lock to a different cache line than order-0 free page lists	Mel Gorman	1	-4/+3
	Huang Ying reported the following problem due to commit 3484b2de9499 ("mm: rearrange zone fields into read-only, page alloc, statistics and page reclaim lines") from the Intel performance tests 24b7e5819ad5cbef 3484b2de9499df23c4604a513b ---------------- -------------------------- %stddev %change %stddev \ \| \ 152288 \261 0% -46.2% 81911 \261 0% aim7.jobs-per-min 237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time 237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time.max 25026 \261 0% +70.7% 42712 \261 0% aim7.time.system_time 2186645 \261 5% +32.0% 2885949 \261 4% aim7.time.voluntary_context_switches 4576561 \261 1% +24.9% 5715773 \261 0% aim7.time.involuntary_context_switches The problem is specific to very large machines under stress. It was not reproducible with the machines I had used to justify the original patch because large numbers of CPUs are required. When pressure is high enough, the cache line is bouncing between CPUs trying to acquire the lock and the holder of the lock adjusting free lists. The intention was that the acquirer of the lock would automatically have the cache line holding the free lists but according to Huang, this is not a universal win. One possibility is to move the zone lock to its own cache line but it increases the size of the zone. This patch moves the lock to the other end of the free lists where they do not contend under high pressure. It does mean the page allocator paths now require more cache lines but Huang reports that it restores performance to previous levels on large machines %stddev %change %stddev \ \| \ 84568 \261 1% +94.3% 164280 \261 1% aim7.jobs-per-min 2881944 \261 2% -35.1% 1870386 \261 8% aim7.time.voluntary_context_switches 681 \261 1% -3.4% 658 \261 0% aim7.time.user_time 5538139 \261 0% -12.1% 4867884 \261 0% aim7.time.involuntary_context_switches 44174 \261 1% -46.0% 23848 \261 1% aim7.time.system_time 426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time 426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time.max 468 \261 1% -43.1% 266 \261 2% uptime.boot Signed-off-by: Mel Gorman <[email protected]> Reported-by: Huang Ying <[email protected]> Tested-by: Huang Ying <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-07	drivers: thermal: st: remove several sparse warnings	Eduardo Valentin	3	-11/+11
	Simple patch to make symbols static. Symbols that are not shared with other parts of the kernel can be made static. This change also removes several sparse complains. Cc: Zhang Rui <[email protected]> Cc: Lee Jones <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Ajit Pal Singh <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Eduardo Valentin <[email protected]>
2015-04-07	thermal: constify of_device_id array	Fabian Frederick	2	-2/+2
	of_device_id is always used as const. (See driver.of_match_table and open firmware functions) Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Eduardo Valentin <[email protected]>
2015-04-07	thermal: Do not log an error if thermal_zone_get_temp returns -EAGAIN	Hans de Goede	1	-2/+4
	Some temperature sensors only get updated every few seconds and while waiting for the first irq reporting a (new) temperature to happen there get_temp operand will return -EAGAIN as it does not have any data to report yet. Not logging an error in this case avoids messages like these from showing up in dmesg on affected systems: [ 1.219353] thermal thermal_zone0: failed to read out thermal zone 0 [ 2.015433] thermal thermal_zone0: failed to read out thermal zone 0 [ 2.416737] thermal thermal_zone0: failed to read out thermal zone 0 Reviewed-by: Dmitry Torokhov <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Eduardo Valentin <[email protected]>
2015-04-07	thermal: rcar: Fix typo in r8a73a4 SoC name	Geert Uytterhoeven	1	-1/+1
	r8a73a4 is R-Mobile APE6, not AP6. Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Eduardo Valentin <[email protected]>
2015-04-07	Merge tag 'kvm-s390-next-20150331' of ↵	Paolo Bonzini	9	-402/+906
	git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Features and fixes for 4.1 (kvm/next) 1. Assorted changes 1.1 allow more feature bits for the guest 1.2 Store breaking event address on program interrupts 2. Interrupt handling rework 2.1 Fix copy_to_user while holding a spinlock (cc stable) 2.2 Rework floating interrupts to follow the priorities 2.3 Allow to inject all local interrupts via new ioctl 2.4 allow to get/set the full local irq state, e.g. for migration and introspection
2015-04-07	Merge tag 'kvm-arm-for-4.1' of ↵	Paolo Bonzini	47	-703/+1012
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups
2015-04-07	Revert "libceph: use memalloc flags for net IO"	Ilya Dryomov	1	-8/+1
	This reverts commit 89baaa570ab0b476db09408d209578cfed700e9f. Dirty page throttling should be sufficient for us in the general case so there is no need to use __GFP_MEMALLOC - it would be needed only in the swap-over-rbd case, which we currently don't support. (It would probably take approximately the commit that is being reverted to add that support, but we would also need the "swap" option to distinguish from the general case and make sure swap ceph_client-s aren't shared with anything else.) See ceph-devel threads [1] and [2] for the details of why enabling pfmemalloc reserves for all cases is a bad thing. On top of potential system lockups related to drained emergency reserves, this turned out to cause ceph lockups in case peers are on the same host and communicating via loopback due to sk_filter() dropping pfmemalloc skbs on the receiving side because the receiving loopback socket is not tagged with SOCK_MEMALLOC. [1] "SOCK_MEMALLOC vs loopback" http://www.spinics.net/lists/ceph-devel/msg22998.html [2] "[PATCH] libceph: don't set memalloc flags in loopback case" http://www.spinics.net/lists/ceph-devel/msg23392.html Conflicts: net/ceph/messenger.c [ context: tcp_nodelay option ] Cc: Mike Christie <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Sage Weil <[email protected]> Cc: [email protected] # 3.18+, needs backporting Signed-off-by: Ilya Dryomov <[email protected]> Acked-by: Mike Christie <[email protected]> Acked-by: Mel Gorman <[email protected]>
2015-04-07	Merge tag 'kvm-arm-fixes-4.0-rc5' of ↵	Paolo Bonzini	8	-75/+105
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' Fixes for KVM/ARM for 4.0-rc5. Fixes page refcounting issues in our Stage-2 page table management code, fixes a missing unlock in a gicv3 error path, and fixes a race that can cause lost interrupts if signals are pending just prior to entering the guest.
2015-04-07	x86/mm/numa: Fix kernel stack corruption in ↵	Dave Young	1	-2/+9
	numa_init()->numa_clear_kernel_node_hotplug() I got below kernel panic during kdump test on Thinkpad T420 laptop: [ 0.000000] No NUMA configuration found [ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000037ba4fff] [ 0.000000] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81d21910 ... [ 0.000000] Call Trace: [ 0.000000] [<ffffffff817c2a26>] dump_stack+0x45/0x57 [ 0.000000] [<ffffffff817bc8d2>] panic+0xd0/0x204 [ 0.000000] [<ffffffff81d21910>] ? numa_clear_kernel_node_hotplug+0xe6/0xf2 [ 0.000000] [<ffffffff8107741b>] __stack_chk_fail+0x1b/0x20 [ 0.000000] [<ffffffff81d21910>] numa_clear_kernel_node_hotplug+0xe6/0xf2 [ 0.000000] [<ffffffff81d21e5d>] numa_init+0x1a5/0x520 [ 0.000000] [<ffffffff81d222b1>] x86_numa_init+0x19/0x3d [ 0.000000] [<ffffffff81d22460>] initmem_init+0x9/0xb [ 0.000000] [<ffffffff81d0d00c>] setup_arch+0x94f/0xc82 [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120 [ 0.000000] [<ffffffff817bd0bb>] ? printk+0x55/0x6b [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120 [ 0.000000] [<ffffffff81d05d9b>] start_kernel+0xe8/0x4d6 [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120 [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120 [ 0.000000] [<ffffffff81d055ee>] x86_64_start_reservations+0x2a/0x2c [ 0.000000] [<ffffffff81d05751>] x86_64_start_kernel+0x161/0x184 [ 0.000000] ---[ end Kernel panic - not syncing: stack-protector: Kernel sta This is caused by writing over the end of numa mask bitmap in numa_clear_kernel_node(). numa_clear_kernel_node() tries to set the node id in a mask bitmap, by iterating all reserved regions and assuming that every region has a valid nid. This assumption is not true because there's an exception for some graphic memory quirks. See trim_snb_memory() in arch/x86/kernel/setup.c It is easily to reproduce the bug in the kdump kernel because kdump kernel use pre-reserved memory instead of the whole memory, but kexec pass other reserved memory ranges to 2nd kernel as well. like below in my test: kdump kernel ram 0x2d000000 - 0x37bfffff One of the reserved regions: 0x40000000 - 0x40100000 which includes 0x40004000, a page excluded in trim_snb_memory(). For this memblock reserved region the nid is not set, it is still default value MAX_NUMNODES. later node_set will set bit MAX_NUMNODES thus stack corruption happen. This also happens when booting with mem= kernel commandline during my test. Fixing it by adding a check, do not call node_set in case nid is MAX_NUMNODES. Signed-off-by: Dave Young <[email protected]> Reviewed-by: Yasuaki Ishimatsu <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-04-07	drm/i915/vlv: remove wait for previous GFX clk disable request	Jesse Barnes	1	-14/+0
	Looks like it was introduced in: commit 650ad970a39f8b6164fe8613edc150f585315289 Author: Imre Deak <[email protected]> Date: Fri Apr 18 16:35:02 2014 +0300 drm/i915: vlv: factor out vlv_force_gfx_clock and check for pending force-of but I'm not sure why. It has caused problems for us in the past (see 85250ddff7a6 "drm/i915/chv: Remove Wait for a previous gfx force-off" and 8d4eee9cd7a1 "drm/i915: vlv: increase timeout when forcing on the GFX clock") and doesn't seem to be required, so let's just drop it. References: https://bugs.freedesktop.org/show_bug.cgi?id=89611 Signed-off-by: Jesse Barnes <[email protected]> Tested-by: Darren Hart <[email protected]> Reviewed-by: Deepak S <[email protected]> Cc: [email protected] # c9c52e24194a: drm/i915/chv: Remove Wait ... Cc: [email protected] Signed-off-by: Jani Nikula <[email protected]>
2015-04-07	drm/i915/chv: Remove Wait for a previous gfx force-off	Deepak S	1	-2/+4
	On CHV, PUNIT team confirmed that 'VLV_GFX_CLK_STATUS_BIT' is not a sticky bit and it will always be set. So ignore Check for previous Gfx force off during suspend and allow the force clk as part S0ix Sequence Signed-off-by: Deepak S <[email protected]> Cc: [email protected] Reviewed-by: Ville Syrjälä <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Signed-off-by: Jani Nikula <[email protected]>
2015-04-07	drm/i915/vlv: save/restore the power context base reg	Jesse Barnes	2	-0/+3
	Some BIOSes (e.g. the one on the Minnowboard) don't save/restore this reg. If it's unlocked, we can just restore the previous value, and if it's locked (in case the BIOS re-programmed it for us) the write will be ignored and we'll still have "did it move" sanity check in the PM code to warn us if something is still amiss. References: https://bugs.freedesktop.org/show_bug.cgi?id=89611 Signed-off-by: Jesse Barnes <[email protected]> Tested-by: Darren Hart <[email protected]> Cc: [email protected] Reviewed-by: Imre Deak <[email protected]> Reviewed-by: Deepak S <[email protected]> Signed-off-by: Jani Nikula <[email protected]>
2015-04-07	Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"	Rafael J. Wysocki	1	-20/+1
	Commit 84c91b7ae07c (PM / hibernate: avoid unsafe pages in e820 reserved regions) is reported to make resume from hibernation on Lenovo x230 unreliable, so revert it. We will revisit the issue the commit in question was supposed to fix in the future. Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111 Reported-by: rhn <[email protected]> Cc: 3.17+ <[email protected]> # 3.17+ Signed-off-by: Rafael J. Wysocki <[email protected]>
2015-04-06	Linux 4.0-rc7	Linus Torvalds	1	-1/+1

2015-04-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds	21	-71/+78
	Pull networking fixes from David Miller: 1) In TCP, don't register an FRTO for cumulatively ACK'd data that was previously SACK'd, from Neal Cardwell. 2) Need to hold RNL mutex in ipv4 multicast code namespace cleanup, from Cong WANG. 3) Similarly we have to hold RNL mutex for fib_rules_unregister(), also from Cong WANG. 4) Revert and rework netns nsid allocation fix, from Nicolas Dichtel. 5) When we encapsulate for a tunnel device, skb->sk still points to the user socket. So this leads to cases where we retraverse the ipv4/ipv6 output path with skb->sk being of some other address family (f.e. AF_PACKET). This can cause things to crash since the ipv4 output path is dereferencing an AF_PACKET socket as if it were an ipv4 one. The short term fix for 'net' and -stable is to elide these socket checks once we've entered an encapsulation sequence by testing xmit_recursion. Longer term we have a better solution wherein we pass the tunnel's socket down through the output paths, but that is way too invasive for 'net' and -stable. From Hannes Frederic Sowa. 6) l2tp_init() failure path forgets to unregister per-net ops, from Cong WANG. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net/mlx4_core: Fix error message deprecation for ConnectX-2 cards net: dsa: fix filling routing table from OF description l2tp: unregister l2tp_net_ops on failure path mvneta: dont call mvneta_adjust_link() manually ipv6: protect skb->sk accesses from recursive dereference inside the stack netns: don't allocate an id for dead netns Revert "netns: don't clear nsid too early on removal" ip6mr: call del_timer_sync() in ip6mr_free_table() net: move fib_rules_unregister() under rtnl lock ipv4: take rtnl_lock and mark mrt table as freed on namespace cleanup tcp: fix FRTO undo on cumulative ACK of SACKed range xen-netfront: transmit fully GSO-sized packets
2015-04-06	ioctx_alloc(): fix vma (and file) leak on failure	Al Viro	1	-0/+3
	If we fail past the aio_setup_ring(), we need to destroy the mapping. We don't need to care about anybody having found ctx, or added requests to it, since the last failure exit is exactly the failure to make ctx visible to lookups. Reproducer (based on one by Joe Mario <[email protected]>): void count(char p) { char s[80]; printf("%s: ", p); fflush(stdout); sprintf(s, "/bin/cat /proc/%d/maps\|/bin/fgrep -c '/[aio] (deleted)'", getpid()); system(s); } int main() { io_context_t ctx; int created, limit, i, destroyed; FILE *f; count("before"); if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL) perror("opening aio-max-nr"); else if (fscanf(f, "%d", &limit) != 1) fprintf(stderr, "can't parse aio-max-nr\n"); else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL) perror("allocating aio_context_t array"); else { for (i = 0, created = 0; i < limit; i++) { if (io_setup(1000, ctx + created) == 0) created++; } for (i = 0, destroyed = 0; i < created; i++) if (io_destroy(ctx[i]) == 0) destroyed++; printf("created %d, failed %d, destroyed %d\n", created, limit - created, destroyed); count("after"); } } Found-by: Joe Mario <[email protected]> Cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2015-04-06	fix mremap() vs. ioctx_kill() race	Al Viro	3	-9/+20
	teach ->mremap() method to return an error and have it fail for aio mappings in process of being killed Note that in case of ->mremap() failure we need to undo move_page_tables() we'd already done; we could call ->mremap() first, but then the failure of move_page_tables() would require undoing whatever _successful_ ->mremap() has done, which would be a lot more headache in general. Signed-off-by: Al Viro <[email protected]>
2015-04-06	net/mlx4_core: Fix error message deprecation for ConnectX-2 cards	Jack Morgenstein	1	-1/+2
	Commit 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") did the deprecation only for port 1 of the card. Need to deprecate for port 2 as well. Fixes: 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Amir Vadai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-06	net: dsa: fix filling routing table from OF description	Pavel Nakonechny	2	-17/+10
	According to description in 'include/net/dsa.h', in cascade switches configurations where there are more than one interconnected devices, 'rtable' array in 'dsa_chip_data' structure is used to indicate which port on this switch should be used to send packets to that are destined for corresponding switch. However, dsa_of_setup_routing_table() fills 'rtable' with port numbers of the _target_ switch, but not current one. This commit removes redundant devicetree parsing and adds needed port number as a function argument. So dsa_of_setup_routing_table() now just looks for target switch number by parsing parent of 'link' device node. To remove possible misunderstandings with the way of determining target switch number, a corresponding comment was added to the source code and to the DSA device tree bindings documentation file. This was tested on a custom board with two Marvell 88E6095 switches with following corresponding routing tables: { -1, 10 } and { 8, -1 }. Signed-off-by: Pavel Nakonechny <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-06	Merge branch 'for-linus' of ↵	Linus Torvalds	2	-14/+31
	git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Updates for the input subsystem - two more tweaks for ALPS driver to work out kinks after splitting the touchpad, trackstick, and potential external PS/2 mouse into separate input devices. Changes to support ALPS SS4 devices (protocol V8) will be coming in 4.1..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: alps - document stick behavior for protocol V2 Input: alps - report V2 Dualpoint Stick events via the right evdev node Input: alps - report interleaved bare PS/2 packets via dev3
2015-04-06	l2tp: unregister l2tp_net_ops on failure path	WANG Cong	1	-0/+1
	Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-06	mvneta: dont call mvneta_adjust_link() manually	Stas Sergeev	1	-6/+1
	mvneta_adjust_link() is a callback for of_phy_connect() and should not be called directly. The result of calling it directly is as below: Signed-off-by: David S. Miller <[email protected]>
2015-04-06	ipv6: protect skb->sk accesses from recursive dereference inside the stack	[email protected]	7	-19/+34
	We should not consult skb->sk for output decisions in xmit recursion levels > 0 in the stack. Otherwise local socket settings could influence the result of e.g. tunnel encapsulation process. ipv6 does not conform with this in three places: 1) ip6_fragment: we do consult ipv6_npinfo for frag_size 2) sk_mc_loop in ipv6 uses skb->sk and checks if we should loop the packet back to the local socket 3) ip6_skb_dst_mtu could query the settings from the user socket and force a wrong MTU Furthermore: In sk_mc_loop we could potentially land in WARN_ON(1) if we use a PF_PACKET socket ontop of an IPv6-backed vxlan device. Reuse xmit_recursion as we are currently only interested in protecting tunnel devices. Cc: Jiri Pirko <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-06	x86/alternatives: Guard NOPs optimization	Borislav Petkov	1	-0/+3
	Take a look at the first instruction byte before optimizing the NOP - there might be something else there already, like the ALTERNATIVE_2() in rdtsc_barrier() which NOPs out on AMD even though we just patched in an MFENCE. This happens because the alternatives sees X86_FEATURE_MFENCE_RDTSC, AMD CPUs set it, we patch in the MFENCE and right afterwards it sees X86_FEATURE_LFENCE_RDTSC which AMD CPUs don't set and we blindly optimize the NOP. Checking whether at least the first byte is 0x90 prevents that. Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-04-06	x86/asm/entry: Clear EXTRA_REGS for all executable formats	Denys Vlasenko	3	-31/+35
	On failure, sys_execve() does not clobber EXTRA_REGS, so we can just return to userpsace without saving/restoring them. On success, ELF_PLAT_INIT() in sys_execve() clears all these registers. On other executable formats: - binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone else except sh) doesn't define it. - binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most others) doesn't define it. - There are no such hooks in binfmt_aout.c et al. We inherit EXTRA_REGS from the prior executable. This inconsistency was not intended. This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve, removes register clearing in ELF_PLAT_INIT(), and instead simply clears them on success in stub_execve. Run-tested. Signed-off-by: Denys Vlasenko <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Will Drewry <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-04-06	x86/signal: Remove pax argument from restore_sigcontext	Brian Gerst	3	-28/+15
	The 'pax' argument is unnecesary. Instead, store the RAX value directly in regs. This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext() was changed to return an error code instead of EAX directly: https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174 In 2007 sigaltstack syscall support was added, where the return value of restore_sigcontext() was changed to carry the memory-copying failure code. But instead of putting 'ax' into regs->ax directly, it was carried in via a pointer and then returned, where the generic syscall return code copied it to regs->ax. So there was never any deeper reason for this suboptimal pattern, it was simply never noticed after being introduced. Signed-off-by: Brian Gerst <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-04-05	Input: alps - document stick behavior for protocol V2	Hans de Goede	1	-0/+8
	Document that protocol V2 uses standard (bare) PS/2 mouse packets for the DualPoint stick. Signed-off-by: Hans de Goede <[email protected]> Acked-By: Pali Rohár <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
2015-04-05	Input: alps - report V2 Dualpoint Stick events via the right evdev node	Hans de Goede	1	-1/+6
	On V2 devices the DualPoint Stick reports bare packets, these should be reported via the "AlpsPS/2 ALPS DualPoint Stick" dev2 evdev node, which also has the INPUT_PROP_POINTING_STICK propbit set. Note that since there is no way to distinguish these packets from an external PS/2 mouse (insofar as these laptops have an external PS/2 port) this means that we will be reporting PS/2 mouse events via this evdev node too, as we've been doing in kernel 3.19 and older. This has been tested on a Dell Latitude D620 and a Dell Latitude E6400, which both have a V2 touchpad + a DualPoint Stick which reports bare packets. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Pali Rohár <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
2015-04-05	Input: alps - report interleaved bare PS/2 packets via dev3	Hans de Goede	1	-14/+18
	Bare packets should be reported via the same evdev device independent on whether they are detected on the beginning of a packet or in the middle of a packet. This has been tested on a Dell Latitude E6400, where the DualPoint Stick reports bare packets, which get reported via dev3 when the touchpad is idle, and via dev2 when the touchpad and stick are used simultaneously. This commit fixes this inconsistency by always reporting bare packets via dev3. Note that since the come from a DualPoint Stick they really should be reported via dev2, this gets fixed in a later commit. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Pali Rohár <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
2015-04-04	Merge tag 'usb-4.0-rc6' of ↵	Linus Torvalds	6	-5/+26
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes and new device ids for 4.0-rc6. Nothing major, some xhci fixes for reported problems, and some usb-serial device ids. All have been in linux-next for a while" * tag 'usb-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: ftdi_sio: Use jtag quirk for SNAP Connect E10 usb: isp1760: fix spin unlock in the error path of isp1760_udc_start usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers usb: xhci: handle Config Error Change (CEC) in xhci driver USB: keyspan_pda: add new device id USB: ftdi_sio: Added custom PID for Synapse Wireless product