aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-08-25dax: fix deadlock due to misaligned PMD faultsRoss Zwisler1-0/+10
In DAX there are two separate places where the 2MiB range of a PMD is defined. The first is in the page tables, where a PMD mapping inserted for a given address spans from (vmf->address & PMD_MASK) to ((vmf->address & PMD_MASK) + PMD_SIZE - 1). That is, from the 2MiB boundary below the address to the 2MiB boundary above the address. So, for example, a fault at address 3MiB (0x30 0000) falls within the PMD that ranges from 2MiB (0x20 0000) to 4MiB (0x40 0000). The second PMD range is in the mapping->page_tree, where a given file offset is covered by a radix tree entry that spans from one 2MiB aligned file offset to another 2MiB aligned file offset. So, for example, the file offset for 3MiB (pgoff 768) falls within the PMD range for the order 9 radix tree entry that ranges from 2MiB (pgoff 512) to 4MiB (pgoff 1024). This system works so long as the addresses and file offsets for a given mapping both have the same offsets relative to the start of each PMD. Consider the case where the starting address for a given file isn't 2MiB aligned - say our faulting address is 3 MiB (0x30 0000), but that corresponds to the beginning of our file (pgoff 0). Now all the PMDs in the mapping are misaligned so that the 2MiB range defined in the page tables never matches up with the 2MiB range defined in the radix tree. The current code notices this case for DAX faults to storage with the following test in dax_pmd_insert_mapping(): if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR) goto unlock_fallback; This test makes sure that the pfn we get from the driver is 2MiB aligned, and relies on the assumption that the 2MiB alignment of the pfn we get back from the driver matches the 2MiB alignment of the faulting address. However, faults to holes were not checked and we could hit the problem described above. This was reported in response to the NVML nvml/src/test/pmempool_sync TEST5: $ cd nvml/src/test/pmempool_sync $ make TEST5 You can grab NVML here: https://github.com/pmem/nvml/ The dmesg warning you see when you hit this error is: WARNING: CPU: 13 PID: 2900 at fs/dax.c:641 dax_insert_mapping_entry+0x2df/0x310 Where we notice in dax_insert_mapping_entry() that the radix tree entry we are about to replace doesn't match the locked entry that we had previously inserted into the tree. This happens because the initial insertion was done in grab_mapping_entry() using a pgoff calculated from the faulting address (vmf->address), and the replacement in dax_pmd_load_hole() => dax_insert_mapping_entry() is done using vmf->pgoff. In our failure case those two page offsets (one calculated from vmf->address, one using vmf->pgoff) point to different order 9 radix tree entries. This failure case can result in a deadlock because the radix tree unlock also happens on the pgoff calculated from vmf->address. This means that the locked radix tree entry that we swapped in to the tree in dax_insert_mapping_entry() using vmf->pgoff is never unlocked, so all future faults to that 2MiB range will block forever. Fix this by validating that the faulting address's PMD offset matches the PMD offset from the start of the file. This check is done at the very beginning of the fault and covers faults that would have mapped to storage as well as faults to holes. I left the COLOUR check in dax_pmd_insert_mapping() in place in case we ever hit the insanity condition where the alignment of the pfn we get from the driver doesn't match the alignment of the userspace address. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ross Zwisler <[email protected]> Reported-by: "Slusarz, Marcin" <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-25mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabledKirill A. Shutemov1-2/+2
/sys/kernel/mm/transparent_hugepage/shmem_enabled controls if we want to allocate huge pages when allocate pages for private in-kernel shmem mount. Unfortunately, as Dan noticed, I've screwed it up and the only way to make kernel allocate huge page for the mount is to use "force" there. All other values will be effectively ignored. Link: http://lkml.kernel.org/r/[email protected] Fixes: 5a6e75f8110c ("shmem: prepare huge= mount option and sysfs knob") Signed-off-by: Kirill A. Shutemov <[email protected]> Reported-by: Dan Carpenter <[email protected]> Cc: stable <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-25PM/hibernate: touch NMI watchdog when creating snapshotChen Yu1-2/+18
There is a problem that when counting the pages for creating the hibernation snapshot will take significant amount of time, especially on system with large memory. Since the counting job is performed with irq disabled, this might lead to NMI lockup. The following warning were found on a system with 1.5TB DRAM: Freezing user space processes ... (elapsed 0.002 seconds) done. OOM killer disabled. PM: Preallocating image memory... NMI watchdog: Watchdog detected hard LOCKUP on cpu 27 CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted 4.13.0-0.rc2.git0.1.fc27.x86_64 #1 task: ffff9f01971ac000 task.stack: ffffb1a3f325c000 RIP: 0010:memory_bm_find_bit+0xf4/0x100 Call Trace: swsusp_set_page_free+0x2b/0x30 mark_free_pages+0x147/0x1c0 count_data_pages+0x41/0xa0 hibernate_preallocate_memory+0x80/0x450 hibernation_snapshot+0x58/0x410 hibernate+0x17c/0x310 state_store+0xdf/0xf0 kobj_attr_store+0xf/0x20 sysfs_kf_write+0x37/0x40 kernfs_fop_write+0x11c/0x1a0 __vfs_write+0x37/0x170 vfs_write+0xb1/0x1a0 SyS_write+0x55/0xc0 entry_SYSCALL_64_fastpath+0x1a/0xa5 ... done (allocated 6590003 pages) PM: Allocated 26360012 kbytes in 19.89 seconds (1325.28 MB/s) It has taken nearly 20 seconds(2.10GHz CPU) thus the NMI lockup was triggered. In case the timeout of the NMI watch dog has been set to 1 second, a safe interval should be 6590003/20 = 320k pages in theory. However there might also be some platforms running at a lower frequency, so feed the watchdog every 100k pages. [[email protected]: simplification] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: use interval of 128k instead of 100k to avoid modulus] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chen Yu <[email protected]> Reported-by: Jan Filipcewicz <[email protected]> Suggested-by: Michal Hocko <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Len Brown <[email protected]> Cc: Dan Williams <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-25virtio_pci: fix cpu affinity supportChristoph Hellwig1-3/+7
Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues"") removed the adjustment of the pre_vectors for the virtio MSI-X vector allocation which was added in commit fb5e31d9 ("virtio: allow drivers to request IRQ affinity when creating VQs"). This will lead to an incorrect assignment of MSI-X vectors, and potential deadlocks when offlining cpus. Signed-off-by: Christoph Hellwig <[email protected]> Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues") Reported-by: YASUAKI ISHIMATSU <[email protected]> Cc: [email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2017-08-25virtio_blk: fix incorrect message when disk is resizedStefan Hajnoczi1-6/+10
The message printed on disk resize is incorrect. The following is printed when resizing to 2 GiB: $ truncate -s 1G test.img $ qemu -device virtio-blk-pci,logical_block_size=4096,... (qemu) block_resize drive1 2G virtio_blk virtio0: new size: 4194304 4096-byte logical blocks (17.2 GB/16.0 GiB) The virtio_blk capacity config field is in 512-byte sector units regardless of logical_block_size as per the VIRTIO specification. Therefore the message should read: virtio_blk virtio0: new size: 524288 4096-byte logical blocks (2.15 GB/2.0 GiB) Note that this only affects the printed message. Thankfully the actual block device has the correct size because the block layer expects capacity in sectors. Signed-off-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2017-08-25blk-mq-debugfs: Add names for recently added flagsBart Van Assche1-0/+3
The symbolic constants QUEUE_FLAG_SCSI_PASSTHROUGH, QUEUE_FLAG_QUIESCED and REQ_NOWAIT are missing from blk-mq-debugfs.c. Add these to blk-mq-debugfs.c such that these appear as names in debugfs instead of as numbers. Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Cc: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-25ASoC: remove duplicate definition of dapm_routes/num_dapm_routesKuninori Morimoto2-8/+4
snd_soc_component and snd_soc_component_driver both have dapm_routes/num_dapm_routes, but these are duplicated. Let's remove duplicated definition. Signed-off-by: Kuninori Morimoto <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: remove duplicate definition of dapm_widgets/num_dapm_widgetsKuninori Morimoto2-7/+4
snd_soc_component and snd_soc_component_driver both have dapm_widgets/num_dapm_widgets, but these are duplicated. Let's remove duplicated definition. Signed-off-by: Kuninori Morimoto <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: remove duplicate definition of controls/num_controlsKuninori Morimoto2-7/+4
snd_soc_component and snd_soc_component_driver both have controls/num_controls, but these are duplicated. Let's remove duplicated definition. Signed-off-by: Kuninori Morimoto <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: nau8825: correct typo of semaphore commentJohn Hsu1-13/+13
There are a lot of typo about semaphore in the comment. Correct it from semaphone to semaphore. Signed-off-by: John Hsu <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: use snd_soc_component_get_dapm()Kuninori Morimoto1-1/+1
Now we have snd_soc_component_get_dapm(), and as soc.h say below, let's use it. /* Don't use these, use snd_soc_component_get_dapm() */ struct snd_soc_dapm_context dapm; Signed-off-by: Kuninori Morimoto <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: Intel: Skylake: Update module id in pin connectionsJeeja KP2-3/+39
Each module's id comes from the topology and gets updated in the driver. This patch updates the input and output pin connections of each module by matching the uuid for each module. Signed-off-by: Jeeja KP <[email protected]> Signed-off-by: Guneshwor Singh <[email protected]> Acked-By: Vinod Koul <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: Intel: Skylake: Parse and update module config structureRamesh Babu4-95/+334
A dsp path and the modules in the path can support various pcm configurations. The list of supported pcm configurations from topology manifest would be stored and later selected runtime based on the hw pcm params. For legacy, module data is filled in the 0th index of resource and interface table. To accommodate both models, change the relevant structures and populate them by parsing newly defined tokens. This change is backward compatible with the existing model where driver computes the resources required by each dsp module. Signed-off-by: Ramesh Babu <[email protected]> Signed-off-by: Shreyas NC <[email protected]> Signed-off-by: Guneshwor Singh <[email protected]> Acked-By: Vinod Koul <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: Intel: Skylake: Populate module data from topology manifestShreyas NC1-4/+316
All the module common data will now be populated in the topology manifest. This includes the resource and interface list supported by the module. With this, driver need not compute the resources required by each dsp module for a particular pcm parameter since it comes as a part of the topology manifest. So, add functions to parse the manifest tokens to populate the module config data structure. Signed-off-by: Shreyas NC <[email protected]> Signed-off-by: Guneshwor Singh <[email protected]> Acked-By: Vinod Koul <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: Intel: Skylake: Add driver structures to be filled from topology manifestShreyas NC2-0/+54
The topology manifest would include module common data including resource and interface table. The resource table consists of resources required by the dsp module such as buffer size, cycles per second, number of input/output pins. And, the interface table consists of pcm parameters per module which can be referenced later. So define the structures accordingly to represent topology manifest data in the driver. Signed-off-by: Shreyas NC <[email protected]> Signed-off-by: Guneshwor Singh <[email protected]> Acked-By: Vinod Koul <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: Intel: Skylake: Commonize parsing of format tokensShreyas NC1-21/+26
Format resource tokens can be a part of either the widget or manifest private data. In the current model, format resources come as a part of widget private data and they come as a part of topology manifest in the newly introduced model. So add a common function that can fill up either of the structures. Signed-off-by: Shreyas NC <[email protected]> Signed-off-by: Guneshwor Singh <[email protected]> Acked-By: Vinod Koul <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: Intel: uapi: Add new tokens for module common dataShreyas NC1-1/+91
The module private data can be modelled independent of its instances so that it can be reused by the module instances. So move module data to common manifest which can be referenced by the module instances. This requires new tokens to be defined to accommodate these changes. The new tokens will specify buffer sizes, DSP cycles and respective indexes corresponding to the pcm params in the topology manifest so that driver need not compute them. Signed-off-by: Shreyas NC <[email protected]> Signed-off-by: Guneshwor Singh <[email protected]> Acked-By: Vinod Koul <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: Intel: Skylake: Parse multiple manifest data blocksShreyas NC1-7/+5
Currently we can parse a single manifest data block. But manifest private data can have multiple data blocks. So, fix the parsing logic to parse multiple data blocks by returning offset of each parsed data block. Signed-off-by: Shreyas NC <[email protected]> Signed-off-by: Guneshwor Singh <[email protected]> Acked-By: Vinod Koul <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25ASoC: Add a sanity check before using dai driver nameJeffy Chen1-1/+2
The dai driver's name is allowed to be NULL. So add a sanity check for that. Signed-off-by: Jeffy Chen <[email protected]> Reported-by: Donglin Peng <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2017-08-25Merge tag 'sound-4.13-rc7' of ↵Mark Brown693-6008/+13777
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound into asoc-rt5677 sound fixes for 4.13-rc7 We're keeping in a good shape, this batch contains just a few small fixes (a regression fix for ASoC rt5677 codec, NULL dereference and error-path fixes in firewire, and a corner-case ioctl error fix for user TLV), as well as usual quirks for USB-audio and HD-audio.
2017-08-25KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()Paul Mackerras1-22/+34
Nixiaoming pointed out that there is a memory leak in kvm_vm_ioctl_create_spapr_tce() if the call to anon_inode_getfd() fails; the memory allocated for the kvmppc_spapr_tce_table struct is not freed, and nor are the pages allocated for the iommu tables. In addition, we have already incremented the process's count of locked memory pages, and this doesn't get restored on error. David Hildenbrand pointed out that there is a race in that the function checks early on that there is not already an entry in the stt->iommu_tables list with the same LIOBN, but an entry with the same LIOBN could get added between then and when the new entry is added to the list. This fixes all three problems. To simplify things, we now call anon_inode_getfd() before placing the new entry in the list. The check for an existing entry is done while holding the kvm->lock mutex, immediately before adding the new entry to the list. Finally, on failure we now call kvmppc_account_memlimit to decrement the process's count of locked memory pages. Reported-by: Nixiaoming <[email protected]> Reported-by: David Hildenbrand <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-25perf/core: Fix group {cpu,task} validationMark Rutland1-20/+19
Regardless of which events form a group, it does not make sense for the events to target different tasks and/or CPUs, as this leaves the group inconsistent and impossible to schedule. The core perf code assumes that these are consistent across (successfully intialised) groups. Core perf code only verifies this when moving SW events into a HW context. Thus, we can violate this requirement for pure SW groups and pure HW groups, unless the relevant PMU driver happens to perform this verification itself. These mismatched groups subsequently wreak havoc elsewhere. For example, we handle watchpoints as SW events, and reserve watchpoint HW on a per-CPU basis at pmu::event_init() time to ensure that any event that is initialised is guaranteed to have a slot at pmu::add() time. However, the core code only checks the group leader's cpu filter (via event_filter_match()), and can thus install follower events onto CPUs violating thier (mismatched) CPU filters, potentially installing them into a CPU without sufficient reserved slots. This can be triggered with the below test case, resulting in warnings from arch backends. #define _GNU_SOURCE #include <linux/hw_breakpoint.h> #include <linux/perf_event.h> #include <sched.h> #include <stdio.h> #include <sys/prctl.h> #include <sys/syscall.h> #include <unistd.h> static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); } char watched_char; struct perf_event_attr wp_attr = { .type = PERF_TYPE_BREAKPOINT, .bp_type = HW_BREAKPOINT_RW, .bp_addr = (unsigned long)&watched_char, .bp_len = 1, .size = sizeof(wp_attr), }; int main(int argc, char *argv[]) { int leader, ret; cpu_set_t cpus; /* * Force use of CPU0 to ensure our CPU0-bound events get scheduled. */ CPU_ZERO(&cpus); CPU_SET(0, &cpus); ret = sched_setaffinity(0, sizeof(cpus), &cpus); if (ret) { printf("Unable to set cpu affinity\n"); return 1; } /* open leader event, bound to this task, CPU0 only */ leader = perf_event_open(&wp_attr, 0, 0, -1, 0); if (leader < 0) { printf("Couldn't open leader: %d\n", leader); return 1; } /* * Open a follower event that is bound to the same task, but a * different CPU. This means that the group should never be possible to * schedule. */ ret = perf_event_open(&wp_attr, 0, 1, leader, 0); if (ret < 0) { printf("Couldn't open mismatched follower: %d\n", ret); return 1; } else { printf("Opened leader/follower with mismastched CPUs\n"); } /* * Open as many independent events as we can, all bound to the same * task, CPU0 only. */ do { ret = perf_event_open(&wp_attr, 0, 0, -1, 0); } while (ret >= 0); /* * Force enable/disble all events to trigger the erronoeous * installation of the follower event. */ printf("Opened all events. Toggling..\n"); for (;;) { prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0); prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0); } return 0; } Fix this by validating this requirement regardless of whether we're moving events. Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Zhou Chengming <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-25x86/mm: Fix use-after-free of ldt_structEric Biggers1-3/+1
The following commit: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") renamed init_new_context() to init_new_context_ldt() and added a new init_new_context() which calls init_new_context_ldt(). However, the error code of init_new_context_ldt() was ignored. Consequently, if a memory allocation in alloc_ldt_struct() failed during a fork(), the ->context.ldt of the new task remained the same as that of the old task (due to the memcpy() in dup_mm()). ldt_struct's are not intended to be shared, so a use-after-free occurred after one task exited. Fix the bug by making init_new_context() pass through the error code of init_new_context_ldt(). This bug was found by syzkaller, which encountered the following splat: BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710 CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x24e/0x340 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] exec_mmap fs/exec.c:1061 [inline] flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291 load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855 search_binary_handler+0x142/0x6b0 fs/exec.c:1652 exec_binprm fs/exec.c:1694 [inline] do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816 do_execve+0x31/0x40 fs/exec.c:1860 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627 kmalloc include/linux/slab.h:493 [inline] alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67 write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277 sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] __mmput kernel/fork.c:916 [inline] mmput+0x541/0x6e0 kernel/fork.c:927 copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931 copy_process kernel/fork.c:1546 [inline] _do_fork+0x1ef/0xfb0 kernel/fork.c:2025 SYSC_clone kernel/fork.c:2135 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2129 do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a Here is a C reproducer: #include <asm/ldt.h> #include <pthread.h> #include <signal.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void *fork_thread(void *_arg) { fork(); } int main(void) { struct user_desc desc = { .entry_number = 8191 }; syscall(__NR_modify_ldt, 1, &desc, sizeof(desc)); for (;;) { if (fork() == 0) { pthread_t t; srand(getpid()); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } Note: the reproducer takes advantage of the fact that alloc_ldt_struct() may use vmalloc() to allocate a large ->entries array, and after commit: 5d17a73a2ebe ("vmalloc: back off when the current task is killed") it is possible for userspace to fail a task's vmalloc() by sending a fatal signal, e.g. via exit_group(). It would be more difficult to reproduce this bug on kernels without that commit. This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y. Signed-off-by: Eric Biggers <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: <[email protected]> [v4.6+] Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-25KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.statePaolo Bonzini2-6/+17
The host pkru is restored right after vcpu exit (commit 1be0e61), so KVM_GET_XSAVE will return the host PKRU value instead. Fix this by using the guest PKRU explicitly in fill_xsave and load_xsave. This part is based on a patch by Junkang Fu. The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state, because it could have been changed by userspace since the last time it was saved, so skip loading it in kvm_load_guest_fpu. Reported-by: Junkang Fu <[email protected]> Cc: Yang Zhang <[email protected]> Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-25KVM: x86: simplify handling of PKRUPaolo Bonzini5-30/+10
Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a simple comparison against the host value of the register. The write of PKRU in addition can be skipped if the guest has not enabled the feature. Once we do this, we need not test OSPKE in the host anymore, because guest_CR4.PKE=1 implies host_CR4.PKE=1. The static PKU test is kept to elide the code on older CPUs. Suggested-by: Yang Zhang <[email protected]> Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: [email protected] Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-25KVM: x86: block guest protection keys unless the host has them enabledPaolo Bonzini1-1/+1
If the host has protection keys disabled, we cannot read and write the guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0. Block the PKU cpuid bit in that case. This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1. Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: [email protected] Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-24mtd: nand: atmel: Relax tADL_min constraintBoris Brezillon1-1/+12
Version 4 of the ONFI spec mandates that tADL be at least 400 nanoseconds, but, depending on the master clock rate, 400 ns may not fit in the tADL field of the SMC reg. We need to relax the check and accept the -ERANGE return code. Note that previous versions of the ONFI spec had a lower tADL_min (100 or 200 ns). It's not clear why this timing constraint got increased but it seems most NANDs are fine with values lower than 400ns, so we should be safe. Fixes: f9ce2eddf176 ("mtd: nand: atmel: Add ->setup_data_interface() hooks") Signed-off-by: Boris Brezillon <[email protected]> Tested-by: Quentin Schulz <[email protected]> Signed-off-by: Brian Norris <[email protected]>
2017-08-24mtd: nandsim: remove debugfs entries in error pathUwe Kleine-König1-0/+1
The debugfs entries must be removed before an error is returned in the probe function. Otherwise another try to load the module fails and when the debugfs files are accessed without the module loaded, the kernel still tries to call a function in that module. Fixes: 5346c27c5fed ("mtd: nandsim: Introduce debugfs infrastructure") Signed-off-by: Uwe Kleine-König <[email protected]> Reviewed-by: Richard Weinberger <[email protected]> Acked-by: Boris Brezillon <[email protected]> Signed-off-by: Brian Norris <[email protected]>
2017-08-25Merge tag 'drm-misc-fixes-2017-08-24' of ↵Dave Airlie1-3/+3
git://anongit.freedesktop.org/git/drm-misc into drm-fixes Core Changes: - Release driver tracking before making the object available again (Chris) Cc: Chris Wilson <[email protected]> * tag 'drm-misc-fixes-2017-08-24' of git://anongit.freedesktop.org/git/drm-misc: drm: Release driver tracking before making the object available again
2017-08-25Merge tag 'drm-intel-fixes-2017-08-24' of ↵Dave Airlie6-12/+36
git://anongit.freedesktop.org/git/drm-intel into drm-fixes drm/i915 fixes for v4.13-rc7 * tag 'drm-intel-fixes-2017-08-24' of git://anongit.freedesktop.org/git/drm-intel: drm/i915/gvt: Fix the kernel null pointer error drm/i915: Clear lost context-switch interrupts across reset drm/i915/bxt: use NULL for GPIO connection ID drm/i915/cnl: Fix LSPCON support. drm/i915/vbt: ignore extraneous child devices for a port drm/i915: Initialize 'data' in intel_dsi_dcs_backlight.c
2017-08-24Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpadMasaki Ota2-10/+39
Fixed the issue that two finger scroll does not work correctly on V8 protocol. The cause is that V8 protocol X-coordinate decode is wrong at SS4 PLUS device. I added SS4 PLUS X decode definition. Mote notes: the problem manifests itself by the commit e7348396c6d5 ("Input: ALPS - fix V8+ protocol handling (73 03 28)"), where a fix for the V8+ protocol was applied. Although the culprit must have been present beforehand, the two-finger scroll worked casually even with the wrongly reported values by some reason. It got broken by the commit above just because it changed x_max value, and this made libinput correctly figuring the MT events. Since the X coord is reported as falsely doubled, the events on the right-half side go outside the boundary, thus they are no longer handled. This resulted as a broken two-finger scroll. One finger event is decoded differently, and it didn't suffer from this problem. The problem was only about MT events. --tiwai Fixes: e7348396c6d5 ("Input: ALPS - fix V8+ protocol handling (73 03 28)") Signed-off-by: Masaki Ota <[email protected]> Tested-by: Takashi Iwai <[email protected]> Tested-by: Paul Donohue <[email protected]> Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
2017-08-24Merge tag 'for-linus' of ↵Linus Torvalds5-6/+22
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma fixes from Doug Ledford: "Well, I thought we were going to be done for this -rc cycle. I should have known better than to say so though. We have four additional items that trickled in. One was a simple mistake on my part. I took a patch into my for-next thinking that the issue was less severe than it was. I was then notified that it needed to be in my -rc area instead. The other three were just found late in testing. Summary: - One core fix accidentally applied first to for-next and then cherry picked back because it needed to be in the -rc cycles instead - Another core fix - Two mlx5 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/mlx5: Always return success for RoCE modify port IB/mlx5: Fix Raw Packet QP event handler assignment IB/core: Avoid accessing non-allocated memory when inferring port type RDMA/uverbs: Initialize cq_context appropriately
2017-08-24net: sunrpc: svcsock: fix NULL-pointer exceptionVadim Lomovtsev1-2/+20
While running nfs/connectathon tests kernel NULL-pointer exception has been observed due to races in svcsock.c. Race is appear when kernel accepts connection by kernel_accept (which creates new socket) and start queuing ingress packets to new socket. This happens in ksoftirq context which could run concurrently on a different core while new socket setup is not done yet. The fix is to re-order socket user data init sequence and add write/read barrier calls to be sure that we got proper values for callback pointers before actually calling them. Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations. Crash log: ---<-snip->--- [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 6708.647093] pgd = ffff0000094e0000 [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000 [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops [ 6708.736446] ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021] [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G W OE 4.11.0-4.el7.aarch64 #1 [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017 [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000 [ 6708.773822] PC is at 0x0 [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc] [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145 [ 6708.789248] sp : ffff810ffbad3900 [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58 [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00 [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28 [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000 [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00 [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2 [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000 [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001 [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000 [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000 [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000 [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000 [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000 [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00 [ 6708.872084] [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000) [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000) [ 6708.886075] Call trace: [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840) [ 6708.894942] 3700: ffff810012412400 0001000000000000 [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000 [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000 [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000 [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140 [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000 [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028 [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000 [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000 [ 6708.973117] [< (null)>] (null) [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58 [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254 [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4 [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8 [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84 [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8 [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf] [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf] [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464 [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308 [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174 [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4 [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190 [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20) [ 6709.092929] fde0: 0000810ff2ec0000 ffff000008c10000 [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18 [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80 [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4 [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50 [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000 [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000 [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20 [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145 [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238 [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140 [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30 [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4 [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30 [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160 [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4 [ 6709.207967] Code: bad PC value [ 6709.211061] SMP: stopping secondary CPUs [ 6709.218830] Starting crashdump kernel... [ 6709.222749] Bye! ---<-snip>--- Signed-off-by: Vadim Lomovtsev <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Cc: [email protected] Signed-off-by: J. Bruce Fields <[email protected]>
2017-08-24nfsd: Limit end of page list when decoding NFSv4 WRITEChuck Lever1-4/+2
When processing an NFSv4 WRITE operation, argp->end should never point past the end of the data in the final page of the page list. Otherwise, nfsd4_decode_compound can walk into uninitialized memory. More critical, nfsd4_decode_write is failing to increment argp->pagelen when it increments argp->pagelist. This can cause later xdr decoders to assume more data is available than really is, which can cause server crashes on malformed requests. Signed-off-by: Chuck Lever <[email protected]> Cc: [email protected] Signed-off-by: J. Bruce Fields <[email protected]>
2017-08-24Merge tag 'acpi-4.13-rc7' of ↵Linus Torvalds5-16/+15
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix two recent regressions (in ACPICA and in the ACPI EC driver) and one bug in code introduced during the 4.12 cycle (ACPI device properties library routine). Specifics: - Fix a regression in the ACPI EC driver causing a kernel to crash during initialization on some systems due to a code ordering issue exposed by a recent change (Lv Zheng). - Fix a recent regression in ACPICA due to a change of the behavior of a library function in a way that is not backwards compatible with some existing callers of it (Rafael Wysocki). - Fix a coding mistake in a library function related to the handling of ACPI device properties introduced during the 4.12 cycle (Sakari Ailus)" * tag 'acpi-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value() ACPICA: Fix acpi_evaluate_object_typed() ACPI: EC: Fix regression related to wrong ECDT initialization order
2017-08-24Merge tag 'kbuild-fixes-v4.13' of ↵Linus Torvalds9-37/+48
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - fix linker script regression caused by dead code elimination support - fix typos and outdated comments - specify kselftest-clean as a PHONY target - fix "make dtbs_install" when $(srctree) includes shell special characters like '~' - Move -fshort-wchar to the global option list because defining it partially emits warnings * tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: update comments of Makefile.asm-generic kbuild: Do not use hyphen in exported variable name Makefile: add kselftest-clean to PHONY target list Kbuild: use -fshort-wchar globally fixdep: trivial: typo fix and correction kbuild: trivial cleanups on the comments kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured
2017-08-24Merge branch 'for-4.13-rc7' of ↵Linus Torvalds5-60/+64
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "We have one more fixup that stems from the blk_status_t conversion that did not quite cover everything. The normal cases were not affected because the code is 0, but any error and retries could mix up new and old values" * 'for-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix blk_status_t/errno confusion
2017-08-24Merge tag 'trace-v4.13-rc3' of ↵Linus Torvalds6-16/+38
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various bug fixes: - Two small memory leaks in error paths. - A missed return error code on an error path. - A fix to check the tracing ring buffer CPU when it doesn't exist (caused by setting maxcpus on the command line that is less than the actual number of CPUs, and then onlining them manually). - A fix to have the reset of boot tracers called by lateinit_sync() instead of just lateinit(). As some of the tracers register via lateinit(), and if the clear happens before the tracer is registered, it will never start even though it was told to via the kernel command line" * tag 'trace-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix freeing of filter in create_filter() when set_str is false tracing: Fix kmemleak in tracing_map_array_free() ftrace: Check for null ret_stack on profile function graph entry function ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU tracing: Missing error code in tracer_alloc_buffers() tracing: Call clear_boot_tracer() at lateinit_sync
2017-08-24Merge tag 'armsoc-fixes' of ↵Linus Torvalds6-6/+24
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "A small number of bugfixes, again nothing serious. - Alexander Dahl found multiple bugs in the Atmel memory interface driver - A randconfig build fix for at91 was incomplete, the second attempt fixes the remaining corner case - One fix for the TI Keystone queue handler - The Odroid XU4 HDMI port (added in 4.13) needs a small DT fix" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: exynos: add needs-hpd for Odroid-XU3/4 ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms soc: ti: knav: Add a NULL pointer check for kdev in knav_pool_create memory: atmel-ebi: Fix smc cycle xlate converter memory: atmel-ebi: Allow t_DF timings of zero ns memory: atmel-ebi: Fix smc timing return value evaluation
2017-08-24pty: Repair TIOCGPTPEEREric W. Biederman4-53/+89
The implementation of TIOCGPTPEER has two issues. When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong vfsmount is passed to dentry_open. Which results in the kernel displaying the wrong pathname for the peer. The second is simply by caching the vfsmount and dentry of the peer it leaves them open, in a way they were not previously Which because of the inreased reference counts can cause unnecessary behaviour differences resulting in regressions. To fix these move the ioctl into tty_io.c at a generic level allowing the ioctl to have access to the struct file on which the ioctl is being called. This allows the path of the slave to be derived when opening the slave through TIOCGPTPEER instead of requiring the path to the slave be cached. Thus removing the need for caching the path. A new function devpts_ptmx_path is factored out of devpts_acquire and used to implement a function devpts_mntget. The new function devpts_mntget takes a filp to perform the lookup on and fsi so that it can confirm that the superblock that is found by devpts_ptmx_path is the proper superblock. v2: Lots of fixes to make the code actually work v3: Suggestions by Linus - Removed the unnecessary initialization of filp in ptm_open_peer - Simplified devpts_ptmx_path as gotos are no longer required [ This is the fix for the issue that was reverted in commit 143c97cc6529, but this time without breaking 'pbuilder' due to increased reference counts - Linus ] Fixes: 54ebbfb16034 ("tty: add TIOCGPTPEER ioctl") Reported-by: Christian Brauner <[email protected]> Reported-and-tested-by: Stefan Lippers-Hollmann <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-24Merge branches 'acpica-fix', 'acpi-ec-fix' and 'acpi-properties-fix'Rafael J. Wysocki4-13/+8
* acpica-fix: ACPICA: Fix acpi_evaluate_object_typed() * acpi-ec-fix: ACPI: EC: Fix regression related to wrong ECDT initialization order * acpi-properties-fix: ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value()
2017-08-24IB/mlx5: Always return success for RoCE modify portMajd Dibbiny1-0/+6
CM layer calls ib_modify_port() regardless of the link layer. For the Ethernet ports, qkey violation and Port capabilities are meaningless. Therefore, always return success for ib_modify_port calls on the Ethernet ports. Cc: Selvin Xavier <[email protected]> Signed-off-by: Majd Dibbiny <[email protected]> Reviewed-by: Moni Shoua <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-08-24IB/mlx5: Fix Raw Packet QP event handler assignmentMajd Dibbiny1-0/+1
In case we have SQ and RQ for Raw Packet QP, the SQ's event handler wasn't assigned. Fixing this by assigning event handler for each WQ after creation. [ 1877.145243] Call Trace: [ 1877.148644] <IRQ> [ 1877.150580] [<ffffffffa07987c5>] ? mlx5_rsc_event+0x105/0x210 [mlx5_core] [ 1877.159581] [<ffffffffa0795bd7>] ? mlx5_cq_event+0x57/0xd0 [mlx5_core] [ 1877.167137] [<ffffffffa079208e>] mlx5_eq_int+0x53e/0x6c0 [mlx5_core] [ 1877.174526] [<ffffffff8101a679>] ? sched_clock+0x9/0x10 [ 1877.180753] [<ffffffff810f717e>] handle_irq_event_percpu+0x3e/0x1e0 [ 1877.188014] [<ffffffff810f735d>] handle_irq_event+0x3d/0x60 [ 1877.194567] [<ffffffff810f9fe7>] handle_edge_irq+0x77/0x130 [ 1877.201129] [<ffffffff81014c3f>] handle_irq+0xbf/0x150 [ 1877.207244] [<ffffffff815ed78a>] ? atomic_notifier_call_chain+0x1a/0x20 [ 1877.214829] [<ffffffff815f434f>] do_IRQ+0x4f/0xf0 [ 1877.220498] [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d [ 1877.227025] <EOI> [ 1877.228967] [<ffffffff814834e2>] ? cpuidle_enter_state+0x52/0xc0 [ 1877.236990] [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200 [ 1877.243676] [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30 [ 1877.249831] [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290 [ 1877.256513] [<ffffffff815cfee1>] start_secondary+0x265/0x27b [ 1877.263111] Code: Bad RIP value. [ 1877.267296] RIP [< (null)>] (null) [ 1877.273264] RSP <ffff88046fd63df8> [ 1877.277531] CR2: 0000000000000000 Fixes: 19098df2da78 ("IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types") Signed-off-by: Majd Dibbiny <[email protected]> Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-08-24IB/core: Avoid accessing non-allocated memory when inferring port typeNoa Osherovich3-5/+14
Commit 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") introduced the concept of type in ah_attr: * During ib_register_device, each port is checked for its type which is stored in ib_device's port_immutable array. * During uverbs' modify_qp, the type is inferred using the port number in ib_uverbs_qp_dest struct (address vector) by accessing the relevant port_immutable array and the type is passed on to providers. IB spec (version 1.3) enforces a valid port value only in Reset to Init. During Init to RTR, the address vector must be valid but port number is not mentioned as a field in the address vector, so its value is not validated, which leads to accesses to a non-allocated memory when inferring the port type. Save the real port number in ib_qp during modify to Init (when the comp_mask indicates that the port number is valid) and use this value to infer the port type. Avoid copying the address vector fields if the matching bit is not set in the attr_mask. Address vector can't be modified before the port, so no valid flow is affected. Fixes: 44c58487d51a ('IB/core: Define 'ib' and 'roce' rdma_ah_attr types') Signed-off-by: Noa Osherovich <[email protected]> Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-08-24ASoC: rt5677: Reintroduce I2C device IDsTom Rini1-0/+1
Not all devices with ACPI and this combination of sound devices will have the required information provided via ACPI. Reintroduce the I2C device ID to restore sound functionality on on the Chromebook 'Samus' model. [ More background note: the commit a36afb0ab648 ("ASoC: rt5677: Introduce proper table...") moved the i2c ID probed via ACPI ("RT5677CE:00") to a proper acpi_device_id table. Although the action itself is correct per se, the overseen issue is the reference id->driver_data at rt5677_i2c_probe() for retrieving the corresponding chip model for the given id. Since id=NULL is passed for ACPI matching case, we get an Oops now. We already have queued more fixes for 4.14 and they already address the issue, but they are bigger changes that aren't preferable for the late 4.13-rc stage. So, this patch just papers over the bug as a once-off quick fix for a particular ACPI matching. -- tiwai ] Fixes: a36afb0ab648 ("ASoC: rt5677: Introduce proper table for ACPI enumeration") Signed-off-by: Tom Rini <[email protected]> Acked-by: Mark Brown <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2017-08-24Btrfs: fix blk_status_t/errno confusionOmar Sandoval5-60/+64
This fixes several instances of blk_status_t and bare errno ints being mixed up, some of which are real bugs. In the normal case, 0 matches BLK_STS_OK, so we don't observe any effects of the missing conversion, but in case of errors or passes through the repair/retry paths, the errors get mixed up. The changes were identified using 'sparse', we don't have reports of the buggy behaviour. Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-08-25kbuild: update comments of Makefile.asm-genericCao jin1-2/+2
Signed-off-by: Cao jin <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2017-08-24bsg-lib: fix kernel panic resulting from missing allocation of reply-bufferBenjamin Block3-31/+46
Since we split the scsi_request out of struct request bsg fails to provide a reply-buffer for the drivers. This was done via the pointer for sense-data, that is not preallocated anymore. Failing to allocate/assign it results in illegal dereferences because LLDs use this pointer unquestioned. An example panic on s390x, using the zFCP driver, looks like this (I had debugging on, otherwise NULL-pointer dereferences wouldn't even panic on s390x): Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403 Fault in home space mode while using kernel ASCE. AS:0000000001590007 R3:0000000000000024 Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: <Long List> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) task: 0000000065cb0100 task.stack: 0000000065cb4000 Krnl PSW : 0704e00180000000 000003ff801e4156 (zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp]) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000001 000000005fa9d0d0 000000005fa9d078 0000000000e16866 000003ff00000290 6b6b6b6b6b6b6b6b 0000000059f78f00 000000000000000f 00000000593a0958 00000000593a0958 0000000060d88800 000000005ddd4c38 0000000058b50100 07000000659cba08 000003ff801e8556 00000000659cb9a8 Krnl Code: 000003ff801e4146: e31020500004 lg %r1,80(%r2) 000003ff801e414c: 58402040 l %r4,64(%r2) #000003ff801e4150: e35020200004 lg %r5,32(%r2) >000003ff801e4156: 50405004 st %r4,4(%r5) 000003ff801e415a: e54c50080000 mvhi 8(%r5),0 000003ff801e4160: e33010280012 lt %r3,40(%r1) 000003ff801e4166: a718fffb lhi %r1,-5 000003ff801e416a: 1803 lr %r0,%r3 Call Trace: ([<000003ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp]) [<000003ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp] [<000003ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp] [<00000000009b91b6>] qdio_kick_handler+0x2ae/0x2c8 [<00000000009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10 [<00000000001684c2>] tasklet_action+0x15a/0x1d8 [<0000000000bd28ec>] __do_softirq+0x3ec/0x848 [<00000000001675a4>] irq_exit+0x74/0xf8 [<000000000010dd6a>] do_IRQ+0xba/0xf0 [<0000000000bd19e8>] io_int_handler+0x104/0x2d4 [<00000000001033b6>] enabled_wait+0xb6/0x188 ([<000000000010339e>] enabled_wait+0x9e/0x188) [<000000000010396a>] arch_cpu_idle+0x32/0x50 [<0000000000bd0112>] default_idle_call+0x52/0x68 [<00000000001cd0fa>] do_idle+0x102/0x188 [<00000000001cd41e>] cpu_startup_entry+0x3e/0x48 [<0000000000118c64>] smp_start_secondary+0x11c/0x130 [<0000000000bd2016>] restart_int_handler+0x62/0x78 [<0000000000000000>] (null) INFO: lockdep is turned off. Last Breaking-Event-Address: [<000003ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp] Kernel panic - not syncing: Fatal exception in interrupt This patch moves bsg-lib to allocate and setup struct bsg_job ahead of time, including the allocation of a buffer for the reply-data. This means, struct bsg_job is not allocated separately anymore, but as part of struct request allocation - similar to struct scsi_cmd. Reflect this in the function names that used to handle creation/destruction of struct bsg_job. Reported-by: Steffen Maier <[email protected]> Suggested-by: Christoph Hellwig <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Benjamin Block <[email protected]> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request") Cc: <[email protected]> #4.11+ Signed-off-by: Jens Axboe <[email protected]>
2017-08-24tracing: Fix freeing of filter in create_filter() when set_str is falseSteven Rostedt (VMware)1-0/+4
Performing the following task with kmemleak enabled: # cd /sys/kernel/tracing/events/irq/irq_handler_entry/ # echo 'enable_event:kmem:kmalloc:3 if irq >' > trigger # echo 'enable_event:kmem:kmalloc:3 if irq > 31' > trigger # echo scan > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8800b9290308 (size 32): comm "bash", pid 1114, jiffies 4294848451 (age 141.139s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81cef5aa>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81357938>] kmem_cache_alloc_trace+0x158/0x290 [<ffffffff81261c09>] create_filter_start.constprop.28+0x99/0x940 [<ffffffff812639c9>] create_filter+0xa9/0x160 [<ffffffff81263bdc>] create_event_filter+0xc/0x10 [<ffffffff812655e5>] set_trigger_filter+0xe5/0x210 [<ffffffff812660c4>] event_enable_trigger_func+0x324/0x490 [<ffffffff812652e2>] event_trigger_write+0x1a2/0x260 [<ffffffff8138cf87>] __vfs_write+0xd7/0x380 [<ffffffff8138f421>] vfs_write+0x101/0x260 [<ffffffff8139187b>] SyS_write+0xab/0x130 [<ffffffff81cfd501>] entry_SYSCALL_64_fastpath+0x1f/0xbe [<ffffffffffffffff>] 0xffffffffffffffff The function create_filter() is passed a 'filterp' pointer that gets allocated, and if "set_str" is true, it is up to the caller to free it, even on error. The problem is that the pointer is not freed by create_filter() when set_str is false. This is a bug, and it is not up to the caller to free the filter on error if it doesn't care about the string. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 38b78eb85 ("tracing: Factorize filter creation") Reported-by: Chunyu Hu <[email protected]> Tested-by: Chunyu Hu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-08-24tracing: Fix kmemleak in tracing_map_array_free()Chunyu Hu1-4/+7
kmemleak reported the below leak when I was doing clear of the hist trigger. With this patch, the kmeamleak is gone. unreferenced object 0xffff94322b63d760 (size 32): comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s) hex dump (first 32 bytes): 00 01 00 00 04 00 00 00 08 00 00 00 ff 00 00 00 ................ 10 00 00 00 00 00 00 00 80 a8 7a f2 31 94 ff ff ..........z.1... backtrace: [<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff9e424cba>] kmem_cache_alloc_trace+0xca/0x1d0 [<ffffffff9e377736>] tracing_map_array_alloc+0x26/0x140 [<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50 [<ffffffff9e38b935>] create_hist_data+0x535/0x750 [<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420 [<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0 [<ffffffff9e44dfc7>] __vfs_write+0x37/0x170 [<ffffffff9e44f552>] vfs_write+0xb2/0x1b0 [<ffffffff9e450b85>] SyS_write+0x55/0xc0 [<ffffffff9e203857>] do_syscall_64+0x67/0x150 [<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff9431f27aa880 (size 128): comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s) hex dump (first 32 bytes): 00 00 8c 2a 32 94 ff ff 00 f0 8b 2a 32 94 ff ff ...*2......*2... 00 e0 8b 2a 32 94 ff ff 00 d0 8b 2a 32 94 ff ff ...*2......*2... backtrace: [<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff9e425348>] __kmalloc+0xe8/0x220 [<ffffffff9e3777c1>] tracing_map_array_alloc+0xb1/0x140 [<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50 [<ffffffff9e38b935>] create_hist_data+0x535/0x750 [<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420 [<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0 [<ffffffff9e44dfc7>] __vfs_write+0x37/0x170 [<ffffffff9e44f552>] vfs_write+0xb2/0x1b0 [<ffffffff9e450b85>] SyS_write+0x55/0xc0 [<ffffffff9e203857>] do_syscall_64+0x67/0x150 [<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 08d43a5fa063 ("tracing: Add lock-free tracing_map") Signed-off-by: Chunyu Hu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>