aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-09-06drm/atmel-hlcdc: Make ->reset() implementation staticThierry Reding1-1/+1
The atmel_hlcdc_crtc_reset() function is never used outside the file and can be static. This avoids a warning from sparse. Signed-off-by: Thierry Reding <[email protected]>
2016-09-06drm: atmel-hlcdc: Fix vertical scalingJan Leupold1-5/+5
The code is applying the same scaling for the X and Y components, thus making the scaling feature only functional when both components have the same scaling factor. Do the s/_w/_h/ replacement where appropriate to fix vertical scaling. Signed-off-by: Jan Leupold <[email protected]> Fixes: 1a396789f65a2 ("drm: add Atmel HLCDC Display Controller support") Cc: <[email protected]> Signed-off-by: Boris Brezillon <[email protected]>
2016-09-06thermal: rcar_thermal: Fix priv->zone error handlingDirk Behme1-0/+1
In case thermal_zone_xxx_register() returns an error, priv->zone isn't NULL any more, but contains the error code. This is passed to thermal_zone_device_unregister(), then. This checks for priv->zone being NULL, but the error code is != NULL. So it works with the error code as a pointer. Crashing immediately. To fix this, reset priv->zone to NULL before entering rcar_gen3_thermal_remove(). Signed-off-by: Dirk Behme <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Zhang Rui <[email protected]>
2016-09-06Merge remote-tracking branches 'spi/fix/lock', 'spi/fix/maintainers', ↵Mark Brown7-7/+23
'spi/fix/put', 'spi/fix/pxa2xx', 'spi/fix/sh-msiof' and 'spi/fix/timeout' into spi-linus
2016-09-06Merge remote-tracking branches 'regulator/fix/email' and ↵Mark Brown3-18/+20
'regulator/fix/qcom-smd' into regulator-linus
2016-09-06arm: KVM: Fix idmap overlap detection when the kernel is idmap'edMarc Zyngier1-1/+2
We're trying hard to detect when the HYP idmap overlaps with the HYP va, as it makes the teardown of a cpu dangerous. But there is one case where an overlap is completely safe, which is when the whole of the kernel is idmap'ed, which is likely to happen on 32bit when RAM is at 0x8000000 and we're using a 2G/2G VA split. In that case, we can proceed safely. Reported-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-09-06perf/x86/intel/cqm: Check cqm/mbm enabled state in event initJiri Olsa1-0/+9
Yanqiu Zhang reported kernel panic when using mbm event on system where CQM is detected but without mbm event support, like with perf: # perf stat -e 'intel_cqm/event=3/' -a BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff8100d64c>] update_sample+0xbc/0xe0 ... <IRQ> [<ffffffff8100d688>] __intel_mbm_event_init+0x18/0x20 [<ffffffff81113d6b>] flush_smp_call_function_queue+0x7b/0x160 [<ffffffff81114853>] generic_smp_call_function_single_interrupt+0x13/0x60 [<ffffffff81052017>] smp_call_function_interrupt+0x27/0x40 [<ffffffff816fb06c>] call_function_interrupt+0x8c/0xa0 ... The reason is that we currently allow to init mbm event even if mbm support is not detected. Adding checks for both cqm and mbm events and support into cqm's event_init. Fixes: 33c3cc7acfd9 ("perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init") Reported-by: Yanqiu Zhang <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Vikas Shivappa <[email protected]> Cc: Tony Luck <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-09-06usb: gadget: prevent potenial null pointer dereference on skb->lenColin Ian King1-1/+1
An earlier fix partially fixed the null pointer dereference on skb->len by moving the assignment of len after the check on skb being non-null, however it failed to remove the erroneous dereference when assigning len. Correctly fix this by removing the initialisation of len as was originally intended. Fixes: 70237dc8efd092 ("usb: gadget: function: f_eem: socket buffer may be NULL") Acked-by: Peter Chen <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2016-09-06powerpc/powernv: Fix crash on releasing compound PEGavin Shan1-6/+9
The compound PE is created to accommodate the devices attached to one specific PCI bus that consume multiple M64 segments. The compound PE is made up of one master PE and possibly multiple slave PEs. The slave PEs should be destroyed when releasing the master PE. A kernel crash happens when derferencing @pe->pdev on releasing the slave PE in pnv_ioda_deconfigure_pe(). # echo 0 > /sys/bus/pci/slots/C7/power iommu: Removing device 0000:01:00.1 from group 0 iommu: Removing device 0000:01:00.0 from group 0 Unable to handle kernel paging request for data at address 0x00000010 Faulting instruction address: 0xc00000000005d898 cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620] pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610 lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610 sp: c000000fe82178a0 msr: 9000000000009033 dar: 10 dsisr: 40000000 current = 0xc000000fe815ab80 paca = 0xc00000000ff00400 softe: 0 irq_happened: 0x01 pid = 2709, comm = sh Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \ (gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \ Tue Sep 6 13:37:29 AEST 2016 enter ? for help [c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610 [c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70 [c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0 [c000000fe8217a40] c000000000704898 device_release+0x58/0xf0 [c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0 [c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40 [c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150 [c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0 [c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0 [c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190 [c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60 [c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0 [c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260 [c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190 [c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240 [c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110 [c000000fe8217e30] c000000000009524 system_call+0x38/0x108 It fixes the kernel crash by bypassing releasing resources (DMA, IO and memory segments, PELTM) because there are no resources assigned to the slave PE. Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE") Reported-by: Frederic Barrat <[email protected]> Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-09-06powerpc/xics/opal: Fix processor numbers in OPAL ICPBenjamin Herrenschmidt1-5/+7
When using the OPAL ICP backend we incorrectly pass Linux CPU numbers rather than HW CPU numbers to OPAL. Fixes: d74361881f0d ("powerpc/xics: Add ICP OPAL backend") Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-09-06powerpc/pseries: Fix little endian build with CONFIG_KEXEC=nThiago Jung Bauermann1-1/+1
On ppc64le, builds with CONFIG_KEXEC=n fail with: arch/powerpc/platforms/pseries/setup.c: In function ‘pseries_big_endian_exceptions’: arch/powerpc/platforms/pseries/setup.c:403:13: error: implicit declaration of function ‘kdump_in_progress’ if (rc && !kdump_in_progress()) This is because pseries/setup.c includes <linux/kexec.h>, but kdump_in_progress() is defined in <asm/kexec.h>. This is a problem because the former only includes the latter if CONFIG_KEXEC_CORE=y. Fix it by including <asm/kexec.h> directly, as is done in powernv/setup.c. Fixes: d3cbff1b5a90 ("powerpc: Put exception configuration in a common place") Signed-off-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-09-05iio:core: fix IIO_VAL_FRACTIONAL sign handlingGregor Boirie1-3/+2
7985e7c100 ("iio: Introduce a new fractional value type") introduced a new IIO_VAL_FRACTIONAL value type meant to represent rational type numbers expressed by a numerator and denominator combination. Formating of IIO_VAL_FRACTIONAL values relies upon do_div() usage. This fails handling negative values properly since parameters are reevaluated as unsigned values. Fix this by using div_s64_rem() instead. Computed integer part will carry properly signed value. Formatted fractional part will always be positive. Fixes: 7985e7c100 ("iio: Introduce a new fractional value type") Signed-off-by: Gregor Boirie <[email protected]> Reviewed-by: Lars-Peter Clausen <[email protected]> Cc: <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]>
2016-09-05iio: ensure ret is initialized to zero before entering do loopColin Ian King1-2/+2
A recent fix to iio_buffer_read_first_n_outer removed ret from being set by a return from wait_event_interruptible and also added a continue in a loop which causes the variable ret to not be set when it reaches the end of the loop. Fix this by initializing ret to zero. Also remove extraneous white space at the end of the loop. Fixes: fcf68f3c0bb2a5 ("fix sched WARNING "do not call blocking ops when !TASK_RUNNING") Signed-off-by: Colin Ian King <[email protected]> Cc: <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]>
2016-09-05Merge branch 'linus' of ↵Linus Torvalds2-41/+39
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a regression in the cryptd code that breaks certain accelerated AED algorithms as well as an older regression in the caam driver that breaks IPsec" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: caam - fix IV loading for authenc (giv)decryption crypto: cryptd - Use correct tfm object for AEAD tracking
2016-09-05Merge branch 'rc-fixes' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild fix from Michal Marek: "Fix for 'make deb-pkg'. The bug got introduced in v4.8-rc1" * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: builddeb: Skip gcc-plugins when not configured
2016-09-05btrfs: do not decrease bytes_may_use when replaying extentsWang Xiaoguang1-3/+9
When replaying extents, there is no need to update bytes_may_use in btrfs_alloc_logged_file_extent(), otherwise it'll trigger a WARN_ON about bytes_may_use. Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely") Signed-off-by: Wang Xiaoguang <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2016-09-05Merge tag 'efi-urgent' of ↵Thomas Gleixner574-2930/+5281
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/urgent * Make for_each_efi_memory_desc_in_map() safe on Xen and prevent an infinte loop - Jan Beulich * Fix boot error on arm64 Qualcomm platforms by refactoring and improving the ExitBootServices() hack we already for x86 and moving it to the libstub - Jeffrey Hugo * Use correct return data type for of_get_flat_dt_subnode_by_name() so that we correctly handle errors - Andrzej Hajda
2016-09-05Merge tag 'kvm-s390-master-4.8-3' of ↵Paolo Bonzini1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master A bugfix for the vsie code (setting the wrong field).
2016-09-05KVM: lapic: adjust preemption timer correctly when goes TSC backwardWanpeng Li1-4/+4
TSC_OFFSET will be adjusted if discovers TSC backward during vCPU load. The preemption timer, which relies on the guest tsc to reprogram its preemption timer value, is also reprogrammed if vCPU is scheded in to a different pCPU. However, the current implementation reprogram preemption timer before TSC_OFFSET is adjusted to the right value, resulting in the preemption timer firing prematurely. This patch fix it by adjusting TSC_OFFSET before reprogramming preemption timer if TSC backward. Cc: Paolo Bonzini <[email protected]> Cc: Radim Krċmář <[email protected]> Cc: Yunhong Jiang <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-09-05PM / QoS: avoid calling cancel_delayed_work_sync() during early bootTejun Heo1-1/+10
of_clk_init() ends up calling into pm_qos_update_request() very early during boot where irq is expected to stay disabled. pm_qos_update_request() uses cancel_delayed_work_sync() which correctly assumes that irq is enabled on invocation and unconditionally disables and re-enables it. Gate cancel_delayed_work_sync() invocation with kevented_up() to avoid enabling irq unexpectedly during early boot. Signed-off-by: Tejun Heo <[email protected]> Reported-and-tested-by: Qiao Zhou <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-09-05ceph: do not modify fi->frag in need_reset_readdir()Nicolas Iooss1-1/+1
Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |= fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a comparison operator with an assignment one. This looks like a typo which is reported by clang when building the kernel with some warning flags: fs/ceph/dir.c:600:22: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] } else if (fi->frag |= fpos_frag(new_pos)) { ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ fs/ceph/dir.c:600:22: note: place parentheses around the assignment to silence this warning } else if (fi->frag |= fpos_frag(new_pos)) { ^ ( ) fs/ceph/dir.c:600:22: note: use '!=' to turn this compound assignment into an inequality comparison } else if (fi->frag |= fpos_frag(new_pos)) { ^~ != Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") Signed-off-by: Nicolas Iooss <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2016-09-05ovl: fix workdir creationMiklos Szeredi1-2/+2
Workdir creation fails in latest kernel. Fix by allowing EOPNOTSUPP as a valid return value from vfs_removexattr(XATTR_NAME_POSIX_ACL_*). Upper filesystem may not support ACL and still be perfectly able to support overlayfs. Reported-by: Martin Ziegler <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Fixes: c11b9fdd6a61 ("ovl: remove posix_acl_default from workdir") Cc: <[email protected]>
2016-09-05KVM: s390: vsie: fix riccbdDavid Hildenbrand1-1/+1
We store the address of riccbd at the wrong location, overwriting gvrd. This means that our nested guest will not be able to use runtime instrumentation. Also, a memory leak, if our KVM guest actually sets gvrd. Not noticed until now, as KVM guests never make use of gvrd and runtime instrumentation wasn't completely tested yet. Reported-by: Fan Zhang <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Cornelia Huck <[email protected]>
2016-09-05x86/efi: Use efi_exit_boot_services()Jeffrey Hugo1-69/+67
The eboot code directly calls ExitBootServices. This is inadvisable as the UEFI spec details a complex set of errors, race conditions, and API interactions that the caller of ExitBootServices must get correct. The eboot code attempts allocations after calling ExitBootSerives which is not permitted per the spec. Call the efi_exit_boot_services() helper intead, which handles the allocation scenario properly. Signed-off-by: Jeffrey Hugo <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Leif Lindholm <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: <[email protected]> Signed-off-by: Matt Fleming <[email protected]>
2016-09-05efi/libstub: Use efi_exit_boot_services() in FDTJeffrey Hugo1-10/+27
The FDT code directly calls ExitBootServices. This is inadvisable as the UEFI spec details a complex set of errors, race conditions, and API interactions that the caller of ExitBootServices must get correct. The FDT code does not handle EFI_INVALID_PARAMETER as required by the spec, which causes intermittent boot failures on the Qualcomm Technologies QDF2432. Call the efi_exit_boot_services() helper intead, which handles the EFI_INVALID_PARAMETER scenario properly. Signed-off-by: Jeffrey Hugo <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Leif Lindholm <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: <[email protected]> Signed-off-by: Matt Fleming <[email protected]>
2016-09-05efi/libstub: Introduce ExitBootServices helperJeffrey Hugo2-0/+83
The spec allows ExitBootServices to fail with EFI_INVALID_PARAMETER if a race condition has occurred where the EFI has updated the memory map after the stub grabbed a reference to the map. The spec defines a retry proceedure with specific requirements to handle this scenario. This scenario was previously observed on x86 - commit d3768d885c6c ("x86, efi: retry ExitBootServices() on failure") but the current fix is not spec compliant and the scenario is now observed on the Qualcomm Technologies QDF2432 via the FDT stub which does not handle the error and thus causes boot failures. The user will notice the boot failure as the kernel is not executed and the system may drop back to a UEFI shell, but will be unresponsive to input and the system will require a power cycle to recover. Add a helper to the stub library that correctly adheres to the spec in the case of EFI_INVALID_PARAMETER from ExitBootServices and can be universally used across all stub implementations. Signed-off-by: Jeffrey Hugo <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Leif Lindholm <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: <[email protected]> Signed-off-by: Matt Fleming <[email protected]>
2016-09-05efi/libstub: Allocate headspace in efi_get_memory_map()Jeffrey Hugo5-49/+111
efi_get_memory_map() allocates a buffer to store the memory map that it retrieves. This buffer may need to be reused by the client after ExitBootServices() is called, at which point allocations are not longer permitted. To support this usecase, provide the allocated buffer size back to the client, and allocate some additional headroom to account for any reasonable growth in the map that is likely to happen between the call to efi_get_memory_map() and the client reusing the buffer. Signed-off-by: Jeffrey Hugo <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Leif Lindholm <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: <[email protected]> Signed-off-by: Matt Fleming <[email protected]>
2016-09-05usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS conditionYoshihiro Shimoda1-2/+9
The previous driver is possible to stop the transfer wrongly. For example: 1) An interrupt happens, but not BRDY interruption. 2) Read INTSTS0. And than state->intsts0 is not set to BRDY. 3) BRDY is set to 1 here. 4) Read BRDYSTS. 5) Clear the BRDYSTS. And then. the BRDY is cleared wrongly. Remarks: - The INTSTS0.BRDY is read only. - If any bits of BRDYSTS are set to 1, the BRDY is set to 1. - If BRDYSTS is 0, the BRDY is set to 0. So, this patch adds condition to avoid such situation. (And about NRDYSTS, this is not used for now. But, avoiding any side effects, this patch doesn't touch it.) Fixes: d5c6a1e024dd ("usb: renesas_usbhs: fixup interrupt status clear method") Cc: <[email protected]> # v3.8+ Signed-off-by: Yoshihiro Shimoda <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2016-09-05usb: phy: phy-generic: Check clk_prepare_enable() errorFabio Estevam1-2/+6
clk_prepare_enable() may fail, so we should better check its return value and propagate it in the case of failure. Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2016-09-05usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CONYoshihiro Shimoda1-0/+2
This driver should clear the bit. Otherwise, the VBUS will output wrongly if the usb port on a board has VBUS output capability. Fixes: 746bfe63bba3 ("usb: gadget: renesas_usb3: add support for Renesas USB3.0 peripheral controller") Cc: <[email protected]> # v4.5+ Signed-off-by: Yoshihiro Shimoda <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2016-09-05Revert "usb: dwc3: gadget: always decrement by 1"John Youn1-1/+4
This reverts commit 6f8245b4e37c ("usb: dwc3: gadget: always decrement by 1"). We can't always decrement this value. We should decrement only if the calculation of free slots results in a LINK TRB being among one of the free slots (dequeue < enqueue). Otherwise, if the LINK TRB is not among the free slots then it should not be decremented. Signed-off-by: John Youn <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>
2016-09-05efi: Fix handling error value in fdt_find_uefi_paramsAndrzej Hajda1-2/+5
of_get_flat_dt_subnode_by_name can return negative value in case of error. Assigning the result to unsigned variable and checking if the variable is lesser than zero is incorrect and always false. The patch fixes it by using signed variable to check the result. The problem has been detected using semantic patch scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci Signed-off-by: Andrzej Hajda <[email protected]> Cc: Bartlomiej Zolnierkiewicz <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Shawn Lin <[email protected]> Cc: Mark Rutland <[email protected]> Cc: <[email protected]> Signed-off-by: Matt Fleming <[email protected]>
2016-09-05efi: Make for_each_efi_memory_desc_in_map() cope with running on XenJan Beulich1-1/+1
While commit 55f1ea15216 ("efi: Fix for_each_efi_memory_desc_in_map() for empty memmaps") made an attempt to deal with empty memory maps, it didn't address the case where the map field never gets set, as is apparently the case when running under Xen. Reported-by: <[email protected]> Tested-by: <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Mark Rutland <[email protected]> Cc: <[email protected]> # v4.7+ Signed-off-by: Jan Beulich <[email protected]> [ Guard the loop with a NULL check instead of pointer underflow ] Signed-off-by: Matt Fleming <[email protected]>
2016-09-05sched/core: Fix a race between try_to_wake_up() and a woken up taskBalbir Singh1-0/+22
The origin of the issue I've seen is related to a missing memory barrier between check for task->state and the check for task->on_rq. The task being woken up is already awake from a schedule() and is doing the following: do { schedule() set_current_state(TASK_(UN)INTERRUPTIBLE); } while (!cond); The waker, actually gets stuck doing the following in try_to_wake_up(): while (p->on_cpu) cpu_relax(); Analysis: The instance I've seen involves the following race: CPU1 CPU2 while () { if (cond) break; do { schedule(); set_current_state(TASK_UN..) } while (!cond); wakeup_routine() spin_lock_irqsave(wait_lock) raw_spin_lock_irqsave(wait_lock) wake_up_process() } try_to_wake_up() set_current_state(TASK_RUNNING); .. list_del(&waiter.list); CPU2 wakes up CPU1, but before it can get the wait_lock and set current state to TASK_RUNNING the following occurs: CPU3 wakeup_routine() raw_spin_lock_irqsave(wait_lock) if (!list_empty) wake_up_process() try_to_wake_up() raw_spin_lock_irqsave(p->pi_lock) .. if (p->on_rq && ttwu_wakeup()) .. while (p->on_cpu) cpu_relax() .. CPU3 tries to wake up the task on CPU1 again since it finds it on the wait_queue, CPU1 is spinning on wait_lock, but immediately after CPU2, CPU3 got it. CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and the task is spinning on the wait_lock. Interestingly since p->on_rq is checked under pi_lock, I've noticed that try_to_wake_up() finds p->on_rq to be 0. This was the most confusing bit of the analysis, but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq check is not reliable without this fix IMHO. The race is visible (based on the analysis) only when ttwu_queue() does a remote wakeup via ttwu_queue_remote. In which case the p->on_rq change is not done uder the pi_lock. The result is that after a while the entire system locks up on the raw_spin_irqlock_save(wait_lock) and the holder spins infintely Reproduction of the issue: The issue can be reproduced after a long run on my system with 80 threads and having to tweak available memory to very low and running memory stress-ng mmapfork test. It usually takes a long time to reproduce. I am trying to work on a test case that can reproduce the issue faster, but thats work in progress. I am still testing the changes on my still in a loop and the tests seem OK thus far. Big thanks to Benjamin and Nick for helping debug this as well. Ben helped catch the missing barrier, Nick caught every missing bit in my theory. Signed-off-by: Balbir Singh <[email protected]> [ Updated comment to clarify matching barriers. Many architectures do not have a full barrier in switch_to() so that cannot be relied upon. ] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Cc: Alexey Kardashevskiy <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-09-05perf/core: Remove WARN from perf_event_read()Peter Zijlstra1-4/+12
This effectively reverts commit: 71e7bc2bab77 ("perf/core: Check return value of the perf_event_read() IPI") ... and puts in a comment explaining why we ignore the return value. Reported-by: Vegard Nossum <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: David Carrillo-Cisneros <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 71e7bc2bab77 ("perf/core: Check return value of the perf_event_read() IPI") Signed-off-by: Ingo Molnar <[email protected]>
2016-09-05locking/barriers: Don't use sizeof(void) in lockless_dereference()Johannes Berg1-3/+4
My previous commit: 112dc0c8069e ("locking/barriers: Suppress sparse warnings in lockless_dereference()") caused sparse to complain that (in radix-tree.h) we use sizeof(void) since that rcu_dereference()s a void *. Really, all we need is to have the expression *p in here somewhere to make sure p is a pointer type, and sizeof(*p) was the thing that came to my mind first to make sure that's done without really doing anything at runtime. Another thing I had considered was using typeof(*p), but obviously we can't just declare a typeof(*p) variable either, since that may end up being void. Declaring a variable as typeof(*p)* gets around that, and still checks that typeof(*p) is valid, so do that. This type construction can't be done for _________p1 because that will actually be used and causes sparse address space warnings, so keep a separate unused variable for it. Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E . McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: 112dc0c8069e ("locking/barriers: Suppress sparse warnings in lockless_dereference()") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-09-05x86/microcode/AMD: Fix load of builtin microcode with randomized memoryBorislav Petkov1-3/+10
We do not need to add the randomization offset when the microcode is built in. Reported-and-tested-by: Emanuel Czirai <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-09-05ARM: dts: imx6qdl: Fix SPDIF regressionFabio Estevam1-1/+1
Commit 833f2cbf7091 ("ARM: dts: imx6: change the core clock of spdif") changed many more clocks than only the SPDIF core clock as stated in the commit message. The MLB clock has been added and this causes SPDIF regression as reported by Xavi Drudis Ferran and also in this forum post: https://forum.digikey.com/thread/34240 The MX6Q Reference Manual does not mention that MLB is a clock related to SPDIF, so change it back to a dummy clock to restore SPDIF functionality. Thanks to Ambika for providing the fix at: https://community.nxp.com/thread/387131 Fixes: 833f2cbf7091 ("ARM: dts: imx6: change the core clock of spdif") Cc: <[email protected]> # 4.4.x Reported-by: Xavi Drudis Ferran <[email protected]> Signed-off-by: Fabio Estevam <[email protected]> Tested-by: Xavi Drudis Ferran <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
2016-09-04Linux 4.8-rc5Linus Torvalds1-1/+1
2016-09-04af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'Linus Torvalds2-23/+24
Right now we use the 'readlock' both for protecting some of the af_unix IO path and for making the bind be single-threaded. The two are independent, but using the same lock makes for a nasty deadlock due to ordering with regards to filesystem locking. The bind locking would want to nest outside the VSF pathname locking, but the IO locking wants to nest inside some of those same locks. We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix splice-bind deadlock") which moved the readlock inside the vfs locks, but that caused problems with overlayfs that will then call back into filesystem routines that take the lock in the wrong order anyway. Splitting the locks means that we can go back to having the bind lock be the outermost lock, and we don't have any deadlocks with lock ordering. Acked-by: Rainer Weikusat <[email protected]> Acked-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-09-04Revert "af_unix: Fix splice-bind deadlock"Linus Torvalds1-40/+26
This reverts commit c845acb324aa85a39650a14e7696982ceea75dc1. It turns out that it just replaces one deadlock with another one: we can still get the wrong lock ordering with the readlock due to overlayfs calling back into the filesystem layer and still taking the vfs locks after the readlock. The proper solution ends up being to just split the readlock into two pieces: the bind lock (taken *outside* the vfs locks) and the IO lock (taken *inside* the filesystem locks). The two locks are independent anyway. Signed-off-by: Linus Torvalds <[email protected]> Reviewed-by: Shmulik Ladkani <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-09-04Merge branch 'vxlan-fixes'David S. Miller1-26/+12
Jiri Benc says: ==================== vxlan: fix error reporting This patchset improves checking for invalid configuration in VXLAN and fixes problems with duplicated and inappropriate error messages. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-09-04vxlan: fix duplicated and wrong error messagesJiri Benc1-26/+9
vxlan_dev_configure outputs error messages before returning, no need to print again the same mesages in vxlan_newlink. Also, vxlan_dev_configure may return a particular error code for a different reason than vxlan_newlink thinks. Move the remaining error messages into vxlan_dev_configure and let vxlan_newlink just pass on the error code. Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-09-04vxlan: reject multicast destination without an interfaceJiri Benc1-0/+3
Currently, kernel accepts configurations such as: ip l a type vxlan dstport 4789 id 1 group 239.192.0.1 ip l a type vxlan dstport 4789 id 1 group ff0e::110 However, neither of those really works. In the IPv4 case, the interface cannot be brought up ("RTNETLINK answers: No such device"). This is because multicast join will be rejected without the interface being specified. In the IPv6 case, multicast wil be joined on the first interface found. This is not what the user wants as it depends on random factors (order of interfaces). Note that it's possible to add a local address but it doesn't solve anything. For IPv4, it's not considered in the multicast join (thus the same error as above is returned on ifup). This could be added but it wouldn't help for IPv6 anyway. For IPv6, we do need the interface. Just reject a configuration that sets multicast address and does not provide an interface. Nobody can depend on the previous behavior as it never worked. Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-09-04bonding: Fix bonding crashMahesh Bandewar3-3/+21
Following few steps will crash kernel - (a) Create bonding master > modprobe bonding miimon=50 (b) Create macvlan bridge on eth2 > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ type macvlan (c) Now try adding eth2 into the bond > echo +eth2 > /sys/class/net/bond0/bonding/slaves <crash> Bonding does lots of things before checking if the device enslaved is busy or not. In this case when the notifier call-chain sends notifications, the bond_netdev_event() assumes that the rx_handler /rx_handler_data is registered while the bond_enslave() hasn't progressed far enough to register rx_handler for the new slave. This patch adds a rx_handler check that can be performed right at the beginning of the enslave code to avoid getting into this situation. Signed-off-by: Mahesh Bandewar <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-09-04pNFS: Don't forget the layout stateid if there are outstanding LAYOUTGETsTrond Myklebust1-1/+2
If there are outstanding LAYOUTGET rpc calls, then we want to ensure that we keep the layout stateid around so we that don't inadvertently pick up an old/misordered sequence id. The race is as follows: Client Server ====== ====== LAYOUTGET(seqid) LAYOUTGET(seqid) return LAYOUTGET(seqid+1) return LAYOUTGET(seqid+2) process LAYOUTGET(seqid+2) forget layout process LAYOUTGET(seqid+1) If it forgets the layout stateid before processing seqid+1, then the client will not check the layout->plh_barrier, and so will set the stateid with seqid+1. Signed-off-by: Trond Myklebust <[email protected]>
2016-09-04Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix for an AMD erratum so machines without a BIOS fix work" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/AMD: Apply erratum 665 on machines without a BIOS fix
2016-09-04Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixlet from the timers departement: - A fix for scheduler stalls in the tick idle code affecting NOHZ_FULL kernels - A trivial compile fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/nohz: Fix softlockup on scheduler stalls in kvm guest clocksource/drivers/atmel-pit: Fix compilation error
2016-09-04nvme-rdma: destroy nvme queue rdma resources on connect failureSteve Wise1-3/+11
After address resolution, the nvme_rdma_queue rdma resources are allocated. If rdma route resolution or the connect fails, or the controller reconnect times out and gives up, then the rdma resources need to be freed. Otherwise, rdma resources are leaked. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Steve Wise <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2016-09-04nvme_rdma: keep a ref on the ctrl during delete/flushSteve Wise1-8/+18
Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Steve Wise <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>