aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-09-07drm/nouveau: Reset MST branching unit before enablingLyude Paul1-8/+12
When probing a new MST device, it's not safe to make any assumptions about it's current state. While most well mannered MST hubs will just disable the branching unit on hotplug disconnects, this isn't enough to save us from various other scenarios that might have resulted in something writing to the MST branching unit before we got control of it. This could happen if a previous probe we tried failed, if we're booting in kexec context and the hub is still in the state the last kernel put it in, etc. Luckily; there is no reason we can't just reset the branching unit every time we enable a new topology. So, fix this by resetting it on enabling new topologies to ensure that we always start off with a clean, unmodified topology state on MST sinks. This fixes occasional hard-lockups on my P50's laptop dock (e.g. AUX times out all DPCD trasactions) observed after multiple docks, undocks, and module reloads. Signed-off-by: Lyude Paul <[email protected]> Cc: [email protected] Cc: Karol Herbst <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
2018-09-07drm/nouveau: Only write DP_MSTM_CTRL when neededLyude Paul1-9/+36
Currently, nouveau will re-write the DP_MSTM_CTRL register for an MST hub every time it receives a long HPD pulse on DP. This isn't actually necessary and additionally, has some unintended side effects. With the P50 I've got here, rewriting DP_MSTM_CTRL constantly seems to make it rather likely (1 out of 5 times usually) that bringing up MST with it's ThinkPad dock will fail and result in sideband messages timing out in the middle. Afterwards, successive probes don't manage to get the dock to communicate properly over MST sideband properly. Many times sideband message timeouts from MST hubs are indicative of either the source or the sink dropping an ESI event, which can cause DRM's perspective of the topology's current state to go out of sync with reality. While it's tough to really know for sure what's happening to the dock, using userspace tools to write to DP_MSTM_CTRL in the middle of the MST link probing process does appear to make things flaky. It's possible that when we write to DP_MSTM_CTRL, the function that gets triggered to respond in the dock's firmware temporarily puts it in a state where it might end up not reporting an ESI to the source, or ends up dropping a sideband message we sent it. So, to fix this we make it so that when probing an MST topology, we respect it's current state. If the dock's already enabled, we simply read DP_MSTM_CTRL and disable the topology if it's value is not what we expected. Otherwise, we perform the normal MST probing dance. We avoid taking any action except if the state of the MST topology actually changes. This fixes MST sideband message timeouts and detection failures on my P50 with its ThinkPad dock. Signed-off-by: Lyude Paul <[email protected]> Cc: [email protected] Cc: Karol Herbst <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
2018-09-07drm/nouveau: Remove useless poll_enable() call in drm_load()Lyude Paul1-3/+1
Again, this doesn't do anything. drm_kms_helper_poll_enable() will have already been called in nouveau_display_init() Signed-off-by: Lyude Paul <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: Lukas Wunner <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
2018-09-07drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()Lyude Paul1-1/+0
This won't do anything but potentially make us miss hotplugs. We already call drm_kms_helper_poll_disable() in nouveau_pmops_suspend()->nouveau_display_suspend()->nouveau_display_fini() Signed-off-by: Lyude Paul <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: Lukas Wunner <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
2018-09-07drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()Lyude Paul1-1/+0
This doesn't do anything, drm_kms_helper_poll_enable() gets called in nouveau_pmops_resume()->nouveau_display_resume()->nouveau_display_init() already. Signed-off-by: Lyude Paul <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: Lukas Wunner <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
2018-09-07drm/nouveau: Fix deadlocks in nouveau_connector_detect()Lyude Paul1-0/+22
When we disable hotplugging on the GPU, we need to be able to synchronize with each connector's hotplug interrupt handler before the interrupt is finally disabled. This can be a problem however, since nouveau_connector_detect() currently grabs a runtime power reference when handling connector probing. This will deadlock the runtime suspend handler like so: [ 861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds. [ 861.483290] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.486332] kworker/0:2 D 0 61 2 0x80000000 [ 861.487044] Workqueue: events nouveau_display_hpd_work [nouveau] [ 861.487737] Call Trace: [ 861.488394] __schedule+0x322/0xaf0 [ 861.489070] schedule+0x33/0x90 [ 861.489744] rpm_resume+0x19c/0x850 [ 861.490392] ? finish_wait+0x90/0x90 [ 861.491068] __pm_runtime_resume+0x4e/0x90 [ 861.491753] nouveau_display_hpd_work+0x22/0x60 [nouveau] [ 861.492416] process_one_work+0x231/0x620 [ 861.493068] worker_thread+0x44/0x3a0 [ 861.493722] kthread+0x12b/0x150 [ 861.494342] ? wq_pool_ids_show+0x140/0x140 [ 861.494991] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.495648] ret_from_fork+0x3a/0x50 [ 861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds. [ 861.496968] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.498341] kworker/6:2 D 0 320 2 0x80000080 [ 861.499045] Workqueue: pm pm_runtime_work [ 861.499739] Call Trace: [ 861.500428] __schedule+0x322/0xaf0 [ 861.501134] ? wait_for_completion+0x104/0x190 [ 861.501851] schedule+0x33/0x90 [ 861.502564] schedule_timeout+0x3a5/0x590 [ 861.503284] ? mark_held_locks+0x58/0x80 [ 861.503988] ? _raw_spin_unlock_irq+0x2c/0x40 [ 861.504710] ? wait_for_completion+0x104/0x190 [ 861.505417] ? trace_hardirqs_on_caller+0xf4/0x190 [ 861.506136] ? wait_for_completion+0x104/0x190 [ 861.506845] wait_for_completion+0x12c/0x190 [ 861.507555] ? wake_up_q+0x80/0x80 [ 861.508268] flush_work+0x1c9/0x280 [ 861.508990] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 861.509735] nvif_notify_put+0xb1/0xc0 [nouveau] [ 861.510482] nouveau_display_fini+0xbd/0x170 [nouveau] [ 861.511241] nouveau_display_suspend+0x67/0x120 [nouveau] [ 861.511969] nouveau_do_suspend+0x5e/0x2d0 [nouveau] [ 861.512715] nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau] [ 861.513435] pci_pm_runtime_suspend+0x6b/0x180 [ 861.514165] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.514897] __rpm_callback+0x7a/0x1d0 [ 861.515618] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.516313] rpm_callback+0x24/0x80 [ 861.517027] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.517741] rpm_suspend+0x142/0x6b0 [ 861.518449] pm_runtime_work+0x97/0xc0 [ 861.519144] process_one_work+0x231/0x620 [ 861.519831] worker_thread+0x44/0x3a0 [ 861.520522] kthread+0x12b/0x150 [ 861.521220] ? wq_pool_ids_show+0x140/0x140 [ 861.521925] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.522622] ret_from_fork+0x3a/0x50 [ 861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds. [ 861.523977] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.525349] kworker/6:0 D 0 1329 2 0x80000000 [ 861.526073] Workqueue: events nvif_notify_work [nouveau] [ 861.526751] Call Trace: [ 861.527411] __schedule+0x322/0xaf0 [ 861.528089] schedule+0x33/0x90 [ 861.528758] rpm_resume+0x19c/0x850 [ 861.529399] ? finish_wait+0x90/0x90 [ 861.530073] __pm_runtime_resume+0x4e/0x90 [ 861.530798] nouveau_connector_detect+0x7e/0x510 [nouveau] [ 861.531459] ? ww_mutex_lock+0x47/0x80 [ 861.532097] ? ww_mutex_lock+0x47/0x80 [ 861.532819] ? drm_modeset_lock+0x88/0x130 [drm] [ 861.533481] drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper] [ 861.534127] drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper] [ 861.534940] nouveau_connector_hotplug+0x98/0x120 [nouveau] [ 861.535556] nvif_notify_work+0x2d/0xb0 [nouveau] [ 861.536221] process_one_work+0x231/0x620 [ 861.536994] worker_thread+0x44/0x3a0 [ 861.537757] kthread+0x12b/0x150 [ 861.538463] ? wq_pool_ids_show+0x140/0x140 [ 861.539102] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.539815] ret_from_fork+0x3a/0x50 [ 861.540521] Showing all locks held in the system: [ 861.541696] 2 locks held by kworker/0:2/61: [ 861.542406] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543071] #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543814] 1 lock held by khungtaskd/64: [ 861.544535] #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 861.545160] 3 locks held by kworker/6:2/320: [ 861.545896] #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.546702] #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.547443] #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau] [ 861.548146] 1 lock held by dmesg/983: [ 861.548889] 2 locks held by zsh/1250: [ 861.549605] #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 861.550393] #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 861.551122] 6 locks held by kworker/6:0/1329: [ 861.551957] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.552765] #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.553582] #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper] [ 861.554357] #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper] [ 861.555227] #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper] [ 861.556133] #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm] [ 861.557864] ============================================= [ 861.559507] NMI backtrace for cpu 2 [ 861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018 [ 861.561948] Call Trace: [ 861.562757] dump_stack+0x8e/0xd3 [ 861.563516] nmi_cpu_backtrace.cold.3+0x14/0x5a [ 861.564269] ? lapic_can_unplug_cpu.cold.27+0x42/0x42 [ 861.565029] nmi_trigger_cpumask_backtrace+0xa1/0xae [ 861.565789] arch_trigger_cpumask_backtrace+0x19/0x20 [ 861.566558] watchdog+0x316/0x580 [ 861.567355] kthread+0x12b/0x150 [ 861.568114] ? reset_hung_task_detector+0x20/0x20 [ 861.568863] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.569598] ret_from_fork+0x3a/0x50 [ 861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7: [ 861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120 [ 861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120 [ 861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120 [ 861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120 [ 861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120 [ 861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120 [ 861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120 [ 861.572428] Kernel panic - not syncing: hung_task: blocked tasks So: fix this by making it so that normal hotplug handling /only/ happens so long as the GPU is currently awake without any pending runtime PM requests. In the event that a hotplug occurs while the device is suspending or resuming, we can simply defer our response until the GPU is fully runtime resumed again. Changes since v4: - Use a new trick I came up with using pm_runtime_get() instead of the hackish junk we had before Signed-off-by: Lyude Paul <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Lukas Wunner <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
2018-09-07drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()Lyude Paul1-9/+11
It's true we can't resume the device from poll workers in nouveau_connector_detect(). We can however, prevent the autosuspend timer from elapsing immediately if it hasn't already without risking any sort of deadlock with the runtime suspend/resume operations. So do that instead of entirely avoiding grabbing a power reference. Signed-off-by: Lyude Paul <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Lukas Wunner <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
2018-09-07drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requestsLyude Paul4-2/+64
Currently, nouveau uses the generic drm_fb_helper_output_poll_changed() function provided by DRM as it's output_poll_changed callback. Unfortunately however, this function doesn't grab runtime PM references early enough and even if it did-we can't block waiting for the device to resume in output_poll_changed() since it's very likely that we'll need to grab the fb_helper lock at some point during the runtime resume process. This currently results in deadlocking like so: [ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds. [ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.676527] kworker/4:0 D 0 37 2 0x80000000 [ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper] [ 246.678704] Call Trace: [ 246.679753] __schedule+0x322/0xaf0 [ 246.680916] schedule+0x33/0x90 [ 246.681924] schedule_preempt_disabled+0x15/0x20 [ 246.683023] __mutex_lock+0x569/0x9a0 [ 246.684035] ? kobject_uevent_env+0x117/0x7b0 [ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.686179] mutex_lock_nested+0x1b/0x20 [ 246.687278] ? mutex_lock_nested+0x1b/0x20 [ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper] [ 246.692611] process_one_work+0x231/0x620 [ 246.693725] worker_thread+0x214/0x3a0 [ 246.694756] kthread+0x12b/0x150 [ 246.695856] ? wq_pool_ids_show+0x140/0x140 [ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.697998] ret_from_fork+0x3a/0x50 [ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds. [ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.702278] kworker/0:1 D 0 60 2 0x80000000 [ 246.703293] Workqueue: pm pm_runtime_work [ 246.704393] Call Trace: [ 246.705403] __schedule+0x322/0xaf0 [ 246.706439] ? wait_for_completion+0x104/0x190 [ 246.707393] schedule+0x33/0x90 [ 246.708375] schedule_timeout+0x3a5/0x590 [ 246.709289] ? mark_held_locks+0x58/0x80 [ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40 [ 246.711222] ? wait_for_completion+0x104/0x190 [ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190 [ 246.713094] ? wait_for_completion+0x104/0x190 [ 246.713964] wait_for_completion+0x12c/0x190 [ 246.714895] ? wake_up_q+0x80/0x80 [ 246.715727] ? get_work_pool+0x90/0x90 [ 246.716649] flush_work+0x1c9/0x280 [ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 246.718442] __cancel_work_timer+0x146/0x1d0 [ 246.719247] cancel_delayed_work_sync+0x13/0x20 [ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] [ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau] [ 246.721897] pci_pm_runtime_suspend+0x6b/0x190 [ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.723737] __rpm_callback+0x7a/0x1d0 [ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.725607] rpm_callback+0x24/0x80 [ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.727376] rpm_suspend+0x142/0x6b0 [ 246.728185] pm_runtime_work+0x97/0xc0 [ 246.728938] process_one_work+0x231/0x620 [ 246.729796] worker_thread+0x44/0x3a0 [ 246.730614] kthread+0x12b/0x150 [ 246.731395] ? wq_pool_ids_show+0x140/0x140 [ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.732878] ret_from_fork+0x3a/0x50 [ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds. [ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.736113] kworker/4:2 D 0 422 2 0x80000080 [ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] [ 246.737665] Call Trace: [ 246.738490] __schedule+0x322/0xaf0 [ 246.739250] schedule+0x33/0x90 [ 246.739908] rpm_resume+0x19c/0x850 [ 246.740750] ? finish_wait+0x90/0x90 [ 246.741541] __pm_runtime_resume+0x4e/0x90 [ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau] [ 246.743124] drm_atomic_commit+0x4a/0x50 [drm] [ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper] [ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper] [ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper] [ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] [ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper] [ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau] [ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper] [ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper] [ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper] [ 246.752314] process_one_work+0x231/0x620 [ 246.752979] worker_thread+0x44/0x3a0 [ 246.753838] kthread+0x12b/0x150 [ 246.754619] ? wq_pool_ids_show+0x140/0x140 [ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.756162] ret_from_fork+0x3a/0x50 [ 246.756847] Showing all locks held in the system: [ 246.758261] 3 locks held by kworker/4:0/37: [ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.761516] 2 locks held by kworker/0:1/60: [ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.763890] 1 lock held by khungtaskd/64: [ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 246.765588] 5 locks held by kworker/4:2/422: [ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper] [ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper] [ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm] [ 246.770839] 1 lock held by dmesg/1038: [ 246.771739] 2 locks held by zsh/1172: [ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 246.775522] ============================================= After trying dozens of different solutions, I found one very simple one that should also have the benefit of preventing us from having to fight locking for the rest of our lives. So, we work around these deadlocks by deferring all fbcon hotplug events that happen after the runtime suspend process starts until after the device is resumed again. Changes since v7: - Fixup commit message - Daniel Vetter Changes since v6: - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia Changes since v5: - Come up with the (hopefully final) solution for solving this dumb problem, one that is a lot less likely to cause issues with locking in the future. This should work around all deadlock conditions with fbcon brought up thus far. Changes since v4: - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock condition that Lukas described - Just move all of this out of drm_fb_helper. It seems that other DRM drivers have already figured out other workarounds for this. If other drivers do end up needing this in the future, we can just move this back into drm_fb_helper again. Changes since v3: - Actually check if fb_helper is NULL in both new helpers - Actually check drm_fbdev_emulation in both new helpers - Don't fire off a fb_helper hotplug unconditionally; only do it if the following conditions are true (as otherwise, calling this in the wrong spot will cause Bad Things to happen): - fb_helper hotplug handling was actually inhibited previously - fb_helper actually has a delayed hotplug pending - fb_helper is actually bound - fb_helper is actually initialized - Add __must_check to drm_fb_helper_suspend_hotplug(). There's no situation where a driver would actually want to use this without checking the return value, so enforce that - Rewrite and clarify the documentation for both helpers. - Make sure to return true in the drm_fb_helper_suspend_hotplug() stub that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION isn't enabled - Actually grab the toplevel fb_helper lock in drm_fb_helper_resume_hotplug(), since it's possible other activity (such as a hotplug) could be going on at the same time the driver calls drm_fb_helper_resume_hotplug(). We need this to check whether or not drm_fb_helper_hotplug_event() needs to be called anyway Signed-off-by: Lyude Paul <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Lukas Wunner <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
2018-09-07drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()Lyude Paul1-1/+0
Since actual hotplug notifications don't get disabled until nouveau_display_fini() is called, all this will do is cause any hotplugs that happen between this drm_kms_helper_poll_disable() call and the actual hotplug disablement to potentially be dropped if ACPI isn't around to help us. Signed-off-by: Lyude Paul <[email protected]> Acked-by: Karol Herbst <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Lukas Wunner <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
2018-09-07drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placementLyude Paul1-2/+5
Turns out this part is my fault for not noticing when reviewing 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling"). Currently we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work(). This makes basically no sense however, because that means we're calling drm_kms_helper_poll_enable() every time we schedule the hotplug detection work. This is also against the advice mentioned in drm_kms_helper_poll_enable()'s documentation: Note that calls to enable and disable polling must be strictly ordered, which is automatically the case when they're only call from suspend/resume callbacks. Of course, hotplugs can't really be ordered. They could even happen immediately after we called drm_kms_helper_poll_disable() in nouveau_display_fini(), which can lead to all sorts of issues. Additionally; enabling polling /after/ we call drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to probe connectors so long as polling is disabled. So; simply move this back into nouveau_display_init() again. The race condition that both of these patches attempted to work around has already been fixed properly in d61a5c106351 ("drm/nouveau: Fix deadlock on runtime suspend") Fixes: 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling") Signed-off-by: Lyude Paul <[email protected]> Acked-by: Karol Herbst <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: Lukas Wunner <[email protected]> Cc: Peter Ujfalusi <[email protected]> Cc: [email protected] Signed-off-by: Ben Skeggs <[email protected]>
2018-09-06RDMA/mlx4: Ensure that maximal send/receive SGE less than supported by HWLeon Romanovsky1-3/+5
In calculating the global maximum number of the Scatter/Gather elements supported, the following four maximum parameters must be taken into consideration: max_sg_rq, max_sg_sq, max_desc_sz_rq and max_desc_sz_sq. However instead of bringing this complexity to query_device, which still won't be sufficient anyway (the calculations are dependent on QP type), the safer approach will be to restore old code, which will give us 32 SGEs. Fixes: 33023fb85a42 ("IB/core: add max_send_sge and max_recv_sge attributes") Reported-by: Chuck Lever <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-06RDMA/cma: Protect cma dev list with lockParav Pandit1-5/+7
When AF_IB addresses are used during rdma_resolve_addr() a lock is not held. A cma device can get removed while list traversal is in progress which may lead to crash. ie CPU0 CPU1 ==== ==== rdma_resolve_addr() cma_resolve_ib_dev() list_for_each() cma_remove_one() cur_dev->device mutex_lock(&lock) list_del(); mutex_unlock(&lock); cma_process_remove(); Therefore, hold a lock while traversing the list which avoids such situation. Cc: <[email protected]> # 3.10 Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()") Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-06i2c: xiic: Make the start and the byte count write atomicShubhrajyoti Datta1-0/+4
Disable interrupts while configuring the transfer and enable them back. We have below as the programming sequence 1. start and slave address 2. byte count and stop In some customer platform there was a lot of interrupts between 1 and 2 and after slave address (around 7 clock cyles) if 2 is not executed then the transaction is nacked. To fix this case make the 2 writes atomic. Signed-off-by: Shubhrajyoti Datta <[email protected]> Signed-off-by: Michal Simek <[email protected]> [wsa: added a newline for better readability] Signed-off-by: Wolfram Sang <[email protected]> Cc: [email protected]
2018-09-06irqchip/gic-v3-its: Cap lpi_id_bits to reduce memory footprintJia He1-1/+3
Commit fe8e93504ce8 ("irqchip/gic-v3-its: Use full range of LPIs"), removes the cap for lpi_id_bits, which causes the following warning to trigger on a QDF2400 server: WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:4066 __alloc_pages_nodemask ... Call trace: __alloc_pages_nodemask+0x2d8/0x1188 alloc_pages_current+0x8c/0xd8 its_allocate_prop_table+0x5c/0xb8 its_init+0x220/0x3c0 gic_init_bases+0x250/0x380 gic_acpi_init+0x16c/0x2a4 In its_alloc_lpi_tables(), lpi_id_bits is 24 in QDF2400. The allocation in allocate_prop_table() tries therefore to allocate 16M (order 12 if pagesize=4k), which triggers the warning. As said by MarcL Capping lpi_id_bits at 16 (which is what we had before) is plenty, will save a some memory, and gives some margin before we need to push it up again. Bring the upper limit of lpi_id_bits back to prevent Fixes: fe8e93504ce8 ("irqchip/gic-v3-its: Use full range of LPIs") Suggested-by: Marc Zyngier <[email protected]> Signed-off-by: Jia He <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Marc Zyngier <[email protected]> Tested-by: Olof Johansson <[email protected]> Cc: Jason Cooper <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-09-06xtensa: ISS: don't allocate memory in platform_setupMax Filippov1-10/+15
Memory allocator is not initialized at that point yet, use static array instead. Cc: [email protected] Signed-off-by: Max Filippov <[email protected]>
2018-09-06cgroup: kselftests: add test_core to .gitignoreLei Yang1-0/+1
Update .gitignore file. Signed-off-by: Lei Yang <[email protected]> Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
2018-09-06dm raid: fix reshape race on small devicesHeinz Mauelshagen1-47/+1
Loading a new mapping table, the dm-raid target's constructor retrieves the volatile reshaping state from the raid superblocks. When the new table is activated in a following resume, the actual reshape position is retrieved. The reshape driven by the previous mapping can already have finished on small and/or fast devices thus updating raid superblocks about the new raid layout. This causes the actual array state (e.g. stripe size reshape finished) to be inconsistent with the one in the new mapping, causing hangs with left behind devices. This race does not occur with usual raid device sizes but with small ones (e.g. those created by the lvm2 test suite). Fix by no longer transferring stale/inconsistent raid_set state during preresume. Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2018-09-06block: bfq: swap puts in bfqg_and_blkg_putKonstantin Khlebnikov1-2/+2
Fix trivial use-after-free. This could be last reference to bfqg. Fixes: 8f9bebc33dd7 ("block, bfq: access and cache blkg data only when safe") Acked-by: Paolo Valente <[email protected]> Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-09-06dm: disable CRYPTO_TFM_REQ_MAY_SLEEP to fix a GFP_KERNEL recursion deadlockMikulas Patocka2-7/+7
There's a XFS on dm-crypt deadlock, recursing back to itself due to the crypto subsystems use of GFP_KERNEL, reported here: https://bugzilla.kernel.org/show_bug.cgi?id=200835 * dm-crypt calls crypt_convert in xts mode * init_crypt from xts.c calls kmalloc(GFP_KERNEL) * kmalloc(GFP_KERNEL) recurses into the XFS filesystem, the filesystem tries to submit some bios and wait for them, causing a deadlock Fix this by updating both the DM crypt and integrity targets to no longer use the CRYPTO_TFM_REQ_MAY_SLEEP flag, which will change the crypto allocations from GFP_KERNEL to GFP_ATOMIC, therefore they can't recurse into a filesystem. A GFP_ATOMIC allocation can fail, but init_crypt() in xts.c handles the allocation failure gracefully - it will fall back to preallocated buffer if the allocation fails. The crypto API maintainer says that the crypto API only needs to allocate memory when dealing with unaligned buffers and therefore turning CRYPTO_TFM_REQ_MAY_SLEEP off is safe (see this discussion: https://www.redhat.com/archives/dm-devel/2018-August/msg00195.html ) Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2018-09-06memory: ti-aemif: fix a potential NULL-pointer dereferenceBartosz Golaszewski1-1/+1
Platform data pointer may be NULL. We check it everywhere but in one place. Fix it. Fixes: 8af70cd2ca50 ("memory: aemif: add support for board files") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]> Cc: [email protected] Signed-off-by: Olof Johansson <[email protected]>
2018-09-06arm64: fix erroneous warnings in page freeing functionsMark Rutland1-4/+6
In pmd_free_pte_page() and pud_free_pmd_page() we try to warn if they hit a present non-table entry. In both cases we'll warn for non-present entries, as the VM_WARN_ON() only checks the entry is not a table entry. This has been observed to result in warnings when booting a v4.19-rc2 kernel under qemu. Fix this by bailing out earlier for non-present entries. Fixes: ec28bb9c9b0826d7 ("arm64: Implement page table free interfaces") Signed-off-by: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Cc: Catalin Marinas <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2018-09-06Merge tag 'arm-soc/for-4.19/drivers-fixes' of ↵Olof Johansson1-0/+1
https://github.com/Broadcom/stblinux into fixes This pull request contains Broadcom ARM/ARM64 SoCs drivers fixes for 4.19, please pull the following: - Peter adds an alias to the Raspberry Pi HWMON driver that was just merged as part of the 4.19 merge window * tag 'arm-soc/for-4.19/drivers-fixes' of https://github.com/Broadcom/stblinux: hwmon: rpi: add module alias to raspberrypi-hwmon Signed-off-by: Olof Johansson <[email protected]>
2018-09-06firmware: arm_scmi: fix divide by zero when sustained_perf_level is zeroSudeep Holla1-1/+7
Firmware can provide zero as values for sustained performance level and corresponding sustained frequency in kHz in order to hide the actual frequencies and provide only abstract values. It may endup with divide by zero scenario resulting in kernel panic. Let's set the multiplication factor to one if either one or both of them (sustained_perf_level and sustained_freq) are set to zero. Fixes: a9e3fbfaa0ff ("firmware: arm_scmi: add initial support for performance protocol") Reported-by: Ionela Voinescu <[email protected]> Signed-off-by: Sudeep Holla <[email protected]> Signed-off-by: Olof Johansson <[email protected]>
2018-09-06Merge tag 'apparmor-pr-2018-09-06' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor fix from John Johansen: "A fix for an issue syzbot discovered last week: - Fix for bad debug check when converting secids to secctx" * tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: fix bad debug check in apparmor_secid_to_secctx()
2018-09-06Merge tag 'trace-v4.19-rc2' of ↵Linus Torvalds2-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This fixes two annoying bugs: - The first one is a side effect caused by using SRCU for rcuidle tracepoints. It seems that the perf was depending on the rcuidle tracepoints to make RCU watch when it wasn't. The real fix will be to have perf use SRCU instead of depending on RCU watching, but that can't be done until SRCU is safe to use in NMI context (Paul's working on that). - The second bug fix is for a bug that's been periodically making my tests fail randomly for some time. I haven't had time to track it down, but finally have. It has to do with stressing NMIs (via perf) while enabling or disabling ftrace function handling with lockdep enabled. If an interrupt happens and just as it returns, it sets lockdep back to "interrupts enabled" but before it returns an NMI is triggered, and if this happens while printk_nmi_enter has a breakpoint attached to it (because ftrace is converting it to or from nop to call fentry), the breakpoint trap also calls into lockdep, and since returning from the NMI to a interrupt handler, interrupts were disabled when the NMI went off, lockdep keeps its state as interrupts disabled when it returns back from the interrupt handler where interrupts are enabled. This causes lockdep_assert_irqs_enabled() to trigger a false positive" * tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: printk/tracing: Do not trace printk_nmi_enter() tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
2018-09-06Merge tag 'for-4.19-rc2-tag' of ↵Linus Torvalds9-55/+197
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix for improper fsync after hardlink - fix for a corruption during file deduplication - use after free fixes - RCU warning fix - fix for buffered write to nodatacow file * tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcu btrfs: use after free in btrfs_quota_enable btrfs: btrfs_shrink_device should call commit transaction at the end btrfs: fix qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadata Btrfs: fix data corruption when deduplicating between different files Btrfs: sync log after logging new name Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space
2018-09-06printk/tracing: Do not trace printk_nmi_enter()Steven Rostedt (VMware)1-2/+2
I hit the following splat in my tests: ------------[ cut here ]------------ IRQs not enabled as expected WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 EIP: tick_nohz_idle_enter+0x44/0x8c Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00 75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0 e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007 ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296 CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0 Call Trace: do_idle+0x33/0x202 cpu_startup_entry+0x61/0x63 start_secondary+0x18e/0x1ed startup_32_smp+0x164/0x168 irq event stamp: 18773830 hardirqs last enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10 hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10 softirqs last enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b ---[ end trace b7c64aa79e17954a ]--- After a bit of debugging, I found what was happening. This would trigger when performing "perf" with a high NMI interrupt rate, while enabling and disabling function tracer. Ftrace uses breakpoints to convert the nops at the start of functions to calls to the function trampolines. The breakpoint traps disable interrupts and this makes calls into lockdep via the trace_hardirqs_off_thunk in the entry.S code. What happens is the following: do_idle { [interrupts enabled] <interrupt> [interrupts disabled] TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET test if pt_regs say return to interrupts enabled [yes] TRACE_IRQS_ON [lockdep says irqs are on] <nmi> nmi_enter() { printk_nmi_enter() [traced by ftrace] [ hit ftrace breakpoint ] <breakpoint exception> TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET [return from breakpoint] test if pt_regs say interrupts enabled [no] [iret back to interrupt] [iret back to code] tick_nohz_idle_enter() { lockdep_assert_irqs_enabled() [lockdep say no!] Although interrupts are indeed enabled, lockdep thinks it is not, and since we now do asserts via lockdep, it gives a false warning. The issue here is that printk_nmi_enter() is called before lockdep_off(), which disables lockdep (for this reason) in NMIs. By simply not allowing ftrace to see printk_nmi_enter() (via notrace annotation) we keep lockdep from getting confused. Cc: [email protected] Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI") Acked-by: Sergey Senozhatsky <[email protected]> Acked-by: Petr Mladek <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-09-06Merge tag 'mlx5e-fixes-2018-09-05' of ↵David S. Miller9-60/+79
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-09-05 This pull request contains some fixes for mlx5 etherent netdevice and core driver. Please pull and let me know if there's any problem. For -stable v4.9: ('net/mlx5: Fix debugfs cleanup in the device init/remove flow') For -stable v4.12: ("net/mlx5: E-Switch, Fix memory leak when creating switchdev mode FDB tables") For -stable v4.13: ("net/mlx5: Fix use-after-free in self-healing flow") For -stable v4.14: ("net/mlx5: Check for error in mlx5_attach_interface") For -stable v4.15: ("net/mlx5: Fix not releasing read lock when adding flow rules") For -stable v4.17: ("net/mlx5: Fix possible deadlock from lockdep when adding fte to fg") For -stable v4.18: ("net/mlx5: Use u16 for Work Queue buffer fragment size") ==================== Signed-off-by: David S. Miller <[email protected]>
2018-09-06HID: i2c-hid: Don't reset device upon system resumeKai-Heng Feng2-10/+7
Raydium touchscreen triggers interrupt storm after system-wide suspend: [ 179.085033] i2c_hid i2c-CUST0000:00: i2c_hid_get_input: incomplete report (58/65535) According to Raydium, Windows driver does not reset the device after system resume. The HID over I2C spec does specify a reset should be used at intialization, but it doesn't specify if reset is required for system suspend. Tested this patch on other i2c-hid touchpanels I have and those touchpanels do work after S3 without doing reset. If any regression happens to other touchpanel vendors, we can use quirk for Raydium devices. There's still one device uses I2C_HID_QUIRK_RESEND_REPORT_DESCR so keep it there. Cc: Aaron Ma <[email protected]> Cc: AceLan Kao <[email protected]> Signed-off-by: Kai-Heng Feng <[email protected]> Reviewed-by: Benjamin Tissoires <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2018-09-06rbd: support cloning across namespacesIlya Dryomov1-14/+97
If parent_get class method is not supported by the OSDs, fall back to the legacy class method and assume that the parent is in the default (i.e. "") namespace. The "use the child's image namespace" workaround is no longer needed because creating images within namespaces will require parent_get aware OSDs. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Jason Dillaman <[email protected]>
2018-09-06rbd: factor out get_parent_info()Ilya Dryomov1-48/+86
In preparation for the new parent_get and parent_overlap_get class methods, factor out the fetching and decoding of parent data. As a side effect, we now decode all four fields in the "no parent" case. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Jason Dillaman <[email protected]>
2018-09-06ceph: avoid a use-after-free in ceph_destroy_options()Ilya Dryomov1-5/+11
syzbot reported a use-after-free in ceph_destroy_options(), called from ceph_mount(). The problem was that create_fs_client() consumed the opt pointer on some errors, but not on all of them. Make sure it always consumes both libceph and ceph options. Reported-by: [email protected] Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: "Yan, Zheng" <[email protected]>
2018-09-06cpu/hotplug: Prevent state corruption on error rollbackThomas Gleixner1-2/+3
When a teardown callback fails, the CPU hotplug code brings the CPU back to the previous state. The previous state becomes the new target state. The rollback happens in undo_cpu_down() which increments the state unconditionally even if the state is already the same as the target. As a consequence the next CPU hotplug operation will start at the wrong state. This is easily to observe when __cpu_disable() fails. Prevent the unconditional undo by checking the state vs. target before incrementing state and fix up the consequently wrong conditional in the unplug code which handles the failure of the final CPU take down on the control CPU side. Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core") Reported-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Tested-by: Sudeep Holla <[email protected]> Tested-by: Neeraj Upadhyay <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] ----
2018-09-06cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()Neeraj Upadhyay1-3/+3
The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the load of st->should_run to prevent reordering of the later load/stores w.r.t. the load of st->should_run. Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core") Signed-off-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-09-06batman-adv: Prevent duplicated tvlv handlerSven Eckelmann1-2/+6
The function batadv_tvlv_handler_register is responsible for adding new tvlv_handler to the handler_list. It first checks whether the entry already is in the list or not. If it is, then the creation of a new entry is aborted. But the lock for the list is only held when the list is really modified. This could lead to duplicated entries because another context could create an entry with the same key between the check and the list manipulation. The check and the manipulation of the list must therefore be in the same locked code section. Fixes: ef26157747d4 ("batman-adv: tvlv - basic infrastructure") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-09-06batman-adv: Prevent duplicated global TT entrySven Eckelmann1-2/+4
The function batadv_tt_global_orig_entry_add is responsible for adding new tt_orig_list_entry to the orig_list. It first checks whether the entry already is in the list or not. If it is, then the creation of a new entry is aborted. But the lock for the list is only held when the list is really modified. This could lead to duplicated entries because another context could create an entry with the same key between the check and the list manipulation. The check and the manipulation of the list must therefore be in the same locked code section. Fixes: d657e621a0f5 ("batman-adv: add reference counting for type batadv_tt_orig_list_entry") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-09-06batman-adv: Prevent duplicated softif_vlan entrySven Eckelmann1-7/+18
The function batadv_softif_vlan_get is responsible for adding new softif_vlan to the softif_vlan_list. It first checks whether the entry already is in the list or not. If it is, then the creation of a new entry is aborted. But the lock for the list is only held when the list is really modified. This could lead to duplicated entries because another context could create an entry with the same key between the check and the list manipulation. The check and the manipulation of the list must therefore be in the same locked code section. Fixes: 5d2c05b21337 ("batman-adv: add per VLAN interface attribute framework") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-09-06x86/process: Don't mix user/kernel regs in 64bit __show_regs()Jann Horn4-13/+26
When the kernel.print-fatal-signals sysctl has been enabled, a simple userspace crash will cause the kernel to write a crash dump that contains, among other things, the kernel gsbase into dmesg. As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE in this case. This also moves the bitness-specific logic from show_regs() into process_{32,64}.c. Fixes: 45807a1df9f5 ("vdso: print fatal signals") Signed-off-by: Jann Horn <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-09-06x86/tsc: Prevent result truncation on 32bitChuanhua Lei1-1/+1
Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then dividing it by HZ. Both tsc_khz and the temporary variable holding the multiplication result are of type unsigned long, so on 32bit the result is truncated to the lower 32bit. Use u64 as type for the temporary variable and cast tsc_khz to it before multiplying. [ tglx: Massaged changelog and removed pointless braces ] Fixes: cf7a63ef4e02 ("x86/tsc: Calibrate tsc only once") Signed-off-by: Chuanhua Lei <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Len Brown <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Rajvi Jingar <[email protected]> Cc: Dou Liyang <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-09-06batman-adv: Prevent duplicated nc_node entrySven Eckelmann1-12/+15
The function batadv_nc_get_nc_node is responsible for adding new nc_nodes to the in_coding_list and out_coding_list. It first checks whether the entry already is in the list or not. If it is, then the creation of a new entry is aborted. But the lock for the list is only held when the list is really modified. This could lead to duplicated entries because another context could create an entry with the same key between the check and the list manipulation. The check and the manipulation of the list must therefore be in the same locked code section. Fixes: d56b1705e28c ("batman-adv: network coding - detect coding nodes and remove these after timeout") Signed-off-by: Sven Eckelmann <[email protected]> Acked-by: Marek Lindner <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-09-06batman-adv: Prevent duplicated gateway_node entrySven Eckelmann1-2/+9
The function batadv_gw_node_add is responsible for adding new gw_node to the gateway_list. It is expecting that the caller already checked that there is not already an entry with the same key or not. But the lock for the list is only held when the list is really modified. This could lead to duplicated entries because another context could create an entry with the same key between the check and the list manipulation. The check and the manipulation of the list must therefore be in the same locked code section. Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Signed-off-by: Sven Eckelmann <[email protected]> Acked-by: Marek Lindner <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-09-06batman-adv: Fix segfault when writing to sysfs elp_intervalSven Eckelmann1-8/+17
The per hardif sysfs file "batman_adv/elp_interval" is using the generic functions to store/show uint values. The helper __batadv_store_uint_attr requires the softif net_device as parameter to print the resulting change as info text when the users writes to this file. It uses the helper function batadv_info to add it at the same time to the kernel ring buffer and to the batman-adv debug log (when CONFIG_BATMAN_ADV_DEBUG is enabled). The function batadv_info requires as first parameter the batman-adv softif net_device. This parameter is then used to find the private buffer which contains the debug log for this batman-adv interface. But batadv_store_throughput_override used as first argument the slave net_device. This slave device doesn't have the batadv_priv private data which is access by batadv_info. Writing to this file with CONFIG_BATMAN_ADV_DEBUG enabled can either lead to a segfault or to memory corruption. Fixes: 0744ff8fa8fa ("batman-adv: Add hard_iface specific sysfs wrapper macros for UINT") Signed-off-by: Sven Eckelmann <[email protected]> Acked-by: Marek Lindner <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-09-06batman-adv: Fix segfault when writing to throughput_overrideSven Eckelmann1-2/+3
The per hardif sysfs file "batman_adv/throughput_override" prints the resulting change as info text when the users writes to this file. It uses the helper function batadv_info to add it at the same time to the kernel ring buffer and to the batman-adv debug log (when CONFIG_BATMAN_ADV_DEBUG is enabled). The function batadv_info requires as first parameter the batman-adv softif net_device. This parameter is then used to find the private buffer which contains the debug log for this batman-adv interface. But batadv_store_throughput_override used as first argument the slave net_device. This slave device doesn't have the batadv_priv private data which is access by batadv_info. Writing to this file with CONFIG_BATMAN_ADV_DEBUG enabled can either lead to a segfault or to memory corruption. Fixes: 0b5ecc6811bd ("batman-adv: add throughput override attribute to hard_ifaces") Signed-off-by: Sven Eckelmann <[email protected]> Acked-by: Marek Lindner <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-09-06batman-adv: Avoid probe ELP information leakSven Eckelmann1-1/+1
The probe ELPs for WiFi interfaces are expanded to contain at least BATADV_ELP_MIN_PROBE_SIZE bytes. This is usually a lot more than the number of bytes which the template ELP packet requires. These extra padding bytes were not initialized and thus could contain data which were previously stored at the same location. It is therefore required to set it to some predefined or random values to avoid leaking private information from the system transmitting these kind of packets. Fixes: e4623c913508 ("batman-adv: Avoid probe ELP information leak") Signed-off-by: Sven Eckelmann <[email protected]> Acked-by: Antonio Quartulli <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2018-09-06ACPI / LPSS: Force LPSS quirks on bootZhang Rui1-1/+1
Commit 12864ff8545f (ACPI / LPSS: Avoid PM quirks on suspend and resume from hibernation) bypasses lpss quirks for S3 and S4, by setting a flag for S3/S4 in acpi_lpss_suspend(), and check that flag in acpi_lpss_resume(). But this overlooks the boot case where acpi_lpss_resume() may get called without a corresponding acpi_lpss_suspend() having been called. Thus force setting the flag during boot. Fixes: 12864ff8545f (ACPI / LPSS: Avoid PM quirks on suspend and resume from hibernation) Link: https://bugzilla.kernel.org/show_bug.cgi?id=200989 Reported-and-tested-by: William Lieurance <[email protected]> Signed-off-by: Zhang Rui <[email protected]> Cc: 4.15+ <[email protected]> # 4.15+: 12864ff8545f (ACPI / LPSS: Avoid ...) Signed-off-by: Rafael J. Wysocki <[email protected]>
2018-09-06ACPI / bus: Only call dmi_check_system() on X86Jean Delvare1-6/+7
Calling dmi_check_system() early only works on X86. Other architectures initialize the DMI subsystem later so it's not ready yet when ACPI itself gets initialized. In the best case it results in a useless call to a function which will do nothing. But depending on the dmi implementation, it could also result in warnings. Best is to not call the function when it can't work and isn't needed. Additionally, if anyone ever needs to add non-x86 quirks, it would surprisingly not work, so document the limitation to avoid confusion. Signed-off-by: Jean Delvare <[email protected]> Fixes: cce4f632db20 (ACPI: fix early DSDT dmi check warnings on ia64) Signed-off-by: Rafael J. Wysocki <[email protected]>
2018-09-06Merge tag 'fixes-for-v4.19-rc2' of ↵Greg Kroah-Hartman6-18/+36
git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus Felipe writes: usb: fixes for v4.19-rc2 NET2280 got a fix to an old patch attempting to fix locking for gadget framework callbacks. DWC2 fixed a bug where driver was attempting to access registers before clocks were enabled. DWC3 got a fix for ULPI clock configuration on Baytrail devices. FOTG210 plugged a memory leak and Renesas USB3 fixed ep0 maxpacket size.
2018-09-05Merge branch 'iucv-fixes'David S. Miller2-14/+26
Julian Wiedmann says: ==================== net/iucv: fixes 2018-09-05 please apply three straight-forward fixes for iucv. One that prevents leaking the skb on malformed inbound packets, one to fix the error handling on transmit error, and one to get rid of a compile warning. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-09-05net/iucv: declare iucv_path_table_empty() as staticJulian Wiedmann1-1/+1
Fixes a compile warning. Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-09-05net/af_iucv: fix skb handling on HiperTransport xmit errorJulian Wiedmann1-11/+23
When sending an skb, afiucv_hs_send() bails out on various error conditions. But currently the caller has no way of telling whether the skb was freed or not - resulting in potentially either a) leaked skbs from iucv_send_ctrl(), or b) double-free's from iucv_sock_sendmsg(). As dev_queue_xmit() will always consume the skb (even on error), be consistent and also free the skb from all other error paths. This way callers no longer need to care about managing the skb. Signed-off-by: Julian Wiedmann <[email protected]> Reviewed-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>