aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-04-10net/sched: taprio: fix picos_per_byte miscalculationLeandro Dorileo1-16/+81
The Time Aware Priority Scheduler is heavily dependent to link speed, it relies on it to calculate transmission bytes per cycle, we can't properly calculate the so called budget if the device has failed to report the link speed. In that case we can't dequeue packets assuming a wrong budget. This patch makes sure we fail to dequeue case: 1) __ethtool_get_link_ksettings() reports error or 2) the ethernet driver failed to set the ksettings' speed value (setting link speed to SPEED_UNKNOWN). Additionally we re calculate the budget whenever the link speed is changed. Fixes: 5a781ccbd19e4 ("tc: Add support for configuring the taprio scheduler") Signed-off-by: Leandro Dorileo <[email protected]> Reviewed-by: Vedang Patel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10team: set slave to promisc if team is already in promisc modeHangbin Liu1-0/+26
After adding a team interface to bridge, the team interface will enter promisc mode. Then if we add a new slave to team0, the slave will keep promisc off. Fix it by setting slave to promisc on if team master is already in promisc mode, also do the same for allmulti. v2: add promisc and allmulti checking when delete ports Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10net/tls: fix build without CONFIG_TLS_DEVICEJakub Kicinski1-0/+2
buildbot noticed that TLS_HW is not defined if CONFIG_TLS_DEVICE=n. Wrap the cleanup branch into an ifdef, tls_device_free_resources_tx() wouldn't be compiled either in this case. Fixes: 35b71a34ada6 ("net/tls: don't leak partially sent record in device mode") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10clk: x86: Add system specific quirk to mark clocks as criticalDavid Müller3-3/+35
Since commit 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL"), the pmc_plt_clocks of the Bay Trail SoC are unconditionally gated off. Unfortunately this will break systems where these clocks are used for external purposes beyond the kernel's knowledge. Fix it by implementing a system specific quirk to mark the necessary pmc_plt_clks as critical. Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL") Signed-off-by: David Müller <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>
2019-04-10block: do not leak memory in bio_copy_user_iov()Jérôme Glisse1-1/+4
When bio_add_pc_page() fails in bio_copy_user_iov() we should free the page we just allocated otherwise we are leaking it. Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: [email protected] Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Jérôme Glisse <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-10net: strparser: fix commentJakub Kicinski1-1/+1
Fix comment. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10PCI: pciehp: Ignore Link State Changes after powering off a slotSergey Miroshnichenko1-0/+4
During a safe hot remove, the OS powers off the slot, which may cause a Data Link Layer State Changed event. The slot has already been set to OFF_STATE, so that event results in re-enabling the device, making it impossible to safely remove it. Clear out the Presence Detect Changed and Data Link Layer State Changed events when the disabled slot has settled down. It is still possible to re-enable the device if it remains in the slot after pressing the Attention Button by pressing it again. Fixes the problem that Micah reported below: an NVMe drive power button may not actually turn off the drive. Link: https://bugzilla.kernel.org/show_bug.cgi?id=203237 Reported-by: Micah Parrish <[email protected]> Tested-by: Micah Parrish <[email protected]> Signed-off-by: Sergey Miroshnichenko <[email protected]> [bhelgaas: changelog, add bugzilla URL] Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Lukas Wunner <[email protected]> Cc: [email protected] # v4.19+
2019-04-10Merge branch 'tls-leaks'David S. Miller5-22/+41
Jakub Kicinski says: ==================== net: tls: fix memory leaks and freeing skbs This series fixes two memory issues and a stack overflow. First two patches are fairly simple leaks. Third patch partially reverts an optimization made to the strparser which causes creation of skb->frag_list->skb->frag_list... chains of 100s of skbs, leading to recursive kfree_skb() filling up the kernel stack. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-04-10net: strparser: partially revert "strparser: Call skb_unclone conditionally"Jakub Kicinski1-7/+5
This reverts the first part of commit 4e485d06bb8c ("strparser: Call skb_unclone conditionally"). To build a message with multiple fragments we need our own root of frag_list. We can't simply use the frag_list of orig_skb, because it will lead to linking all orig_skbs together creating very long frag chains, and causing stack overflow on kfree_skb() (which is called recursively on the frag_lists). BUG: stack guard page was hit at 00000000d40fad41 (stack is 0000000029dde9f4..000000008cce03d5) kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP RIP: 0010:free_one_page+0x2b/0x490 Call Trace: __free_pages_ok+0x143/0x2c0 skb_release_data+0x8e/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 [...] skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 __kfree_skb+0xe/0x20 tcp_disconnect+0xd6/0x4d0 tcp_close+0xf4/0x430 ? tcp_check_oom+0xf0/0xf0 tls_sk_proto_close+0xe4/0x1e0 [tls] inet_release+0x36/0x60 __sock_release+0x37/0xa0 sock_close+0x11/0x20 __fput+0xa2/0x1d0 task_work_run+0x89/0xb0 exit_to_usermode_loop+0x9a/0xa0 do_syscall_64+0xc0/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Let's leave the second unclone conditional, as I'm not entirely sure what is its purpose :) Fixes: 4e485d06bb8c ("strparser: Call skb_unclone conditionally") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10net/tls: don't leak partially sent record in device modeJakub Kicinski4-14/+32
David reports that tls triggers warnings related to sk->sk_forward_alloc not being zero at destruction time: WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110 WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170 When sender fills up the write buffer and dies from SIGPIPE. This is due to the device implementation not cleaning up the partially_sent_record. This is because commit a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") moved the partial record cleanup to the SW-only path. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Reported-by: David Beckett <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10net/tls: fix the IV leaksJakub Kicinski1-1/+4
Commit f66de3ee2c16 ("net/tls: Split conf to rx + tx") made freeing of IV and record sequence number conditional to SW path only, but commit e8f69799810c ("net/tls: Add generic NIC offload infrastructure") also allocates that state for the device offload configuration. Remember to free it. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10sparc64/pci_sun4v: fix ATU checks for large DMA masksChristoph Hellwig1-9/+11
Now that we allow drivers to always need to set larger than required DMA masks we need to be a little more careful in the sun4v PCI iommu driver to chose when to select the ATU support - a larger DMA mask can be set even when the platform does not support ATU, so we always have to check if it is avaiable before using it. Add a little helper for that and use it in all the places where we make ATU usage decisions based on the DMA mask. Fixes: 24132a419c68 ("sparc64/pci_sun4v: allow large DMA masks") Reported-by: Meelis Roos <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Meelis Roos <[email protected]> Acked-by: David S. Miller <[email protected]>
2019-04-10ipv4: Handle RTA_GATEWAY set to 0David Ahern2-2/+4
Govindarajulu reported a regression with Network Manager which sends an RTA_GATEWAY attribute with the address set to 0. Fixup the handling of RTA_GATEWAY to only set fc_gw_family if the gateway address is actually set. Fixes: f35b794b3b405 ("ipv4: Prepare fib_config for IPv6 gateway") Reported-by: Govindarajulu Varadarajan <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds9-41/+42
Pull rdma fixes from Jason Gunthorpe: "Several driver bug fixes posted in the last several weeks - Several bug fixes for the hfi1 driver 'TID RDMA' functionality merged into 5.1. Since TID RDMA is on by default these all seem to be regressions. - Wrong software permission checks on memory in mlx5 - Memory leak in vmw_pvrdma during driver remove - Several bug fixes for hns driver features merged into 5.1" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/hfi1: Do not flush send queue in the TID RDMA second leg RDMA/hns: Bugfix for SCC hem free RDMA/hns: Fix bug that caused srq creation to fail RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_remove IB/mlx5: Reset access mask when looping inside page fault handler IB/hfi1: Fix the allocation of RSM table IB/hfi1: Eliminate opcode tests on mr deref IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state IB/hfi1: Failed to drain send queue when QP is put into error state
2019-04-10Merge branch 'ibmvnic-features'David S. Miller1-7/+25
Thomas Falcon says: ==================== ibmvnic: Fix netdev features settings on reset In its current state, a driver reset clobbers any feature settings a user may have toggled and will disable GRO as it is not explicitly enabled in the driver. This patch set enables GRO and tries to retain user settings after a reset. If the underlying carrier changes, however, the driver will disable features unsupported by the new carrier. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-04-10ibmvnic: Fix netdev feature clobbering during a resetThomas Falcon1-6/+24
While determining offload capabilities of backing hardware during a device reset, the driver is clobbering current feature settings. Update hw_features on reset instead of features unless a feature is enabled that is no longer supported on the current backing device. Also enable features that were not supported prior to the reset but were previously enabled or requested by the user. This can occur if the reset is the result of a carrier change, such as a device failover or partition migration. Signed-off-by: Thomas Falcon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10ibmvnic: Enable GROThomas Falcon1-1/+1
Enable Generic Receive Offload in the ibmvnic driver. Signed-off-by: Thomas Falcon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10MAINTAINERS: BMIPS: Add internal Broadcom mailing listFlorian Fainelli1-0/+1
There is a patchwork instance behind bcm-kernel-feedback-list that is helpful to track submissions, add this list for the MIPS BMIPS entry. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Paul Burton <[email protected]> Cc: [email protected]
2019-04-10Merge branch 'net-sched-move-back-qlen-to-per-CPU-accounting'David S. Miller5-71/+105
Paolo Abeni says: ==================== net: sched: move back qlen to per CPU accounting The commit 46b1c18f9deb ("net: sched: put back q.qlen into a single location") introduced some measurable regression in the contended scenarios for lock qdisc. As Eric suggested we could replace q.qlen access with calls to qdisc_is_empty() in the datapath and revert the above commit. The TC subsystem updates qdisc->is_empty in a somewhat loose way: notably 'is_empty' is set only when the qdisc dequeue() calls return a NULL ptr. That is, the invocation after the last packet is dequeued. The above is good enough for BYPASS implementation - the only downside is that we end up avoiding the optimization for a very small time-frame - but will break hard things when internal structures consistency for classful qdisc relies on child qdisc_is_empty(). A more strict 'is_empty' update adds a relevant complexity to its life-cycle, so this series takes a different approach: we allow lockless qdisc to switch from per CPU accounting to global stats accounting when the NOLOCK bit is cleared. Since most pieces of infrastructure are already in place, this requires very little changes to the pfifo_fast qdisc, and any later NOLOCK qdisc can hook there with little effort - no need to maintain two different implementations. The first 2 patches removes direct qlen access from non core TC code, the 3rd and 4th patches place and use the infrastructure to allow stats account switching and the 5th patch is the actual revert. v1 -> v2: - fixed build issues - more descriptive commit message for patch 5/5 ==================== Signed-off-by: David S. Miller <[email protected]>
2019-04-10Revert: "net: sched: put back q.qlen into a single location"Paolo Abeni3-20/+28
This revert commit 46b1c18f9deb ("net: sched: put back q.qlen into a single location"). After the previous patch, when a NOLOCK qdisc is enslaved to a locking qdisc it switches to global stats accounting. As a consequence, when a classful qdisc accesses directly a child qdisc's qlen, such qdisc is not doing per CPU accounting and qlen value is consistent. In the control path nobody uses directly qlen since commit e5f0e8f8e45 ("net: sched: introduce and use qdisc tree flush/purge helpers"), so we can remove the contented atomic ops from the datapath. v1 -> v2: - complete the qdisc_qstats_atomic_qlen_dec() -> qdisc_qstats_cpu_qlen_dec() replacement, fix build issue - more descriptive commit message Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, tooPaolo Abeni3-9/+42
Since stats updating is always consistent with TCQ_F_CPUSTATS flag, we can disable it at qdisc creation time flipping such bit. In my experiments, if the NOLOCK flag is cleared, per CPU stats accounting does not give any measurable performance gain, but it waste some memory. Let's clear TCQ_F_CPUSTATS together with NOLOCK, when enslaving a NOLOCK qdisc to 'lock' one. Use stats update helper inside pfifo_fast, to cope correctly with TCQ_F_CPUSTATS flag change. As a side effect, q.qlen value for any child qdiscs is always consistent for all lock classfull qdiscs. Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10net: sched: always do stats accounting according to TCQ_F_CPUSTATSPaolo Abeni2-42/+31
The core sched implementation checks independently for NOLOCK flag to acquire/release the root spin lock and for qdisc_is_percpu_stats() to account per CPU values in many places. This change update the last few places checking the TCQ_F_NOLOCK to do per CPU stats accounting according to qdisc_is_percpu_stats() value. The above allows to clean dev_requeue_skb() implementation a bit and makes stats update always consistent with a single flag. v1 -> v2: - do not move qdisc_is_empty definition, fix build issue Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10net: sched: prefer qdisc_is_empty() over direct qlen accessPaolo Abeni1-1/+1
When checking for root qdisc queue length, do not access directly q.qlen. In the following patches we will move back qlen accounting to per CPU values for NOLOCK qdiscs. Instead, prefer the qdisc_is_empty() helper usage. Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10net: caif: avoid using qdisc_qlen()Paolo Abeni1-4/+8
Such helper does not cope correctly with NOLOCK qdiscs. In the following patches we will move back qlen to per CPU values for such qdiscs, so qdisc_qlen_sum() is not an option, too. Instead, use qlen only for lock qdiscs, and always set flow off for NOLOCK qdiscs with a not empty tx queue. Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10Merge branch 'mlxsw-Various-fixes'David S. Miller5-13/+36
Ido Schimmel says: ==================== mlxsw: Various fixes This patchset contains various small fixes for mlxsw. Patch #1 fixes a warning generated by switchdev core when the driver fails to insert an MDB entry in the commit phase. Patches #2-#4 fix a warning in check_flush_dependency() that can be triggered when a work item in a WQ_MEM_RECLAIM workqueue tries to flush a non-WQ_MEM_RECLAIM workqueue. It seems that the semantics of the WQ_MEM_RECLAIM flag are not very clear [1] and that various patches have been sent to remove it from various workqueues throughout the kernel [2][3][4] in order to silence the warning. These patches do the same for the workqueues created by mlxsw that probably should not have been created with this flag in the first place. Patch #5 fixes a regression where an IP address cannot be assigned to a VRF upper due to erroneous MAC validation check. Patch #6 adds a test case. Patch #7 adjusts Spectrum-2 shared buffer configuration to be compatible with Spectrum-1. The problem and fix are described in detail in the commit message. Please consider patches #1-#5 for 5.0.y. I verified they apply cleanly. [1] https://patchwork.kernel.org/patch/10791315/ [2] Commit ce162bfbc0b6 ("mac80211_hwsim: don't use WQ_MEM_RECLAIM") [3] Commit 39baf10310e6 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM") [4] Commit 75215e5bb22c ("iwcm: Don't allocate iwcm workqueue with WQ_MEM_RECLAIM") ==================== Signed-off-by: David S. Miller <[email protected]>
2019-04-10mlxsw: spectrum_buffers: Add a multicast pool for Spectrum-2Ido Schimmel1-8/+11
In Spectrum-1, when a multicast packet is admitted to the shared buffer it increases the quotas of all the ports and {port, TC} to which it is forwarded to. The above means that multicast packets are accounted multiple times in the shared buffer and can therefore cause the associated shared buffer pool to fill up very quickly. To work around this issue, commit e83c045e53d7 ("mlxsw: spectrum_buffers: Configure MC pool") added a dedicated multicast pool in which multicast packets are accounted. The issue is not present in Spectrum-2, but in order to be backward compatible with Spectrum-1, its default behavior is to allow a multicast packet to increase multiple egress quotas instead of one. Until the new (non-backward compatible) mode is supported, configure a dedicated multicast pool as in Spectrum-1. Fixes: fe099bf682ab ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10selftests: mlxsw: Test VRF MAC vetoingIdo Schimmel1-0/+20
Test that it is possible to set an IP address on a VRF and that it is not vetoed. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10mlxsw: spectrum_router: Do not check VRF MAC addressIdo Schimmel1-1/+1
Commit 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses") enabled the driver to veto router interface (RIF) MAC addresses that it cannot support. This check should only be performed for interfaces for which the driver actually configures a RIF. A VRF upper is not one of them, so ignore it. Without this patch it is not possible to set an IP address on the VRF device and use it as a loopback. Fixes: 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Alexander Petrovskiy <[email protected]> Tested-by: Alexander Petrovskiy <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueueIdo Schimmel1-1/+1
The workqueue is used to periodically update the networking stack about activity / statistics of various objects such as neighbours and TC actions. It should not be called as part of memory reclaim path, so remove the WQ_MEM_RECLAIM flag. Fixes: 3d5479e92087 ("mlxsw: core: Remove deprecated create_workqueue") Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueueIdo Schimmel1-1/+1
The ordered workqueue is used to offload various objects such as routes and neighbours in the order they are notified. It should not be called as part of memory reclaim path, so remove the WQ_MEM_RECLAIM flag. This can also result in a warning [1], if a worker tries to flush a non-WQ_MEM_RECLAIM workqueue. [1] [97703.542861] workqueue: WQ_MEM_RECLAIM mlxsw_core_ordered:mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] is flushing !WQ_MEM_RECLAIM events:rht_deferred_worker [97703.542884] WARNING: CPU: 1 PID: 32492 at kernel/workqueue.c:2605 check_flush_dependency+0xb5/0x130 ... [97703.542988] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018 [97703.543049] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] [97703.543061] RIP: 0010:check_flush_dependency+0xb5/0x130 ... [97703.543071] RSP: 0018:ffffb3f08137bc00 EFLAGS: 00010086 [97703.543076] RAX: 0000000000000000 RBX: ffff96e07740ae00 RCX: 0000000000000000 [97703.543080] RDX: 0000000000000094 RSI: ffffffff82dc1934 RDI: 0000000000000046 [97703.543084] RBP: ffffb3f08137bc20 R08: ffffffff82dc18a0 R09: 00000000000225c0 [97703.543087] R10: 0000000000000000 R11: 0000000000007eec R12: ffffffff816e4ee0 [97703.543091] R13: ffff96e06f6a5c00 R14: ffff96e077ba7700 R15: ffffffff812ab0c0 [97703.543097] FS: 0000000000000000(0000) GS:ffff96e077a80000(0000) knlGS:0000000000000000 [97703.543101] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [97703.543104] CR2: 00007f8cd135b280 CR3: 00000001e860e003 CR4: 00000000003606e0 [97703.543109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [97703.543112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [97703.543115] Call Trace: [97703.543129] __flush_work+0xbd/0x1e0 [97703.543137] ? __cancel_work_timer+0x136/0x1b0 [97703.543145] ? pwq_dec_nr_in_flight+0x49/0xa0 [97703.543154] __cancel_work_timer+0x136/0x1b0 [97703.543175] ? mlxsw_reg_trans_bulk_wait+0x145/0x400 [mlxsw_core] [97703.543184] cancel_work_sync+0x10/0x20 [97703.543191] rhashtable_free_and_destroy+0x23/0x140 [97703.543198] rhashtable_destroy+0xd/0x10 [97703.543254] mlxsw_sp_fib_destroy+0xb1/0xf0 [mlxsw_spectrum] [97703.543310] mlxsw_sp_vr_put+0xa8/0xc0 [mlxsw_spectrum] [97703.543364] mlxsw_sp_fib_node_put+0xbf/0x140 [mlxsw_spectrum] [97703.543418] ? mlxsw_sp_fib6_entry_destroy+0xe8/0x110 [mlxsw_spectrum] [97703.543475] mlxsw_sp_router_fib6_event_work+0x6cd/0x7f0 [mlxsw_spectrum] [97703.543484] process_one_work+0x1fd/0x400 [97703.543493] worker_thread+0x34/0x410 [97703.543500] kthread+0x121/0x140 [97703.543507] ? process_one_work+0x400/0x400 [97703.543512] ? kthread_park+0x90/0x90 [97703.543523] ret_from_fork+0x35/0x40 Fixes: a3832b31898f ("mlxsw: core: Create an ordered workqueue for FIB offload") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Semion Lisyansky <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueueIdo Schimmel1-1/+1
The EMAD workqueue is used to handle retransmission of EMAD packets that contain configuration data for the device's firmware. Given the workers need to allocate these packets and that the code is not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM flag. Fixes: d965465b60ba ("mlxsw: core: Fix possible deadlock") Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10mlxsw: spectrum_switchdev: Add MDB entries in prepare phaseIdo Schimmel1-1/+1
The driver cannot guarantee in the prepare phase that it will be able to write an MDB entry to the device. In case the driver returned success during the prepare phase, but then failed to add the entry in the commit phase, a WARNING [1] will be generated by the switchdev core. Fix this by doing the work in the prepare phase instead. [1] [ 358.544486] swp12s0: Commit of object (id=2) failed. [ 358.550061] WARNING: CPU: 0 PID: 30 at net/switchdev/switchdev.c:281 switchdev_port_obj_add_now+0x9b/0xe0 [ 358.560754] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 5.0.0-custom-13382-gf2449babf221 #1350 [ 358.570472] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ 358.580582] Workqueue: events switchdev_deferred_process_work [ 358.587001] RIP: 0010:switchdev_port_obj_add_now+0x9b/0xe0 ... [ 358.614109] RSP: 0018:ffffa6b900d6fe18 EFLAGS: 00010286 [ 358.619943] RAX: 0000000000000000 RBX: ffff8b00797ff000 RCX: 0000000000000000 [ 358.627912] RDX: ffff8b00b7a1d4c0 RSI: ffff8b00b7a152e8 RDI: ffff8b00b7a152e8 [ 358.635881] RBP: ffff8b005c3f5bc0 R08: 000000000000022b R09: 0000000000000000 [ 358.643850] R10: 0000000000000000 R11: ffffa6b900d6fcc8 R12: 0000000000000000 [ 358.651819] R13: dead000000000100 R14: ffff8b00b65a23c0 R15: 0ffff8b00b7a2200 [ 358.659790] FS: 0000000000000000(0000) GS:ffff8b00b7a00000(0000) knlGS:0000000000000000 [ 358.668820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 358.675228] CR2: 00007f00aad90de0 CR3: 00000001ca80d000 CR4: 00000000001006f0 [ 358.683188] Call Trace: [ 358.685918] switchdev_port_obj_add_deferred+0x13/0x60 [ 358.691655] switchdev_deferred_process+0x6b/0xf0 [ 358.696907] switchdev_deferred_process_work+0xa/0x10 [ 358.702548] process_one_work+0x1f5/0x3f0 [ 358.707022] worker_thread+0x28/0x3c0 [ 358.711099] ? process_one_work+0x3f0/0x3f0 [ 358.715768] kthread+0x10d/0x130 [ 358.719369] ? __kthread_create_on_node+0x180/0x180 [ 358.724815] ret_from_fork+0x35/0x40 Fixes: 3a49b4fde2a1 ("mlxsw: Adding layer 2 multicast support") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Alex Kushnarov <[email protected]> Tested-by: Alex Kushnarov <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-10lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecsHans Holmberg1-22/+28
The introduction of multipage bio vectors broke pblk's partial read logic due to it not being prepared for multipage bio vectors. Use bio vector iterators instead of direct bio vector indexing. Fixes: 07173c3ec276 ("block: enable multipage bvecs") Reported-by: Klaus Jensen <[email protected]> Signed-off-by: Hans Holmberg <[email protected]> Updated description. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-10IB/hfi1: Do not flush send queue in the TID RDMA second legKaike Wan1-23/+8
When a QP is put into error state, the send queue will be flushed. This mechanism is implemented in both the first and the second leg of the send engine. Since the second leg is only responsible for data transactions in the KDETH space for the TID RDMA WRITE request, it should not perform the flushing of the send queue. This patch removes the flushing function of the second leg, but still keeps the bailing out of the QP if it is put into error state. Fixes: 70dcb2e3dc6a ("IB/hfi1: Add the TID second leg send packet builder") Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-04-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds4-5/+22
Pull virtio fixes from Michael Tsirkin: "Several fixes, add more reviewers to the list" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio: Honour 'may_reduce_num' in vring_create_virtqueue MAiNTAINERS: add Paolo, Stefan for virtio blk/scsi virtio_pci: fix a NULL pointer reference in vp_del_vqs
2019-04-10RISC-V: Fix Maximum Physical Memory 2GiB option for 64bit systemsAnup Patel1-0/+8
The Maximum Physical Memory 2GiB option for 64bit systems is currently broken because kernel hangs at boot-time when this option is enabled and the underlying system has more than 2GiB memory. This issue can be easily reproduced on SiFive Unleashed board where we have 8GiB of memory. This patch fixes above issue by removing unusable memory region in setup_bootmem(). Signed-off-by: Anup Patel <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]>
2019-04-10ASoC: wcd9335: Fix missing regmap requirementMarc Gonzalez1-0/+1
wcd9335.c: undefined reference to 'devm_regmap_add_irq_chip' Signed-off-by: Marc Gonzalez <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2019-04-10drm/i915/dp: revert back to max link rate and lane count on eDPJani Nikula1-59/+10
Commit 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast and narrow") started to optize the eDP 1.4+ link config, both per spec and as preparation for display stream compression support. Sadly, we again face panels that flat out fail with parameters they claim to support. Revert, and go back to the drawing board. v2: Actually revert to max params instead of just wide-and-slow. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109959 Fixes: 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast and narrow") Cc: Ville Syrjälä <[email protected]> Cc: Manasi Navare <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Matt Atwood <[email protected]> Cc: "Lee, Shawn C" <[email protected]> Cc: Dave Airlie <[email protected]> Cc: [email protected] Cc: <[email protected]> # v5.0+ Reviewed-by: Rodrigo Vivi <[email protected]> Reviewed-by: Manasi Navare <[email protected]> Tested-by: Albert Astals Cid <[email protected]> # v5.0 backport Tested-by: Emanuele Panigati <[email protected]> # v5.0 backport Tested-by: Matteo Iervasi <[email protected]> # v5.0 backport Signed-off-by: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit f11cb1c19ad0563b3c1ea5eb16a6bac0e401f428) Signed-off-by: Rodrigo Vivi <[email protected]>
2019-04-10drm/i915/icl: Fix port disable sequence for mipi-dsiVandita Kulkarni2-4/+4
Re-enable clock gating of DDI clocks. v2: Fix the default ddi clk state for mipi-dsi (Imre) Fixes: 1026bea00381 ("drm/i915/icl: Ungate DSI clocks") Signed-off-by: Vandita Kulkarni <[email protected]> Reviewed-by: Uma Shankar <[email protected]> Signed-off-by: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 942d1cf48eae3fcd7e973cfb708d5c4860f0c713) Signed-off-by: Rodrigo Vivi <[email protected]>
2019-04-10drm/i915/icl: Ungate ddi clocks before IO enableVandita Kulkarni1-0/+6
IO enable sequencing needs ddi clocks enabled. These clocks will be gated at a later point in the enable sequence. v2: Fix the commit header (Uma) v3: Remove the redundant read (Ville) Fixes: 949fc52af19e ("drm/i915/icl: add pll mapping for DSI") Signed-off-by: Vandita Kulkarni <[email protected]> Reviewed-by: Uma Shankar <[email protected]> Signed-off-by: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit c5b81a325263a891d5811aabe938c87e03db4c37) Signed-off-by: Rodrigo Vivi <[email protected]>
2019-04-10nvme: cancel request synchronouslyMing Lei1-1/+1
nvme_cancel_request() is used in error handler, and it is always reliable to cancel request synchronously, and avoids possible race in which request may be completed after real hw queue is destroyed. One issue is reported by our customer on NVMe RDMA, in which freed ib queue pair may be used in nvme_rdma_complete_rq(). Cc: Sagi Grimberg <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: James Smart <[email protected]> Cc: [email protected] Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-10blk-mq: introduce blk_mq_complete_request_sync()Ming Lei2-0/+8
In NVMe's error handler, follows the typical steps of tearing down hardware for recovering controller: 1) stop blk_mq hw queues 2) stop the real hw queues 3) cancel in-flight requests via blk_mq_tagset_busy_iter(tags, cancel_request, ...) cancel_request(): mark the request as abort blk_mq_complete_request(req); 4) destroy real hw queues However, there may be race between #3 and #4, because blk_mq_complete_request() may run q->mq_ops->complete(rq) remotelly and asynchronously, and ->complete(rq) may be run after #4. This patch introduces blk_mq_complete_request_sync() for fixing the above race. Cc: Sagi Grimberg <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: James Smart <[email protected]> Cc: [email protected] Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-10s390/zcrypt: fix possible deadlock situation on ap queue removeHarald Freudenberger1-1/+1
With commit 01396a374c3d ("s390/zcrypt: revisit ap device remove procedure") the ap queue remove is now a two stage process. However, a del_timer_sync() call may trigger the timer function which may try to lock the very same spinlock as is held by the function just initiating the del_timer_sync() call. This could end up in a deadlock situation. Very unlikely but possible as you need to remove an ap queue at the exact sime time when a timeout of a request occurs. Signed-off-by: Harald Freudenberger <[email protected]> Reported-by: Pierre Morel <[email protected]> Fixes: commit 01396a374c3d ("s390/zcrypt: revisit ap device remove procedure") Signed-off-by: Martin Schwidefsky <[email protected]>
2019-04-10s390/3270: fix lockdep false positive on view->lockMartin Schwidefsky5-5/+10
The spinlock in the raw3270_view structure is used by con3270, tty3270 and fs3270 in different ways. For con3270 the lock can be acquired in irq context, for tty3270 and fs3270 the highest context is bh. Lockdep sees the view->lock as a single class and if the 3270 driver is used for the console the following message is generated: WARNING: inconsistent lock state 5.1.0-rc3-05157-g5c168033979d #12 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. swapper/0/1 [HC0[0]:SC1[1]:HE1:SE0] takes: (____ptrval____) (&(&view->lock)->rlock){?.-.}, at: tty3270_update+0x7c/0x330 Introduce a lockdep subclass for the view lock to distinguish bh from irq locks. Signed-off-by: Martin Schwidefsky <[email protected]>
2019-04-10scsi: virtio_scsi: limit number of hw queues by nr_cpu_idsDongli Zhang1-0/+1
When tag_set->nr_maps is 1, the block layer limits the number of hw queues by nr_cpu_ids. No matter how many hw queues are used by virtio-scsi, as it has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues. In addition, specifically for pci scenario, when the 'num_queues' specified by qemu is more than maxcpus, virtio-scsi would not be able to allocate more than maxcpus vectors in order to have a vector for each queue. As a result, it falls back into MSI-X with one vector for config and one shared for queues. Considering above reasons, this patch limits the number of hw queues used by virtio-scsi by nr_cpu_ids. Reviewed-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Dongli Zhang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-10virtio-blk: limit number of hw queues by nr_cpu_idsDongli Zhang1-0/+2
When tag_set->nr_maps is 1, the block layer limits the number of hw queues by nr_cpu_ids. No matter how many hw queues are used by virtio-blk, as it has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues. In addition, specifically for pci scenario, when the 'num-queues' specified by qemu is more than maxcpus, virtio-blk would not be able to allocate more than maxcpus vectors in order to have a vector for each queue. As a result, it falls back into MSI-X with one vector for config and one shared for queues. Considering above reasons, this patch limits the number of hw queues used by virtio-blk by nr_cpu_ids. Reviewed-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Dongli Zhang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-10block, bfq: fix use after free in bfq_bfqq_expirePaolo Valente3-11/+23
The function bfq_bfqq_expire() invokes the function __bfq_bfqq_expire(), and the latter may free the in-service bfq-queue. If this happens, then no other instruction of bfq_bfqq_expire() must be executed, or a use-after-free will occur. Basing on the assumption that __bfq_bfqq_expire() invokes bfq_put_queue() on the in-service bfq-queue exactly once, the queue is assumed to be freed if its refcounter is equal to one right before invoking __bfq_bfqq_expire(). But, since commit 9dee8b3b057e ("block, bfq: fix queue removal from weights tree") this assumption is false. __bfq_bfqq_expire() may also invoke bfq_weights_tree_remove() and, since commit 9dee8b3b057e ("block, bfq: fix queue removal from weights tree"), also the latter function may invoke bfq_put_queue(). So __bfq_bfqq_expire() may invoke bfq_put_queue() twice, and this is the actual case where the in-service queue may happen to be freed. To address this issue, this commit moves the check on the refcounter of the queue right around the last bfq_put_queue() that may be invoked on the queue. Fixes: 9dee8b3b057e ("block, bfq: fix queue removal from weights tree") Reported-by: Dmitrii Tcvetkov <[email protected]> Reported-by: Douglas Anderson <[email protected]> Tested-by: Dmitrii Tcvetkov <[email protected]> Tested-by: Douglas Anderson <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-10ALSA: hda: Fix racy display power accessTakashi Iwai3-2/+6
snd_hdac_display_power() doesn't handle the concurrent calls carefully enough, and it may lead to the doubly get_power or put_power calls, when a runtime PM and an async work get called in racy way. This patch addresses it by reusing the bus->lock mutex that has been used for protecting the link state change in ext bus code, so that it can protect against racy display state changes. The initialization of bus->lock was moved from snd_hdac_ext_bus_init() to snd_hdac_bus_init() as well accordingly. Testcase: igt/i915_pm_rpm/module-reload #glk-dsi Reported-by: Chris Wilson <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Cc: Imre Deak <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2019-04-10alarmtimer: Return correct remaining timeAndrei Vagin1-1/+1
To calculate a remaining time, it's required to subtract the current time from the expiration time. In alarm_timer_remaining() the arguments of ktime_sub are swapped. Fixes: d653d8457c76 ("alarmtimer: Implement remaining callback") Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-04-10locking/lockdep: Zap lock classes even with lock debugging disabledBart Van Assche1-17/+12
The following commit: a0b0fd53e1e6 ("locking/lockdep: Free lock classes that are no longer in use") changed the behavior of lockdep_free_key_range() from unconditionally zapping lock classes into only zapping lock classes if debug_lock == true. Not zapping lock classes if debug_lock == false leaves dangling pointers in several lockdep datastructures, e.g. lock_class::name in the all_lock_classes list. The shell command "cat /proc/lockdep" causes the kernel to iterate the all_lock_classes list. Hence the "unable to handle kernel paging request" cash that Shenghui encountered by running cat /proc/lockdep. Since the new behavior can cause cat /proc/lockdep to crash, restore the pre-v5.1 behavior. This patch avoids that cat /proc/lockdep triggers the following crash with debug_lock == false: BUG: unable to handle kernel paging request at fffffbfff40ca448 RIP: 0010:__asan_load1+0x28/0x50 Call Trace: string+0xac/0x180 vsnprintf+0x23e/0x820 seq_vprintf+0x82/0xc0 seq_printf+0x92/0xb0 print_name+0x34/0xb0 l_show+0x184/0x200 seq_read+0x59e/0x6c0 proc_reg_read+0x11f/0x170 __vfs_read+0x4d/0x90 vfs_read+0xc5/0x1f0 ksys_read+0xab/0x130 __x64_sys_read+0x43/0x50 do_syscall_64+0x71/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: shenghui <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Fixes: a0b0fd53e1e6 ("locking/lockdep: Free lock classes that are no longer in use") # v5.1-rc1. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>