aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds50-178/+356
Pull networking fixes from David Miller: 1) Validate tunnel options length in act_tunnel_key, from Xin Long. 2) Fix DMA sync bug in gve driver, from Adi Suresh. 3) TSO kills performance on some r8169 chips due to HW issues, disable by default in that case, from Corinna Vinschen. 4) Fix clock disable mismatch in fec driver, from Chubong Yuan. 5) Fix interrupt status bits define in hns3 driver, from Huazhong Tan. 6) Fix workqueue deadlocks in qeth driver, from Julian Wiedmann. 7) Don't napi_disable() twice in r8152 driver, from Hayes Wang. 8) Fix SKB extension memory leak, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits) r8152: avoid to call napi_disable twice MAINTAINERS: Add myself as maintainer of virtio-vsock udp: drop skb extensions before marking skb stateless net: rtnetlink: prevent underflows in do_setvfinfo() can: m_can_platform: remove unnecessary m_can_class_resume() call can: m_can_platform: set net_device structure as driver data hv_netvsc: Fix send_table offset in case of a host bug hv_netvsc: Fix offset usage in netvsc_send_table() net-ipv6: IPV6_TRANSPARENT - check NET_RAW prior to NET_ADMIN sfc: Only cancel the PPS workqueue if it exists nfc: port100: handle command failure cleanly net-sysfs: fix netdev_queue_add_kobject() breakage r8152: Re-order napi_disable in rtl8152_close net: qca_spi: Move reset_count to struct qcaspi net: qca_spi: fix receive buffer size check net/ibmvnic: Ignore H_FUNCTION return from H_EOI to tolerate XIVE mode Revert "net/ibmvnic: Fix EOI when running in XIVE mode" net/mlxfw: Verify FSM error code translation doesn't exceed array size net/mlx5: Update the list of the PCI supported devices net/mlx5: Fix auto group size calculation ...
2019-11-22afs: Fix large file supportMarc Dionne1-0/+1
By default s_maxbytes is set to MAX_NON_LFS, which limits the usable file size to 2GB, enforced by the vfs. Commit b9b1f8d5930a ("AFS: write support fixes") added support for the 64-bit fetch and store server operations, but did not change this value. As a result, attempts to write past the 2G mark result in EFBIG errors: $ dd if=/dev/zero of=foo bs=1M count=1 seek=2048 dd: error writing 'foo': File too large Set s_maxbytes to MAX_LFS_FILESIZE. Fixes: b9b1f8d5930a ("AFS: write support fixes") Signed-off-by: Marc Dionne <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-22afs: Fix possible assert with callbacks from yfs serversMarc Dionne1-1/+0
Servers sending callback breaks to the YFS_CM_SERVICE service may send up to YFSCBMAX (1024) fids in a single RPC. Anything over AFSCBMAX (50) will cause the assert in afs_break_callbacks to trigger. Remove the assert, as the count has already been checked against the appropriate max values in afs_deliver_cb_callback and afs_deliver_yfs_cb_callback. Fixes: 35dbfba3111a ("afs: Implement the YFS cache manager service") Signed-off-by: Marc Dionne <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-22MIPS: Ingenic: Disable abandoned HPTLB function.Zhou Yanjie2-2/+25
JZ4760/JZ4770/JZ4775/X1000/X1500 has an abandoned huge page tlb, this mode is not compatible with the MIPS standard, it will cause tlbmiss and into an infinite loop (line 21 in the tlb-funcs.S) when starting the init process. write 0xa9000000 to cp0 register 5 sel 4 to disable this function to prevent getting stuck. Confirmed by Ingenic, this operation will not adversely affect processors without HPTLB function. Signed-off-by: Zhou Yanjie <[email protected]> Acked-by: Paul Cercueil <[email protected]> Signed-off-by: Paul Burton <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected]
2019-11-22MIPS: PCI: remember nasid changed by set interrupt affinityThomas Bogendoerfer1-3/+2
When changing interrupt affinity remember the possible changed nasid, otherwise an interrupt deactivate/activate sequence will incorrectly setup interrupt. Fixes: e6308b6d35ea ("MIPS: SGI-IP27: abstract chipset irq from bridge") Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: James Hogan <[email protected]> Cc: [email protected] Cc: [email protected]
2019-11-22MIPS: SGI-IP27: Fix crash, when CPUs are disabled via nr_cpus parameterThomas Bogendoerfer1-0/+4
If number of CPUs are limited by the kernel commandline parameter nr_cpus assignment of interrupts accourding to numa rules might not be possibe. As a fallback use one of the online CPUs as interrupt destination. Fixes: 69a07a41d908 ("MIPS: SGI-IP27: rework HUB interrupts") Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: James Hogan <[email protected]> Cc: [email protected] Cc: [email protected]
2019-11-22mips: add support for folded p4d page tablesMike Rapoport14-39/+75
Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate, replace 5leve-fixup.h with pgtable-nop4d.h and drop usage of __ARCH_USE_5LEVEL_HACK. Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: James Hogan <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Mike Rapoport <[email protected]>
2019-11-22mips: drop __pXd_offset() macros that duplicate pXd_index() onesMike Rapoport6-25/+18
The __pXd_offset() macros are identical to the pXd_index() macros and there is no point to keep both of them. All architectures define and use pXd_index() so let's keep only those to make mips consistent with the rest of the kernel. Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: James Hogan <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Mike Rapoport <[email protected]>
2019-11-22mips: fix build when "48 bits virtual memory" is enabledMike Rapoport1-2/+7
With CONFIG_MIPS_VA_BITS_48=y the build fails miserably: CC arch/mips/kernel/asm-offsets.s In file included from arch/mips/include/asm/pgtable.h:644, from include/linux/mm.h:99, from arch/mips/kernel/asm-offsets.c:15: include/asm-generic/pgtable.h:16:2: error: #error CONFIG_PGTABLE_LEVELS is not consistent with __PAGETABLE_{P4D,PUD,PMD}_FOLDED #error CONFIG_PGTABLE_LEVELS is not consistent with __PAGETABLE_{P4D,PUD,PMD}_FOLDED ^~~~~ include/asm-generic/pgtable.h:390:28: error: unknown type name 'p4d_t'; did you mean 'pmd_t'? static inline int p4d_same(p4d_t p4d_a, p4d_t p4d_b) ^~~~~ pmd_t [ ... more such errors ... ] scripts/Makefile.build:99: recipe for target 'arch/mips/kernel/asm-offsets.s' failed make[2]: *** [arch/mips/kernel/asm-offsets.s] Error 1 This happens because when CONFIG_MIPS_VA_BITS_48 enables 4th level of the page tables, but neither pgtable-nop4d.h nor 5level-fixup.h are included to cope with the 5th level. Replace #ifdef conditions around includes of the pgtable-nop{m,u}d.h with explicit CONFIG_PGTABLE_LEVELS and add include of 5level-fixup.h for the case when CONFIG_PGTABLE_LEVELS==4 Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: James Hogan <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Mike Rapoport <[email protected]>
2019-11-22r8152: avoid to call napi_disable twiceHayes Wang1-8/+20
Call napi_disable() twice would cause dead lock. There are three situations may result in the issue. 1. rtl8152_pre_reset() and set_carrier() are run at the same time. 2. Call rtl8152_set_tunable() after rtl8152_close(). 3. Call rtl8152_set_ringparam() after rtl8152_close(). For #1, use the same solution as commit 84811412464d ("r8152: Re-order napi_disable in rtl8152_close"). For #2 and #3, add checking the flag of IFF_UP and using napi_disable/napi_enable during mutex. Signed-off-by: Hayes Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-33/+53
Merge misc fixes from Andrew Morton: "Three fixes" * emailed patches from Andrew Morton <[email protected]>: mm/ksm.c: don't WARN if page is still mapped in remove_stable_node() mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span() Revert "fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()"
2019-11-22Merge tag 'linux-can-fixes-for-5.4-20191122' of ↵David S. Miller1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2019-11-22 this is a pull request of 2 patches for net/master, if possible for the current release cycle. Otherwise these patches should hit v5.4 via the stable tree. Both patches of this pull request target the m_can driver. Pankaj Sharma fixes the fallout in the m_can_platform part, which appeared with the introduction of the m_can platform framework. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-22MAINTAINERS: Add myself as maintainer of virtio-vsockStefano Garzarella1-0/+1
Since I'm actively working on vsock and virtio/vhost transports, Stefan suggested to help him to maintain it. Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-22udp: drop skb extensions before marking skb statelessFlorian Westphal2-5/+28
Once udp stack has set the UDP_SKB_IS_STATELESS flag, later skb free assumes all skb head state has been dropped already. This will leak the extension memory in case the skb has extensions other than the ipsec secpath, e.g. bridge nf data. To fix this, set the UDP_SKB_IS_STATELESS flag only if we don't have extensions or if the extension space can be free'd. Fixes: 895b5c9f206eb7d25dc1360a ("netfilter: drop bridge nf reset from nf_reset") Cc: Paolo Abeni <[email protected]> Reported-by: Byron Stanoszek <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-22net: rtnetlink: prevent underflows in do_setvfinfo()Dan Carpenter1-1/+22
The "ivm->vf" variable is a u32, but the problem is that a number of drivers cast it to an int and then forget to check for negatives. An example of this is in the cxgb4 driver. drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 2890 static int cxgb4_mgmt_get_vf_config(struct net_device *dev, 2891 int vf, struct ifla_vf_info *ivi) ^^^^^^ 2892 { 2893 struct port_info *pi = netdev_priv(dev); 2894 struct adapter *adap = pi->adapter; 2895 struct vf_info *vfinfo; 2896 2897 if (vf >= adap->num_vfs) ^^^^^^^^^^^^^^^^^^^ 2898 return -EINVAL; 2899 vfinfo = &adap->vfinfo[vf]; ^^^^^^^^^^^^^^^^^^^^^^^^^^ There are 48 functions affected. drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:8435 hclge_set_vf_vlan_filter() warn: can 'vfid' underflow 's32min-2147483646' drivers/net/ethernet/freescale/enetc/enetc_pf.c:377 enetc_pf_set_vf_mac() warn: can 'vf' underflow 's32min-2147483646' drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:2899 cxgb4_mgmt_get_vf_config() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:2960 cxgb4_mgmt_set_vf_rate() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:3019 cxgb4_mgmt_set_vf_rate() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:3038 cxgb4_mgmt_set_vf_vlan() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:3086 cxgb4_mgmt_set_vf_link_state() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/chelsio/cxgb/cxgb2.c:791 get_eeprom() warn: can 'i' underflow 's32min-(-4),0,4-s32max' drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:82 bnxt_set_vf_spoofchk() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:164 bnxt_set_vf_trust() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:186 bnxt_get_vf_config() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:228 bnxt_set_vf_mac() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:264 bnxt_set_vf_vlan() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:293 bnxt_set_vf_bw() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:333 bnxt_set_vf_link_state() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:2595 bnx2x_vf_op_prep() warn: can 'vfidx' underflow 's32min-63' drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:2595 bnx2x_vf_op_prep() warn: can 'vfidx' underflow 's32min-63' drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2281 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63' drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2285 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63' drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2286 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63' drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2292 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63' drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2297 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63' drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:1832 qlcnic_sriov_set_vf_mac() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:1864 qlcnic_sriov_set_vf_tx_rate() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:1937 qlcnic_sriov_set_vf_vlan() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:2005 qlcnic_sriov_get_vf_config() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:2036 qlcnic_sriov_set_vf_spoofchk() warn: can 'vf' underflow 's32min-254' drivers/net/ethernet/emulex/benet/be_main.c:1914 be_get_vf_config() warn: can 'vf' underflow 's32min-65534' drivers/net/ethernet/emulex/benet/be_main.c:1915 be_get_vf_config() warn: can 'vf' underflow 's32min-65534' drivers/net/ethernet/emulex/benet/be_main.c:1922 be_set_vf_tvt() warn: can 'vf' underflow 's32min-65534' drivers/net/ethernet/emulex/benet/be_main.c:1951 be_clear_vf_tvt() warn: can 'vf' underflow 's32min-65534' drivers/net/ethernet/emulex/benet/be_main.c:2063 be_set_vf_tx_rate() warn: can 'vf' underflow 's32min-65534' drivers/net/ethernet/emulex/benet/be_main.c:2091 be_set_vf_link_state() warn: can 'vf' underflow 's32min-65534' drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:2609 ice_set_vf_port_vlan() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3050 ice_get_vf_cfg() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3103 ice_set_vf_spoofchk() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3181 ice_set_vf_mac() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3237 ice_set_vf_trust() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3286 ice_set_vf_link_state() warn: can 'vf_id' underflow 's32min-65534' drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:3919 i40e_validate_vf() warn: can 'vf_id' underflow 's32min-2147483646' drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:3957 i40e_ndo_set_vf_mac() warn: can 'vf_id' underflow 's32min-2147483646' drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4104 i40e_ndo_set_vf_port_vlan() warn: can 'vf_id' underflow 's32min-2147483646' drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4263 i40e_ndo_set_vf_bw() warn: can 'vf_id' underflow 's32min-2147483646' drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4309 i40e_ndo_get_vf_config() warn: can 'vf_id' underflow 's32min-2147483646' drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4371 i40e_ndo_set_vf_link_state() warn: can 'vf_id' underflow 's32min-2147483646' drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4441 i40e_ndo_set_vf_spoofchk() warn: can 'vf_id' underflow 's32min-2147483646' drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4441 i40e_ndo_set_vf_spoofchk() warn: can 'vf_id' underflow 's32min-2147483646' drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4504 i40e_ndo_set_vf_trust() warn: can 'vf_id' underflow 's32min-2147483646' Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-22Merge tag 'pm-5.4-final' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management regression fix from Rafael Wysocki: "Fix problems with switching cpufreq drivers on some x86 systems with ACPI (and with changing the operation modes of the intel_pstate driver on those systems) introduced by recent changes related to the management of frequency limits in cpufreq" * tag 'pm-5.4-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: QoS: Invalidate frequency QoS requests after removal
2019-11-22Merge tag 'drm-fixes-2019-11-22' of git://anongit.freedesktop.org/drm/drmLinus Torvalds16-32/+183
Pull drm fixes from Dave Airlie: "Two sets of fixes in here, one for amdgpu, and one for i915. The amdgpu ones are pretty small, i915's CI system seems to have a few problems in the last week or so, there is one major regression fix for fb_mmap, but there are a bunch of other issues fixed in there as well, oops, screen flashes and rcu related. amdgpu: - Remove experimental flag for navi14 - Fix confusing power message failures on older VI parts - Hang fix for gfxoff when using the read register interface - Two stability regression fixes for Raven i915: - Fix kernel oops on dumb_create ioctl on no crtc situation - Fix bad ugly colored flash on VLV/CHV related to gamma LUT update - Fix unity of the frequencies reported on PMU - Fix kernel oops on set_page_dirty using better locks around it - Protect the request pointer with RCU to prevent it being freed while we might need still - Make pool objects read-only - Restore physical addresses for fb_map to avoid corrupted page table" * tag 'drm-fixes-2019-11-22' of git://anongit.freedesktop.org/drm/drm: drm/i915/fbdev: Restore physical addresses for fb_mmap() Revert "drm/amd/display: enable S/G for RAVEN chip" drm/amdgpu: disable gfxoff on original raven drm/amdgpu: disable gfxoff when using register read interface drm/amd/powerplay: correct fine grained dpm force level setting drm/amd/powerplay: issue no PPSMC_MSG_GetCurrPkgPwr on unsupported ASICs drm/amdgpu: remove experimental flag for Navi14 drm/i915: make pool objects read-only drm/i915: Protect request peeking with RCU drm/i915/userptr: Try to acquire the page lock around set_page_dirty() drm/i915/pmu: "Frequency" is reported as accumulated cycles drm/i915: Preload LUTs if the hw isn't currently using them drm/i915: Don't oops in dumb_create ioctl if we have no crtcs
2019-11-22mm/ksm.c: don't WARN if page is still mapped in remove_stable_node()Andrey Ryabinin1-7/+7
It's possible to hit the WARN_ON_ONCE(page_mapped(page)) in remove_stable_node() when it races with __mmput() and squeezes in between ksm_exit() and exit_mmap(). WARNING: CPU: 0 PID: 3295 at mm/ksm.c:888 remove_stable_node+0x10c/0x150 Call Trace: remove_all_stable_nodes+0x12b/0x330 run_store+0x4ef/0x7b0 kernfs_fop_write+0x200/0x420 vfs_write+0x154/0x450 ksys_write+0xf9/0x1d0 do_syscall_64+0x99/0x510 entry_SYSCALL_64_after_hwframe+0x49/0xbe Remove the warning as there is nothing scary going on. Link: http://lkml.kernel.org/r/[email protected] Fixes: cbf86cfe04a6 ("ksm: remove old stable nodes more thoroughly") Signed-off-by: Andrey Ryabinin <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-22mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span()David Hildenbrand1-3/+13
Let's limit shrinking to !ZONE_DEVICE so we can fix the current code. We should never try to touch the memmap of offline sections where we could have uninitialized memmaps and could trigger BUGs when calling page_to_nid() on poisoned pages. There is no reliable way to distinguish an uninitialized memmap from an initialized memmap that belongs to ZONE_DEVICE, as we don't have anything like SECTION_IS_ONLINE we can use similar to pfn_to_online_section() for !ZONE_DEVICE memory. E.g., set_zone_contiguous() similarly relies on pfn_to_online_section() and will therefore never set a ZONE_DEVICE zone consecutive. Stopping to shrink the ZONE_DEVICE therefore results in no observable changes, besides /proc/zoneinfo indicating different boundaries - something we can totally live with. Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug"), the memmap was initialized with 0 and the node with the right value. So the zone might be wrong but not garbage. After that commit, both the zone and the node will be garbage when touching uninitialized memmaps. Toshiki reported a BUG (race between delayed initialization of ZONE_DEVICE memmaps without holding the memory hotplug lock and concurrent zone shrinking). https://lkml.org/lkml/2019/11/14/1040 "Iteration of create and destroy namespace causes the panic as below: kernel BUG at mm/page_alloc.c:535! CPU: 7 PID: 2766 Comm: ndctl Not tainted 5.4.0-rc4 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:set_pfnblock_flags_mask+0x95/0xf0 Call Trace: memmap_init_zone_device+0x165/0x17c memremap_pages+0x4c1/0x540 devm_memremap_pages+0x1d/0x60 pmem_attach_disk+0x16b/0x600 [nd_pmem] nvdimm_bus_probe+0x69/0x1c0 really_probe+0x1c2/0x3e0 driver_probe_device+0xb4/0x100 device_driver_attach+0x4f/0x60 bind_store+0xc9/0x110 kernfs_fop_write+0x116/0x190 vfs_write+0xa5/0x1a0 ksys_write+0x59/0xd0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 While creating a namespace and initializing memmap, if you destroy the namespace and shrink the zone, it will initialize the memmap outside the zone and trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page) in set_pfnblock_flags_mask()." This BUG is also mitigated by this commit, where we for now stop to shrink the ZONE_DEVICE zone until we can do it in a safe and clean way. Link: http://lkml.kernel.org/r/[email protected] Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <[email protected]> Reported-by: Aneesh Kumar K.V <[email protected]> Reported-by: Toshiki Fukasawa <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Damian Tometzki <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Halil Pasic <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jun Yao <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qian Cai <[email protected]> Cc: Rich Felker <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Steve Capper <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Cc: Wei Yang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Yu Zhao <[email protected]> Cc: <[email protected]> [4.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-22Revert "fs: ocfs2: fix possible null-pointer dereferences in ↵Joseph Qi1-23/+33
ocfs2_xa_prepare_entry()" This reverts commit 56e94ea132bb5c2c1d0b60a6aeb34dcb7d71a53d. Commit 56e94ea132bb ("fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()") introduces a regression that fail to create directory with mount option user_xattr and acl. Actually the reported NULL pointer dereference case can be correctly handled by loc->xl_ops->xlo_add_entry(), so revert it. Link: http://lkml.kernel.org/r/[email protected] Fixes: 56e94ea132bb ("fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()") Signed-off-by: Joseph Qi <[email protected]> Reported-by: Thomas Voegtle <[email protected]> Acked-by: Changwei Ge <[email protected]> Cc: Jia-Ju Bai <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-22can: m_can_platform: remove unnecessary m_can_class_resume() callPankaj Sharma1-2/+0
The function m_can_runtime_resume() is getting recursively called from m_can_class_resume(). This results in a lock up. We need not call m_can_class_resume() during m_can_runtime_resume(). Fixes: f524f829b75a ("can: m_can: Create a m_can platform framework") Signed-off-by: Pankaj Sharma <[email protected]> Signed-off-by: Sriram Dash <[email protected]> Acked-by: Dan Murphy <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2019-11-22can: m_can_platform: set net_device structure as driver dataPankaj Sharma1-1/+1
The current code is failing during clock prepare enable because of not getting proper clock from platform device. [ 0.852089] Call trace: [ 0.854516] 0xffff0000fa22a668 [ 0.857638] clk_prepare+0x20/0x34 [ 0.861019] m_can_runtime_resume+0x2c/0xe4 [ 0.865180] pm_generic_runtime_resume+0x28/0x38 [ 0.869770] __rpm_callback+0x16c/0x1bc [ 0.873583] rpm_callback+0x24/0x78 [ 0.877050] rpm_resume+0x428/0x560 [ 0.880517] __pm_runtime_resume+0x7c/0xa8 [ 0.884593] m_can_clk_start.isra.9.part.10+0x1c/0xa8 [ 0.889618] m_can_class_register+0x138/0x370 [ 0.893950] m_can_plat_probe+0x120/0x170 [ 0.897939] platform_drv_probe+0x4c/0xa0 [ 0.901924] really_probe+0xd8/0x31c [ 0.905477] driver_probe_device+0x58/0xe8 [ 0.909551] device_driver_attach+0x68/0x70 [ 0.913711] __driver_attach+0x9c/0xf8 [ 0.917437] bus_for_each_dev+0x50/0xa0 [ 0.921251] driver_attach+0x20/0x28 [ 0.924804] bus_add_driver+0x148/0x1fc [ 0.928617] driver_register+0x6c/0x124 [ 0.932431] __platform_driver_register+0x48/0x50 [ 0.937113] m_can_plat_driver_init+0x18/0x20 [ 0.941446] do_one_initcall+0x4c/0x19c [ 0.945259] kernel_init_freeable+0x1d0/0x280 [ 0.949591] kernel_init+0x10/0x100 [ 0.953057] ret_from_fork+0x10/0x18 [ 0.956614] Code: 00000000 00000000 00000000 00000000 (fa22a668) [ 0.962681] ---[ end trace 881f71bd609de763 ]--- [ 0.967301] Kernel panic - not syncing: Attempted to kill init! A device driver for CAN controller hardware registers itself with the Linux network layer as a network device. So, the driver data for m_can should ideally be of type net_device. Fixes: f524f829b75a ("can: m_can: Create a m_can platform framework") Signed-off-by: Pankaj Sharma <[email protected]> Signed-off-by: Sriram Dash <[email protected]> Acked-by: Dan Murphy <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2019-11-22EDAC/altera: Use the Altera System Manager driverThor Thayer1-124/+8
Simplify by using the Altera System Manager driver that abstracts the differences between ARM32 and ARM64. Also allows the removal of the Arria10 test function since this is handled by the System Manager driver. Signed-off-by: Thor Thayer <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: James Morse <[email protected]> Cc: linux-edac <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: [email protected] Cc: Robert Richter <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-22EDAC/altera: Cleanup the ECC ManagerThor Thayer1-20/+1
Cleanup the ECC Manager peripheral test in probe function as suggested by James. Remove the check for Stratix10. Suggested-by: James Morse <[email protected]> Signed-off-by: Thor Thayer <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: linux-edac <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Robert Richter <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-22EDAC/altera: Use fast register IO for S10 IRQsMeng Li1-0/+1
When an IRQ occurs, regmap_{read,write,...}() is invoked in atomic context. Regmap must indicate register IO is fast so that a spinlock is used instead of a mutex to avoid sleeping in atomic context: lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_write a10_eccmgr_irq_unmask unmask_irq.part.0 irq_enable __irq_startup irq_startup __setup_irq request_threaded_irq devm_request_threaded_irq altr_sdram_probe Mark it so. [ bp: Massage. ] Fixes: 3dab6bd52687 ("EDAC, altera: Add support for Stratix10 SDRAM EDAC") Reported-by: Meng Li <[email protected]> Signed-off-by: Meng Li <[email protected]> Signed-off-by: Thor Thayer <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: James Morse <[email protected]> Cc: linux-edac <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Robert Richter <[email protected]> Cc: stable <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-22EDAC/ghes: Do not warn when incrementing refcount on 0Robert Richter1-2/+2
The following warning from the refcount framework is seen during ghes initialization: EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) ------------[ cut here ]------------ refcount_t: increment on 0; use-after-free. WARNING: CPU: 36 PID: 1 at lib/refcount.c:156 refcount_inc_checked [...] Call trace: refcount_inc_checked ghes_edac_register ghes_probe ... It warns if the refcount is incremented from zero. This warning is reasonable as a kernel object is typically created with a refcount of one and freed once the refcount is zero. Afterwards the object would be "used-after-free". For GHES, the refcount is initialized with zero, and that is why this message is seen when initializing the first instance. However, whenever the refcount is zero, the device will be allocated and registered. Since the ghes_reg_mutex protects the refcount and serializes allocation and freeing of ghes devices, a use-after-free cannot happen here. Instead of using refcount_inc() for the first instance, use refcount_set(). This can be used here because the refcount is zero at this point and can not change due to its protection by the mutex. Fixes: 23f61b9fc5cc ("EDAC/ghes: Fix locking and memory barrier issues") Reported-by: John Garry <[email protected]> Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Tested-by: John Garry <[email protected]> Cc: <[email protected]> Cc: James Morse <[email protected]> Cc: <[email protected]> Cc: linux-edac <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: <[email protected]> Cc: Tony Luck <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-21Merge branch 'hv_netvsc-Fix-send-indirection-table-offset'David S. Miller2-9/+32
Haiyang Zhang says: ==================== hv_netvsc: Fix send indirection table offset Fix send indirection table offset issues related to guest and host bugs. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-21hv_netvsc: Fix send_table offset in case of a host bugHaiyang Zhang1-2/+14
If negotiated NVSP version <= NVSP_PROTOCOL_VERSION_6, the offset may be wrong (too small) due to a host bug. This can cause missing the end of the send indirection table, and add multiple zero entries from leading zeros before the data region. This bug adds extra burden on channel 0. So fix the offset by computing it from the data structure sizes. This will ensure netvsc driver runs normally on unfixed hosts, and future fixed hosts. Fixes: 5b54dac856cb ("hyperv: Add support for virtual Receive Side Scaling (vRSS)") Signed-off-by: Haiyang Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-21hv_netvsc: Fix offset usage in netvsc_send_table()Haiyang Zhang2-9/+20
To reach the data region, the existing code adds offset in struct nvsp_5_send_indirect_table on the beginning of this struct. But the offset should be based on the beginning of its container, struct nvsp_message. This bug causes the first table entry missing, and adds an extra zero from the zero pad after the data region. This can put extra burden on the channel 0. So, correct the offset usage. Also add a boundary check to ensure not reading beyond data region. Fixes: 5b54dac856cb ("hyperv: Add support for virtual Receive Side Scaling (vRSS)") Signed-off-by: Haiyang Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-21net-ipv6: IPV6_TRANSPARENT - check NET_RAW prior to NET_ADMINMaciej Żenczykowski1-2/+2
NET_RAW is less dangerous, so more likely to be available to a process, so check it first to prevent some spurious logging. This matches IP_TRANSPARENT which checks NET_RAW first. Signed-off-by: Maciej Żenczykowski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-22Merge tag 'drm-intel-fixes-2019-11-21' of ↵Dave Airlie9-18/+141
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix kernel oops on dumb_create ioctl on no crtc situation - Fix bad ugly colored flash on VLV/CHV related to gamma LUT update - Fix unity of the frequencies reported on PMU - Fix kernel oops on set_page_dirty using better locks around it - Protect the request pointer with RCU to prevent it being freed while we might need still - Make pool objects read-only - Restore physical addresses for fb_map to avoid corrupted page table Signed-off-by: Dave Airlie <[email protected]> From: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-11-21Merge tag 'arm64-fixes' of ↵Linus Torvalds7-31/+27
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "Ensure PAN is re-enabled following user fault in uaccess routines. After I thought we were done for 5.4, we had a report this week of a nasty issue that has been shown to leak data between different user address spaces thanks to corruption of entries in the TLB. In hindsight, we should have spotted this in review when the PAN code was merged back in v4.3, but hindsight is 20/20 and I'm trying not to beat myself up too much about it despite being fairly miserable. Anyway, the fix is "obvious" but the actual failure is more more subtle, and is described in the commit message. I've included a fairly mechanical follow-up patch here as well, which moves this checking out into the C wrappers which is what we do for {get,put}_user() already and allows us to remove these bloody assembly macros entirely. The patches have passed kernelci [1] [2] [3] and CKI [4] tests over night, as well as some targetted testing [5] for this particular issue. The first patch is tagged for stable and should be applied to 4.14, 4.19 and 5.3. I have separate backports for 4.4 and 4.9, which I'll send out once this has landed in your tree (although the original patch applies cleanly, it won't build for those two trees). Thanks to Pavel Tatashin for reporting this and Mark Rutland for helping to diagnose the issue and review/test the solution" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: uaccess: Remove uaccess_*_not_uao asm macros arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault
2019-11-21sfc: Only cancel the PPS workqueue if it existsMartin Habets1-1/+2
The workqueue only exists for the primary PF. For other functions we hit a WARN_ON in kernel/workqueue.c. Fixes: 7c236c43b838 ("sfc: Add support for IEEE-1588 PTP") Signed-off-by: Martin Habets <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-21Merge tag 'for-linus-20191121' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull block fix from Jens Axboe: "Just a single fix for an issue in nbd introduced in this cycle" * tag 'for-linus-20191121' of git://git.kernel.dk/linux-block: nbd:fix memory leak in nbd_get_socket()
2019-11-21Merge tag 'gpio-v5.4-5' of ↵Linus Torvalds5-9/+31
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "A last set of small fixes for GPIO, this cycle was quite busy. - Fix debounce delays on the MAX77620 GPIO expander - Use the correct unit for debounce times on the BD70528 GPIO expander - Get proper deps for parallel builds of the GPIO tools - Add a specific ACPI quirk for the Terra Pad 1061" * tag 'gpio-v5.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpiolib: acpi: Add Terra Pad 1061 to the run_edge_events_on_boot_blacklist tools: gpio: Correctly add make dependencies for gpio_utils gpio: bd70528: Use correct unit for debounce times gpio: max77620: Fixup debounce delays
2019-11-21Merge tag 'for-linus-2019-11-21' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd fixlet from Christian Brauner: "This contains a simple fix for the pidfd poll method. In the original patchset pidfd_poll() was made to return an unsigned int. However, the poll method is defined to return a __poll_t. While the unsigned int is not a huge deal it's just nicer to return a __poll_t. I've decided to send it right before the 5.4 release mainly so that stable doesn't need to backport it to both 5.4 and 5.3" * tag 'for-linus-2019-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: fork: fix pidfd_poll()'s return type
2019-11-21nfc: port100: handle command failure cleanlyOliver Neukum1-1/+1
If starting the transfer of a command suceeds but the transfer for the reply fails, it is not enough to initiate killing the transfer for the command may still be running. You need to wait for the killing to finish before you can reuse URB and buffer. Reported-and-tested-by: [email protected] Signed-off-by: Oliver Neukum <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-21nbd: prevent memory leakNavid Emamdoost1-2/+3
In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the reallocted memory is leak. The correct behaviour should be assigning the reallocted memory to config->socks right after success. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Navid Emamdoost <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-11-21Merge branch 'nvme-5.5' of git://git.infradead.org/nvme into ↵Jens Axboe8-4/+307
for-5.5/drivers-post Pull NVMe changes from Keith: "- The only new feature is the optional hwmon support for nvme (Guenter and Akinobu) - A universal work-around for controllers reading discard payloads beyond the range boundary (Eduard) - Chaitanya graciously agreed to share the target driver maintenance" * 'nvme-5.5' of git://git.infradead.org/nvme: nvme: hwmon: add quirk to avoid changing temperature threshold nvme: hwmon: provide temperature min and max values for each sensor nvmet: add another maintainer nvme: Discard workaround for non-conformant devices nvme: Add hardware monitoring support
2019-11-22nvme: hwmon: add quirk to avoid changing temperature thresholdAkinobu Mita3-2/+12
This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing the value of the temperature threshold feature for specific devices that show undesirable behavior. Guenter reported: "On my Intel NVME drive (SSDPEKKW512G7), writing any minimum limit on the Composite temperature sensor results in a temperature warning, and that warning is sticky until I reset the controller. It doesn't seem to matter which temperature I write; writing -273000 has the same result." The Intel NVMe has the latest firmware version installed, so this isn't a problem that was ever fixed. Reported-by: Guenter Roeck <[email protected]> Cc: Keith Busch <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Sagi Grimberg <[email protected]> Cc: Jean Delvare <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Akinobu Mita <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2019-11-22nvme: hwmon: provide temperature min and max values for each sensorAkinobu Mita2-16/+96
According to the NVMe specification, the over temperature threshold and under temperature threshold features shall be implemented for Composite Temperature if a non-zero WCTEMP field value is reported in the Identify Controller data structure. The features are also implemented for all implemented temperature sensors (i.e., all Temperature Sensor fields that report a non-zero value). This provides the over temperature threshold and under temperature threshold for each sensor as temperature min and max values of hwmon sysfs attributes. The WCTEMP is already provided as a temperature max value for Composite Temperature, but this change isn't incompatible. Because the default value of the over temperature threshold for Composite Temperature is the WCTEMP. Now the alarm attribute for Composite Temperature indicates one of the temperature is outside of a temperature threshold. Because there is only a single bit in Critical Warning field that indicates a temperature is outside of a threshold. Example output from the "sensors" command: nvme-pci-0100 Adapter: PCI adapter Composite: +33.9°C (low = -273.1°C, high = +69.8°C) (crit = +79.8°C) Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C) This also adds helper macros for kelvin from/to milli Celsius conversion, and replaces the repeated code in hwmon.c. Cc: Keith Busch <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Sagi Grimberg <[email protected]> Cc: Jean Delvare <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Akinobu Mita <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2019-11-22nvmet: add another maintainerChristoph Hellwig1-0/+1
Sagi and I have been pretty busy lately, and Chaitanya has been helping a lot with target work and agreed to share the load. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2019-11-21Revert "block: split bio if the only bvec's length is > SZ_4K"Jens Axboe1-1/+1
We really don't need this, as the slow path will do the right thing anyway. This reverts commit 6952a7f8446ee85ea9d10ab87b64797a031eaae3. Signed-off-by: Jens Axboe <[email protected]>
2019-11-21block: add iostat counters for flush requestsKonstantin Khlebnikov8-7/+58
Requests that triggers flushing volatile writeback cache to disk (barriers) have significant effect to overall performance. Block layer has sophisticated engine for combining several flush requests into one. But there is no statistics for actual flushes executed by disk. Requests which trigger flushes usually are barriers - zero-size writes. This patch adds two iostat counters into /sys/class/block/$dev/stat and /proc/diskstats - count of completed flush requests and their total time. Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-11-21KVM: x86: create mmu/ subdirectoryPaolo Bonzini4-2/+2
Preparatory work for shattering mmu.c into multiple files. Besides making it easier to follow, this will also make it possible to write unit tests for various parts. Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-21KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use ↵Liran Alon1-7/+0
apic-access-page According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use of the INVVPID/INVEPT Instruction, the hypervisor needs to execute INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and either: "Virtualize APIC accesses" VM-execution control was changed from 0 to 1, OR the value of apic_access_page was changed. In the nested case, the burden falls on L1, unless L0 enables EPT in vmcs02 but L1 enables neither EPT nor VPID in vmcs12. For this reason prepare_vmcs02() and load_vmcs12_host_state() have special code to request a TLB flush in case L1 does not use EPT but it uses "virtualize APIC accesses". This special case however is not necessary. On a nested vmentry the physical TLB will already be flushed except if all the following apply: * L0 uses VPID * L1 uses VPID * L0 can guarantee TLB entries populated while running L1 are tagged differently than TLB entries populated while running L2. If the first condition is false, the processor will flush the TLB on vmentry to L2. If the second or third condition are false, prepare_vmcs02() will request KVM_REQ_TLB_FLUSH. However, even if both are true, no extra TLB flush is needed to handle the APIC access page: * if L1 doesn't use VPID, the second condition doesn't hold and the TLB will be flushed anyway. * if L1 uses VPID, it has to flush the TLB itself with INVVPID and section 28.3.3.3 doesn't apply to L0. * even INVEPT is not needed because, if L0 uses EPT, it uses different EPTP when running L2 than L1 (because guest_mode is part of mmu-role). In this case SDM section 28.3.3.4 doesn't apply. Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(), one could note that L0 won't flush TLB only in cases where SDM sections 28.3.3.3 and 28.3.3.4 don't apply. In particular, if L0 uses different VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section 28.3.3.3 doesn't apply. Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit(). Side-note: This patch can be viewed as removing parts of commit fb6c81984313 ("kvm: vmx: Flush TLB when the APIC-access address changes”) that is not relevant anymore since commit 1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”). i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT, then L0 will use same EPTP for both L0 and L1. Which indeed required L0 to execute INVEPT before entering L2 guest. This assumption is not true anymore since when guest_mode was added to mmu-role. Reviewed-by: Joao Martins <[email protected]> Signed-off-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-21KVM: x86: remove set but not used variable 'called'Mao Wenan1-3/+2
Fixes gcc '-Wunused-but-set-variable' warning: arch/x86/kvm/x86.c: In function kvm_make_scan_ioapic_request_mask: arch/x86/kvm/x86.c:7911:7: warning: variable called set but not used [-Wunused-but-set-variable] It is not used since commit 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Mao Wenan <[email protected]> Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-21KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinningLiran Alon1-3/+3
vmcs->apic_access_page is simply a token that the hypervisor puts into the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers APIC-access VMExit or APIC virtualization logic whenever a CPU running in VMX non-root mode read/write from/to this PFN. As every write either triggers an APIC-access VMExit or write is performed on vmcs->virtual_apic_page, the PFN pointed to by vmcs->apic_access_page should never actually be touched by CPU. Therefore, there is no need to mark vmcs02->apic_access_page as dirty after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation. Reviewed-by: Krish Sadhukhan <[email protected]> Reviewed-by: Joao Martins <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-21Merge branch 'kvm-tsx-ctrl' into HEADPaolo Bonzini928-4763/+13793
Conflicts: arch/x86/kvm/vmx/vmx.c
2019-11-21KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack itPaolo Bonzini1-14/+30
If X86_FEATURE_RTM is disabled, the guest should not be able to access MSR_IA32_TSX_CTRL. We can therefore use it in KVM to force all transactions from the guest to abort. Tested-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>