aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-11-12tipc: update mon's self addr when node addr generatedHoang Le3-0/+18
In commit 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values"), the 32-bit node address only generated after one second trial period expired. However the self's addr in struct tipc_monitor do not update according to node address generated. This lead to it is always zero as initial value. As result, sorting algorithm using this value does not work as expected, neither neighbor monitoring framework. In this commit, we add a fix to update self's addr when 32-bit node address generated. Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values") Acked-by: Jon Maloy <[email protected]> Signed-off-by: Hoang Le <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12Merge branch 'netfilter-flowtable-hardware-offload'David S. Miller11-74/+955
Pablo Neira Ayuso says: ==================== netfilter flowtable hardware offload The following patchset adds hardware offload support for the flowtable infrastructure [1]. This infrastructure provides a fast datapath for the classic Linux forwarding path that users can enable through policy, eg. table inet x { flowtable f { hook ingress priority 10 devices = { eth0, eth1 } flags offload } chain y { type filter hook forward priority 0; policy accept; ip protocol tcp flow offload @f } } This example above enables the fastpath for TCP traffic between devices eth0 and eth1. Users can turn on the hardware offload through the 'offload' flag from the flowtable definition. If this new flag is not specified, the software flowtable datapath is used. This patchset is composed of 4 preparation patches: room to extend this infrastructure, eg. accelerate bridge forwarding. And 2 patches to add the hardware offload control and data planes: hardware offload. This includes a new NFTA_FLOWTABLE_FLAGS netlink attribute to convey the optional NF_FLOWTABLE_HW_OFFLOAD flag. API available at net/core/flow_offload.h to represent the flow through two flow_rule objects to configure an exact 5-tuple matching on each direction plus the corresponding forwarding actions, that is, the MAC address, NAT and checksum updates; and port redirection in order to configure the hardware datapath. This patch only supports for IPv4 support and statistics collection for flow aging as an initial step. This patchset introduces a new flow_block callback type that needs to be set up to configure the flowtable hardware offload. The first client of this infrastructure follows up after this batch. I would like to thank Mellanox for developing the first upstream driver to use this infrastructure. [1] Documentation/networking/nf_flowtable.txt ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-12netfilter: nf_flow_table: hardware offload supportPablo Neira Ayuso8-9/+822
This patch adds the dataplane hardware offload to the flowtable infrastructure. Three new flags represent the hardware state of this flow: * FLOW_OFFLOAD_HW: This flow entry resides in the hardware. * FLOW_OFFLOAD_HW_DYING: This flow entry has been scheduled to be remove from hardware. This might be triggered by either packet path (via TCP RST/FIN packet) or via aging. * FLOW_OFFLOAD_HW_DEAD: This flow entry has been already removed from the hardware, the software garbage collector can remove it from the software flowtable. This patch supports for: * IPv4 only. * Aging via FLOW_CLS_STATS, no packet and byte counter synchronization at this stage. This patch also adds the action callback that specifies how to convert the flow entry into the flow_rule object that is passed to the driver. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12netfilter: nf_tables: add flowtable offload control planePablo Neira Ayuso6-2/+42
This patch adds the NFTA_FLOWTABLE_FLAGS attribute that allows users to specify the NF_FLOWTABLE_HW_OFFLOAD flag. This patch also adds a new setup interface for the flowtable type to perform the flowtable offload block callback configuration. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12netfilter: nf_flow_table: detach routing information from flow descriptionPablo Neira Ayuso3-27/+80
This patch adds the infrastructure to support for flow entry types. The initial type is NF_FLOW_OFFLOAD_ROUTE that stores the routing information into the flow entry to define a fastpath for the classic forwarding path. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12netfilter: nf_flowtable: remove flow_offload_entry structurePablo Neira Ayuso2-15/+5
Move rcu_head to struct flow_offload, then remove the flow_offload_entry structure definition. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12netfilter: nf_flow_table: remove union from flow_offload structurePablo Neira Ayuso1-4/+1
Drivers do not have access to the flow_offload structure, hence remove this union from this flow_offload object as well as the original comment on top of it. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12netfilter: nf_flow_table: move conntrack object to struct flow_offloadPablo Neira Ayuso2-24/+12
Simplify this code by storing the pointer to conntrack object in the flow_offload structure. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12tc-testing: Introduced tdc tests for basic filterRoman Mashak1-0/+325
Added tests for 'cmp' extended match rules. Signed-off-by: Roman Mashak <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12Merge branch 'for-upstream' of ↵David S. Miller9-52/+242
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: linux-bluetooth 2019-11-11 Here's one more bluetooth-next pull request for the 5.5 kernel release. - Several fixes for LE advertising - Added PM support to hci_qca driver - Added support for WCN3991 SoC in hci_qca driver - Added DT bindings for BCM43540 module - A few other small cleanups/fixes ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-12Input: cyttsp4_core - fix use after free bugPan Bian1-7/+0
The device md->input is used after it is released. Setting the device data to NULL is unnecessary as the device is never used again. Instead, md->input should be assigned NULL to avoid accessing the freed memory accidently. Besides, checking md->si against NULL is superfluous as it points to a variable address, which cannot be NULL. Signed-off-by: Pan Bian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Torokhov <[email protected]>
2019-11-12Input: synaptics-rmi4 - clear IRQ enables for F54Lucas Stach1-1/+1
The driver for F54 just polls the status and doesn't even have a IRQ handler registered. Make sure to disable all F54 IRQs, so we don't crash the kernel on a nonexistent handler. Signed-off-by: Lucas Stach <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Signed-off-by: Dmitry Torokhov <[email protected]>
2019-11-12Remove VirtualBox guest shared folders filesystemLinus Torvalds13-3280/+0
This went into staging in rc7. It turns out that was a mistake, and apparently it wasn't even supposed to go there at all, but be introduced as a regular filesystem. We don't try to sneak in whole new filesystems this late in the rc, just delete the whole thing, and it can be re-introduced as a proper patch with proper acks from actual filesystem people instead of some odd late-rc staging back-door. Cc: Greg Kroah-Hartman <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hans de Goede <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-51/+96
Pull kvm fixes from Paolo Bonzini: "Fix unwinding of KVM_CREATE_VM failure, VT-d posted interrupts, DAX/ZONE_DEVICE, and module unload/reload" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved KVM: VMX: Introduce pi_is_pir_empty() helper KVM: VMX: Do not change PID.NDST when loading a blocked vCPU KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts KVM: VMX: Fix comment to specify PID.ON instead of PIR.ON KVM: X86: Fix initialization of MSR lists KVM: fix placement of refcount initialization KVM: Fix NULL-ptr deref after kvm_create_vm fails
2019-11-12Merge tag 'linux-can-next-for-5.5-20191111' of ↵David S. Miller20-277/+360
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2019-10-07 this is a pull request for net-next/master consisting of 32 patches. The first patch is by Gustavo A. R. Silva and removes unused code in the generic CAN infrastructure. The next three patches target the mcp251x driver. The one by Andy Shevchenko removes the legacy platform data support from the driver. The other two are by Timo Schlüßler and reset the device only when needed, to prevent glitches on the output when GPIO support is added. I'm contributing two patches fixing checkpatch warnings in the c_can_platform and peak_canfd driver. Stephane Grosjean's patch for the peak_canfd driver adds hw timestamps support in rx skbs. The next three patches target the xilinx_can driver. One patch by me to fix checkpatch warnings, one patch by Anssi Hannula to avoid non requested bus error frames, and a patch by YueHaibing that switches the driver to devm_platform_ioremap_resource(). Pankaj Sharma contributes two patches for the m_can driver, the first one adds support for one shot mode, the other support for handling arbitration errors. Followed by four patches by YueHaibing, switching the grcan, ifi, rcar, and sun4i drivers to devm_platform_ioremap_resource() I'm contributing cleanup patches for the rx-offload helper, while Joakim Zhang's patch prepares the rx-offload helper for CAN-FD support. The rx offload users flexcan and ti_hecc are converted accordingly. The remaining twelve patches target the flexcan driver. First Joakim Zhang switches the driver to devm_platform_ioremap_resource(). The remaining eleven patch are by me and clean up the abstract the access of the iflag1 and iflag2 register both for RX and TX mailboxes. This is a preparation for the upcoming CAN-FD support. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-12sfc: trace_xdp_exception on XDP failureArthur Fabre1-0/+3
The sfc driver can drop packets processed with XDP, notably when running out of buffer space on XDP_TX, or returning an unknown XDP action. This increments the rx_xdp_bad_drops ethtool counter. Call trace_xdp_exception everywhere rx_xdp_bad_drops is incremented, except for fragmented RX packets as the XDP program hasn't run yet. This allows it to easily be monitored from userspace. This mirrors the behavior of other drivers. Signed-off-by: Arthur Fabre <[email protected]> Acked-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12net/smc: fix refcount non-blocking connect() -part 2Ursula Braun1-0/+1
If an SMC socket is immediately terminated after a non-blocking connect() has been called, a memory leak is possible. Due to the sock_hold move in commit 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()") an extra sock_put() is needed in smc_connect_work(), if the internal TCP socket is aborted and cancels the sk_stream_wait_connect() of the connect worker. Reported-by: [email protected] Fixes: 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()") Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: Karsten Graul <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12Merge tag 'gvt-fixes-2019-11-12' of https://github.com/intel/gvt-linux into ↵Rodrigo Vivi1-2/+2
drm-intel-fixes gvt-fixes-2019-11-12 - Fix dmabuf reference drop (Pan) Signed-off-by: Rodrigo Vivi <[email protected]> From: Zhenyu Wang <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-11-12ptp: ptp_clockmatrix: Fix build errorYueHaibing1-1/+1
When do randbuilding, we got this warning: WARNING: unmet direct dependencies detected for PTP_1588_CLOCK Depends on [n]: NET [=y] && POSIX_TIMERS [=n] Selected by [y]: - PTP_1588_CLOCK_IDTCM [=y] Make PTP_1588_CLOCK_IDTCM depends on PTP_1588_CLOCK to fix this. Fixes: 3a6ba7dc7799 ("ptp: Add a ptp clock driver for IDT ClockMatrix.") Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Vincent Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12net/sched: actions: remove unused 'order'Davide Caratti2-2/+0
after commit 4097e9d250fb ("net: sched: don't use tc_action->order during action dump"), 'act->order' is initialized but then it's no more read, so we can just remove this member of struct tc_action. CC: Ivan Vecera <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Acked-by: Jiri Pirko <[email protected]> Reviewed-by: Ivan Vecera <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12drm/i915: update rawclk also on resumeJani Nikula2-3/+3
Since CNP it's possible for rawclk to have two different values, 19.2 and 24 MHz. If the value indicated by SFUSE_STRAP register is different from the power on default for PCH_RAWCLK_FREQ, we'll end up having a mismatch between the rawclk hardware and software states after suspend/resume. On previous platforms this used to work by accident, because the power on defaults worked just fine. Update the rawclk also on resume. The natural place to do this would be intel_modeset_init_hw(), however VLV/CHV need it done before intel_power_domains_init_hw(). Thus put it there even if it feels slightly out of place. v2: Call intel_update_rawclck() in intel_power_domains_init_hw() for all platforms (Ville). Reported-by: Shawn Lee <[email protected]> Cc: Shawn Lee <[email protected]> Cc: Ville Syrjala <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Tested-by: Shawn Lee <[email protected]> Signed-off-by: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 59ed05ccdded5eb18ce012eff3d01798ac8535fa) Cc: <[email protected]> # v4.15+ Signed-off-by: Rodrigo Vivi <[email protected]>
2019-11-12net: dsa: mv88e6xxx: fix broken if statement because of a stray semicolonColin Ian King1-1/+1
There is a stray semicolon in an if statement that will cause a dev_err message to be printed unconditionally. Fix this by removing the stray semicolon. Addresses-Coverity: ("Stay semicolon") Fixes: f0942e00a1ab ("net: dsa: mv88e6xxx: Add support for port mirroring") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12Merge branch 'Update-devlink-binary-output'David S. Miller5-36/+27
Aya Levin says: ==================== Update devlink binary output This series changes the devlink binary interface: -The first patch forces binary values to be enclosed in an array. In addition, devlink_fmsg_binary_pair_put breaks the binary value into chunks to comply with devlink's restriction for value length. -The second patch removes redundant code and uses the fixed devlink interface (devlink_fmsg_binary_pair_put). -The third patch make self test to use the updated devlink interface. -The fourth, adds a verification of dumping a very large binary content. This test verifies breaking the data into chunks in a valid JSON output. Series was generated against net-next commit: ca22d6977b9b Merge branch 'stmmac-next' ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-12selftests: Add a test of large binary to devlink health testAya Levin1-0/+9
Add a test of 2 PAGEs size (exceeds devlink previous length limitation) of binary data on a 'devlink health dump show' command. Set binary length to 8192, issue a dump show command and clear it. Signed-off-by: Aya Levin <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12netdevsim: Update dummy reporter's devlink binary interfaceAya Levin1-7/+1
Update dummy reporter's output to use updated devlink interface of binary fmsg pair. Signed-off-by: Aya Levin <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12net/mlx5: Dump of fw_fatal use updated devlink binary interfaceAya Levin1-17/+1
Remove redundant code from fw_fatal reporter's dump callback. Use updated devlink interface of binary fmsg pair which breaks the output into chunks internally. Signed-off-by: Aya Levin <[email protected]> Acked-by: Jiri Pirko <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12devlink: Allow large formatted message of binary outputAya Levin2-12/+16
Devlink supports pair output of name and value. When the value is binary, it must be presented in an array. If the length of the binary value exceeds fmsg limitation, break the value into chunks internally. Signed-off-by: Aya Levin <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12net: sfp: fix sfp_bus_add_upstream() warningRussell King1-2/+2
When building with SFP disabled, the stub for sfp_bus_add_upstream() missed "inline". Add it. Fixes: 727b3668b730 ("net: sfp: rework upstream interface") Signed-off-by: Russell King <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12cxgb4: make function 'cxgb4_mqprio_free_hw_resources' staticzhengbin1-1/+1
Fix sparse warnings: drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c:242:6: warning: symbol 'cxgb4_mqprio_free_hw_resources' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Fixes: 2d0cb84dd973 ("cxgb4: add ETHOFLD hardware queue support") Signed-off-by: zhengbin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12Merge branch 'atlantic-static'David S. Miller2-3/+3
zhengbin says: ==================== net: atlantic: make some symbol & function static v1->v2: add Fixes tag ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-12net: atlantic: make function 'aq_ethtool_get_priv_flags', ↵zhengbin1-2/+2
'aq_ethtool_set_priv_flags' static Fix sparse warnings: drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c:706:5: warning: symbol 'aq_ethtool_get_priv_flags' was not declared. Should it be static? drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c:713:5: warning: symbol 'aq_ethtool_set_priv_flags' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Fixes: ea4b4d7fc106 ("net: atlantic: loopback tests via private flags") Signed-off-by: zhengbin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12net: atlantic: make symbol 'aq_pm_ops' staticzhengbin1-1/+1
Fix sparse warnings: drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c:426:25: warning: symbol 'aq_pm_ops' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Fixes: 8aaa112a57c1 ("net: atlantic: refactoring pm logic") Signed-off-by: zhengbin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12Merge branch 'mlxsw-Add-extended-ACK-for-EMADs'David S. Miller4-20/+162
Ido Schimmel says: ==================== mlxsw: Add extended ACK for EMADs Shalom says: Ethernet Management Datagrams (EMADs) are Ethernet packets sent between the driver and device's firmware. They are used to pass various configurations to the device, but also to get events (e.g., port up) from it. After the Ethernet header, these packets are built in a TLV format. Up until now, whenever the driver issued an erroneous register access it only got an error code indicating a bad parameter was used. This patch set adds a new TLV (string TLV) that can be used by the firmware to encode a 128 character string describing the error. The new TLV is allocated by the driver and set to zeros. In case of error, the driver will check the length of the string in the response and report it using devlink hwerr tracepoint. Example: $ perf record -a -q -e devlink:devlink_hwerr & $ pkill -2 perf $ perf script -F trace:event,trace | grep hwerr devlink:devlink_hwerr: bus_name=pci dev_name=0000:03:00.0 driver_name=mlxsw_spectrum err=7 (tid=9913892d00001593,reg_id=8018(rauhtd)) bad parameter (inside er_rauhtd_write_query(), num_rec=32 is over the maximum number of records supported) Patch #1 parses the offsets of the different TLVs in incoming EMADs and stores them in the skb's control block. This makes it easier to later add new TLVs. Patches #2-#3 remove deprecated TLVs and add string TLV definition. Patches #4-#7 gradually add support for the new string TLV. v2: * Use existing devlink hwerr tracepoint to report the error string, instead of printing it to kernel log ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-12mlxsw: spectrum: Enable EMAD string TLVShalom Toledo1-0/+2
Make sure to enable EMAD string TLV only after using the required firmware version. Signed-off-by: Shalom Toledo <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12mlxsw: core: Add support for using EMAD string TLVShalom Toledo2-6/+72
In case the firmware had an error while processing EMADs, it can send back an ASCII string with the reason using EMAD string TLV. This patch adds the support for using EMAD string TLV. In case of an error, reports the reason using devlink hwerr tracepoint. Signed-off-by: Shalom Toledo <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12mlxsw: core: Extend EMAD information reported to devlink hwerrShalom Toledo1-2/+10
Extend EMAD information reported to devlink hwerr tracepoint with transaction id and reg id (both, hex and string). Signed-off-by: Shalom Toledo <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12mlxsw: core: Add support for EMAD string TLV parsingShalom Toledo1-0/+15
During parsing of incoming EMADs, fill the string TLV's offset when it is used. Signed-off-by: Shalom Toledo <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12mlxsw: core: Add EMAD string TLVShalom Toledo2-1/+24
Add EMAD string TLV, an ASCII string the driver can receive from the firmware in case of an error. Signed-off-by: Shalom Toledo <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12mlxsw: emad: Remove deprecated EMAD TLVsShalom Toledo1-4/+1
Remove deprecated EMAD TLVs. Signed-off-by: Shalom Toledo <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12mlxsw: core: Parse TLVs' offsets of incoming EMADsShalom Toledo1-11/+42
Until now the code assumes a fixed structure which makes it difficult to support EMADs with and without new TLVs. Make it more generic by parsing the TLVs when the EMADs are received and store the offset to the different TLVs in the control block. Using these offsets to extract information from the EMADs without relying on a specific structure. Signed-off-by: Shalom Toledo <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds28-77/+1613
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TSX Async Abort and iTLB Multihit mitigations from Thomas Gleixner: "The performance deterioration departement is not proud at all of presenting the seventh installment of speculation mitigations and hardware misfeature workarounds: 1) TSX Async Abort (TAA) - 'The Annoying Affair' TAA is a hardware vulnerability that allows unprivileged speculative access to data which is available in various CPU internal buffers by using asynchronous aborts within an Intel TSX transactional region. The mitigation depends on a microcode update providing a new MSR which allows to disable TSX in the CPU. CPUs which have no microcode update can be mitigated by disabling TSX in the BIOS if the BIOS provides a tunable. Newer CPUs will have a bit set which indicates that the CPU is not vulnerable, but the MSR to disable TSX will be available nevertheless as it is an architected MSR. That means the kernel provides the ability to disable TSX on the kernel command line, which is useful as TSX is a truly useful mechanism to accelerate side channel attacks of all sorts. 2) iITLB Multihit (NX) - 'No eXcuses' iTLB Multihit is an erratum where some Intel processors may incur a machine check error, possibly resulting in an unrecoverable CPU lockup, when an instruction fetch hits multiple entries in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. A malicious guest running on a virtualized system can exploit this erratum to perform a denial of service attack. The workaround is that KVM marks huge pages in the extended page tables as not executable (NX). If the guest attempts to execute in such a page, the page is broken down into 4k pages which are marked executable. The workaround comes with a mechanism to recover these shattered huge pages over time. Both issues come with full documentation in the hardware vulnerabilities section of the Linux kernel user's and administrator's guide. Thanks to all patch authors and reviewers who had the extraordinary priviledge to be exposed to this nuisance. Special thanks to Borislav Petkov for polishing the final TAA patch set and to Paolo Bonzini for shepherding the KVM iTLB workarounds and providing also the backports to stable kernels for those!" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs Documentation: Add ITLB_MULTIHIT documentation kvm: x86: mmu: Recovery of shattered NX large pages kvm: Add helper function for creating VM worker threads kvm: mmu: ITLB_MULTIHIT mitigation cpu/speculation: Uninline and export CPU mitigations helpers x86/cpu: Add Tremont to the cpu vulnerability whitelist x86/bugs: Add ITLB_MULTIHIT bug infrastructure x86/tsx: Add config options to set tsx=on|off|auto x86/speculation/taa: Add documentation for TSX Async Abort x86/tsx: Add "auto" option to the tsx= cmdline parameter kvm/x86: Export MDS_NO=0 to guests when TSX is enabled x86/speculation/taa: Add sysfs reporting for TSX Async Abort x86/speculation/taa: Add mitigation for TSX Async Abort x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default x86/cpu: Add a helper function x86_read_arch_cap_msr() x86/msr: Add the IA32_TSX_CTRL MSR
2019-11-12net: ethernet: ti: Add dependency for TI_DAVINCI_EMACMao Wenan1-0/+1
If TI_DAVINCI_EMAC=y and GENERIC_ALLOCATOR is not set, below erros can be seen: drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_desc_pool_destroy.isra.14': davinci_cpdma.c:(.text+0x359): undefined reference to `gen_pool_size' davinci_cpdma.c:(.text+0x365): undefined reference to `gen_pool_avail' davinci_cpdma.c:(.text+0x373): undefined reference to `gen_pool_avail' davinci_cpdma.c:(.text+0x37f): undefined reference to `gen_pool_size' drivers/net/ethernet/ti/davinci_cpdma.o: In function `__cpdma_chan_free': davinci_cpdma.c:(.text+0x4a2): undefined reference to `gen_pool_free_owner' drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_chan_submit_si': davinci_cpdma.c:(.text+0x66c): undefined reference to `gen_pool_alloc_algo_owner' davinci_cpdma.c:(.text+0x805): undefined reference to `gen_pool_free_owner' drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_ctlr_create': davinci_cpdma.c:(.text+0xabd): undefined reference to `devm_gen_pool_create' davinci_cpdma.c:(.text+0xb79): undefined reference to `gen_pool_add_owner' drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_check_free_tx_desc': davinci_cpdma.c:(.text+0x16c6): undefined reference to `gen_pool_avail' This patch mades TI_DAVINCI_EMAC select GENERIC_ALLOCATOR. Fixes: 99f629718272 ("net: ethernet: ti: cpsw: drop TI_DAVINCI_CPDMA config option") Signed-off-by: Mao Wenan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-12x86/quirks: Disable HPET on Intel Coffe Lake platformsKai-Heng Feng1-0/+2
Some Coffee Lake platforms have a skewed HPET timer once the SoCs entered PC10, which in consequence marks TSC as unstable because HPET is used as watchdog clocksource for TSC. Harry Pan tried to work around it in the clocksource watchdog code [1] thereby creating a circular dependency between HPET and TSC. This also ignores the fact, that HPET is not only unsuitable as watchdog clocksource on these systems, it becomes unusable in general. Disable HPET on affected platforms. Suggested-by: Feng Tang <[email protected]> Signed-off-by: Kai-Heng Feng <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183 Link: https://lore.kernel.org/lkml/[email protected]/ [1] Link: https://lkml.kernel.org/r/[email protected]
2019-11-12block: check bi_size overflow before mergeJunichi Nomura1-1/+1
__bio_try_merge_page() may merge a page to bio without bio_full() check and cause bi_size overflow. The overflow typically ends up with sd_init_command() warning on zero segment request with call trace like this: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1986 at drivers/scsi/scsi_lib.c:1025 scsi_init_io+0x156/0x180 CPU: 2 PID: 1986 Comm: kworker/2:1H Kdump: loaded Not tainted 5.4.0-rc7 #1 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:scsi_init_io+0x156/0x180 RSP: 0018:ffffa11487663bf0 EFLAGS: 00010246 RAX: 00000000002be0a0 RBX: ffff8e6e9ff30118 RCX: 0000000000000000 RDX: 00000000ffffffe1 RSI: 0000000000000000 RDI: ffff8e6e9ff30118 RBP: ffffa11487663c18 R08: ffffa11487663d28 R09: ffff8e6e9ff30150 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e6e9ff30000 R13: 0000000000000001 R14: ffff8e74a1cf1800 R15: ffff8e6e9ff30000 FS: 0000000000000000(0000) GS:ffff8e6ea7680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff18cf0fe8 CR3: 0000000659f0a001 CR4: 00000000001606e0 Call Trace: sd_init_command+0x326/0xb40 [sd_mod] scsi_queue_rq+0x502/0xaa0 ? blk_mq_get_driver_tag+0xe7/0x120 blk_mq_dispatch_rq_list+0x256/0x5a0 ? elv_rb_del+0x24/0x30 ? deadline_remove_request+0x7b/0xc0 blk_mq_do_dispatch_sched+0xa3/0x140 blk_mq_sched_dispatch_requests+0xfb/0x170 __blk_mq_run_hw_queue+0x81/0x130 blk_mq_run_work_fn+0x1b/0x20 process_one_work+0x179/0x390 worker_thread+0x4f/0x3e0 kthread+0x105/0x140 ? max_active_store+0x80/0x80 ? kthread_bind+0x20/0x20 ret_from_fork+0x35/0x40 ---[ end trace f9036abf5af4a4d3 ]--- blk_update_request: I/O error, dev sdd, sector 2875552 op 0x1:(WRITE) flags 0x0 phys_seg 0 prio class 0 XFS (sdd1): writeback error on sector 2875552 __bio_try_merge_page() should check the overflow before actually doing merge. Fixes: 07173c3ec276c ("block: enable multipage bvecs") Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jun'ichi Nomura <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-11-12KVM: MMU: Do not treat ZONE_DEVICE pages as being reservedSean Christopherson3-7/+28
Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and instead manually handle ZONE_DEVICE on a case-by-case basis. For things like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal pages, e.g. put pages grabbed via gup(). But for flows such as setting A/D bits or shifting refcounts for transparent huge pages, KVM needs to to avoid processing ZONE_DEVICE pages as the flows in question lack the underlying machinery for proper handling of ZONE_DEVICE pages. This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup() when running a KVM guest backed with /dev/dax memory, as KVM straight up doesn't put any references to ZONE_DEVICE pages acquired by gup(). Note, Dan Williams proposed an alternative solution of doing put_page() on ZONE_DEVICE pages immediately after gup() in order to simplify the auditing needed to ensure is_zone_device_page() is called if and only if the backing device is pinned (via gup()). But that approach would break kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til unmap() when accessing guest memory, unlike KVM's secondary MMU, which coordinates with mmu_notifier invalidations to avoid creating stale page references, i.e. doesn't rely on pages being pinned. [*] http://lkml.kernel.org/r/[email protected] Reported-by: Adam Borowski <[email protected]> Analyzed-by: David Hildenbrand <[email protected]> Acked-by: Dan Williams <[email protected]> Cc: [email protected] Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-12KVM: VMX: Introduce pi_is_pir_empty() helperJoao Martins2-3/+7
Streamline the PID.PIR check and change its call sites to use the newly added helper. Suggested-by: Liran Alon <[email protected]> Signed-off-by: Joao Martins <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-12KVM: VMX: Do not change PID.NDST when loading a blocked vCPUJoao Martins2-0/+20
When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU linked list of all vCPUs that are blocked on this pCPU. Afterwards, it changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler (wakeup_handler()) is responsible to kick (unblock) any vCPU on that linked list that now has pending posted interrupts. While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which will cause vmx_vcpu_pi_put() to set PID.SN. If later the vCPU will be scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear PID.SN but will also *overwrite PID.NDST to this different pCPU*. Instead of keeping it with original pCPU which vCPU had entered block phase on. This results in an issue because when a posted interrupt is delivered, as the wakeup_handler() will be executed and fail to find blocked vCPU on its per pCPU linked list of all vCPUs that are blocked on this pCPU. Which is due to the vCPU being placed on a *different* per pCPU linked list i.e. the original pCPU in which it entered block phase. The regression is introduced by commit c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN"). Therefore, partially revert it and reintroduce the condition in vmx_vcpu_pi_load() responsible for avoiding changing PID.NDST when loading a blocked vCPU. Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN") Tested-by: Nathan Ni <[email protected]> Co-developed-by: Liran Alon <[email protected]> Signed-off-by: Liran Alon <[email protected]> Signed-off-by: Joao Martins <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-12KVM: VMX: Consider PID.PIR to determine if vCPU has pending interruptsJoao Martins1-1/+5
Commit 17e433b54393 ("KVM: Fix leak vCPU's VMCS value into other pCPU") introduced vmx_dy_apicv_has_pending_interrupt() in order to determine if a vCPU have a pending posted interrupt. This routine is used by kvm_vcpu_on_spin() when searching for a a new runnable vCPU to schedule on pCPU instead of a vCPU doing busy loop. vmx_dy_apicv_has_pending_interrupt() determines if a vCPU has a pending posted interrupt solely based on PID.ON. However, when a vCPU is preempted, vmx_vcpu_pi_put() sets PID.SN which cause raised posted interrupts to only set bit in PID.PIR without setting PID.ON (and without sending notification vector), as depicted in VT-d manual section 5.2.3 "Interrupt-Posting Hardware Operation". Therefore, checking PID.ON is insufficient to determine if a vCPU has pending posted interrupts and instead we should also check if there is some bit set on PID.PIR if PID.SN=1. Fixes: 17e433b54393 ("KVM: Fix leak vCPU's VMCS value into other pCPU") Reviewed-by: Jagannathan Raman <[email protected]> Co-developed-by: Liran Alon <[email protected]> Signed-off-by: Liran Alon <[email protected]> Signed-off-by: Joao Martins <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-12KVM: VMX: Fix comment to specify PID.ON instead of PIR.ONLiran Alon1-1/+1
The Outstanding Notification (ON) bit is part of the Posted Interrupt Descriptor (PID) as opposed to the Posted Interrupts Register (PIR). The latter is a bitmap for pending vectors. Reviewed-by: Joao Martins <[email protected]> Signed-off-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-12KVM: X86: Fix initialization of MSR listsChenyi Qiang1-30/+26
The three MSR lists(msrs_to_save[], emulated_msrs[] and msr_based_features[]) are global arrays of kvm.ko, which are adjusted (copy supported MSRs forward to override the unsupported MSRs) when insmod kvm-{intel,amd}.ko, but it doesn't reset these three arrays to their initial value when rmmod kvm-{intel,amd}.ko. Thus, at the next installation, kvm-{intel,amd}.ko will do operations on the modified arrays with some MSRs lost and some MSRs duplicated. So define three constant arrays to hold the initial MSR lists and initialize msrs_to_save[], emulated_msrs[] and msr_based_features[] based on the constant arrays. Cc: [email protected] Reviewed-by: Xiaoyao Li <[email protected]> Signed-off-by: Chenyi Qiang <[email protected]> [Remove now useless conditionals. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>