aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-09-16net/wan: dscc4: remove broken dscc4 driverDan Carpenter4-2078/+0
Using static analysis, I discovered that the "dpriv->pci_priv->pdev" pointer is always NULL. This pointer was supposed to be initialized during probe and is essential for the driver to work. It would be easy to add a "ppriv->pdev = pdev;" to dscc4_found1() but this driver has been broken since before we started using git and no one has complained so probably we should just remove it. Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Francois Romieu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-16MAINTAINERS: xen-netback: update my email addressPaul Durrant1-1/+1
My Citrix email address will expire shortly. Signed-off-by: Paul Durrant <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Wei Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-16net: stmmac: Hold rtnl lock in suspend/resume callbacksJose Abreu1-4/+8
We need to hold rnl lock in suspend and resume callbacks because phylink requires it. Otherwise we will get a WARN() in suspend and resume. Also, move phylink start and stop callbacks to inside device's internal lock so that we prevent concurrent HW accesses. Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic") Reported-by: Christophe ROULLIER <[email protected]> Tested-by: Christophe ROULLIER <[email protected]> Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-16ip6_gre: fix a dst leak in ip6erspan_tunnel_xmitXin Long1-1/+1
In ip6erspan_tunnel_xmit(), if the skb will not be sent out, it has to be freed on the tx_err path. Otherwise when deleting a netns, it would cause dst/dev to leak, and dmesg shows: unregister_netdevice: waiting for lo to become free. Usage count = 1 Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode") Signed-off-by: Xin Long <[email protected]> Acked-by: William Tu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-16qed: fix spelling mistake "fullill" -> "fulfill"Colin Ian King1-1/+1
There is a spelling mistake in a DP_VERBOSE debug message. Fix it. (Using American English spelling as this is the most common way to spell this in the kernel). Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-16net: dsa: b53: Add support for port_egress_floods callbackFlorian Fainelli2-0/+35
Add support for configuring the per-port egress flooding control for both Unicast and Multicast traffic. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-16udp: correct reuseport selection with connected socketsWillem de Bruijn6-7/+42
UDP reuseport groups can hold a mix unconnected and connected sockets. Ensure that connections only receive all traffic to their 4-tuple. Fast reuseport returns on the first reuseport match on the assumption that all matches are equal. Only if connections are present, return to the previous behavior of scoring all sockets. Record if connections are present and if so (1) treat such connected sockets as an independent match from the group, (2) only return 2-tuple matches from reuseport and (3) do not return on the first 2-tuple reuseport match to allow for a higher scoring match later. New field has_conns is set without locks. No other fields in the bitmap are modified at runtime and the field is only ever set unconditionally, so an RMW cannot miss a change. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com Signed-off-by: Willem de Bruijn <[email protected]> Acked-by: Paolo Abeni <[email protected]> Acked-by: Craig Gallek <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-16Merge tag 'asoc-v5.4-2' of ↵Takashi Iwai5-22/+32
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Final merge window fixes for v5.4 A few small fixes and one feature that came in since I sent you the earlier pull request.
2019-09-15block: also check RQF_STATS in blk_mq_need_time_stamp()Hou Tao1-3/+3
In __blk_mq_end_request() if block stats needs update, we should ensure now is valid instead of 0 even when iostat is disabled. Signed-off-by: Hou Tao <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-09-15block: make rq sector size accessible for block statsHou Tao3-10/+19
Currently rq->data_len will be decreased by partial completion or zeroed by completion, so when blk_stat_add() is invoked, data_len will be zero and there will never be samples in poll_cb because blk_mq_poll_stats_bkt() will return -1 if data_len is zero. We could move blk_stat_add() back to __blk_mq_complete_request(), but that would make the effort of trying to call ktime_get_ns() once in vain. Instead we can reuse throtl_size field, and use it for both block stats and block throttle, and adjust the logic in blk_mq_poll_stats_bkt() accordingly. Fixes: 4bc6339a583c ("block: move blk_stat_add() to __blk_mq_end_request()") Tested-by: Pavel Begunkov <[email protected]> Signed-off-by: Hou Tao <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-09-15Linux 5.3Linus Torvalds1-1/+1
2019-09-15Revert "ext4: make __ext4_get_inode_loc plug"Linus Torvalds1-3/+0
This reverts commit b03755ad6f33b7b8cd7312a3596a2dbf496de6e7. This is sad, and done for all the wrong reasons. Because that commit is good, and does exactly what it says: avoids a lot of small disk requests for the inode table read-ahead. However, it turns out that it causes an entirely unrelated problem: the getrandom() system call was introduced back in 2014 by commit c6e9d6f38894 ("random: introduce getrandom(2) system call"), and people use it as a convenient source of good random numbers. But part of the current semantics for getrandom() is that it waits for the entropy pool to fill at least partially (unlike /dev/urandom). And at least ArchLinux apparently has a systemd that uses getrandom() at boot time, and the improvements in IO patterns means that existing installations suddenly start hanging, waiting for entropy that will never happen. It seems to be an unlucky combination of not _quite_ enough entropy, together with a particular systemd version and configuration. Lennart says that the systemd-random-seed process (which is what does this early access) is supposed to not block any other boot activity, but sadly that doesn't actually seem to be the case (possibly due bogus dependencies on cryptsetup for encrypted swapspace). The correct fix is to fix getrandom() to not block when it's not appropriate, but that fix is going to take a lot more discussion. Do we just make it act like /dev/urandom by default, and add a new flag for "wait for entropy"? Do we add a boot-time option? Or do we just limit the amount of time it will wait for entropy? So in the meantime, we do the revert to give us time to discuss the eventual fix for the fundamental problem, at which point we can re-apply the ext4 inode table access optimization. Reported-by: Ahmed S. Darwish <[email protected]> Cc: Ted Ts'o <[email protected]> Cc: Willy Tarreau <[email protected]> Cc: Alexander E. Patrakov <[email protected]> Cc: Lennart Poettering <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-15net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names'Gerd Rausch1-1/+1
All entries in 'rds_ib_stat_names' are stringified versions of the corresponding "struct rds_ib_statistics" element without the "s_"-prefix. Fix entry 'ib_evt_handler_call' to do the same. Fixes: f4f943c958a2 ("RDS: IB: ack more receive completions to improve performance") Signed-off-by: Gerd Rausch <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-15net_sched: let qdisc_put() accept NULL pointerCong Wang1-0/+3
When tcf_block_get() fails in sfb_init(), q->qdisc is still a NULL pointer which leads to a crash in sfb_destroy(). Similar for sch_dsmark. Instead of fixing each separately, Linus suggested to just accept NULL pointer in qdisc_put(), which would make callers easier. (For sch_dsmark, the bug probably exists long before commit 6529eaba33f0.) Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Reported-by: [email protected] Suggested-by: Linus Torvalds <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-15net: dsa: Fix load order between DSA drivers and taggersAndrew Lunn1-0/+2
The DSA core, DSA taggers and DSA drivers all make use of module_init(). Hence they get initialised at device_initcall() time. The ordering is non-deterministic. It can be a DSA driver is bound to a device before the needed tag driver has been initialised, resulting in the message: No tagger for this switch Rather than have this be fatal, return -EPROBE_DEFER so that it is tried again later once all the needed drivers have been loaded. Fixes: d3b8c04988ca ("dsa: Add boilerplate helper to register DSA tag driver modules") Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-15net/sched: fix race between deactivation and dequeue for NOLOCK qdiscPaolo Abeni2-7/+16
The test implemented by some_qdisc_is_busy() is somewhat loosy for NOLOCK qdisc, as we may hit the following scenario: CPU1 CPU2 // in net_tx_action() clear_bit(__QDISC_STATE_SCHED...); // in some_qdisc_is_busy() val = (qdisc_is_running(q) || test_bit(__QDISC_STATE_SCHED, &q->state)); // here val is 0 but... qdisc_run(q) // ... CPU1 is going to run the qdisc next As a conseguence qdisc_run() in net_tx_action() can race with qdisc_reset() in dev_qdisc_reset(). Such race is not possible for !NOLOCK qdisc as both the above bit operations are under the root qdisc lock(). After commit 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue") the race can cause use after free and/or null ptr dereference, but the root cause is likely older. This patch addresses the issue explicitly checking for deactivation under the seqlock for NOLOCK qdisc, so that the qdisc_run() in the critical scenario becomes a no-op. Note that the enqueue() op can still execute concurrently with dev_qdisc_reset(), but that is safe due to the skb_array() locking, and we can't avoid that for NOLOCK qdiscs. Fixes: 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue") Reported-by: Li Shuang <[email protected]> Reported-and-tested-by: Davide Caratti <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller159-1377/+1672
Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <[email protected]>
2019-09-15Merge branch 'spi-5.4' into spi-nextMark Brown74-901/+1921
2019-09-15Merge branch 'spi-5.3' into spi-linusMark Brown7-7/+34
2019-09-15Merge branch 'asoc-5.4' into asoc-nextMark Brown273-6850/+12117
2019-09-15Merge branch 'asoc-5.3' into asoc-linusMark Brown33-161/+459
2019-09-15ASoC: sdm845: remove unneeded semicolonSaiyam Doshi1-1/+1
Remove excess semicolon after closing parenthesis. Signed-off-by: Saiyam Doshi <[email protected]> Link: https://lore.kernel.org/r/20190914031133.GA28447@SD Signed-off-by: Mark Brown <[email protected]>
2019-09-15Documentation: kbuild: Add document about reproducible buildsBen Hutchings2-0/+123
In the Distribution Kernels track at Linux Plumbers Conference there was some discussion around the difficulty of making kernel builds reproducible. This is a solved problem, but the solutions don't appear to be documented in one place. This document lists the issues I know about and the settings needed to ensure reproducibility. Signed-off-by: Ben Hutchings <[email protected]> Acked-by: Masahiro Yamada <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2019-09-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-4/+124
Pull kvm fixes from Paolo Bonzini: "The main change here is a revert of reverts. We recently simplified some code that was thought unnecessary; however, since then KVM has grown quite a few cond_resched()s and for that reason the simplified code is prone to livelocks---one CPUs tries to empty a list of guest page tables while the others keep adding to them. This adds back the generation-based zapping of guest page tables, which was not unnecessary after all. On top of this, there is a fix for a kernel memory leak and a couple of s390 fixlets as well" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot KVM: x86: work around leak of uninitialized stack contents KVM: nVMX: handle page fault in vmread KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
2019-09-14io_uring: increase IORING_MAX_ENTRIES to 32KDaniel Xu1-1/+1
Some workloads can require far more than 4K oustanding entries. For example memcached can have ~300K sockets over ~40 cores. Bumping the max to 32K seems to work pretty well. Reported-by: Dan Melnic <[email protected]> Signed-off-by: Daniel Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-09-14Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-4/+2
Pull virtio fix from Michael Tsirkin: "A last minute revert The 32-bit build got broken by the latest defence in depth patch. Revert and we'll try again in the next cycle" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: Revert "vhost: block speculation of translated descriptors"
2019-09-14Merge tag 'riscv/for-v5.3' of ↵Linus Torvalds3-14/+15
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fix from Paul Walmsley: "Last week, Palmer and I learned that there was an error in the RISC-V kernel image header format that could make it less compatible with the ARM64 kernel image header format. I had missed this error during my original reviews of the patch. The kernel image header format is an interface that impacts bootloaders, QEMU, and other user tools. Those packages must be updated to align with whatever is merged in the kernel. We would like to avoid proliferating these image formats by keeping the RISC-V header as close as possible to the existing ARM64 header. Since the arch/riscv patch that adds support for the image header was merged with our v5.3-rc1 pull request as commit 0f327f2aaad6a ("RISC-V: Add an Image header that boot loader can parse."), we think it wise to try to fix this error before v5.3 is released. The fix itself should be backwards-compatible with any project that has already merged support for premature versions of this interface. It primarily involves ensuring that the RISC-V image header has something useful in the same field as the ARM64 image header" * tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: modify the Image header to improve compatibility with the ARM64 header
2019-09-14Revert "vhost: block speculation of translated descriptors"Michael S. Tsirkin1-4/+2
This reverts commit a89db445fbd7f1f8457b03759aa7343fa530ef6b. I was hasty to include this patch, and it breaks the build on 32 bit. Defence in depth is good but let's do it properly. Cc: [email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2019-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds67-289/+443
Pull networking fixes from David Miller: 1) Don't corrupt xfrm_interface parms before validation, from Nicolas Dichtel. 2) Revert use of usb-wakeup in btusb, from Mario Limonciello. 3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from Leonardo Bras. 4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso. 5) Missing ULP check in sock_map, from John Fastabend. 6) Fix receive statistic handling in forcedeth, from Zhu Yanjun. 7) Fix length of SKB allocated in 6pack driver, from Christophe JAILLET. 8) ip6_route_info_create() returns an error pointer, not NULL. From Maciej Żenczykowski. 9) Only add RDS sock to the hashes after rs_transport is set, from Ka-Cheong Poon. 10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets. 11) Presence of transmit IPSEC offload in an SKB is not tested for correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff Kirsher. 12) Need rcu_barrier() when register_netdevice() takes one of the notifier based failure paths, from Subash Abhinov Kasiviswanathan. 13) Fix leak in sctp_do_bind(), from Mao Wenan. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) cdc_ether: fix rndis support for Mediatek based smartphones sctp: destroy bucket if failed to bind addr sctp: remove redundant assignment when call sctp_get_port_local sctp: change return type of sctp_get_port_local ixgbevf: Fix secpath usage for IPsec Tx offload sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()' ixgbe: Fix secpath usage for IPsec TX offload. net: qrtr: fix memort leak in qrtr_tun_write_iter net: Fix null de-reference of device refcount ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()' tun: fix use-after-free when register netdev failed tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR ixgbe: fix double clean of Tx descriptors with xdp ixgbe: Prevent u8 wrapping of ITR value to something less than 10us mlx4: fix spelling mistake "veify" -> "verify" net: hns3: fix spelling mistake "undeflow" -> "underflow" net: lmc: fix spelling mistake "runnin" -> "running" NFC: st95hf: fix spelling mistake "receieve" -> "receive" net/rds: An rds_sock is added too early to the hash table mac80211: Do not send Layer 2 Update frame before authorization ...
2019-09-14Merge tag 'mmc-v5.3-rc8' of ↵Linus Torvalds7-29/+17
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - tmio: Fixup runtime PM management during probe and remove - sdhci-pci-o2micro: Fix eMMC initialization for an AMD SoC - bcm2835: Prevent lockups when terminating work * tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: tmio: Fixup runtime PM management during remove mmc: tmio: Fixup runtime PM management during probe Revert "mmc: tmio: move runtime PM enablement to the driver implementations" Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller" Revert "mmc: bcm2835: Terminate timeout work synchronously"
2019-09-14Merge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drmLinus Torvalds4-8/+11
Pull drm fixes from Dave Airlie: "From the maintainer summit, just some last minute fixes for final: lima: - fix gem_wait ioctl core: - constify modes list i915: - DP MST high color depth regression - GPU hangs on vulkan compute workloads" * tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm: drm/lima: fix lima_gem_wait() return value drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+ drm/i915: Limit MST to <= 8bpc once again drm/modes: Make the whitelist more const
2019-09-14bfq: Fix bfq linkage errorPavel Begunkov1-0/+2
Since commit 795fe54c2a828099e ("bfq: Add per-device weight"), bfq uses blkg_conf_prep() and blkg_conf_finish(), which are not exported. So, it causes linkage error if bfq compiled as a module. Fixes: 795fe54c2a828099e ("bfq: Add per-device weight") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-09-14Merge branch 'for-next' into for-linusTakashi Iwai13624-455309/+1031269
Signed-off-by: Takashi Iwai <[email protected]>
2019-09-14Merge tag 'wireless-drivers-next-for-davem-2019-09-14' of ↵David S. Miller76-3570/+8505
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 5.4 Last set of patches for 5.4. wil6210 and rtw88 being most active this time, but ath9k also having a new module to load devices without EEPROM. Major changes: wil6210 * add support for Enhanced Directional Multi-Gigabit (EDMG) channels 9-11 * add debugfs file to show PCM ring content * report boottime_ns in scan results ath9k * add a separate loader for AR92XX (and older) pci(e) without eeprom brcmfmac * use the same wiphy after PCIe reset to not confuse the user space rtw88 * enable interrupt migration * enable AMSDU in AMPDU aggregation * report RX power for each antenna * enable to DPK and IQK calibration methods to improve performance ==================== Signed-off-by: David S. Miller <[email protected]>
2019-09-14docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]Joe Perches1-8/+8
Standard integer promotion is already done and %hx and %hhx is useless so do not encourage the use of %hh[xudi] or %h[xudi]. As Linus said in: https://lore.kernel.org/lkml/CAHk-=wgoxnmsj8GEVFJSvTwdnWm8wVJthefNk2n6+4TC=20e0Q@mail.gmail.com/ It's a pointless warning, making for more complex code, and making people remember esoteric printf format details that have no reason for existing. The "h" and "hh" things should never be used. The only reason for them being used if if you have an "int", but you want to print it out as a "char" (and honestly, that is a really bad reason, you'd be better off just using a proper cast to make the code more obvious). So if what you have a "char" (or unsigned char) you should always just print it out as an "int", knowing that the compiler already did the proper type conversion. Signed-off-by: Joe Perches <[email protected]> Reviewed-by: Louis Taylor <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2019-09-14Documentation: Add "earlycon=sbi" to the admin guidePalmer Dabbelt1-0/+4
This argument is supported on RISC-V systems and widely used, but was not documented here. Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2019-09-14doc:lock: remove reference to clever use of read-write lockFederico Vaga1-12/+0
Remove the clever example about read-write lock because this type of lock is not recommended anymore (according to the very same document). So there is no reason to teach clever things that people should not do. Signed-off-by: Federico Vaga <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2019-09-14devices.txt: improve entry for comedi (char major 98)Ian Abbott1-1/+10
Describe how the comedi minor device numbers are split across comedi devices and comedi subdevices. Replace the current, long dead URL with an official URL for the Comedi project. Signed-off-by: Ian Abbott <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2019-09-14Merge tag 'kvm-s390-master-5.3-1' of ↵Paolo Bonzini2-1/+13
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fixes for 5.3 - prevent a user triggerable oops in the migration code - do not leak kernel stack content
2019-09-14KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslotSean Christopherson2-2/+101
James Harvey reported a livelock that was introduced by commit d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot""). The livelock occurs because kvm_mmu_zap_all() as it exists today will voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs to add shadow pages. With enough vCPUs, kvm_mmu_zap_all() can get stuck in an infinite loop as it can never zap all pages before observing lock contention or the need to reschedule. The equivalent of kvm_mmu_zap_all() that was in use at the time of the reverted commit (4e103134b8623, "KVM: x86/mmu: Zap only the relevant pages when removing a memslot") employed a fast invalidate mechanism and was not susceptible to the above livelock. There are three ways to fix the livelock: - Reverting the revert (commit d012a06ab1d23) is not a viable option as the revert is needed to fix a regression that occurs when the guest has one or more assigned devices. It's unlikely we'll root cause the device assignment regression soon enough to fix the regression timely. - Remove the conditional reschedule from kvm_mmu_zap_all(). However, although removing the reschedule would be a smaller code change, it's less safe in the sense that the resulting kvm_mmu_zap_all() hasn't been used in the wild for flushing memslots since the fast invalidate mechanism was introduced by commit 6ca18b6950f8d ("KVM: x86: use the fast way to invalidate all pages"), back in 2013. - Reintroduce the fast invalidate mechanism and use it when zapping shadow pages in response to a memslot being deleted/moved, which is what this patch does. For all intents and purposes, this is a revert of commit ea145aacf4ae8 ("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of commit 7390de1e99a70 ("Revert "KVM: x86: use the fast way to invalidate all pages""), i.e. restores the behavior of commit 5304b8d37c2a5 ("KVM: MMU: fast invalidate all pages") and commit 6ca18b6950f8d ("KVM: x86: use the fast way to invalidate all pages") respectively. Fixes: d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"") Reported-by: James Harvey <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-09-14KVM: x86: work around leak of uninitialized stack contentsFuqian Huang1-0/+7
Emulation of VMPTRST can incorrectly inject a page fault when passed an operand that points to an MMIO address. The page fault will use uninitialized kernel stack memory as the CR2 and error code. The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR exit to userspace; however, it is not an easy fix, so for now just ensure that the error code and CR2 are zero. Signed-off-by: Fuqian Huang <[email protected]> Cc: [email protected] [add comment] Signed-off-by: Paolo Bonzini <[email protected]>
2019-09-14KVM: nVMX: handle page fault in vmreadPaolo Bonzini1-1/+3
The implementation of vmread to memory is still incomplete, as it lacks the ability to do vmread to I/O memory just like vmptrst. Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2019-09-13riscv: modify the Image header to improve compatibility with the ARM64 headerPaul Walmsley3-14/+15
Part of the intention during the definition of the RISC-V kernel image header was to lay the groundwork for a future merge with the ARM64 image header. One error during my original review was not noticing that the RISC-V header's "magic" field was at a different size and position than the ARM64's "magic" field. If the existing ARM64 Image header parsing code were to attempt to parse an existing RISC-V kernel image header format, it would see a magic number 0. This is undesirable, since it's our intention to align as closely as possible with the ARM64 header format. Another problem was that the original "res3" field was not being initialized correctly to zero. Address these issues by creating a 32-bit "magic2" field in the RISC-V header which matches the ARM64 "magic" field. RISC-V binaries will store "RSC\x05" in this field. The intention is that the use of the existing 64-bit "magic" field in the RISC-V header will be deprecated over time. Increment the minor version number of the file format to indicate this change, and update the documentation accordingly. Fix the assembler directives in head.S to ensure that reserved fields are properly zero-initialized. Signed-off-by: Paul Walmsley <[email protected]> Reported-by: Palmer Dabbelt <[email protected]> Reviewed-by: Palmer Dabbelt <[email protected]> Cc: Atish Patra <[email protected]> Cc: Karsten Merker <[email protected]> Link: https://lore.kernel.org/linux-riscv/[email protected]/T/#u Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
2019-09-14KVM: X86: Use IPI shorthands in kvm guest when supportWanpeng Li1-12/+0
IPI shorthand is supported now by linux apic/x2apic driver, switch to IPI shorthand for all excluding self and all including self destination shorthand in kvm guest, to avoid splitting the target mask into several PV IPI hypercalls. This patch removes the kvm_send_ipi_all() and kvm_send_ipi_allbutself() since the callers in APIC codes have already taken care of apic_use_ipi_shorthand and fallback to ->send_IPI_mask and ->send_IPI_mask_allbutself if it is false. Cc: Thomas Gleixner <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Nadav Amit <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-09-13Merge branch 'md-next' of ↵Jens Axboe6-8/+68
git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.4/block Pull MD fixes from Song. * 'md-next' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md: raid5: use bio_end_sector in r5_next_bio raid5: remove STRIPE_OPS_REQ_PENDING md: add feature flag MD_FEATURE_RAID0_LAYOUT md/raid0: avoid RAID0 data corruption due to layout confusion. raid5: don't set STRIPE_HANDLE to stripe which is in batch list raid5: don't increment read_errors on EILSEQ return
2019-09-13raid5: use bio_end_sector in r5_next_bioGuoqing Jiang1-3/+1
Actually, we calculate bio's end sector here, so use the common way for the purpose. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Song Liu <[email protected]>
2019-09-13raid5: remove STRIPE_OPS_REQ_PENDINGGuoqing Jiang2-2/+0
This stripe state is not used anymore after commit 51acbcec6c42b24 ("md: remove CONFIG_MULTICORE_RAID456"), so remove the obsoleted state. gjiang@nb01257:~/md$ grep STRIPE_OPS_REQ_PENDING drivers/md/ -r drivers/md/raid5.c: (1 << STRIPE_OPS_REQ_PENDING) | drivers/md/raid5.h: STRIPE_OPS_REQ_PENDING, Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Song Liu <[email protected]>
2019-09-13Merge branch ↵David S. Miller8-31/+106
'devlink-move-reload-fail-indication-to-devlink-core-and-expose-to-user' Jiri Pirko says: ==================== net: devlink: move reload fail indication to devlink core and expose to user First two patches are dependencies of the last one. That moves devlink reload failure indication to the devlink code, so the drivers do not have to track it themselves. Currently it is only mlxsw, but I will send a follow-up patchset that introduces this in netdevsim too. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-09-13net: devlink: move reload fail indication to devlink core and expose to userJiri Pirko4-11/+30
Currently the fact that devlink reload failed is stored in drivers. Move this flag into devlink core. Also, expose it to the user. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-13net: devlink: split reload op into twoJiri Pirko5-16/+56
In order to properly implement failure indication during reload, split the reload op into two ops, one for down phase and one for up phase. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>