aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-05-06Merge branch 'wireguard-fixes'David S. Miller6-33/+72
Jason A. Donenfeld says: ==================== wireguard fixes for 5.7-rc5 With Ubuntu and Debian having backported this into their kernels, we're finally seeing testing from places we hadn't seen prior, which is nice. With that comes more fixes: 1) The CI for PPC64 was running with extremely small stacks for 64-bit, causing spurious crashes in surprising places. 2) There's was an old leftover routing loop restriction, which no longer makes sense given the queueing architecture, and was causing problems for people who really did want nested routing. 3) Not yielding our kthread on CONFIG_PREEMPT_VOLUNTARY systems caused RCU stalls and other issues, reported by Wang Jian, with the fix suggested by Sultan Alsawaf. 4) Clang spewed warnings in a selftest for CONFIG_IPV6=n, reported by Arnd Bergmann. 5) A complicated if statement was simplified to an assignment while also making the likely/unlikely hinting more correct and simple, and increasing readability, suggested by Sultan. Patches (2) and (3) have Fixes: lines and are probably good candidates for stable. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-05-06wireguard: send/receive: use explicit unlikely branch instead of implicit ↵Jason A. Donenfeld2-16/+12
coalescing It's very unlikely that send will become true. It's nearly always false between 0 and 120 seconds of a session, and in most cases becomes true only between 120 and 121 seconds before becoming false again. So, unlikely(send) is clearly the right option here. What happened before was that we had this complex boolean expression with multiple likely and unlikely clauses nested. Since this is evaluated left-to-right anyway, the whole thing got converted to unlikely. So, we can clean this up to better represent what's going on. The generated code is the same. Suggested-by: Sultan Alsawaf <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06wireguard: selftests: initalize ipv6 members to NULL to squelch clang warningJason A. Donenfeld1-2/+2
Without setting these to NULL, clang complains in certain configurations that have CONFIG_IPV6=n: In file included from drivers/net/wireguard/ratelimiter.c:223: drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' is uninitialized when used here [-Werror,-Wuninitialized] ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count); ^~~~ drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the variable 'skb6' to silence this warning struct sk_buff *skb4, *skb6; ^ = NULL drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' is uninitialized when used here [-Werror,-Wuninitialized] ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count); ^~~~ drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the variable 'hdr6' to silence this warning struct ipv6hdr *hdr6; ^ We silence this warning by setting the variables to NULL as the warning suggests. Reported-by: Arnd Bergmann <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06wireguard: send/receive: cond_resched() when processing worker ringbuffersJason A. Donenfeld2-0/+6
Users with pathological hardware reported CPU stalls on CONFIG_ PREEMPT_VOLUNTARY=y, because the ringbuffers would stay full, meaning these workers would never terminate. That turned out not to be okay on systems without forced preemption, which Sultan observed. This commit adds a cond_resched() to the bottom of each loop iteration, so that these workers don't hog the core. Note that we don't need this on the napi poll worker, since that terminates after its budget is expended. Suggested-by: Sultan Alsawaf <[email protected]> Reported-by: Wang Jian <[email protected]> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06wireguard: socket: remove errant restriction on looping to selfJason A. Donenfeld2-15/+51
It's already possible to create two different interfaces and loop packets between them. This has always been possible with tunnels in the kernel, and isn't specific to wireguard. Therefore, the networking stack already needs to deal with that. At the very least, the packet winds up exceeding the MTU and is discarded at that point. So, since this is already something that happens, there's no need to forbid the not very exceptional case of routing a packet back to the same interface; this loop is no different than others, and we shouldn't special case it, but rather rely on generic handling of loops in general. This also makes it easier to do interesting things with wireguard such as onion routing. At the same time, we add a selftest for this, ensuring that both onion routing works and infinite routing loops do not crash the kernel. We also add a test case for wireguard interfaces nesting packets and sending traffic between each other, as well as the loop in this case too. We make sure to send some throughput-heavy traffic for this use case, to stress out any possible recursion issues with the locks around workqueues. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06wireguard: selftests: use normal kernel stack size on ppc64Jason A. Donenfeld1-0/+1
While at some point it might have made sense to be running these tests on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on 64-bit powerpc in a long time, and more interesting things that we test don't really work when we deviate from the default (16k). So, we stop pushing our luck in this commit, and return to the default instead of the minimum. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: ethernet: ti: am65-cpsw-nuss: fix irqs typeGrygorii Strashko1-2/+3
The K3 INTA driver, which is source TX/RX IRQs for CPSW NUSS, defines IRQs triggering type as EDGE by default, but triggering type for CPSW NUSS TX/RX IRQs has to be LEVEL as the EDGE triggering type may cause unnecessary IRQs triggering and NAPI scheduling for empty queues. It was discovered with RT-kernel. Fix it by explicitly specifying CPSW NUSS TX/RX IRQ type as IRQF_TRIGGER_HIGH. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06ionic: Use debugfs_create_bool() to export boolGeert Uytterhoeven1-2/+1
Currently bool ionic_cq.done_color is exported using debugfs_create_u8(), which requires a cast, preventing further compiler checks. Fix this by switching to debugfs_create_bool(), and dropping the cast. Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: dsa: Do not leave DSA master with NULL netdev_opsFlorian Fainelli1-1/+2
When ndo_get_phys_port_name() for the CPU port was added we introduced an early check for when the DSA master network device in dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When we perform the teardown operation in dsa_master_ndo_teardown() we would not be checking that cpu_dp->orig_ndo_ops was successfully allocated and non-NULL initialized. With network device drivers such as virtio_net, this leads to a NPD as soon as the DSA switch hanging off of it gets torn down because we are now assigning the virtio_net device's netdev_ops a NULL pointer. Fixes: da7b9e9b00d4 ("net: dsa: Add ndo_get_phys_port_name() for CPU port") Reported-by: Allen Pais <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Tested-by: Allen Pais <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirredVladimir Oltean1-5/+3
This was caused by a poor merge conflict resolution on my side. The "act = &cls->rule->action.entries[0];" assignment was already present in the code prior to the patch mentioned below. Fixes: e13c2075280e ("net: dsa: refactor matchall mirred action to separate function") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: stricter validation of untrusted gso packetsWillem de Bruijn1-2/+24
Syzkaller again found a path to a kernel crash through bad gso input: a packet with transport header extending beyond skb_headlen(skb). Tighten validation at kernel entry: - Verify that the transport header lies within the linear section. To avoid pulling linux/tcp.h, verify just sizeof tcphdr. tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use. - Match the gso_type against the ip_proto found by the flow dissector. Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Reported-by: syzbot <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06seg6: fix SRH processing to comply with RFC8754Ahmed Abdelsalam1-2/+8
The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined in RFC8754. RFC8754 (section 4.1) defines the SR source node behavior which encapsulates packets into an outer IPv6 header and SRH. The SR source node encodes the full list of Segments that defines the packet path in the SRH. Then, the first segment from list of Segments is copied into the Destination address of the outer IPv6 header and the packet is sent to the first hop in its path towards the destination. If the Segment list has only one segment, the SR source node can omit the SRH as he only segment is added in the destination address. RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not require the entire SID list to be preserved in the SRH. A reduced SRH does not contain the first segment of the related SR Policy (the first segment is the one already in the DA of the IPv6 header), and the Last Entry field is set to n-2, where n is the number of elements in the SR Policy. RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to validate the SRH (S09, S10, S11) which works for both reduced and non-reduced behaviors. This patch updates seg6_validate_srh() to validate the SRH as per RFC8754. Signed-off-by: Ahmed Abdelsalam <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06Merge branch 'FDB-fixes-for-Felix-and-Ocelot-switches'David S. Miller6-6/+16
Vladimir Oltean says: ==================== FDB fixes for Felix and Ocelot switches This series fixes the following problems: - Dynamically learnt addresses never expiring (neither for Ocelot nor for Felix) - Half of the FDB not visible in 'bridge fdb show' (for Felix only) ==================== Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not msVladimir Oltean1-2/+9
One may notice that automatically-learnt entries 'never' expire, even though the bridge configures the address age period at 300 seconds. Actually the value written to hardware corresponds to a time interval 1000 times higher than intended, i.e. 83 hours. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Faineli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: dsa: ocelot: the MAC table on Felix is twice as largeVladimir Oltean6-4/+7
When running 'bridge fdb dump' on Felix, sometimes learnt and static MAC addresses would appear, sometimes they wouldn't. Turns out, the MAC table has 4096 entries on VSC7514 (Ocelot) and 8192 entries on VSC9959 (Felix), so the existing code from the Ocelot common library only dumped half of Felix's MAC table. They are both organized as a 4-way set-associative TCAM, so we just need a single variable indicating the correct number of rows. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of ↵Linus Torvalds3-61/+93
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform fix from Benson Leung: "Fix a resource allocation issue in cros_ec_sensorhub.c" * tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: platform/chrome: cros_ec_sensorhub: Allocate sensorhub resource before claiming sensors
2020-05-07ARM: futex: Address build warningThomas Gleixner1-2/+7
Stephen reported the following build warning on a ARM multi_v7_defconfig build with GCC 9.2.1: kernel/futex.c: In function 'do_futex': kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] 1676 | return oldval == cmparg; | ~~~~~~~^~~~~~~~~ kernel/futex.c:1652:6: note: 'oldval' was declared here 1652 | int oldval, ret; | ^~~~~~ introduced by commit a08971e9488d ("futex: arch_futex_atomic_op_inuser() calling conventions change"). While that change should not make any difference it confuses GCC which fails to work out that oldval is not referenced when the return value is not zero. GCC fails to properly analyze arch_futex_atomic_op_inuser(). It's not the early return, the issue is with the assembly macros. GCC fails to detect that those either set 'ret' to 0 and set oldval or set 'ret' to -EFAULT which makes oldval uninteresting. The store to the callsite supplied oldval pointer is conditional on ret == 0. The straight forward way to solve this is to make the store unconditional. Aside of addressing the build warning this makes sense anyway because it removes the conditional from the fastpath. In the error case the stored value is uninteresting and the extra store does not matter at all. Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-05-06net: dsa: sja1105: the PTP_CLK extts input reacts on both edgesVladimir Oltean1-8/+18
It looks like the sja1105 external timestamping input is not as generic as we thought. When fed a signal with 50% duty cycle, it will timestamp both the rising and the falling edge. When fed a short pulse signal, only the timestamp of the falling edge will be seen in the PTPSYNCTS register, because that of the rising edge had been overwritten. So the moral is: don't feed it short pulse inputs. Luckily this is not a complete deal breaker, as we can still work with 1 Hz square waves. But the problem is that the extts polling period was not dimensioned enough for this input signal. If we leave the period at half a second, we risk losing timestamps due to jitter in the measuring process. So we need to increase it to 4 times per second. Also, the very least we can do to inform the user is to deny any other flags combination than with PTP_RISING_EDGE and PTP_FALLING_EDGE both set. Fixes: 747e5eb31d59 ("net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT") Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06selftests: net: tcp_mmap: fix SO_RCVLOWAT settingEric Dumazet1-1/+3
Since chunk_size is no longer an integer, we can not use it directly as an argument of setsockopt(). This patch should fix tcp_mmap for Big Endian kernels. Fixes: 597b01edafac ("selftests: net: avoid ptl lock contention in tcp_mmap") Signed-off-by: Eric Dumazet <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Arjun Roy <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: hsr: fix incorrect type usage for protocol variableMurali Karicheri1-1/+1
Fix following sparse checker warning:- net/hsr/hsr_slave.c:38:18: warning: incorrect type in assignment (different base types) net/hsr/hsr_slave.c:38:18: expected unsigned short [unsigned] [usertype] protocol net/hsr/hsr_slave.c:38:18: got restricted __be16 [usertype] h_proto net/hsr/hsr_slave.c:39:25: warning: restricted __be16 degrades to integer net/hsr/hsr_slave.c:39:57: warning: restricted __be16 degrades to integer Signed-off-by: Murali Karicheri <[email protected]> Acked-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: macsec: fix rtnl locking issueAntoine Tenart1-1/+2
netdev_update_features() must be called with the rtnl lock taken. Not doing so triggers a warning, as ASSERT_RTNL() is used in __netdev_update_features(), the first function called by netdev_update_features(). Fix this. Fixes: c850240b6c41 ("net: macsec: report real_dev features when HW offloading is enabled") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()Dan Carpenter1-0/+3
The "info->fs.location" is a u32 that comes from the user via the ethtool_set_rxnfc() function. We need to check for invalid values to prevent a buffer overflow. I copy and pasted this check from the mvpp2_ethtool_cls_rule_ins() function. Fixes: 90b509b39ac9 ("net: mvpp2: cls: Add Classification offload support") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06net: mvpp2: prevent buffer overflow in mvpp22_rss_ctx()Dan Carpenter1-0/+2
The "rss_context" variable comes from the user via ethtool_get_rxfh(). It can be any u32 value except zero. Eventually it gets passed to mvpp22_rss_ctx() and if it is over MVPP22_N_RSS_TABLES (8) then it results in an array overflow. Fixes: 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06selftests: net: tcp_mmap: clear whole tcp_zerocopy_receive structEric Dumazet1-1/+2
We added fields in tcp_zerocopy_receive structure, so make sure to clear all fields to not pass garbage to the kernel. We were lucky because recent additions added 'out' parameters, still we need to clean our reference implementation, before folks copy/paste it. Fixes: c8856c051454 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.") Fixes: 33946518d493 ("tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.") Signed-off-by: Eric Dumazet <[email protected]> Cc: Arjun Roy <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-06gfs2: fix withdraw sequence deadlockBob Peterson1-1/+1
After a gfs2 file system withdraw, any attempt to read metadata is automatically rejected by function gfs2_meta_read() except for reads of the journal inode. This turns out to be a problem because function signal_our_withdraw() repeatedly calls check_journal_clean() which reads the metadata (both its dinode and indirect blocks) to see if the entire journal is mapped. The dinode read works, but reading the indirect blocks returns -EIO which gets sent back up and causes a consistency error. This results in withdraw-from-withdraw, which becomes a deadlock. This patch changes the test in gfs2_meta_read() to allow all metadata reads for the journal. Instead of checking the journal block, it now checks for the journal inode glock which is the same for all blocks in the journal. This allows check_journal_clean() to properly check the journal without trying to withdraw recursively. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2020-05-06Merge branch 'linus' of ↵Linus Torvalds11-34/+69
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a potential scheduling latency problem for the algorithms used by WireGuard" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: arch/nhpoly1305 - process in explicit 4k chunks crypto: arch/lib - limit simd usage to 4k chunks
2020-05-06Merge tag 'usb-serial-5.7-rc5' of ↵Greg Kroah-Hartman2-2/+3
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial fixes for 5.7-rc5 Here's a fix adding a missing input sanity check and a new modem device id. Both have been in linux-next with no reported issues. * tag 'usb-serial-5.7-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: qcserial: Add DW5816e support USB: serial: garmin_gps: add sanity checking for data length
2020-05-06tracing/kprobes: Reject new event if loc is NULLMasami Hiramatsu1-0/+6
Reject the new event which has NULL location for kprobes. For kprobes, user must specify at least the location. Link: http://lkml.kernel.org/r/158779376597.6082.1411212055469099461.stgit@devnote2 Cc: Tom Zanussi <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Fixes: 2a588dd1d5d6 ("tracing: Add kprobe event command generation functions") Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-05-06tracing/boottime: Fix kprobe event API usageMasami Hiramatsu1-12/+8
Fix boottime kprobe events to use API correctly for multiple events. For example, when we set a multiprobe kprobe events in bootconfig like below, ftrace.event.kprobes.myevent { probes = "vfs_read $arg1 $arg2", "vfs_write $arg1 $arg2" } This cause an error; trace_boot: Failed to add probe: p:kprobes/myevent (null) vfs_read $arg1 $arg2 vfs_write $arg1 $arg2 This shows the 1st argument becomes NULL and multiprobes are merged to 1 probe. Link: http://lkml.kernel.org/r/158779375766.6082.201939936008972838.stgit@devnote2 Cc: Ingo Molnar <[email protected]> Cc: [email protected] Fixes: 29a154810546 ("tracing: Change trace_boot to use kprobe_event interface") Reviewed-by: Tom Zanussi <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-05-06tracing/kprobes: Fix a double initialization typoMasami Hiramatsu1-1/+1
Fix a typo that resulted in an unnecessary double initialization to addr. Link: http://lkml.kernel.org/r/158779374968.6082.2337484008464939919.stgit@devnote2 Cc: Tom Zanussi <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Fixes: c7411a1a126f ("tracing/kprobe: Check whether the non-suffixed symbol is notrace") Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-05-06bootconfig: Fix to remove bootconfig data from initrd while bootMasami Hiramatsu1-17/+52
If there is a bootconfig data in the tail of initrd/initramfs, initrd image sanity check caused an error while decompression stage as follows. [ 0.883882] Unpacking initramfs... [ 2.696429] Initramfs unpacking failed: invalid magic at start of compressed archive This error will be ignored if CONFIG_BLK_DEV_RAM=n, but CONFIG_BLK_DEV_RAM=y the kernel failed to mount rootfs and causes a panic. To fix this issue, shrink down the initrd_end for removing tailing bootconfig data while boot the kernel. Link: http://lkml.kernel.org/r/158788401014.24243.17424755854115077915.stgit@devnote2 Cc: Borislav Petkov <[email protected]> Cc: Kees Cook <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Fixes: 7684b8582c24 ("bootconfig: Load boot config from the tail of initrd") Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-05-06Merge tag 'kvm-s390-master-5.7-3' of ↵Paolo Bonzini1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix for running nested uner z/VM There are circumstances when running nested under z/VM that would trigger a WARN_ON_ONCE. Remove the WARN_ON_ONCE. Long term we certainly want to make this code more robust and flexible, but just returning instead of WARNING makes guest bootable again.
2020-05-06KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu3-0/+3
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <[email protected]> Message-Id: <[email protected]> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-06KVM: selftests: Fix build for evmcs.hPeter Xu2-2/+5
I got this error when building kvm selftests: /usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: multiple definition of `current_evmcs'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: first defined here /usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: multiple definition of `current_vp_assist'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: first defined here I think it's because evmcs.h is included both in a test file and a lib file so the structs have multiple declarations when linking. After all it's not a good habit to declare structs in the header files. Cc: Vitaly Kuznetsov <[email protected]> Signed-off-by: Peter Xu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-06kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bitsPaolo Bonzini1-15/+5
Using CPUID data can be useful for the processor compatibility check, but that's it. Using it to compute guest-reserved bits can have both false positives (such as LA57 and UMIP which we are already handling) and false negatives: in particular, with this patch we don't allow anymore a KVM guest to set CR4.PKE when CR4.PKE is clear on the host. Fixes: b9dd21e104bc ("KVM: x86: simplify handling of PKRU") Reported-by: Jim Mattson <[email protected]> Tested-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-06KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB pathSean Christopherson1-0/+3
Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so that KVM doesn't interpret clobbered RFLAGS as a VM-Fail. Filling the RSB has always clobbered RFLAGS, its current incarnation just happens clear CF and ZF in the processs. Relying on the macro to clear CF and ZF is extremely fragile, e.g. commit 089dd8e53126e ("x86/speculation: Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such that the ZF flag is always set. Reported-by: Qian Cai <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Fixes: f2fde6a5bcfcf ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit") Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-06docs/virt/kvm: Document configuring and running nested guestsKashyap Chamarthy2-0/+278
This is a rewrite of this[1] Wiki page with further enhancements. The doc also includes a section on debugging problems in nested environments, among other improvements. [1] https://www.linux-kvm.org/page/Nested_Guests Signed-off-by: Kashyap Chamarthy <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-05RISC-V: Remove unused code from STRICT_KERNEL_RWXAtish Patra2-24/+0
This patch removes the unused functions set_kernel_text_rw/ro. Currently, it is not being invoked from anywhere and no other architecture (except arm) uses this code. Even in ARM, these functions are not invoked from anywhere currently. Fixes: d27c3c90817e ("riscv: add STRICT_KERNEL_RWX support") Signed-off-by: Atish Patra <[email protected]> Reviewed-by: Zong Li <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]>
2020-05-05Merge tag 'platform-drivers-x86-v5.7-2' of ↵Linus Torvalds7-27/+35
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver fixes from Andy Shevchenko: - Avoid loading asus-nb-wmi module on selected laptop models - Fix S0ix debug support for Jasper Lake PMC - Few fixes which have been reported by Hulk bot and others * tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: thinkpad_acpi: Remove always false 'value < 0' statement platform/x86: intel_pmc_core: avoid unused-function warnings platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA platform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL platform/x86/intel-uncore-freq: make uncore_root_kobj static platform/x86: wmi: Make two functions static platform/x86: surface3_power: Fix a NULL vs IS_ERR() check in probe
2020-05-05neigh: send protocol value in neighbor create notificationRoman Mashak1-3/+3
When a new neighbor entry has been added, event is generated but it does not include protocol, because its value is assigned after the event notification routine has run, so move protocol assignment code earlier. Fixes: df9b0e30d44c ("neighbor: Add protocol attribute") Cc: David Ahern <[email protected]> Signed-off-by: Roman Mashak <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-05drm/amd/display: Prevent dpcd reads with passive donglesAurabindo Pillai1-6/+11
[why] During hotplug, a DP port may be connected to the sink through passive adapter which does not support DPCD reads. Issuing reads without checking for this condition will result in errors [how] Ensure the link is in aux_mode before initiating operation that result in a DPCD read. Signed-off-by: Aurabindo Pillai <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Acked-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-05-05drm/amd/display: fix counter in wait_for_no_pipes_pendingRoman Li1-3/+2
[Why] Wait counter is not being reset for each pipe. [How] Move counter reset into pipe loop scope. Signed-off-by: Roman Li <[email protected]> Reviewed-by: Zhan Liu <[email protected]> Acked-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-05-05drm/amd/display: Update DCN2.1 DV Code RevisionSung Lee1-4/+4
[WHY & HOW] There is a problem in hscale_pixel_rate, the bug causes DCN to be more optimistic (more likely to underflow) in upscale cases during prefetch. This commit ports the fix from DV code to address these issues. Signed-off-by: Sung Lee <[email protected]> Reviewed-by: Dmytro Laktyushkin <[email protected]> Acked-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-05-05io_uring: handle -EFAULT properly in io_uring_setup()Xiaoguang Wang1-13/+11
If copy_to_user() in io_uring_setup() failed, we'll leak many kernel resources, which will be recycled until process terminates. This bug can be reproduced by using mprotect to set params to PROT_READ. To fix this issue, refactor io_uring_create() a bit to add a new 'struct io_uring_params __user *params' parameter and move the copy_to_user() in io_uring_setup() to io_uring_setup(), if copy_to_user() failed, we can free kernel resource properly. Suggested-by: Jens Axboe <[email protected]> Signed-off-by: Xiaoguang Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-05-05net: broadcom: fix a mistake about ioremap resourceDejin Zheng1-9/+15
Commit d7a5502b0bb8b ("net: broadcom: convert to devm_platform_ioremap_resource_byname()") will broke this driver. idm_base and nicpm_base were optional, after this change, they are mandatory. it will probe fails with -22 when the dtb doesn't have them defined. so revert part of this commit and make idm_base and nicpm_base as optional. Fixes: d7a5502b0bb8bde ("net: broadcom: convert to devm_platform_ioremap_resource_byname()") Reported-by: Jonathan Richardson <[email protected]> Cc: Scott Branden <[email protected]> Cc: Ray Jui <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: Dejin Zheng <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-05drm: Fix HDCP failures when SRM fw is missingSean Paul1-1/+7
The SRM cleanup in 79643fddd6eb2 ("drm/hdcp: optimizing the srm handling") inadvertently altered the behavior of HDCP auth when the SRM firmware is missing. Before that patch, missing SRM was interpreted as the device having no revoked keys. With that patch, if the SRM fw file is missing we reject _all_ keys. This patch fixes that regression by returning success if the file cannot be found. It also checks the return value from request_srm such that we won't end up trying to parse the ksv list if there is an error fetching it. Fixes: 79643fddd6eb ("drm/hdcp: optimizing the srm handling") Cc: [email protected] Cc: Ramalingam C <[email protected]> Cc: Sean Paul <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Reviewed-by: Ramalingam C <[email protected]> Signed-off-by: Sean Paul <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Changes in v2: -Noticed a couple other things to clean up Reviewed-by: Ramalingam C <[email protected]>
2020-05-05iocost: protect iocg->abs_vdebt with iocg->waitq.lockTejun Heo2-47/+77
abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup is and controls the activation of use_delay mechanism. Once a cgroup goes over budget from forced IOs, it has to pay it back with its future budget. The progress guarantee on debt paying comes from the iocg being active - active iocgs are processed by the periodic timer, which ensures that as time passes the debts dissipate and the iocg returns to normal operation. However, both iocg activation and vdebt handling are asynchronous and a sequence like the following may happen. 1. The iocg is in the process of being deactivated by the periodic timer. 2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns without anything because it still sees that the iocg is already active. 3. The iocg is deactivated. 4. The bio from #2 is over budget but needs to be forced. It increases abs_vdebt and goes over the threshold and enables use_delay. 5. IO control is enabled for the iocg's subtree and now IOs are attributed to the descendant cgroups and the iocg itself no longer issues IOs. This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no further IOs which can activate it. This can end up unduly punishing all the descendants cgroups. The usual throttling path has the same issue - the iocg must be active while throttled to ensure that future event will wake it up - and solves the problem by synchronizing the throttling path with a spinlock. abs_vdebt handling is another form of overage handling and shares a lot of characteristics including the fact that it isn't in the hottest path. This patch fixes the above and other possible races by strictly synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Vlad Dmitriev <[email protected]> Cc: [email protected] # v5.4+ Fixes: e1518f63f246 ("blk-iocost: Don't let merges push vtime into the future") Signed-off-by: Jens Axboe <[email protected]>
2020-05-05bus: mhi: core: Fix channel device name conflictJeffrey Hugo1-1/+2
When multiple instances of the same MHI product are present in a system, we can see a splat from mhi_create_devices() - "sysfs: cannot create duplicate filename". This is because the device names assigned to the MHI channel devices are non-unique. They consist of the channel's name, and the channel's pipe id. For identical products, each instance is going to have the same set of channel (both in name and pipe id). To fix this, we prepend the device name of the parent device that the MHI channels belong to. Since different instances of the same product should have unique device names, this makes the MHI channel devices for each product also unique. Additionally, remove the pipe id from the MHI channel device name. This is an internal detail to the MHI product that provides little value, and imposes too much device specific internal details to userspace. It is expected that channel with a specific name (ie "SAHARA") has a specific client, and it does not matter what pipe id that channel is enumerated on. The pipe id is an internal detail between the MHI bus, and the hardware. The client is not expected to make decisions based on the pipe id, and to do so would require the client to have intimate knowledge of the hardware, which is inappropiate as it may violate the layering provided by the MHI bus. The limitation of doing this is that each product may only have one instance of a channel by a unique name. This limitation is appropriate given the usecases of MHI channels. Signed-off-by: Jeffrey Hugo <[email protected]> Reviewed-by: Hemant Kumar <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Signed-off-by: Manivannan Sadhasivam <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-05-05bus: mhi: core: Fix typo in commentJeffrey Hugo1-1/+1
There is a typo - "runtimet" should be "runtime". Fix it. Signed-off-by: Jeffrey Hugo <[email protected]> Reviewed-by: Hemant Kumar <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Signed-off-by: Manivannan Sadhasivam <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-05-05bus: mhi: core: Offload register accesses to the controllerJeffrey Hugo4-14/+10
When reading or writing MHI registers, the core assumes that the physical link is a memory mapped PCI link. This assumption may not hold for all MHI devices. The controller knows what is the physical link (ie PCI, I2C, SPI, etc), and therefore knows the proper methods to access that link. The controller can also handle link specific error scenarios, such as reading -1 when the PCI link went down. Therefore, it is appropriate that the MHI core requests the controller to make register accesses on behalf of the core, which abstracts the core from link specifics, and end up removing an unnecessary assumption. Signed-off-by: Jeffrey Hugo <[email protected]> Reviewed-by: Hemant Kumar <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Signed-off-by: Manivannan Sadhasivam <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>