aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2020-12-13um: irq/sigio: Support suspend/resume handling of workaround IRQsJohannes Berg3-16/+56
If the sigio workaround needed to be applied to a file descriptor, set_irq_wake() wouldn't work for it since it would get polled by the thread instead of causing SIGIO, and thus could never really cause a wakeup, since the thread notification FD wasn't marked as being able to wake up the system. Fix this by marking the thread's notification FD explicitly as a wake source FD, i.e. not suppressing SIGIO for it in suspend. In order to not cause spurious wakeups, we then need to remove all FDs that shouldn't wake up the system from the polling thread. In order to do this, add unlocked versions of ignore_sigio_fd() and add_sigio_fd() (nothing else is happening in suspend, so this is fine), and also modify ignore_sigio_fd() to return -ENOENT if the FD wasn't originally in there. This doesn't matter because nothing else currently checks the return value, but the irq code needs to know which ones to restore the workaround for. All told, this lets us use a timerfd for the RTC clock in the next patch, which doesn't send SIGIO. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: time-travel: Actually apply "free-until" optimisationJohannes Berg1-1/+12
Due a bug - we never checked the time_travel_ext_free_until value - we were always requesting time for every single scheduling. This adds up since we make reading time cost 256ns, and it's a fairly common call. Fix this. While at it, also make reading time only cost something when we're not currently waiting for our scheduling turn - otherwise things get mixed up in a very confusing way. We should never get here, since we're not actually running, but it's possible if you stick printk() or such into the virtio code that must handle the external interrupts. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: chan_xterm: Fix fd leakAnton Ivanov1-0/+5
xterm serial channel was leaking a fd used in setting up the port helper This bug is prehistoric - it predates switching to git. The "fixes" header here is really just to mark all the versions we would like this to apply to which is "Anything from the Cretaceous period onwards". No dinosaurs were harmed in fixing this bug. Fixes: b40997b872cd ("um: drivers/xterm.c: fix a file descriptor leak") Signed-off-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: tty: Fix handling of close in tty linesAnton Ivanov1-2/+2
Fix a logical error in tty reading. We get 0 and errno == EAGAIN on the first attempt to read from a closed file descriptor. Compared to that a true EAGAIN is EAGAIN and -1. If we check errno for EAGAIN first, before checking the return value we miss the fact that the descriptor is closed. This bug is as old as the driver. It was not showing up with the original POLL based IRQ controller, because it was producing multiple events. Switching to EPOLL unmasked it. Fixes: ff6a17989c08 ("Epoll based IRQ controller") Signed-off-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Monitor error events in IRQ controllerAnton Ivanov1-1/+1
Ensure that file closes, connection closes, etc are propagated as interrupts in the interrupt controller. Fixes: ff6a17989c08 ("Epoll based IRQ controller") Signed-off-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: allocate a guard page to helper threadsJohannes Berg4-8/+11
We've been running into stack overflows in helper threads corrupting memory (e.g. because somebody put printf() or os_info() there), so to avoid those causing hard-to-debug issues later on, allocate a guard page for helper thread stacks and mark it read-only. Unfortunately, the crash dump at that point is useless as the stack tracer will try to backtrace the *kernel* thread, not the helper thread, but at least we don't survive to a random issue caused by corruption. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: support some of ARCH_HAS_SET_MEMORYJohannes Berg4-0/+59
For now, only support set_memory_ro()/rw() which we need for the stack protection in the next patch. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: time-travel: avoid multiple identical propagationsJohannes Berg1-0/+6
If there is some kind of interrupt negotation or such then it may happen that we send an update message multiple times, avoid that in the interest of efficiency by storing the last transmitted value and only sending a new update if it's not the same as the last update. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Fetch registers only for signals which need themAnton Ivanov1-1/+14
UML userspace fetches siginfo and passes it to signal handlers in UML. This is needed only for some of the signals, because key handlers like SIGIO make no use of this variable. Signed-off-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Support suspend to RAMJohannes Berg6-2/+139
With all the previous bits in place, we can now also support suspend to RAM, in the sense that everything is suspended, not just most, including userspace, processes like in s2idle. Since um_idle_sleep() now waits forever, we can simply call that to "suspend" the system. As before, you can wake it up using SIGUSR1 since we're just in a pause() call that only needs to return. In order to implement selective resume from certain devices, and not have any arbitrary device interrupt wake up, suspend interrupts by removing SIGIO notification (O_ASYNC) from all the FDs that are not supposed to wake up the system. However, swap out the handler so we don't actually handle the SIGIO as an interrupt. Since we're in pause(), the mere act of receiving SIGIO wakes us up, and then after things have been restored enough, re-set O_ASYNC for all previously suspended FDs, reinstall the proper SIGIO handler, and send SIGIO to self to process anything that might now be pending. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Allow PM with suspend-to-idleJohannes Berg5-1/+46
In order to be able to experiment with suspend in UML, add the minimal work to be able to suspend (s2idle) an instance of UML, and be able to wake it back up from that state with the USR1 signal sent to the main UML process. Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: time: Fix read_persistent_clock64() in time-travelJohannes Berg1-3/+20
In time-travel mode, we've relied on read_persistent_clock64() being called only once at system startup, but this is both the right thing to call from the pseudo-RTC, and also gets called by the timekeeping core during suspend/resume. Thus, fix this to always fall make use of the time_travel_time in any time-travel mode, initializing time_travel_start at boot to the right value depending on the time-travel mode. Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Simplify os_idle_sleep() and sleep longerJohannes Berg5-25/+21
There really is no reason to pass the amount of time we should sleep, especially since it's just hard-coded to one second. Additionally, one second isn't really all that long, and as we are expecting to be woken up by a signal, we can sleep longer and avoid doing some work every second, so replace the current clock_nanosleep() with just an empty select() that can _only_ be woken up by a signal. We can also remove the deliver_alarm() since we don't need to do that when we got e.g. SIGIO that woke us up, and if we got SIGALRM the signal handler will actually (have) run, so it's just unnecessary extra work. Similarly, in time-travel mode, just program the wakeup event from idle to be S64_MAX, which is basically the most you could ever simulate to. Of course, you should already have an event in the list that's earlier and will cause a wakeup, normally that's the regular timer interrupt, though in suspend it may (later) also be an RTC event. Since actually getting to this point would be a bug and you can't ever get out again, panic() on it in the time control code. Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Simplify IRQ handling codeJohannes Berg1-242/+128
Reduce dynamic allocations (and thereby cache misses) by simply embedding the registration data for IRQs in the irq_entry, we never supported these being really dynamic anyway as only one was ever allowed ("Trying to reregister ..."). Lockless behaviour is preserved by removing the FD from the poll set appropriately, but we use reg->events to indicate whether or not this entry is used, rather than dynamically allocating them. Also port the list of IRQ entries to list_head instead of the current open-coded singly-linked list implementation, just for sanity. Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Remove IRQ_NONE typeJohannes Berg8-45/+26
We don't actually use this in um_request_irq(), so it can never be assigned. It's also not clear what that would be useful for, so just remove it. This results in quite a number of cleanups, all the way to removing the "SIGIO on close" startup check, since the data it assigns (pty_close_sigio) is not used anymore. While at it, also make this an enum so we get a minimum of type checking, and remove the IRQ_NONE hack in virtio since we now no longer have the name twice. Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: irq: Reduce irq_reg allocationJohannes Berg2-7/+7
We don't need an array of 4 entries to capture three and the name 'MAX_IRQ_TYPE' really gets confusing as well. Remove it and add a correct NUM_IRQ_TYPES, and use that correctly. Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: irq: Clean up and rename struct irq_fdJohannes Berg2-26/+22
This really shouldn't be called "irq_fd" since it doesn't carry an fd. Well, it used to, apparently, but that struct member is unused. Rename it to "irq_reg" since it more accurately reflects a registered interrupt, and remove the unused 'next' and 'fd' members from the struct as well. While at it, also move it to the implementation, it's not used anywhere else, and the header file is shared with the userspace components. Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Clean up alarm IRQ chip nameJohannes Berg1-5/+4
We don't use "SIGVTALRM", it's just SIGALRM. Clean up the naming. While at it, fix the comment's grammar. Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: virtio: Use dynamic IRQ allocationJohannes Berg2-11/+16
This separates the devices, which is better for debug and for later suspend/resume and wakeup support, since there we'll have to separate which IRQs can wake up the system and which cannot. Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Support dynamic IRQ allocationJohannes Berg13-32/+65
It's cumbersome and error-prone to keep adding fixed IRQ numbers, and for proper device wakeup support for the virtio/vhost-user support we need to have different IRQs for each device. Even if in theory two IRQs (with and without wake) might be sufficient, it's much easier to reason about it when we have dynamic number assignment. It also makes it easier to add new devices that may dynamically exist or depending on the configuration, etc. Add support for this, up to 64 IRQs (the same limit as epoll FDs we have right now). Since it's not easy to port all the existing places to dynamic allocation (some data is statically initialized) keep the low numbers are reserved for the existing hard-coded IRQ numbers. Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: sigio: Return error from add_sigio_fd()Johannes Berg1-2/+4
If we run out of space, return an error instead of 0. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: ubd: Set device serial attribute from cmdlineChristopher Obbard1-16/+62
Adds the ability to set the UBD device serial number from the commandline, disabling the serial number functionality by default. In some cases it may be useful to set a serial to the UBD device, such that downstream users (i.e. udev) can use this information to better describe the hardware to the user from the UML cmdline. In our case we use this parameter to create some entries under /dev/disk/by-ubd-id/ for each of the UBD devices passed through the UML cmdline. Signed-off-by: Christopher Obbard <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Increase stack frame size threshold for signal.cAndy Shevchenko1-0/+2
The signal.c can't use heap for bit data located on stack. However, by default a compiler warns us about overstepping stack frame size threshold: arch/um/os-Linux/signal.c: In function ‘sig_handler_common’: arch/um/os-Linux/signal.c:51:1: warning: the frame size of 2960 bytes is larger than 2048 bytes [-Wframe-larger-than=] 51 | } | ^ arch/um/os-Linux/signal.c: In function ‘timer_real_alarm_handler’: arch/um/os-Linux/signal.c:95:1: warning: the frame size of 2960 bytes is larger than 2048 bytes [-Wframe-larger-than=] 95 | } | ^ Due to above increase stack frame size threshold explicitly for signal.c to avoid unnecessary warning. Signed-off-by: Andy Shevchenko <[email protected]> Tested-by: David Gow <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: line: Don't free winch (with IRQ) under spinlockJohannes Berg1-4/+8
Lockdep correctly complains that one shouldn't call um_free_irq() with free_irq() inside under a spinlock since that will attempt to acquire a mutex. Rearrange the code to keep the list manipulations under the lock while moving the actual freeing outside of it, to avoid this. In particular, this removes the lockdep complaint at shutdown that I was seeing with lockdep enabled. Signed-off-by: Johannes Berg <[email protected]> Acked-By: [email protected] Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: ubd: Submit all data segments atomicallyGabriel Krisman Bertazi1-76/+115
Internally, UBD treats each physical IO segment as a separate command to be submitted in the execution pipe. If the pipe returns a transient error after a few segments have already been written, UBD will tell the block layer to requeue the request, but there is no way to reclaim the segments already submitted. When a new attempt to dispatch the request is done, those segments already submitted will get duplicated, causing the WARN_ON below in the best case, and potentially data corruption. In my system, running a UML instance with 2GB of RAM and a 50M UBD disk, I can reproduce the WARN_ON by simply running mkfs.fvat against the disk on a freshly booted system. There are a few ways to around this, like reducing the pressure on the pipe by reducing the queue depth, which almost eliminates the occurrence of the problem, increasing the pipe buffer size on the host system, or by limiting the request to one physical segment, which causes the block layer to submit way more requests to resolve a single operation. Instead, this patch modifies the format of a UBD command, such that all segments are sent through a single element in the communication pipe, turning the command submission atomic from the point of view of the block layer. The new format has a variable size, depending on the number of elements, and looks like this: +------------+-----------+-----------+------------ | cmd_header | segment 0 | segment 1 | segment ... +------------+-----------+-----------+------------ With this format, we push a pointer to cmd_header in the submission pipe. This has the advantage of reducing the memory footprint of executing a single request, since it allow us to merge some fields in the header. It is possible to reduce even further each segment memory footprint, by merging bitmap_words and cow_offset, for instance, but this is not the focus of this patch and is left as future work. One issue with the patch is that for a big number of segments, we now perform one big memory allocation instead of multiple small ones, but I wasn't able to trigger any real issues or -ENOMEM because of this change, that wouldn't be reproduced otherwise. This was tested using fio with the verify-crc32 option, and by running an ext4 filesystem over this UBD device. The original WARN_ON was: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141 refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346 Stack: 6084eed0 6063dc77 00000009 6084ef60 00000000 604b8d9f 6084eee0 6063dcbc 6084ef40 6006ab8d e013d780 1c00000000 Call Trace: [<600a0c1c>] ? printk+0x0/0x94 [<6004a888>] show_stack+0x13b/0x155 [<6063dc77>] ? dump_stack_print_info+0xdf/0xe8 [<604b8d9f>] ? refcount_warn_saturate+0x13f/0x141 [<6063dcbc>] dump_stack+0x2a/0x2c [<6006ab8d>] __warn+0x107/0x134 [<6008da6c>] ? wake_up_process+0x17/0x19 [<60487628>] ? blk_queue_max_discard_sectors+0x0/0xd [<6006b05f>] warn_slowpath_fmt+0xd1/0xdf [<6006af8e>] ? warn_slowpath_fmt+0x0/0xdf [<600acc14>] ? raw_read_seqcount_begin.constprop.0+0x0/0x15 [<600619ae>] ? os_nsecs+0x1d/0x2b [<604b8d9f>] refcount_warn_saturate+0x13f/0x141 [<6048bc8f>] refcount_sub_and_test.constprop.0+0x2f/0x37 [<6048c8de>] blk_mq_free_request+0xf1/0x10d [<6048ca06>] __blk_mq_end_request+0x10c/0x114 [<6005ac0f>] ubd_intr+0xb5/0x169 [<600a1a37>] __handle_irq_event_percpu+0x6b/0x17e [<600a1b70>] handle_irq_event_percpu+0x26/0x69 [<600a1bd9>] handle_irq_event+0x26/0x34 [<600a1bb3>] ? handle_irq_event+0x0/0x34 [<600a5186>] ? unmask_irq+0x0/0x37 [<600a57e6>] handle_edge_irq+0xbc/0xd6 [<600a131a>] generic_handle_irq+0x21/0x29 [<60048f6e>] do_IRQ+0x39/0x54 [...] ---[ end trace c6e7444e55386c0f ]--- Cc: Christopher Obbard <[email protected]> Reported-by: Martyn Welch <[email protected]> Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Tested-by: Christopher Obbard <[email protected]> Acked-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Fix time-travel modeJohannes Berg1-5/+0
Since the time-travel rework, basic time-travel mode hasn't worked properly, but there's no longer a need for this WARN_ON() so just remove it and thereby fix things. Cc: [email protected] Fixes: 4b786e24ca80 ("um: time-travel: Rewrite as an event scheduler") Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Remove use of asprinf in umid.cAnton Ivanov1-12/+5
asprintf is not compatible with the existing uml memory allocation mechanism. Its use on the "user" side of UML results in a corrupt slab state. Fixes: 0d4e5ac7e780 ("um: remove uses of variable length arrays") Cc: [email protected] Signed-off-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Add support for TIF_NOTIFY_SIGNALJens Axboe2-1/+4
Wire up TIF_NOTIFY_SIGNAL handling for um. Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: random: Register random as hwrng-core deviceChristopher Obbard1-76/+25
The UML random driver creates a dummy device under the guest, /dev/hw_random. When this file is read from the guest, the driver reads from the host machine's /dev/random, in-turn reading from the host kernel's entropy pool. This entropy pool could have been filled by a hardware random number generator or just the host kernel's internal software entropy generator. Currently the driver does not fill the guests kernel entropy pool, this requires a userspace tool running inside the guest (like rng-tools) to read from the dummy device provided by this driver, which then would fill the guest's internal entropy pool. This all seems quite pointless when we are already reading from an entropy pool, so this patch aims to register the device as a hwrng device using the hwrng-core framework. This not only improves and cleans up the driver, but also fills the guest's entropy pool without having to resort to using extra userspace tools in the guest. This is typically a nuisance when booting a guest: the random pool takes a long time (~200s) to build up enough entropy since the dummy hwrng is not used to fill the guest's pool. This port was originally attempted by Alexander Neville "dark" (in CC, discussion in Link), but the conversation there stalled since the handling of -EAGAIN errors were no removed and longer handled by the driver. This patch attempts to use the existing method of error handling but utilises the new hwrng core. The issue can be noticed when booting a UML guest: [ 2.560000] random: fast init done [ 214.000000] random: crng init done With the patch applied, filling the pool becomes a lot quicker: [ 2.560000] random: fast init done [ 12.000000] random: crng init done Cc: Alexander Neville <[email protected]> Link: https://lore.kernel.org/lkml/20190828204609.02a7ff70@TheDarkness/ Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Sjoerd Simons <[email protected]> Signed-off-by: Christopher Obbard <[email protected]> Acked-by: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13um: Convert tasklets to use new tasklet_setup() APIAllen Pais1-3/+3
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <[email protected]> Signed-off-by: Allen Pais <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13Merge tag 'x86-urgent-2020-12-13' of ↵Linus Torvalds7-24/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 and membarrier fixes: - Correct a few problems in the x86 and the generic membarrier implementation. Small corrections for assumptions about visibility which have turned out not to be true. - Make the PAT bits for memory encryption correct vs 4K and 2M/1G page table entries as they are at a different location. - Fix a concurrency issue in the the local bandwidth readout of resource control leading to incorrect values - Fix the ordering of allocating a vector for an interrupt. The order missed to respect the provided cpumask when the first attempt of allocating node local in the mask fails. It then tries the node instead of trying the full provided mask first. This leads to erroneous error messages and breaking the (user) supplied affinity request. Reorder it. - Make the INT3 padding detection in optprobe work correctly" * tag 'x86-urgent-2020-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Fix optprobe to detect INT3 padding correctly x86/apic/vector: Fix ordering in vector assignment x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled x86/mm/mem_encrypt: Fix definition of PMD_FLAGS_DEC_WP membarrier: Execute SYNC_CORE on the calling thread membarrier: Explicitly sync remote cores when SYNC_CORE is requested membarrier: Add an actual barrier before rseq_preempt() x86/membarrier: Get rid of a dubious optimization
2020-12-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-16/+55
Pull kvm fixes from Paolo Bonzini: "Bugfixes for ARM, x86 and tools" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: tools/kvm_stat: Exempt time-based counters KVM: mmu: Fix SPTE encoding of MMIO generation upper half kvm: x86/mmu: Use cpuid to determine max gfn kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit() selftests: kvm/set_memory_region_test: Fix race in move region test KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort() KVM: arm64: Fix handling of merging tables into a block entry KVM: arm64: Fix memory leak on stage2 update of a valid PTE
2020-12-12Merge tag 'riscv-for-linus-5.10-rc8' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fix from Palmer Dabbelt: "Just one fix. It's nothing critical, just a randconfig that wasn't building. That said, it does seem pretty safe and is technically a regression so I'm sending it along for 5.10: - define get_cycles64() all the time, as it's used by most configurations" * tag 'riscv-for-linus-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Define get_cycles64() regardless of M-mode
2020-12-12sparc: add support for TIF_NOTIFY_SIGNALJens Axboe4-5/+9
Wire up TIF_NOTIFY_SIGNAL handling for sparc. Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2020-12-12riscv: add support for TIF_NOTIFY_SIGNALJens Axboe2-2/+5
Wire up TIF_NOTIFY_SIGNAL handling for riscv. Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2020-12-12nds32: add support for TIF_NOTIFY_SIGNALJens Axboe3-2/+4
Wire up TIF_NOTIFY_SIGNAL handling for nds32. Cc: Nick Hu <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-12ia64: add support for TIF_NOTIFY_SIGNALJens Axboe2-2/+5
Wire up TIF_NOTIFY_SIGNAL handling for ia64. Cc: [email protected] [axboe: added fixes from Mike Rapoport <[email protected]>] Signed-off-by: Jens Axboe <[email protected]>
2020-12-12h8300: add support for TIF_NOTIFY_SIGNALJens Axboe2-2/+4
Wire up TIF_NOTIFY_SIGNAL handling for h8300. Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2020-12-12c6x: add support for TIF_NOTIFY_SIGNALJens Axboe3-1/+4
Wire up TIF_NOTIFY_SIGNAL handling for c6x. Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2020-12-12alpha: add support for TIF_NOTIFY_SIGNALJens Axboe3-2/+4
Wire up TIF_NOTIFY_SIGNAL handling for alpha. Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2020-12-12x86/kprobes: Fix optprobe to detect INT3 padding correctlyMasami Hiramatsu1-2/+20
Commit 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes") changed the padding bytes between functions from NOP to INT3. However, when optprobe decodes a target function it finds INT3 and gives up the jump optimization. Instead of giving up any INT3 detection, check whether the rest of the bytes to the end of the function are INT3. If all of them are INT3, those come from the linker. In that case, continue the optprobe jump optimization. [ bp: Massage commit message. ] Fixes: 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes") Reported-by: Adam Zabrocki <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/160767025681.3880685.16021570341428835411.stgit@devnote2
2020-12-12Merge tag 'kvm-s390-next-5.11-1' of ↵Paolo Bonzini9-42/+45
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features and Test for 5.11 - memcg accouting for s390 specific parts of kvm and gmap - selftest for diag318 - new kvm_stat for when async_pf falls back to sync The selftest even triggers a non-critical bug that is unrelated to diag318, fix will follow later.
2020-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski49-75/+161
xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-11KVM: mmu: Fix SPTE encoding of MMIO generation upper halfMaciej S. Szmigiero2-9/+20
Commit cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling") cleaned up the computation of MMIO generation SPTE masks, however it introduced a bug how the upper part was encoded: SPTE bits 52-61 were supposed to contain bits 10-19 of the current generation number, however a missing shift encoded bits 1-10 there instead (mostly duplicating the lower part of the encoded generation number that then consisted of bits 1-9). In the meantime, the upper part was shrunk by one bit and moved by subsequent commits to become an upper half of the encoded generation number (bits 9-17 of bits 0-17 encoded in a SPTE). In addition to the above, commit 56871d444bc4 ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation") has changed the SPTE bit range assigned to encode the generation number and the total number of bits encoded but did not update them in the comment attached to their defines, nor in the KVM MMU doc. Let's do it here, too, since it is too trivial thing to warrant a separate commit. Fixes: cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling") Signed-off-by: Maciej S. Szmigiero <[email protected]> Message-Id: <156700708db2a5296c5ed7a8b9ac71f1e9765c85.1607129096.git.maciej.szmigiero@oracle.com> Cc: [email protected] [Reorganize macros so that everything is computed from the bit ranges. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2020-12-11KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bitsPaolo Bonzini3-14/+22
Until commit e7c587da1252 ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP"), KVM was testing both Intel and AMD CPUID bits before allowing the guest to write MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD. Testing only Intel bits on VMX processors, or only AMD bits on SVM processors, fails if the guests are created with the "opposite" vendor as the host. While at it, also tweak the host CPU check to use the vendor-agnostic feature bit X86_FEATURE_IBPB, since we only care about the availability of the MSR on the host here and not about specific CPUID bits. Fixes: e7c587da1252 ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP") Cc: [email protected] Reported-by: Denis V. Lunev <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-12-11KVM: x86: Expose AVX512_FP16 for supported CPUIDCathy Zhang1-1/+1
AVX512_FP16 is supported by Intel processors, like Sapphire Rapids. It could gain better performance for it's faster compared to FP32 if the precision or magnitude requirements are met. It's availability is indicated by CPUID.(EAX=7,ECX=0):EDX[bit 23]. Expose it in KVM supported CPUID, then guest could make use of it; no new registers are used, only new instructions. Signed-off-by: Cathy Zhang <[email protected]> Signed-off-by: Kyung Min Park <[email protected]> Acked-by: Dave Hansen <[email protected]> Reviewed-by: Tony Luck <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-12-11x86: Enumerate AVX512 FP16 CPUID feature flagKyung Min Park2-0/+2
Enumerate AVX512 Half-precision floating point (FP16) CPUID feature flag. Compared with using FP32, using FP16 cut the number of bits required for storage in half, reducing the exponent from 8 bits to 5, and the mantissa from 23 bits to 10. Using FP16 also enables developers to train and run inference on deep learning models fast when all precision or magnitude (FP32) is not needed. A processor supports AVX512 FP16 if CPUID.(EAX=7,ECX=0):EDX[bit 23] is present. The AVX512 FP16 requires AVX512BW feature be implemented since the instructions for manipulating 32bit masks are associated with AVX512BW. The only in-kernel usage of this is kvm passthrough. The CPU feature flag is shown as "avx512_fp16" in /proc/cpuinfo. Signed-off-by: Kyung Min Park <[email protected]> Acked-by: Dave Hansen <[email protected]> Reviewed-by: Tony Luck <[email protected]> Message-Id: <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-12-11KVM/VMX: Use TEST %REG,%REG instead of CMP $0,%REG in vmenter.SUros Bizjak1-1/+1
Saves one byte in __vmx_vcpu_run for the same functionality. Cc: Paolo Bonzini <[email protected]> Cc: Sean Christopherson <[email protected]> Signed-off-by: Uros Bizjak <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-12-11x86,swiotlb: Adjust SWIOTLB bounce buffer size for SEV guestsAshish Kalra3-0/+39
For SEV, all DMA to and from guest has to use shared (un-encrypted) pages. SEV uses SWIOTLB to make this happen without requiring changes to device drivers. However, depending on the workload being run, the default 64MB of it might not be enough and it may run out of buffers to use for DMA, resulting in I/O errors and/or performance degradation for high I/O workloads. Adjust the default size of SWIOTLB for SEV guests using a percentage of the total memory available to guest for the SWIOTLB buffers. Adds a new sev_setup_arch() function which is invoked from setup_arch() and it calls into a new swiotlb generic code function swiotlb_adjust_size() to do the SWIOTLB buffer adjustment. v5 fixed build errors and warnings as Reported-by: kbuild test robot <[email protected]> Signed-off-by: Ashish Kalra <[email protected]> Co-developed-by: Borislav Petkov <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2020-12-11Add and use a generic version of devmem_is_allowed()Palmer Dabbelt7-48/+3
As part of adding STRICT_DEVMEM support to the RISC-V port, Zong provided an implementation of devmem_is_allowed() that's exactly the same as the version in a handful of other ports. Rather than duplicate code, I've put a generic version of this in lib/ and used it for the RISC-V port. * palmer/generic-devmem: arm64: Use the generic devmem_is_allowed() arm: Use the generic devmem_is_allowed() RISC-V: Use the new generic devmem_is_allowed() lib: Add a generic version of devmem_is_allowed()