blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2021-08-18	Merge tag 'sound-5.14-rc7' of ↵	Linus Torvalds	7	-9/+35
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Only a few regression fixes and trivial device quirks" * tag 'sound-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/via: Apply runtime PM workaround for ASUS B23E ALSA: hda: Fix hang during shutdown due to link reset ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop ALSA: oxfw: fix functioal regression for silence in Apogee Duet FireWire ALSA: hda - fix the 'Capture Switch' value change notifications
2021-08-18	Merge tag 'cfi-v5.14-rc7' of ↵	Linus Torvalds	1	-4/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull clang cfi fix from Kees Cook: - Use rcu_read_{un}lock_sched_notrace to avoid recursion (Elliot Berman) * tag 'cfi-v5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: cfi: Use rcu_read_{un}lock_sched_notrace
2021-08-18	pipe: avoid unnecessary EPOLLET wakeups under normal loads	Linus Torvalds	2	-6/+11
	I had forgotten just how sensitive hackbench is to extra pipe wakeups, and commit 3a34b13a88ca ("pipe: make pipe writes always wake up readers") ended up causing a quite noticeable regression on larger machines. Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's not clear that this matters in real life all that much, but as Mel points out, it's used often enough when comparing kernels and so the performance regression shows up like a sore thumb. It's easy enough to fix at least for the common cases where pipes are used purely for data transfer, and you never have any exciting poll usage at all. So set a special 'poll_usage' flag when there is polling activity, and make the ugly "EPOLLET has crazy legacy expectations" semantics explicit to only that case. I would love to limit it to just the broken EPOLLET case, but the pipe code can't see the difference between epoll and regular select/poll, so any non-read/write waiting will trigger the extra wakeup behavior. That is sufficient for at least the hackbench case. Apart from making the odd extra wakeup cases more explicitly about EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe write, not at the first write chunk. That is actually much saner semantics (as much as you can call any of the legacy edge-triggered expectations for EPOLLET "sane") since it means that you know the wakeup will happen once the write is done, rather than possibly in the middle of one. [ For stable people: I'm putting a "Fixes" tag on this, but I leave it up to you to decide whether you actually want to backport it or not. It likely has no impact outside of synthetic benchmarks - Linus ] Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/ Fixes: 3a34b13a88ca ("pipe: make pipe writes always wake up readers") Reported-by: kernel test robot <[email protected]> Tested-by: Sandeep Patil <[email protected]> Tested-by: Mel Gorman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-17	Merge tag 'trace-v5.14-rc6' of ↵	Linus Torvalds	2	-35/+15
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Limit the shooting in the foot of tp_printk The "tp_printk" option redirects the trace event output to printk at boot up. This is useful when a machine crashes before boot where the trace events can not be retrieved by the in kernel ring buffer. But it can be "dangerous" because trace events can be located in high frequency locations such as interrupts and the scheduler, where a printk can slow it down that it live locks the machine (because by the time the printk finishes, the next event is triggered). Thus tp_printk must be used with care. It was discovered that the filter logic to trace events does not apply to the tp_printk events. This can cause a surprise and live lock when the user expects it to be filtered to limit the amount of events printed to the console when in fact it still prints everything" * tag 'trace-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Apply trace filters on all output channels
2021-08-17	ALSA: hda/via: Apply runtime PM workaround for ASUS B23E	Takashi Iwai	1	-0/+1
	ASUS B23E requires the same workaround like other machines with VT1802, otherwise it looses the codec power on a few nodes and the sound kept silence. Fixes: a0645daf1610 ("ALSA: HDA: Early Forbid of runtime PM") Link: https://lore.kernel.org/r/[email protected] Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2021-08-17	ALSA: hda: Fix hang during shutdown due to link reset	Imre Deak	1	-3/+9
	During system shutdown codecs may be still active, and resetting the controller->codec HW link in this state - based on the bug reporter's tests - leads to the shutdown sequence to get stuck. This happens at least on the reporter's KBL system with an ALC662 codec. For now fix the issue by skipping the link reset step. Fixes: 472e18f63c42 ("ALSA: hda: Release controller display power during shutdown/reboot") References: https://bugzilla.kernel.org/show_bug.cgi?id=214045 References: https://gitlab.freedesktop.org/drm/intel/-/issues/3618#note_1024665 Reported-and-tested-by: [email protected] Cc: [email protected] Signed-off-by: Imre Deak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2021-08-16	Merge branch 'linus' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This contains a fix for a potential boot failure due to a missing Kconfig dependency for people upgrading with the DRBG enabled" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: drbg - select SHA512
2021-08-16	Merge tag 'mtd/fixes-for-5.14-rc7' of ↵	Linus Torvalds	5	-13/+19
	git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD fixes from Miquel Raynal: "MTD core fixes: - Fix lock hierarchy in deregister_mtd_blktrans - Handle flashes without OTP gracefully - Break circular locks in register_mtd_blktrans MTD device fixes: - mchp48l640: - Fix memory leak on cmd - Silence some uninitialized variable warnings - blkdevs: - Initialize rq.limits.discard_granularity CFI fixes: - Fix crash when erasing/writing AMD cards Raw NAND fixes: - Fix of_get_nand_secure_regions(): - Add a missing check - Avoid an unwanted probe failure when a DT property is missing" * tag 'mtd/fixes-for-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: Fix probe failure due to of_get_nand_secure_regions() mtd: fix lock hierarchy in deregister_mtd_blktrans mtd: devices: mchp48l640: Fix memory leak on cmd mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards mtd: core: handle flashes without OTP gracefully mtd: mchp48l640: silence some uninitialized variable warnings mtd: break circular locks in register_mtd_blktrans mtd: rawnand: Add a check in of_get_nand_secure_regions() mtd: mtd_blkdevs: Initialize rq.limits.discard_granularity
2021-08-16	Merge tag 'trace-v5.14-rc5-2' of ↵	Linus Torvalds	4	-3/+69
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Fixes and clean ups to tracing: - Fix header alignment when PREEMPT_RT is enabled for osnoise tracer - Inject "stop" event to see where osnoise stopped the trace - Define DYNAMIC_FTRACE_WITH_ARGS as some code had an #ifdef for it - Fix erroneous message for bootconfig cmdline parameter - Fix crash caused by not found variable in histograms" * tag 'trace-v5.14-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name init: Suppress wrong warning for bootconfig cmdline parameter tracing: define needed config DYNAMIC_FTRACE_WITH_ARGS trace/osnoise: Print a stop tracing message trace/timerlat: Add a header with PREEMPT_RT additional fields trace/osnoise: Add a header with PREEMPT_RT additional fields
2021-08-16	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	3	-7/+17
	Pull KVM fixes from Paolo Bonzini: "Two nested virtualization fixes for AMD processors" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656) KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)
2021-08-16	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds	20	-36/+166
	Pull virtio fixes from Michael Tsirkin: "Fixes in virtio, vhost, and vdpa drivers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Fix queue type selection logic vdpa/mlx5: Avoid destroying MR on empty iotlb tools/virtio: fix build virtio_ring: pull in spinlock header vringh: pull in spinlock header virtio-blk: Add validation for block size in config space vringh: Use wiov->used to check for read/write desc order virtio_vdpa: reject invalid vq indices vdpa: Add documentation for vdpa_alloc_device() macro vDPA/ifcvf: Fix return value check for vdpa_alloc_device() vp_vdpa: Fix return value check for vdpa_alloc_device() vdpa_sim: Fix return value check for vdpa_alloc_device() vhost: Fix the calculation in vhost_overflow() vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update() virtio_pci: Support surprise removal of virtio pci device virtio: Protect vqs list access virtio: Keep vring_del_virtqueue() mirror of VQ create virtio: Improve vq->broken access to avoid any compiler optimization
2021-08-16	tracing: Apply trace filters on all output channels	Pingfan Liu	2	-35/+15
	The event filters are not applied on all of the output, which results in the flood of printk when using tp_printk. Unfolding event_trigger_unlock_commit_regs() into trace_event_buffer_commit(), so the filters can be applied on every output. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 0daa2302968c1 ("tracing: Add tp_printk cmdline to have tracepoints go to printk()") Signed-off-by: Pingfan Liu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-08-16	KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656)	Maxim Levitsky	1	-0/+3
	If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor), then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only possible by making L0 intercept these instructions. Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted, and thus read/write portions of the host physical memory. Fixes: 89c8a4984fc9 ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature") Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-08-16	KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)	Maxim Levitsky	3	-7/+14
	* Invert the mask of bits that we pick from L2 in nested_vmcb02_prepare_control * Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr This fixes a security issue that allowed a malicious L1 to run L2 with AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled AVIC to read/write the host physical memory at some offsets. Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler") Signed-off-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-08-15	Linux 5.14-rc6	Linus Torvalds	1	-1/+1

2021-08-15	Merge tag 'powerpc-5.14-5' of ↵	Linus Torvalds	13	-62/+82
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes coming out of nap on 32-bit Book3s (eg. powerbooks). - Fix critical and debug interrupts on BookE, seen as crashes when using ptrace. - Fix an oops when running an SMP kernel on a UP system. - Update pseries LPAR security flavor after partition migration. - Fix an oops when using kprobes on BookE. - Fix oops on 32-bit pmac by not calling do_IRQ() from timer_interrupt(). - Fix softlockups on CPU hotplug into a CPU-less node with xive (P9). Thanks to Cédric Le Goater, Christophe Leroy, Finn Thain, Geetika Moolchandani, Laurent Dufour, Laurent Vivier, Nicholas Piggin, Pu Lehui, Radu Rendec, Srikar Dronamraju, and Stan Johnson. * tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xive: Do not skip CPU-less nodes when creating the IPIs powerpc/interrupt: Do not call single_step_exception() from other exceptions powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt() powerpc/kprobes: Fix kprobe Oops happens in booke powerpc/pseries: Fix update of LPAR security flavor after LPM powerpc/smp: Fix OOPS in topology_init() powerpc/32: Fix critical and debug interrupts on BOOKE powerpc/32s: Fix napping restore in data storage interrupt (DSI)
2021-08-15	Merge tag 'irq-urgent-2021-08-15' of ↵	Linus Torvalds	11	-61/+113
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for PCI/MSI and x86 interrupt startup: - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked entries stay around e.g. when a crashkernel is booted. - Enforce masking of a MSI-X table entry when updating it, which mandatory according to speification - Ensure that writes to MSI[-X} tables are flushed. - Prevent invalid bits being set in the MSI mask register - Properly serialize modifications to the mask cache and the mask register for multi-MSI. - Cure the violation of the affinity setting rules on X86 during interrupt startup which can cause lost and stale interrupts. Move the initial affinity setting ahead of actualy enabling the interrupt. - Ensure that MSI interrupts are completely torn down before freeing them in the error handling case. - Prevent an array out of bounds access in the irq timings code" * tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: driver core: Add missing kernel doc for device::msi_lock genirq/msi: Ensure deactivation on teardown genirq/timings: Prevent potential array overflow in __irq_timings_store() x86/msi: Force affinity setup before startup x86/ioapic: Force affinity setup before startup genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP PCI/MSI: Protect msi_desc::masked for multi-MSI PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() PCI/MSI: Correct misleading comments PCI/MSI: Do not set invalid bits in MSI mask PCI/MSI: Enforce MSI[X] entry updates to be visible PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Mask all unused MSI-X entries PCI/MSI: Enable and mask MSI-X early
2021-08-15	Merge tag 'locking_urgent_for_v5.14_rc6' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix a CONFIG symbol's spelling * tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Use the correct rtmutex debugging config option
2021-08-15	Merge tag 'efi_urgent_for_v5.14_rc6' of ↵	Linus Torvalds	2	-10/+63
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Borislav Petkov: "A batch of fixes for the arm64 stub image loader: - fix a logic bug that can make the random page allocator fail spuriously - force reallocation of the Image when it overlaps with firmware reserved memory regions - fix an oversight that defeated on optimization introduced earlier where images loaded at a suitable offset are never moved if booting without randomization - complain about images that were not loaded at the right offset by the firmware image loader" * tag 'efi_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: arm64: Double check image alignment at entry efi/libstub: arm64: Warn when efi_random_alloc() fails efi/libstub: arm64: Relax 2M alignment again for relocatable kernels efi/libstub: arm64: Force Image reallocation if BSS was not reserved arm64: efi: kaslr: Fix occasional random alloc (and boot) failure
2021-08-15	Merge tag 'x86_urgent_for_v5.14_rc6' of ↵	Linus Torvalds	2	-14/+14
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "Two fixes: - An objdump checker fix to ignore parenthesized strings in the objdump version - Fix resctrl default monitoring groups reporting when new subgroups get created" * tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix default monitoring groups reporting x86/tools: Fix objdump version check again
2021-08-15	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	10	-62/+118
	Pull KVM fixes from Paolo Bonzini: "ARM: - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM x86: - Fixes for the new MMU, especially a memory leak on hosts with <39 physical address bits - Remove bogus EFER.NX checks on 32-bit non-PAE hosts - WAITPKG fix" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault KVM: x86: remove dead initialization KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation KVM: arm64: Fix race when enabling KVM_ARM_CAP_MTE KVM: arm64: Fix off-by-one in range_is_memory
2021-08-15	ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop	Kristin Paget	1	-0/+1
	The 2021-model XPS 15 appears to use the same 4-speakers-on-ALC289 audio setup as the Precision models, so requires the same quirk to enable woofer output. Tested on my own 9510. Signed-off-by: Kristin Paget <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2021-08-14	Merge tag 'scsi-fixes' of ↵	Linus Torvalds	3	-4/+15
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three minor fixes, all in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpt3sas: Fix incorrectly assigned error return and check scsi: storvsc: Log TEST_UNIT_READY errors as warnings scsi: lpfc: Move initialization of phba->poll_list earlier to avoid crash
2021-08-14	Merge tag 'libnvdimm-fixes-5.14-rc6' of ↵	Linus Torvalds	6	-13/+19
	git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A couple of fixes for long standing bugs, a warning fixup, and some miscellaneous dax cleanups. The bugs were recently found due to new platforms looking to use the ACPI NFIT "virtual" device definition, and new error injection capabilities to trigger error responses to label area requests. Ira's cleanups have been long pending, I neglected to send them earlier, and see no harm in including them now. This has all appeared in -next with no reported issues. Summary: - Fix support for NFIT "virtual" ranges (BIOS-defined memory disks) - Fix recovery from failed label storage areas on NVDIMM devices - Miscellaneous cleanups from Ira's investigation of dax_direct_access paths preparing for stray-write protection" * tag 'libnvdimm-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: tools/testing/nvdimm: Fix missing 'fallthrough' warning libnvdimm/region: Fix label activation vs errors ACPI: NFIT: Fix support for virtual SPA ranges dax: Ensure errno is returned from dax_direct_access fs/dax: Clarify nr_pages to dax_direct_access() fs/fuse: Remove unneeded kaddr parameter
2021-08-14	Merge tag 'usb-5.14-rc6' of ↵	Linus Torvalds	1	-16/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fix from Greg KH: "A single revert of a commit that caused problems in 5.14-rc5 for 5.14-rc6. It has been in linux-next almost all week, and has resolved the issues that were reported on lots of different systems that were not the platform that the change was originally tested on (gotta love SoC cores used in multiple devices from multiple vendors...)" * tag 'usb-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: Revert "usb: dwc3: gadget: Use list_replace_init() before traversing lists"
2021-08-14	Merge tag 'staging-5.14-rc6' of ↵	Linus Torvalds	7	-49/+10
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull IIO driver fixes from Greg KH: "Here are some small IIO driver fixes for reported problems for 5.14-rc6 (no staging driver fixes at the moment). All of them resolve reported issues and have been in linux-next all week with no reported problems. Full details are in the shortlog" * tag 'staging-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: adc: Fix incorrect exit of for-loop iio: humidity: hdc100x: Add margin to the conversion time dt-bindings: iio: st: Remove wrong items length check iio: accel: fxls8962af: fix i2c dependency iio: adis: set GPIO reset pin direction iio: adc: ti-ads7950: Ensure CS is deasserted after reading channels iio: accel: fxls8962af: fix potential use of uninitialized symbol
2021-08-14	Merge branch 'i2c/for-current' of ↵	Linus Torvalds	3	-4/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "One driver bugfix, a documentation bugfix, and an "uninitialized data" leak fix for the core" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Documentation: i2c: add i2c-sysfs into index i2c: dev: zero out array used for i2c reads from userspace i2c: iproc: fix race between client unreg and tasklet
2021-08-14	Merge tag 'for-linus-5.14-rc6-tag' of ↵	Linus Torvalds	1	-7/+15
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "A small cleanup patch and a fix of a rare race in the Xen evtchn driver" * tag 'for-linus-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: Fix race in set_evtchn_to_irq xen/events: remove redundant initialization of variable irq
2021-08-14	Merge tag 'riscv-for-linus-5.14-rc6' of ↵	Linus Torvalds	2	-2/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - avoid passing -mno-relax to compilers that don't support it - a comment fix * tag 'riscv-for-linus-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix comment regarding kernel mapping overlapping with IS_ERR_VALUE riscv: kexec: do not add '-mno-relax' flag if compiler doesn't support it
2021-08-14	Merge tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs	Linus Torvalds	1	-12/+6
	Pull configfs fix from Christoph Hellwig: - fix to revert to the historic write behavior (Bart Van Assche) * tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs: configfs: restore the kernel v5.13 text attribute write behavior
2021-08-13	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	6	-20/+30
	Merge misc fixes from Andrew Morton: "7 patches. Subsystems affected by this patch series: mm (kasan, mm/slub, mm/madvise, and memcg), and lib" * emailed patches from Andrew Morton <[email protected]>: lib: use PFN_PHYS() in devmem_is_allowed() mm/memcg: fix incorrect flushing of lruvec data in obj_stock mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ\|WRITE) mm: slub: fix slub_debug disabling for list of slabs slub: fix kmalloc_pagealloc_invalid_free unit test kasan, slub: reset tag when printing address kasan, kmemleak: reset tags when scanning block
2021-08-13	Merge tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds	6	-33/+80
	Pull cifs fixes from Steve French: "Four CIFS/SMB3 Fixes, all for stable, two relating to deferred close, and one for the 'modefromsid' mount option (when 'idsfromsid' not specified)" * tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Call close synchronously during unlink/rename/lease break. cifs: Handle race conditions during rename cifs: use the correct max-length for dentry_path_raw() cifs: create sd context must be a multiple of 8
2021-08-13	Merge tag 'linux-kselftest-fixes-5.14-rc6' of ↵	Linus Torvalds	1	-20/+21
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fix from Shuah Khan: "A single patch to sgx test to fix Q1 and Q2 calculation" * tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c
2021-08-13	lib: use PFN_PHYS() in devmem_is_allowed()	Liang Wang	1	-1/+1
	The physical address may exceed 32 bits on 32-bit systems with more than 32 bits of physcial address. Use PFN_PHYS() in devmem_is_allowed(), or the physical address may overflow and be truncated. We found this bug when mapping a high addresses through devmem tool, when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem is used to map a high address that is not in the iomem address range, an unexpected error indicating no permission is returned. This bug was initially introduced from v2.6.37, and the function was moved to lib in v5.11. Link: https://lkml.kernel.org/r/[email protected] Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem") Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()") Signed-off-by: Liang Wang <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Russell King <[email protected]> Cc: Liang Wang <[email protected]> Cc: Xiaoming Ni <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: <[email protected]> [2.6.37+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-13	mm/memcg: fix incorrect flushing of lruvec data in obj_stock	Waiman Long	1	-2/+4
	When mod_objcg_state() is called with a pgdat that is different from that in the obj_stock, the old lruvec data cached in obj_stock are flushed out. Unfortunately, they were flushed to the new pgdat and so the data go to the wrong node. This will screw up the slab data reported in /sys/devices/system/node/node*/meminfo. Fix that by flushing the data to the cached pgdat instead. Link: https://lkml.kernel.org/r/[email protected] Fixes: 68ac5b3c8db2 ("mm/memcg: cache vmstat data in percpu memcg_stock_pcp") Signed-off-by: Waiman Long <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Muchun Song <[email protected]> Cc: Alex Shi <[email protected]> Cc: Chris Down <[email protected]> Cc: Yafang Shao <[email protected]> Cc: Wei Yang <[email protected]> Cc: Masayoshi Mizuma <[email protected]> Cc: Xing Zhengjun <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Waiman Long <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-13	mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ\|WRITE)	David Hildenbrand	2	-3/+8
	Doing some extended tests and polishing the man page update for MADV_POPULATE_(READ\|WRITE), I realized that we end up converting also SIGBUS (via -EFAULT) to -EINVAL, making it look like yet another madvise() user error. We want to report only problematic mappings and permission problems that the user could have know as -EINVAL. Let's not convert -EFAULT arising due to SIGBUS (or SIGSEGV) to -EINVAL, but instead indicate -EFAULT to user space. While we could also convert it to -ENOMEM, using -EFAULT looks more helpful when user space might want to troubleshoot what's going wrong: MADV_POPULATE_(READ\|WRITE) is not part of an final Linux release and we can still adjust the behavior. Link: https://lkml.kernel.org/r/[email protected] Fixes: 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ\|WRITE) to prefault page tables") Signed-off-by: David Hildenbrand <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rolf Eike Beer <[email protected]> Cc: Ram Pai <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-13	mm: slub: fix slub_debug disabling for list of slabs	Vlastimil Babka	1	-5/+8
	Vijayanand Jitta reports: Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would want to disable slub_debug for few slabs. Using boot parameter with slub_debug=-,slab_name syntax doesn't work as expected i.e; only disabling debugging for the specified list of slabs. Instead it disables debugging for all slabs, which is wrong. This patch fixes it by delaying the moment when the global slub_debug flags variable is updated. In case a "slub_debug=-,slab_name" has been passed, the global flags remain as initialized (depending on CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Vijayanand Jitta <[email protected]> Reviewed-by: Vijayanand Jitta <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-13	slub: fix kmalloc_pagealloc_invalid_free unit test	Shakeel Butt	1	-4/+4
	The unit test kmalloc_pagealloc_invalid_free makes sure that for the higher order slub allocation which goes to page allocator, the free is called with the correct address i.e. the virtual address of the head page. Commit f227f0faf63b ("slub: fix unreclaimable slab stat for bulk free") unified the free code paths for page allocator based slub allocations but instead of using the address passed by the caller, it extracted the address from the page. Thus making the unit test kmalloc_pagealloc_invalid_free moot. So, fix this by using the address passed by the caller. Should we fix this? I think yes because dev expect kasan to catch these type of programming bugs. Link: https://lkml.kernel.org/r/[email protected] Fixes: f227f0faf63b ("slub: fix unreclaimable slab stat for bulk free") Signed-off-by: Shakeel Butt <[email protected]> Reported-by: Nathan Chancellor <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-13	kasan, slub: reset tag when printing address	Kuan-Ying Lee	1	-2/+2
	The address still includes the tags when it is printed. With hardware tag-based kasan enabled, we will get a false positive KASAN issue when we access metadata. Reset the tag before we access the metadata. Link: https://lkml.kernel.org/r/[email protected] Fixes: aa1ef4d7b3f6 ("kasan, mm: reset tags when accessing metadata") Signed-off-by: Kuan-Ying Lee <[email protected]> Reviewed-by: Marco Elver <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chinwen Chang <[email protected]> Cc: Nicholas Tang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-13	kasan, kmemleak: reset tags when scanning block	Kuan-Ying Lee	1	-3/+3
	Patch series "kasan, slub: reset tag when printing address", v3. With hardware tag-based kasan enabled, we reset the tag when we access metadata to avoid from false alarm. This patch (of 2): Kmemleak needs to scan kernel memory to check memory leak. With hardware tag-based kasan enabled, when it scans on the invalid slab and dereference, the issue will occur as below. Hardware tag-based KASAN doesn't use compiler instrumentation, we can not use kasan_disable_current() to ignore tag check. Based on the below report, there are 11 0xf7 granules, which amounts to 176 bytes, and the object is allocated from the kmalloc-256 cache. So when kmemleak accesses the last 256-176 bytes, it causes faults, as those are marked with KASAN_KMALLOC_REDZONE == KASAN_TAG_INVALID == 0xfe. Thus, we reset tags before accessing metadata to avoid from false positives. BUG: KASAN: out-of-bounds in scan_block+0x58/0x170 Read at addr f7ff0000c0074eb0 by task kmemleak/138 Pointer tag: [f7], memory tag: [fe] CPU: 7 PID: 138 Comm: kmemleak Not tainted 5.14.0-rc2-00001-g8cae8cd89f05-dirty #134 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x1b0 show_stack+0x1c/0x30 dump_stack_lvl+0x68/0x84 print_address_description+0x7c/0x2b4 kasan_report+0x138/0x38c __do_kernel_fault+0x190/0x1c4 do_tag_check_fault+0x78/0x90 do_mem_abort+0x44/0xb4 el1_abort+0x40/0x60 el1h_64_sync_handler+0xb4/0xd0 el1h_64_sync+0x78/0x7c scan_block+0x58/0x170 scan_gray_list+0xdc/0x1a0 kmemleak_scan+0x2ac/0x560 kmemleak_scan_thread+0xb0/0xe0 kthread+0x154/0x160 ret_from_fork+0x10/0x18 Allocated by task 0: kasan_save_stack+0x2c/0x60 __kasan_kmalloc+0xec/0x104 __kmalloc+0x224/0x3c4 __register_sysctl_paths+0x200/0x290 register_sysctl_table+0x2c/0x40 sysctl_init+0x20/0x34 proc_sys_init+0x3c/0x48 proc_root_init+0x80/0x9c start_kernel+0x648/0x6a4 __primary_switched+0xc0/0xc8 Freed by task 0: kasan_save_stack+0x2c/0x60 kasan_set_track+0x2c/0x40 kasan_set_free_info+0x44/0x54 ____kasan_slab_free.constprop.0+0x150/0x1b0 __kasan_slab_free+0x14/0x20 slab_free_freelist_hook+0xa4/0x1fc kfree+0x1e8/0x30c put_fs_context+0x124/0x220 vfs_kern_mount.part.0+0x60/0xd4 kern_mount+0x24/0x4c bdev_cache_init+0x70/0x9c vfs_caches_init+0xdc/0xf4 start_kernel+0x638/0x6a4 __primary_switched+0xc0/0xc8 The buggy address belongs to the object at ffff0000c0074e00 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 176 bytes inside of 256-byte region [ffff0000c0074e00, ffff0000c0074f00) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100074 head:(____ptrval____) order:2 compound_mapcount:0 compound_pincount:0 flags: 0xbfffc0000010200(slab\|head\|node=0\|zone=2\|lastcpupid=0xffff\|kasantag=0x0) raw: 0bfffc0000010200 0000000000000000 dead000000000122 f5ff0000c0002300 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff0000c0074c00: f0 f0 f0 f0 f0 f0 f0 f0 f0 fe fe fe fe fe fe fe ffff0000c0074d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe >ffff0000c0074e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 fe fe fe fe fe ^ ffff0000c0074f00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ffff0000c0075000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint kmemleak: 181 new suspected memory leaks (see /sys/kernel/debug/kmemleak) Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kuan-Ying Lee <[email protected]> Acked-by: Catalin Marinas <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Nicholas Tang <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Chinwen Chang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-13	Merge tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block	Linus Torvalds	8	-316/+33
	Pull block fixes from Jens Axboe: "A few fixes for block that should go into 5.14: - Revert the mq-deadline cgroup addition. More work is needed on this front, let's revert it for now and get it right before having it in a released kernel (Tejun) - blk-iocost lockdep fix (Ming) - nbd double completion fix (Xie) - Fix for non-idling when clearing the shared tag flag (Yu)" * tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block: nbd: Aovid double completion of a request blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED Revert "block/mq-deadline: Add cgroup support" blk-iocost: fix lockdep warning on blkcg->lock
2021-08-13	Merge tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block	Linus Torvalds	3	-40/+75
	Pull io_uring fixes from Jens Axboe: "A bit bigger than the previous weeks, but mostly just a few stable bound fixes. In detail: - Followup fixes to patches from last week for io-wq, turns out they weren't complete (Hao) - Two lockdep reported fixes out of the RT camp (me) - Sync the io_uring-cp example with liburing, as a few bug fixes never made it to the kernel carried version (me) - SQPOLL related TIF_NOTIFY_SIGNAL fix (Nadav) - Use WRITE_ONCE() when writing sq flags (Nadav) - io_rsrc_put_work() deadlock fix (Pavel)" * tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block: tools/io_uring/io_uring-cp: sync with liburing example io_uring: fix ctx-exit io_rsrc_put_work() deadlock io_uring: drop ctx->uring_lock before flushing work item io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker() io-wq: fix bug of creating io-wokers unconditionally io_uring: rsrc ref lock needs to be IRQ safe io_uring: Use WRITE_ONCE() when writing to sq_flags io_uring: clear TIF_NOTIFY_SIGNAL when running task work
2021-08-13	Merge tag 'pinctrl-v5.14-2' of ↵	Linus Torvalds	6	-61/+73
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "An assortment of pin control fixes of varying importance, the most important ones affecting Intel and AMD laptops turned up the recent few days so it's time to push this to your tree. - Fix the Kconfig dependency for Qualcomm SM8350 pin controller - Fix pin biasing fallback behaviour on the Mediatek pin controller - Fix the GPIO numbering scheme for Intel Tiger Lake-H to correspond to the products that are now actually out on the market - Fix a pin control function itemization in the Sunxi driver out-of-bounds access bug - Fix disable clocking for the RISC-V K210 pin controller on the errorpath - Fix a system shutdown bug affecting AMD Ryzen-based laptops, the system would not suspend but just bounce back up" * tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: Fix an issue with shutdown when system set to s0ix pinctrl: k210: Fix k210_fpioa_probe() pinctrl: sunxi: Don't underestimate number of functions pinctrl: tigerlake: Fix GPIO mapping for newer version of software pinctrl: mediatek: Fix fallback behavior for bias_set_combo pinctrl: qcom: fix GPIOLIB dependencies
2021-08-13	nbd: Aovid double completion of a request	Xie Yongji	1	-3/+11
	There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: 96d97e17828f ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <[email protected]> Signed-off-by: Xie Yongji <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-08-13	tools/io_uring/io_uring-cp: sync with liburing example	Jens Axboe	1	-4/+27
	This example is missing a few fixes that are in the liburing version, synchronize with the upstream version. Reported-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-08-13	blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED	Yu Kuai	1	-2/+4
	We run a test that delete and recover devcies frequently(two devices on the same host), and we found that 'active_queues' is super big after a period of time. If device a and device b share a tag set, and a is deleted, then blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there is only one queue that are using the tag set. However, if b is still active, the active_queues of b might never be cleared even if b is deleted. Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-08-13	driver core: Add missing kernel doc for device::msi_lock	Thomas Gleixner	1	-0/+1
	Fixes: 77e89afc25f3 ("PCI/MSI: Protect msi_desc::masked for multi-MSI") Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2021-08-13	Merge branch 'kvm-tdpmmu-fixes' into kvm-master	Paolo Bonzini	4	-15/+63
	Merge topic branch with fixes for both 5.14-rc6 and 5.15.
2021-08-13	KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock	Sean Christopherson	3	-4/+39
	Add yet another spinlock for the TDP MMU and take it when marking indirect shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with nested TDP, KVM may encounter shadow pages for the TDP entries managed by L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and misbehaves when a shadow page is marked unsync via a TDP MMU page fault, which runs with mmu_lock held for read, not write. Lack of a critical section manifests most visibly as an underflow of unsync_children in clear_unsync_child_bit() due to unsync_children being corrupted when multiple CPUs write it without a critical section and without atomic operations. But underflow is the best case scenario. The worst case scenario is that unsync_children prematurely hits '0' and leads to guest memory corruption due to KVM neglecting to properly sync shadow pages. Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock would functionally be ok. Usurping the lock could degrade performance when building upper level page tables on different vCPUs, especially since the unsync flow could hold the lock for a comparatively long time depending on the number of indirect shadow pages and the depth of the paging tree. For simplicity, take the lock for all MMUs, even though KVM could fairly easily know that mmu_lock is held for write. If mmu_lock is held for write, there cannot be contention for the inner spinlock, and marking shadow pages unsync across multiple vCPUs will be slow enough that bouncing the kvm_arch cacheline should be in the noise. Note, even though L2 could theoretically be given access to its own EPT entries, a nested MMU must hold mmu_lock for write and thus cannot race against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to be taken by the TDP MMU, as opposed to being taken by any MMU for a VM that is running with the TDP MMU enabled. Holding mmu_lock for read also prevents the indirect shadow page from being freed. But as above, keep it simple and always take the lock. Alternative #1, the TDP MMU could simply pass "false" for can_unsync and effectively disable unsync behavior for nested TDP. Write protecting leaf shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such VMMs typically don't modify TDP entries, but the same may not hold true for non-standard use cases and/or VMMs that are migrating physical pages (from L1's perspective). Alternative #2, the unsync logic could be made thread safe. In theory, simply converting all relevant kvm_mmu_page fields to atomics and using atomic bitops for the bitmap would suffice. However, (a) an in-depth audit would be required, (b) the code churn would be substantial, and (c) legacy shadow paging would incur additional atomic operations in performance sensitive paths for no benefit (to legacy shadow paging). Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Cc: [email protected] Cc: Ben Gardon <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-08-13	KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs	Sean Christopherson	1	-1/+8
	Set the min_level for the TDP iterator at the root level when zapping all SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf SPTE will recursively zap all its children, thus there is no need for the iterator to attempt to step down. This avoids rereading the top-level SPTEs after they are zapped by causing try_step_down() to short-circuit. In most cases, optimizing try_step_down() will be in the noise as the cost of zapping SPTEs completely dominates the overall time. The optimization is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM is zapping in response to multiple memslot updates when userspace is adding and removing read-only memslots for option ROMs. In that case, the task doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock for read and thus can be a noisy neighbor of sorts. Reviewed-by: Ben Gardon <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>