blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-01-27	net: stmmac: skip only stmmac_ptp_register when resume from suspend	Mohammad Athari Bin Ismail	1	-11/+9
	When resume from suspend, besides skipping PTP registration, it also skipping PTP HW initialization. This could cause PTP clock not able to operate properly when resume from suspend. To fix this, only stmmac_ptp_register() is skipped when resume from suspend. Fixes: fe1319291150 ("stmmac: Don't init ptp again when resume from suspend/hibernation") Cc: <[email protected]> # 5.15.x Signed-off-by: Mohammad Athari Bin Ismail <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-27	net: stmmac: configure PTP clock source prior to PTP initialization	Mohammad Athari Bin Ismail	2	-3/+3
	For Intel platform, it is required to configure PTP clock source prior PTP initialization in MAC. So, need to move ptp_clk_freq_config execution from stmmac_ptp_register() to stmmac_init_ptp(). Fixes: 76da35dc99af ("stmmac: intel: Add PSE and PCH PTP clock source selection") Cc: <[email protected]> # 5.15.x Signed-off-by: Mohammad Athari Bin Ismail <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-27	ALSA: usb-audio: initialize variables that could ignore errors	Tom Rix	1	-0/+4
	clang static analysis reports this representative issue mixer.c:1548:35: warning: Assigned value is garbage or undefined ucontrol->value.integer.value[0] = val; ^ ~~~ The filter_error() macro allows errors to be ignored. If errors can be ignored, initialize variables so garbage will not be used. Fixes: 48cc42973509 ("ALSA: usb-audio: Filter error from connector kctl ops, too") Signed-off-by: Tom Rix <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2022-01-27	Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"	Guillaume Nault	2	-7/+22
	This reverts commit b75326c201242de9495ff98e5d5cff41d7fc0d9d. This commit breaks Linux compatibility with USGv6 tests. The RFC this commit was based on is actually an expired draft: no published RFC currently allows the new behaviour it introduced. Without full IETF endorsement, the flash renumbering scenario this patch was supposed to enable is never going to work, as other IPv6 equipements on the same LAN will keep the 2 hours limit. Fixes: b75326c20124 ("ipv6: Honor all IPv6 PIO Valid Lifetime values") Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-27	Merge tag 'rpmsg-v5.17-fixes' of ↵	Linus Torvalds	1	-18/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux Pull rpmsg fixes from Bjorn Andersson: "The cdev cleanup in the rpmsg_char driver was not performed properly, resulting in unpredicable behaviour when the parent remote processor is stopped with any of the cdevs open by a client. Two patches transitions the implementation to use cdev_device_add() and cdev_del_device(), to capture the relationship between the two objects, and relocates the incorrectly placed cdev_del()" * tag 'rpmsg-v5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev
2022-01-27	Merge tag 'rproc-v5.17-fixes' of ↵	Linus Torvalds	2	-0/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux Pull remoteproc fix from Bjorn Andersson: "The interaction between the various Qualcomm remoteproc drivers and the Qualcomm 'QMP' driver (used to communicate with the power-management hardware) was reworked in v5.17-rc1, but failed to account for the new Kconfig dependency" * tag 'rproc-v5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: remoteproc: qcom: q6v5: fix service routines build errors
2022-01-27	MIPS: Fix build error due to PTR used in more places	Thomas Bogendoerfer	18	-185/+185
	Use PTR_WD instead of PTR to avoid clashes with other parts. Signed-off-by: Thomas Bogendoerfer <[email protected]>
2022-01-27	kbuild: remove include/linux/cyclades.h from header file check	Greg Kroah-Hartman	1	-0/+1
	The file now rightfully throws up a big warning that it should never be included, so remove it from the header_check test. Fixes: f23653fe6447 ("tty: Partially revert the removal of the Cyclades public API") Cc: stable <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: "Maciej W. Rozycki" <[email protected]> Reported-by: Stephen Rothwell <[email protected]> Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-01-27	MAINTAINERS: Remove Harry Morris bouncing address	Miquel Raynal	1	-2/+1
	Harry's e-mail address from Cascoda bounces, I have not found any contributions from him since 2018 so let's drop the Maintainer entry from the CA8210 driver and mark it Orphan. Signed-off-by: Miquel Raynal <[email protected]> Acked-by: Alexander Aring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stefan Schmidt <[email protected]>
2022-01-27	net: ieee802154: Return meaningful error codes from the netlink helpers	Miquel Raynal	1	-4/+4
	Returning -1 does not indicate anything useful. Use a standard and meaningful error code instead. Fixes: a26c5fd7622d ("nl802154: add support for security layer") Signed-off-by: Miquel Raynal <[email protected]> Acked-by: Alexander Aring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stefan Schmidt <[email protected]>
2022-01-27	net: ieee802154: ca8210: Stop leaking skb's	Miquel Raynal	1	-0/+1
	Upon error the ieee802154_xmit_complete() helper is not called. Only ieee802154_wake_queue() is called manually. We then leak the skb structure. Free the skb structure upon error before returning. Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver") Signed-off-by: Miquel Raynal <[email protected]> Acked-by: Alexander Aring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stefan Schmidt <[email protected]>
2022-01-27	net: ieee802154: at86rf230: Stop leaking skb's	Miquel Raynal	1	-2/+11
	Upon error the ieee802154_xmit_complete() helper is not called. Only ieee802154_wake_queue() is called manually. In the Tx case we then leak the skb structure. Free the skb structure upon error before returning when appropriate. As the 'is_tx = 0' cannot be moved in the complete handler because of a possible race between the delay in switching to STATE_RX_AACK_ON and a new interrupt, we introduce an intermediate 'was_tx' boolean just for this purpose. There is no Fixes tag applying here, many changes have been made on this area and the issue kind of always existed. Suggested-by: Alexander Aring <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Acked-by: Alexander Aring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stefan Schmidt <[email protected]>
2022-01-27	net: ieee802154: mcr20a: Fix lifs/sifs periods	Miquel Raynal	1	-2/+2
	These periods are expressed in time units (microseconds) while 40 and 12 are the number of symbol durations these periods will last. We need to multiply them both with phy->symbol_duration in order to get these values in microseconds. Fixes: 8c6ad9cc5157 ("ieee802154: Add NXP MCR20A IEEE 802.15.4 transceiver driver") Signed-off-by: Miquel Raynal <[email protected]> Acked-by: Alexander Aring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stefan Schmidt <[email protected]>
2022-01-27	net: ieee802154: hwsim: Ensure proper channel selection at probe time	Miquel Raynal	1	-0/+1
	Drivers are expected to set the PHY current_channel and current_page according to their default state. The hwsim driver is advertising being configured on channel 13 by default but that is not reflected in its own internal pib structure. In order to ensure that this driver consider the current channel as being 13 internally, we at least need to set the pib->channel field to 13. Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb") Signed-off-by: Miquel Raynal <[email protected]> [[email protected]: fixed assigment from page to channel] Acked-by: Alexander Aring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stefan Schmidt <[email protected]>
2022-01-27	nvme-fabrics: remove the unneeded ret variable in nvmf_dev_show	Changcheng Deng	1	-2/+1
	Remove unneeded variable and directly return 0. Reported-by: Zeal Robot <[email protected]> Signed-off-by: Changcheng Deng <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-01-27	nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs	Wu Zheng	1	-1/+2
	The Intel P4500/P4600 SSDs do not report a subsystem NQN despite claiming compliance to a standards version where reporting one is required. Add the IGNORE_DEV_SUBNQN quirk to not fail the initialization of a second such SSDs in a system. Signed-off-by: Zheng Wu <[email protected]> Signed-off-by: Ye Jinhe <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-01-26	Merge branch 'pid-introduce-helper-task_is_in_root_ns'	Jakub Kicinski	2	-1/+6
	Leo Yan says: ==================== pid: Introduce helper task_is_in_root_ns() This patch series introduces a helper function task_is_in_init_pid_ns() to replace open code. The two patches are extracted from the original series [1] for network subsystem. As a plan, we can firstly land this patch set into kernel 5.18; there have 5 patches are left out from original series [1], as a next step, I will resend them for appropriate linux-next merging. [1] https://lore.kernel.org/lkml/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-01-26	connector/cn_proc: Use task_is_in_init_pid_ns()	Leo Yan	1	-1/+1
	This patch replaces open code with task_is_in_init_pid_ns() to check if a task is in root PID namespace. Signed-off-by: Leo Yan <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-01-26	pid: Introduce helper task_is_in_init_pid_ns()	Leo Yan	1	-0/+5
	Currently the kernel uses open code in multiple places to check if a task is in the root PID namespace with the kind of format: if (task_active_pid_ns(current) == &init_pid_ns) do_something(); This patch creates a new helper function, task_is_in_init_pid_ns(), it returns true if a passed task is in the root PID namespace, otherwise returns false. So it will be used to replace open codes. Suggested-by: Suzuki K Poulose <[email protected]> Signed-off-by: Leo Yan <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Acked-by: Suzuki K Poulose <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-01-26	gve: Fix GFP flags when allocing pages	Catherine Sullivan	4	-6/+7
	Use GFP_ATOMIC when allocating pages out of the hotpath, continue to use GFP_KERNEL when allocating pages during setup. GFP_KERNEL will allow blocking which allows it to succeed more often in a low memory enviornment but in the hotpath we do not want to allow the allocation to block. Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Catherine Sullivan <[email protected]> Signed-off-by: David Awogbemila <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-01-27	ata: pata_platform: Fix a NULL pointer dereference in __pata_platform_probe()	Zhou Qingyang	1	-0/+2
	In __pata_platform_probe(), devm_kzalloc() is assigned to ap->ops and there is a dereference of it right after that, which could introduce a NULL pointer dereference bug. Fix this by adding a NULL check of ap->ops. This bug was found by a static analyzer. Builds with 'make allyesconfig' show no new warnings, and our static analyzer no longer warns about this code. Fixes: f3d5e4f18dba ("ata: pata_of_platform: Allow to use 16-bit wide data transfer") Signed-off-by: Zhou Qingyang <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]>
2022-01-26	ucount: Make get_ucount a safe get_user replacement	Eric W. Biederman	1	-0/+2
	When the ucount code was refactored to create get_ucount it was missed that some of the contexts in which a rlimit is kept elevated can be the only reference to the user/ucount in the system. Ordinary ucount references exist in places that also have a reference to the user namspace, but in POSIX message queues, the SysV shm code, and the SIGPENDING code there is no independent user namespace reference. Inspection of the the user_namespace show no instance of circular references between struct ucounts and the user_namespace. So hold a reference from struct ucount to i's user_namespace to resolve this problem. Link: https://lore.kernel.org/lkml/[email protected]/ Reported-by: Qian Cai <[email protected]> Reported-by: Mathias Krause <[email protected]> Tested-by: Mathias Krause <[email protected]> Reviewed-by: Mathias Krause <[email protected]> Reviewed-by: Alexey Gladkov <[email protected]> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Fixes: 6e52a9f0532f ("Reimplement RLIMIT_MSGQUEUE on top of ucounts") Fixes: d7c9e99aee48 ("Reimplement RLIMIT_MEMLOCK on top of ucounts") Cc: [email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
2022-01-27	selftests: nft_concat_range: add test for reload with no element add/del	Florian Westphal	1	-1/+71
	Add a specific test for the reload issue fixed with commit 23c54263efd7cb ("netfilter: nft_set_pipapo: allocate pcpu scratch maps on clone"). Add to set, then flush set content + restore without other add/remove in the transaction. On kernels before the fix, this test case fails: net,mac with reload [FAIL] Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-01-27	netfilter: nft_byteorder: track register operations	Pablo Neira Ayuso	1	-0/+12
	Cancel tracking for byteorder operation, otherwise selector + byteorder operation is incorrectly reduced if source and destination registers are the same. Reported-by: kernel test robot <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-01-27	netfilter: nft_reject_bridge: Fix for missing reply from prerouting	Phil Sutter	1	-4/+4
	Prior to commit fa538f7cf05aa ("netfilter: nf_reject: add reject skbuff creation helpers"), nft_reject_bridge did not assign to nskb->dev before passing nskb on to br_forward(). The shared skbuff creation helpers introduced in above commit do which seems to confuse br_forward() as reject statements in prerouting hook won't emit a packet anymore. Fix this by simply passing NULL instead of 'dev' to the helpers - they use the pointer for just that assignment, nothing else. Fixes: fa538f7cf05aa ("netfilter: nf_reject: add reject skbuff creation helpers") Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-01-27	selftests: netfilter: check stateless nat udp checksum fixup	Florian Westphal	1	-0/+152
	Add a test that sends large udp packet (which is fragmented) via a stateless nft nat rule, i.e. 'ip saddr set 10.2.3.4' and check that the datagram is received by peer. On kernels without commit 4e1860a38637 ("netfilter: nft_payload: do not update layer 4 checksum when mangling fragments")', this will fail with: cmp: EOF on /tmp/tmp.V1q0iXJyQF which is empty -rw------- 1 root root 4096 Jan 24 22:03 /tmp/tmp.Aaqnq4rBKS -rw------- 1 root root 0 Jan 24 22:03 /tmp/tmp.V1q0iXJyQF ERROR: in and output file mismatch when checking udp with stateless nat FAIL: nftables v1.0.0 (Fearless Fosdick #2) On patched kernels, this will show: PASS: IP statless for ns2-PFp89amx Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-01-27	selftests: netfilter: reduce zone stress test running time	Florian Westphal	1	-6/+6
	This selftests needs almost 3 minutes to complete, reduce the insertes zones to 1000. Test now completes in about 20 seconds. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-01-27	netfilter: nft_ct: fix use after free when attaching zone template	Florian Westphal	1	-1/+4
	The conversion erroneously removed the refcount increment. In case we can use the percpu template, we need to increment the refcount, else it will be released when the skb gets freed. In case the slowpath is taken, the new template already has a refcount of 1. Fixes: 719774377622 ("netfilter: conntrack: convert to refcount_t api") Reported-by: kernel test robot <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-01-27	netfilter: Remove flowtable relics	Geert Uytterhoeven	4	-11/+0
	NF_FLOW_TABLE_IPV4 and NF_FLOW_TABLE_IPV6 are invisble, selected by nothing (so they can no longer be enabled), and their last real users have been removed (nf_flow_table_ipv6.c is empty). Clean up the leftovers. Fixes: c42ba4290b2147aa ("netfilter: flowtable: remove ipv4/ipv6 modules") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-01-26	rcu-tasks: Fix computation of CPU-to-list shift counts	Paul E. McKenney	1	-4/+8
	The ->percpu_enqueue_shift field is used to map from the running CPU number to the index of the corresponding callback list. This mapping can change at runtime in response to varying callback load, resulting in varying levels of contention on the callback-list locks. Unfortunately, the initial value of this field is correct only if the system happens to have a power-of-two number of CPUs, otherwise the callbacks from the high-numbered CPUs can be placed into the callback list indexed by 1 (rather than 0), and those index-1 callbacks will be ignored. This can result in soft lockups and hangs. This commit therefore corrects this mapping, adding one to this shift count as needed for systems having odd numbers of CPUs. Fixes: 7a30871b6a27 ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection") Reported-by: Andrii Nakryiko <[email protected]> Cc: Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-01-26	ceph: set pool_ns in new inode layout for async creates	Jeff Layton	1	-0/+7
	Dan reported that he was unable to write to files that had been asynchronously created when the client's OSD caps are restricted to a particular namespace. The issue is that the layout for the new inode is only partially being filled. Ensure that we populate the pool_ns_data and pool_ns_len in the iinfo before calling ceph_fill_inode. Cc: [email protected] URL: https://tracker.ceph.com/issues/54013 Fixes: 9a8d03ca2e2c ("ceph: attempt to do async create when possible") Reported-by: Dan van der Ster <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2022-01-26	ceph: properly put ceph_string reference after async create attempt	Jeff Layton	1	-0/+2
	The reference acquired by try_prep_async_create is currently leaked. Ensure we put it. Cc: [email protected] Fixes: 9a8d03ca2e2c ("ceph: attempt to do async create when possible") Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2022-01-26	ceph: put the requests/sessions when it fails to alloc memory	Xiubo Li	1	-18/+37
	When failing to allocate the sessions memory we should make sure the req1 and req2 and the sessions get put. And also in case the max_sessions decreased so when kreallocate the new memory some sessions maybe missed being put. And if the max_sessions is 0 krealloc will return ZERO_SIZE_PTR, which will lead to a distinct access fault. URL: https://tracker.ceph.com/issues/53819 Fixes: e1a4541ec0b9 ("ceph: flush the mdlog before waiting on unsafe reqs") Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Venky Shankar <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2022-01-26	arm64: extable: fix load_unaligned_zeropad() reg indices	Evgenii Stepanov	1	-2/+2
	In ex_handler_load_unaligned_zeropad() we erroneously extract the data and addr register indices from ex->type rather than ex->data. As ex->type will contain EX_TYPE_LOAD_UNALIGNED_ZEROPAD (i.e. 4): * We'll always treat X0 as the address register, since EX_DATA_REG_ADDR is extracted from bits [9:5]. Thus, we may attempt to dereference an arbitrary address as X0 may hold an arbitrary value. * We'll always treat X4 as the data register, since EX_DATA_REG_DATA is extracted from bits [4:0]. Thus we will corrupt X4 and cause arbitrary behaviour within load_unaligned_zeropad() and its caller. Fix this by extracting both values from ex->data as originally intended. On an MTE-enabled QEMU image we are hitting the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: fixup_exception+0xc4/0x108 __do_kernel_fault+0x3c/0x268 do_tag_check_fault+0x3c/0x104 do_mem_abort+0x44/0xf4 el1_abort+0x40/0x64 el1h_64_sync_handler+0x60/0xa0 el1h_64_sync+0x7c/0x80 link_path_walk+0x150/0x344 path_openat+0xa0/0x7dc do_filp_open+0xb8/0x168 do_sys_openat2+0x88/0x17c __arm64_sys_openat+0x74/0xa0 invoke_syscall+0x48/0x148 el0_svc_common+0xb8/0xf8 do_el0_svc+0x28/0x88 el0_svc+0x24/0x84 el0t_64_sync_handler+0x88/0xec el0t_64_sync+0x1b4/0x1b8 Code: f8695a69 71007d1f 540000e0 927df12a (f940014a) Fixes: 753b32368705 ("arm64: extable: add load_unaligned_zeropad() handler") Cc: <[email protected]> # 5.16.x Reviewed-by: Mark Rutland <[email protected]> Signed-off-by: Evgenii Stepanov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-01-26	counter: fix an IS_ERR() vs NULL bug	Dan Carpenter	1	-9/+6
	There are 8 callers for devm_counter_alloc() and they all check for NULL instead of error pointers. I think NULL is the better thing to return for allocation functions so update counter_alloc() and devm_counter_alloc() to return NULL instead of error pointers. Fixes: c18e2760308e ("counter: Provide alternative counter registration functions") Acked-by: Uwe Kleine-König <[email protected]> Acked-by: William Breathitt Gray <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/20220111173243.GA2192@kili Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-01-26	selftests: kvm: move vm_xsave_req_perm call to amx_test	Paolo Bonzini	5	-14/+9
	There is no need for tests other than amx_test to enable dynamic xsave states. Remove the call to vm_xsave_req_perm from generic code, and move it inside the test. While at it, allow customizing the bit that is requested, so that future tests can use it differently. Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time	Like Xu	1	-2/+2
	XCR0 is reset to 1 by RESET but not INIT and IA32_XSS is zeroed by both RESET and INIT. The kvm_set_msr_common()'s handling of MSR_IA32_XSS also needs to update kvm_update_cpuid_runtime(). In the above cases, the size in bytes of the XSAVE area containing all states enabled by XCR0 or (XCRO \| IA32_XSS) needs to be updated. For simplicity and consistency, existing helpers are used to write values and call kvm_update_cpuid_runtime(), and it's not exactly a fast path. Fixes: a554d207dc46 ("KVM: X86: Processor States following Reset or INIT") Cc: [email protected] Signed-off-by: Like Xu <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS	Like Xu	1	-0/+1
	Do a runtime CPUID update for a vCPU if MSR_IA32_XSS is written, as the size in bytes of the XSAVE area is affected by the states enabled in XSS. Fixes: 203000993de5 ("kvm: vmx: add MSR logic for XSAVES") Cc: [email protected] Signed-off-by: Like Xu <[email protected]> [sean: split out as a separate patch, adjust Fixes tag] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: x86: Keep MSR_IA32_XSS unchanged for INIT	Xiaoyao Li	1	-2/+1
	It has been corrected from SDM version 075 that MSR_IA32_XSS is reset to zero on Power up and Reset but keeps unchanged on INIT. Fixes: a554d207dc46 ("KVM: X86: Processor States following Reset or INIT") Cc: [email protected] Signed-off-by: Xiaoyao Li <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	s390/hypfs: include z/VM guests with access control group set	Vasily Gorbik	1	-2/+4
	Currently if z/VM guest is allowed to retrieve hypervisor performance data globally for all guests (privilege class B) the query is formed in a way to include all guests but the group name is left empty. This leads to that z/VM guests which have access control group set not being included in the results (even local vm). Change the query group identifier from empty to "any" to retrieve information about all guests from any groups (or without a group set). Cc: [email protected] Fixes: 31cb4bd31a48 ("[S390] Hypervisor filesystem (s390_hypfs) for z/VM") Reviewed-by: Gerald Schaefer <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2022-01-26	KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2}	Sean Christopherson	1	-2/+8
	Free the "struct kvm_cpuid_entry2" array on successful post-KVM_RUN KVM_SET_CPUID{,2} to fix a memory leak, the callers of kvm_set_cpuid() free the array only on failure. BUG: memory leak unreferenced object 0xffff88810963a800 (size 2048): comm "syz-executor025", pid 3610, jiffies 4294944928 (age 8.080s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 0d 00 00 00 ................ 47 65 6e 75 6e 74 65 6c 69 6e 65 49 00 00 00 00 GenuntelineI.... backtrace: [<ffffffff814948ee>] kmalloc_node include/linux/slab.h:604 [inline] [<ffffffff814948ee>] kvmalloc_node+0x3e/0x100 mm/util.c:580 [<ffffffff814950f2>] kvmalloc include/linux/slab.h:732 [inline] [<ffffffff814950f2>] vmemdup_user+0x22/0x100 mm/util.c:199 [<ffffffff8109f5ff>] kvm_vcpu_ioctl_set_cpuid2+0x8f/0xf0 arch/x86/kvm/cpuid.c:423 [<ffffffff810711b9>] kvm_arch_vcpu_ioctl+0xb99/0x1e60 arch/x86/kvm/x86.c:5251 [<ffffffff8103e92d>] kvm_vcpu_ioctl+0x4ad/0x950 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4066 [<ffffffff815afacc>] vfs_ioctl fs/ioctl.c:51 [inline] [<ffffffff815afacc>] __do_sys_ioctl fs/ioctl.c:874 [inline] [<ffffffff815afacc>] __se_sys_ioctl fs/ioctl.c:860 [inline] [<ffffffff815afacc>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860 [<ffffffff844a3335>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff844a3335>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c6617c61e8fe ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN") Cc: [email protected] Reported-by: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	xfs, iomap: limit individual ioend chain lengths in writeback	Dave Chinner	3	-5/+65
	Trond Myklebust reported soft lockups in XFS IO completion such as this: watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106] CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1 Workqueue: xfs-conv/md127 xfs_end_io [xfs] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20 Call Trace: wake_up_page_bit+0x8a/0x110 iomap_finish_ioend+0xd7/0x1c0 iomap_finish_ioends+0x7f/0xb0 xfs_end_ioend+0x6b/0x100 [xfs] xfs_end_io+0xb9/0xe0 [xfs] process_one_work+0x1a7/0x360 worker_thread+0x1fa/0x390 kthread+0x116/0x130 ret_from_fork+0x35/0x40 Ioends are processed as an atomic completion unit when all the chained bios in the ioend have completed their IO. Logically contiguous ioends can also be merged and completed as a single, larger unit. Both of these things can be problematic as both the bio chains per ioend and the size of the merged ioends processed as a single completion are both unbound. If we have a large sequential dirty region in the page cache, write_cache_pages() will keep feeding us sequential pages and we will keep mapping them into ioends and bios until we get a dirty page at a non-sequential file offset. These large sequential runs can will result in bio and ioend chaining to optimise the io patterns. The pages iunder writeback are pinned within these chains until the submission chaining is broken, allowing the entire chain to be completed. This can result in huge chains being processed in IO completion context. We get deep bio chaining if we have large contiguous physical extents. We will keep adding pages to the current bio until it is full, then we'll chain a new bio to keep adding pages for writeback. Hence we can build bio chains that map millions of pages and tens of gigabytes of RAM if the page cache contains big enough contiguous dirty file regions. This long bio chain pins those pages until the final bio in the chain completes and the ioend can iterate all the chained bios and complete them. OTOH, if we have a physically fragmented file, we end up submitting one ioend per physical fragment that each have a small bio or bio chain attached to them. We do not chain these at IO submission time, but instead we chain them at completion time based on file offset via iomap_ioend_try_merge(). Hence we can end up with unbound ioend chains being built via completion merging. XFS can then do COW remapping or unwritten extent conversion on that merged chain, which involves walking an extent fragment at a time and running a transaction to modify the physical extent information. IOWs, we merge all the discontiguous ioends together into a contiguous file range, only to then process them individually as discontiguous extents. This extent manipulation is computationally expensive and can run in a tight loop, so merging logically contiguous but physically discontigous ioends gains us nothing except for hiding the fact the fact we broke the ioends up into individual physical extents at submission and then need to loop over those individual physical extents at completion. Hence we need to have mechanisms to limit ioend sizes and to break up completion processing of large merged ioend chains: 1. bio chains per ioend need to be bound in length. Pure overwrites go straight to iomap_finish_ioend() in softirq context with the exact bio chain attached to the ioend by submission. Hence the only way to prevent long holdoffs here is to bound ioend submission sizes because we can't reschedule in softirq context. 2. iomap_finish_ioends() has to handle unbound merged ioend chains correctly. This relies on any one call to iomap_finish_ioend() being bound in runtime so that cond_resched() can be issued regularly as the long ioend chain is processed. i.e. this relies on mechanism #1 to limit individual ioend sizes to work correctly. 3. filesystems have to loop over the merged ioends to process physical extent manipulations. This means they can loop internally, and so we break merging at physical extent boundaries so the filesystem can easily insert reschedule points between individual extent manipulations. Signed-off-by: Dave Chinner <[email protected]> Reported-and-tested-by: Trond Myklebust <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2022-01-26	KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02	Sean Christopherson	1	-10/+12
	WARN if KVM attempts to allocate a shadow VMCS for vmcs02. KVM emulates VMCS shadowing but doesn't virtualize it, i.e. KVM should never allocate a "real" shadow VMCS for L2. The previous code WARNed but continued anyway with the allocation, presumably in an attempt to avoid NULL pointer dereference. However, alloc_vmcs (and hence alloc_shadow_vmcs) can fail, and indeed the sole caller does: if (enable_shadow_vmcs && !alloc_shadow_vmcs(vcpu)) goto out_shadow_vmcs; which makes it not a useful attempt. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest	Sean Christopherson	1	-1/+0
	Don't skip the vmcall() in l2_guest_code() prior to re-entering L2, doing so will result in L2 running to completion, popping '0' off the stack for RET, jumping to address '0', and ultimately dying with a triple fault shutdown. It's not at all obvious why the test re-enters L2 and re-executes VMCALL, but presumably it serves a purpose. The VMX path doesn't skip vmcall(), and the test can't possibly have passed on SVM, so just do what VMX does. Fixes: d951b2210c1a ("KVM: selftests: smm_test: Test SMM enter from L2") Cc: Maxim Levitsky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Tested-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: x86: Check .flags in kvm_cpuid_check_equal() too	Vitaly Kuznetsov	1	-0/+1
	kvm_cpuid_check_equal() checks for the (full) equality of the supplied CPUID data so .flags need to be checked too. Reported-by: Sean Christopherson <[email protected]> Fixes: c6617c61e8fe ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: x86: Forcibly leave nested virt when SMM state is toggled	Sean Christopherson	6	-7/+12
	Forcibly leave nested virtualization operation if userspace toggles SMM state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace forces the vCPU out of SMM while it's post-VMXON and then injects an SMI, vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both vmxon=false and smm.vmxon=false, but all other nVMX state allocated. Don't attempt to gracefully handle the transition as (a) most transitions are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't sufficient information to handle all transitions, e.g. SVM wants access to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede KVM_SET_NESTED_STATE during state restore as the latter disallows putting the vCPU into L2 if SMM is active, and disallows tagging the vCPU as being post-VMXON in SMM if SMM is not active. Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU in an architecturally impossible state. WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Modules linked in: CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00 Call Trace: <TASK> kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460 kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline] kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline] kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250 kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Cc: [email protected] Reported-by: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments()	Vitaly Kuznetsov	2	-13/+1
	Commit 3fa5e8fd0a0e4 ("KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized") re-arranged svm_vcpu_init_msrpm() call in svm_create_vcpu(), thus making the comment about vmcb being NULL obsolete. Drop it. While on it, drop superfluous vmcb_is_clean() check: vmcb_mark_dirty() is a bit flip, an extra check is unlikely to bring any performance gain. Drop now-unneeded vmcb_is_clean() helper as well. Fixes: 3fa5e8fd0a0e4 ("KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized") Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real	Vitaly Kuznetsov	1	-0/+3
	Commit c4327f15dfc7 ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support") introduced enlightened MSR-Bitmap support for KVM-on-Hyper-V but it didn't actually enable the support. Similar to enlightened NPT TLB flush and direct TLB flush features, the guest (KVM) has to tell L0 (Hyper-V) that it's using the feature by setting the appropriate feature fit in VMCB control area (sw reserved fields). Fixes: c4327f15dfc7 ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support") Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode	Sean Christopherson	1	-1/+15
	Inject a #GP instead of synthesizing triple fault to try to avoid killing the guest if emulation of an SEV guest fails due to encountering the SMAP erratum. The injected #GP may still be fatal to the guest, e.g. if the userspace process is providing critical functionality, but KVM should make every attempt to keep the guest alive. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26	KVM: SVM: Don't apply SEV+SMAP workaround on code fetch or PT access	Sean Christopherson	1	-9/+34
	Resume the guest instead of synthesizing a triple fault shutdown if the instruction bytes buffer is empty due to the #NPF being on the code fetch itself or on a page table access. The SMAP errata applies if and only if the code fetch was successful and ucode's subsequent data read from the code page encountered a SMAP violation. In practice, the guest is likely hosed either way, but crashing the guest on a code fetch to emulated MMIO is technically wrong according to the behavior described in the APM. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>