aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-10-21ext4: fast commit recovery pathHarshad Shirwadkar14-131/+1821
This patch adds fast commit recovery path support for Ext4 file system. We add several helper functions that are similar in spirit to e2fsprogs journal recovery path handlers. Example of such functions include - a simple block allocator, idempotent block bitmap update function etc. Using these routines and the fast commit log in the fast commit area, the recovery path (ext4_fc_replay()) performs fast commit log recovery. Reported-by: kernel test robot <[email protected]> Signed-off-by: Harshad Shirwadkar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-10-21jbd2: fast commit recovery pathHarshad Shirwadkar3-4/+88
This patch adds fast commit recovery support in JBD2. Signed-off-by: Harshad Shirwadkar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-10-21ext4: main fast-commit commit pathHarshad Shirwadkar13-29/+1707
This patch adds main fast commit commit path handlers. The overall patch can be divided into two inter-related parts: (A) Metadata updates tracking This part consists of helper functions to track changes that need to be committed during a commit operation. These updates are maintained by Ext4 in different in-memory queues. Following are the APIs and their short description that are implemented in this patch: - ext4_fc_track_link/unlink/creat() - Track unlink. link and creat operations - ext4_fc_track_range() - Track changed logical block offsets inodes - ext4_fc_track_inode() - Track inodes - ext4_fc_mark_ineligible() - Mark file system fast commit ineligible() - ext4_fc_start_update() / ext4_fc_stop_update() / ext4_fc_start_ineligible() / ext4_fc_stop_ineligible() These functions are useful for co-ordinating inode updates with commits. (B) Main commit Path This part consists of functions to convert updates tracked in in-memory data structures into on-disk commits. Function ext4_fc_commit() is the main entry point to commit path. Reported-by: kernel test robot <[email protected]> Signed-off-by: Harshad Shirwadkar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-10-21jbd2: add fast commit machineryHarshad Shirwadkar4-1/+268
This functions adds necessary APIs needed in JBD2 layer for fast commits. Signed-off-by: Harshad Shirwadkar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-10-21ext4 / jbd2: add fast commit initializationHarshad Shirwadkar7-6/+122
This patch adds fast commit area trackers in the journal_t structure. These are initialized via the jbd2_fc_init() routine that this patch adds. This patch also adds ext4/fast_commit.c and ext4/fast_commit.h files for fast commit code that will be added in subsequent patches in this series. Reported-by: kernel test robot <[email protected]> Signed-off-by: Harshad Shirwadkar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-10-21ext4: add fast_commit feature and handling for extended mount optionsHarshad Shirwadkar3-6/+30
We are running out of mount option bits. Add handling for using s_mount_opt2. Add ext4 and jbd2 fast commit feature flag and also add ability to turn off the fast commit feature in Ext4. Signed-off-by: Harshad Shirwadkar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-10-21doc: update ext4 and journalling docs to include fast commit featureHarshad Shirwadkar2-0/+99
This patch adds necessary documentation for fast commits. Signed-off-by: Harshad Shirwadkar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-10-21drm/amdgpu: correct the cu and rb info for sienna cichlidLikun Gao1-0/+9
Skip disabled sa to correct the cu_info and active_rbs for sienna cichlid. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 5.9.x
2020-10-21drm/amd/pm: remove the average clock value in sysfsKenneth Feng1-4/+8
if it's fine-grained clock dpm, remove the average clock value and reflects the real clock. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-10-21drm/amd/pm: fix pp_dpm_fclkKenneth Feng1-0/+3
fclk value is missing in pp_dpm_fclk. add this to correctly show the current value. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 5.9.x
2020-10-21Revert drm/amdgpu: disable sienna chichlid UMC RASJohn Clements1-2/+2
This reverts commit 265c280a4807419249644156654e5c40a235ea84. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-10-21drm/amd/pm: fix pcie information for sienna cichlidLikun Gao1-2/+2
Fix the function used for sienna cichlid to get correct PCIE information by pp_dpm_pcie. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 5.9.x
2020-10-21drm/amdkfd: Use same SQ prefetch setting as amdgpuJay Cornwall1-2/+3
0 causes instruction fetch stall at cache line boundary under some conditions on Navi10. A non-zero prefetch is the preferred default in any case. Fixes soft hang in Luxmark. Signed-off-by: Jay Cornwall <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2020-10-21rtnetlink: fix data overflow in rtnl_calcit()Di Zhu2-8/+7
"ip addr show" command execute error when we have a physical network card with a large number of VFs The return value of if_nlmsg_size() in rtnl_calcit() will exceed range of u16 data type when any network cards has a larger number of VFs. rtnl_vfinfo_size() will significant increase needed dump size when the value of num_vfs is larger. Eventually we get a wrong value of min_ifinfo_dump_size because of overflow which decides the memory size needed by netlink dump and netlink_dump() will return -EMSGSIZE because of not enough memory was allocated. So fix it by promoting min_dump_alloc data type to u32 to avoid whole netlink message size overflow and it's also align with the data type of struct netlink_callback{}.min_dump_alloc which is assigned by return value of rtnl_calcit() Signed-off-by: Di Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-21net: ethernet: mtk-star-emac: select REGMAP_MMIOBartosz Golaszewski1-0/+1
The driver depends on mmio regmap API but doesn't select the appropriate Kconfig option. This fixes it. Fixes: 8c7bd5a454ff ("net: ethernet: mtk-star-emac: new driver") Signed-off-by: Bartosz Golaszewski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-21net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setupXie He1-0/+1
This driver calls ether_setup to set up the network device. The ether_setup function would add the IFF_TX_SKB_SHARING flag to the device. This flag indicates that it is safe to transmit shared skbs to the device. However, this is not true. This driver may pad the frame (in eth_tx) before transmission, so the skb may be modified. Fixes: 550fd08c2ceb ("net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared") Cc: Neil Horman <[email protected]> Cc: Krzysztof Halasa <[email protected]> Signed-off-by: Xie He <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-21net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC deviceXie He1-1/+9
The hdlc_rcv function is used as hdlc_packet_type.func to process any skb received in the kernel with skb->protocol == htons(ETH_P_HDLC). The purpose of this function is to provide second-stage processing for skbs not assigned a "real" L3 skb->protocol value in the first stage. This function assumes the device from which the skb is received is an HDLC device (a device created by this module). It assumes that netdev_priv(dev) returns a pointer to "struct hdlc_device". However, it is possible that some driver in the kernel (not necessarily in our control) submits a received skb with skb->protocol == htons(ETH_P_HDLC), from a non-HDLC device. In this case, the skb would still be received by hdlc_rcv. This will cause problems. hdlc_rcv should be able to recognize and drop invalid skbs. It should first make sure "dev" is actually an HDLC device, before starting its processing. This patch adds this check to hdlc_rcv. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Krzysztof Halasa <[email protected]> Signed-off-by: Xie He <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-22bpf, libbpf: Guard bpf inline asm from bpf_tail_call_staticDaniel Borkmann2-4/+6
Yaniv reported a compilation error after pulling latest libbpf: [...] ../libbpf/src/root/usr/include/bpf/bpf_helpers.h:99:10: error: unknown register name 'r0' in asm : "r0", "r1", "r2", "r3", "r4", "r5"); [...] The issue got triggered given Yaniv was compiling tracing programs with native target (e.g. x86) instead of BPF target, hence no BTF generated vmlinux.h nor CO-RE used, and later llc with -march=bpf was invoked to compile from LLVM IR to BPF object file. Given that clang was expecting x86 inline asm and not BPF one the error complained that these regs don't exist on the former. Guard bpf_tail_call_static() with defined(__bpf__) where BPF inline asm is valid to use. BPF tracing programs on more modern kernels use BPF target anyway and thus the bpf_tail_call_static() function will be available for them. BPF inline asm is supported since clang 7 (clang <= 6 otherwise throws same above error), and __bpf_unreachable() since clang 8, therefore include the latter condition in order to prevent compilation errors for older clang versions. Given even an old Ubuntu 18.04 LTS has official LLVM packages all the way up to llvm-10, I did not bother to special case the __bpf_unreachable() inside bpf_tail_call_static() further. Also, undo the sockex3_kern's use of bpf_tail_call_static() sample given they still have the old hacky way to even compile networking progs with native instead of BPF target so bpf_tail_call_static() won't be defined there anymore. Fixes: 0e9f6841f664 ("bpf, libbpf: Add bpf_tail_call_static helper for bpf programs") Reported-by: Yaniv Agman <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Yonghong Song <[email protected]> Tested-by: Yaniv Agman <[email protected]> Link: https://lore.kernel.org/bpf/CAMy7=ZUk08w5Gc2Z-EKi4JFtuUCaZYmE4yzhJjrExXpYKR4L8w@mail.gmail.com Link: https://lore.kernel.org/bpf/[email protected]
2020-10-22powerpc/eeh: Fix eeh_dev_check_failure() for PE#0Oliver O'Halloran1-5/+0
In commit 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr") the following simplification was made: - if (!pe->addr && !pe->config_addr) { + if (!pe->addr) { eeh_stats.no_cfg_addr++; return 0; } This introduced a bug which causes EEH checking to be skipped for devices in PE#0. Before the change above the check would always pass since at least one of the two PE addresses would be non-zero in all circumstances. On PowerNV pe->config_addr would be the BDFN of the first device added to the PE. The zero BDFN is reserved for the PHB's root port, but this is fine since for obscure platform reasons the root port is never assigned to PE#0. Similarly, on pseries pe->addr has always been non-zero for the reasons outlined in commit 42de19d5ef71 ("powerpc/pseries/eeh: Allow zero to be a valid PE configuration address"). We can fix the problem by deleting the block entirely The original purpose of this test was to avoid performing EEH checks on devices that were not on an EEH capable bus. In modern Linux the edev->pe pointer will be NULL for devices that are not on an EEH capable bus. The code block immediately above this one already checks for the edev->pe == NULL case so this test (new and old) is entirely redundant. Ideally we'd delete eeh_stats.no_cfg_addr too since nothing increments it any more. Unfortunately, that information is exposed via /proc/powerpc/eeh which means it's technically ABI. We could make it hard-coded, but that's a change for another patch. Fixes: 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr") Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-22bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()Toke Høiland-Jørgensen3-5/+173
This updates the test_tc_neigh prog in selftests to use the new syntax of bpf_redirect_neigh(). To exercise the helper both with and without the optional parameter, add an additional test_tc_neigh_fib test program, which does a bpf_fib_lookup() followed by a call to bpf_redirect_neigh() instead of looking up the ifindex in a map. Update the test_tc_redirect.sh script to run both versions of the test, and while we're add it, fix it to work on systems that have a consolidated dual-stack 'ping' binary instead of separate ping/ping6 versions. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-10-22exfat: remove useless check in exfat_move_file()Tetsuhiro Kohada1-5/+0
In exfat_move_file(), the identity of source and target directory has been checked by the caller. Also, it gets stream.start_clu from file dir-entry, which is an invalid determination. Signed-off-by: Tetsuhiro Kohada <[email protected]> Acked-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-10-22exfat: remove 'rwoffset' in exfat_inode_infoTetsuhiro Kohada5-20/+9
Remove 'rwoffset' in exfat_inode_info and replace it with the parameter of exfat_readdir(). Since rwoffset is referenced only by exfat_readdir(), it is not necessary a exfat_inode_info's member. Also, change cpos to point to the next of entry-set, and return the index of dir-entry via dir_entry->entry. Signed-off-by: Tetsuhiro Kohada <[email protected]> Acked-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-10-22exfat: replace memcpy with structure assignmentTetsuhiro Kohada3-14/+10
Use structure assignment instead of memcpy. Signed-off-by: Tetsuhiro Kohada <[email protected]> Acked-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-10-22exfat: remove useless directory scan in exfat_add_entry()Tetsuhiro Kohada1-10/+1
There is nothing in directory just created, so there is no need to scan. Signed-off-by: Tetsuhiro Kohada <[email protected]> Acked-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-10-22exfat: eliminate dead code in exfat_find()Tetsuhiro Kohada2-74/+47
The exfat_find_dir_entry() called by exfat_find() doesn't return -EEXIST. Therefore, the root-dir information setting is never executed. Signed-off-by: Tetsuhiro Kohada <[email protected]> Acked-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-10-22exfat: use i_blocksize() to get blocksizeXianting Tian1-1/+1
We alreday has the interface i_blocksize() to get blocksize, so use it. Signed-off-by: Xianting Tian <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-10-22exfat: fix misspellings using codespell toolNamjae Jeon3-3/+3
Sedat reported typos using codespell tool. $ codespell fs/exfat/*.c | grep -v iput fs/exfat/namei.c:293: upto ==> up to fs/exfat/nls.c:14: tabel ==> table $ codespell fs/exfat/*.h fs/exfat/exfat_fs.h:133: usally ==> usually Fix typos found by codespell. Reported-by: Sedat Dilek <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-10-22bpf: Fix bpf_redirect_neigh helper api to support supplying nexthopToke Høiland-Jørgensen5-67/+145
Based on the discussion in [0], update the bpf_redirect_neigh() helper to accept an optional parameter specifying the nexthop information. This makes it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without incurring a duplicate FIB lookup - since the FIB lookup helper will return the nexthop information even if no neighbour is present, this can simply be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns BPF_FIB_LKUP_RET_NO_NEIGH. Thus fix & extend it before helper API is frozen. [0] https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-10-21xfs: cancel intents immediately if process_intents failsDarrick J. Wong1-0/+8
If processing recovered log intent items fails, we need to cancel all the unprocessed recovered items immediately so that a subsequent AIL push in the bail out path won't get wedged on the pinned intent items that didn't get processed. This can happen if the log contains (1) an intent that gets and releases an inode, (2) an intent that cannot be recovered successfully, and (3) some third intent item. When recovery of (2) fails, we leave (3) pinned in memory. Inode reclamation is called in the error-out path of xfs_mountfs before xfs_log_cancel_mount. Reclamation calls xfs_ail_push_all_sync, which gets stuck waiting for (3). Therefore, call xlog_recover_cancel_intents if _process_intents fails. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-10-21smb3: do not try to cache root directory if dir leases not supportedSteve French1-1/+4
To servers which do not support directory leases (e.g. Samba) it is wasteful to try to open_shroot (ie attempt to cache the root directory handle). Skip attempt to open_shroot when server does not indicate support for directory leases. Cuts the number of requests on mount from 17 to 15, and cuts the number of requests on stat of the root directory from 4 to 3. Signed-off-by: Steve French <[email protected]> CC: Stable <[email protected]> # v5.1+
2020-10-21smb3: fix stat when special device file and mounted with modefromsidSteve French1-1/+6
When mounting with modefromsid mount option, it was possible to get the error on stat of a fifo or char or block device: "cannot stat <filename>: Operation not supported" Special devices can be stored as reparse points by some servers (e.g. Windows NFS server and when using the SMB3.1.1 POSIX Extensions) but when the modefromsid mount option is used the client attempts to get the ACL for the file which requires opening with OPEN_REPARSE_POINT create option. Signed-off-by: Steve French <[email protected]> CC: Stable <[email protected]> Reviewed-by: Ronnie Sahlberg <[email protected]> Reviewed-by: Shyam Prasad N <[email protected]>
2020-10-21cifs: Print the address and port we are connecting to in generic_ip_connect()Samuel Cabrero1-2/+10
Can be helpful in debugging mount and reconnect issues Signed-off-by: Samuel Cabrero <[email protected]> Reviewed-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2020-10-21SMB3: Resolve data corruption of TCP server info fieldsRohith Surabattula1-5/+7
TCP server info field server->total_read is modified in parallel by demultiplex thread and decrypt offload worker thread. server->total_read is used in calculation to discard the remaining data of PDU which is not read into memory. Because of parallel modification, server->total_read can get corrupted and can result in discarding the valid data of next PDU. Signed-off-by: Rohith Surabattula <[email protected]> Reviewed-by: Aurelien Aptel <[email protected]> Reviewed-by: Pavel Shilovsky <[email protected]> CC: Stable <[email protected]> #5.4+ Signed-off-by: Steve French <[email protected]>
2020-10-21io_uring: don't reuse linked_timeoutPavel Begunkov1-1/+3
Clear linked_timeout for next requests in __io_queue_sqe() so we won't queue it up unnecessary when it's going to be punted. Signed-off-by: Pavel Begunkov <[email protected]> Cc: [email protected] # v5.9 Signed-off-by: Jens Axboe <[email protected]>
2020-10-21kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator argBen Gardon2-7/+9
In order to avoid creating executable hugepages in the TDP MMU PF handler, remove the dependency between disallowed_hugepage_adjust and the shadow_walk_iterator. This will open the function up to being used by the TDP MMU PF handler in a future patch. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Support zapping SPTEs in the TDP MMUBen Gardon5-0/+136
Add functions to zap SPTEs to the TDP MMU. These are needed to tear down TDP MMU roots properly and implement other MMU functions which require tearing down mappings. Future patches will add functions to populate the page tables, but as for this patch there will not be any work for these functions to do. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: Cache as_id in kvm_memory_slotPeter Xu2-0/+7
Cache the address space ID just like the slot ID. It will be used in order to fill in the dirty ring entries. Suggested-by: Paolo Bonzini <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Peter Xu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Add functions to handle changed TDP SPTEsBen Gardon3-1/+115
The existing bookkeeping done by KVM when a PTE is changed is spread around several functions. This makes it difficult to remember all the stats, bitmaps, and other subsystems that need to be updated whenever a PTE is modified. When a non-leaf PTE is marked non-present or becomes a leaf PTE, page table memory must also be freed. To simplify the MMU and facilitate the use of atomic operations on SPTEs in future patches, create functions to handle some of the bookkeeping required as a result of a change. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Allocate and free TDP MMU rootsBen Gardon5-6/+146
The TDP MMU must be able to allocate paging structure root pages and track the usage of those pages. Implement a similar, but separate system for root page allocation to that of the x86 shadow paging implementation. When future patches add synchronization model changes to allow for parallel page faults, these pages will need to be handled differently from the x86 shadow paging based MMU's root pages. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Init / Uninit the TDP MMUBen Gardon5-1/+55
The TDP MMU offers an alternative mode of operation to the x86 shadow paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU will require new fields that need to be initialized and torn down. Add hooks into the existing KVM MMU initialization process to do that initialization / cleanup. Currently the initialization and cleanup fucntions do not do very much, however more operations will be added in future patches. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Introduce tdp_iterBen Gardon3-1/+234
The TDP iterator implements a pre-order traversal of a TDP paging structure. This iterator will be used in future patches to create an efficient implementation of the KVM MMU for the TDP case. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: mmu: extract spte.h and spte.cPaolo Bonzini5-548/+607
The SPTE format will be common to both the shadow and the TDP MMU. Extract code that implements the format to a separate module, as a first step towards adding the TDP MMU and putting mmu.c on a diet. Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: mmu: Separate updating a PTE from kvm_set_pte_rmappPaolo Bonzini1-7/+17
The TDP MMU's own function for the changed-PTE notifier will need to be update a PTE in the exact same way as the shadow MMU. Rather than re-implementing this logic, factor the SPTE creation out of kvm_set_pte_rmapp. Extracted out of a patch by Ben Gardon. <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Separate making SPTEs from set_spteBen Gardon1-16/+33
Separate the functions for generating leaf page table entries from the function that inserts them into the paging structure. This refactoring will facilitate changes to the MMU sychronization model to use atomic compare / exchanges (which are not guaranteed to succeed) instead of a monolithic MMU lock. No functional change expected. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This commit introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Reviewed-by: Peter Shier <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: mmu: Separate making non-leaf sptes from link_shadow_pageBen Gardon1-6/+15
The TDP MMU page fault handler will need to be able to create non-leaf SPTEs to build up the paging structures. Rather than re-implementing the function, factor the SPTE creation out of link_shadow_page. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21Merge branch 'kvm-fixes' into 'next'Paolo Bonzini6-23/+47
Pick up bugfixes from 5.9, otherwise various tests fail.
2020-10-21KVM: PPC: Book3S HV: Make struct kernel_param_ops definition constJoe Perches1-1/+1
This should be const, so make it so. Signed-off-by: Joe Perches <[email protected]> Message-Id: <d130e88dd4c82a12d979da747cc0365c72c3ba15.1601770305.git.joe@perches.com> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: Let the guest own CR4.FSGSBASELai Jiangshan1-1/+1
Add FSGSBASE to the set of possible guest-owned CR4 bits, i.e. let the guest own it on VMX. KVM never queries the guest's CR4.FSGSBASE value, thus there is no reason to force VM-Exit on FSGSBASE being toggled. Note, because FSGSBASE is conditionally available, this is dependent on recent changes to intercept reserved CR4 bits and to update the CR4 guest/host mask in response to guest CPUID changes. Cc: Paolo Bonzini <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> [sean: added justification in changelog] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: VMX: Intercept guest reserved CR4 bits to inject #GP faultSean Christopherson1-5/+10
Intercept CR4 bits that are guest reserved so that KVM correctly injects a #GP fault if the guest attempts to set a reserved bit. If a feature is supported by the CPU but is not exposed to the guest, and its associated CR4 bit is not intercepted by KVM by default, then KVM will fail to inject a #GP if the guest sets the CR4 bit without triggering an exit, e.g. by toggling only the bit in question. Note, KVM doesn't give the guest direct access to any CR4 bits that are also dependent on guest CPUID. Yet. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: Move call to update_exception_bitmap() into VMX codeSean Christopherson2-1/+3
Now that vcpu_after_set_cpuid() and update_exception_bitmap() are called back-to-back, subsume the exception bitmap update into the common CPUID update. Drop the SVM invocation entirely as SVM's exception bitmap doesn't vary with respect to guest CPUID. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>