aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-10-29KVM: arm64: Don't corrupt tpidr_el2 on failed HVC callMarc Zyngier1-7/+16
The hyp-init code starts by stashing a register in TPIDR_EL2 in in order to free a register. This happens no matter if the HVC call is legal or not. Although nothing wrong seems to come out of it, it feels odd to alter the EL2 state for something that eventually returns an error. Instead, use the fact that we know exactly which bits of the __kvm_hyp_init call are non-zero to perform the check with a series of EOR/ROR instructions, combined with a build-time check that the value is the one we expect. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-29coresight: cti: Initialize dynamic sysfs attributesSuzuki K Poulose1-0/+7
With LOCKDEP enabled, CTI driver triggers the following splat due to uninitialized lock class for dynamically allocated attribute objects. [ 5.372901] coresight etm0: CPU0: ETM v4.0 initialized [ 5.376694] coresight etm1: CPU1: ETM v4.0 initialized [ 5.380785] coresight etm2: CPU2: ETM v4.0 initialized [ 5.385851] coresight etm3: CPU3: ETM v4.0 initialized [ 5.389808] BUG: key ffff00000564a798 has not been registered! [ 5.392456] ------------[ cut here ]------------ [ 5.398195] DEBUG_LOCKS_WARN_ON(1) [ 5.398233] WARNING: CPU: 1 PID: 32 at kernel/locking/lockdep.c:4623 lockdep_init_map_waits+0x14c/0x260 [ 5.406149] Modules linked in: [ 5.415411] CPU: 1 PID: 32 Comm: kworker/1:1 Not tainted 5.9.0-12034-gbbe85027ce80 #51 [ 5.418553] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.426453] Workqueue: events amba_deferred_retry_func [ 5.433299] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--) [ 5.438252] pc : lockdep_init_map_waits+0x14c/0x260 [ 5.444410] lr : lockdep_init_map_waits+0x14c/0x260 [ 5.449007] sp : ffff800012bbb720 ... [ 5.531561] Call trace: [ 5.536847] lockdep_init_map_waits+0x14c/0x260 [ 5.539027] __kernfs_create_file+0xa8/0x1c8 [ 5.543539] sysfs_add_file_mode_ns+0xd0/0x208 [ 5.548054] internal_create_group+0x118/0x3c8 [ 5.552307] internal_create_groups+0x58/0xb8 [ 5.556733] sysfs_create_groups+0x2c/0x38 [ 5.561160] device_add+0x2d8/0x768 [ 5.565148] device_register+0x28/0x38 [ 5.568537] coresight_register+0xf8/0x320 [ 5.572358] cti_probe+0x1b0/0x3f0 ... Fix this by initializing the attributes when they are allocated. Fixes: 3c5597e39812 ("coresight: cti: Add connection information to sysfs") Reported-by: Leo Yan <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: Mike Leach <[email protected]> Cc: Mathieu Poirier <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Cc: stable <[email protected]> Signed-off-by: Mathieu Poirier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-10-29coresight: Fix uninitialised pointer bug in etm_setup_aux()Mike Leach1-1/+1
Commit [bb1860efc817] changed the sink handling code introducing an uninitialised pointer bug. This results in the default sink selection failing. Prior to commit: static void etm_setup_aux(...) <snip> struct coresight_device *sink; <snip> /* First get the selected sink from user space. */ if (event->attr.config2) { id = (u32)event->attr.config2; sink = coresight_get_sink_by_id(id); } else { sink = coresight_get_enabled_sink(true); } <ctd> *sink always initialised - possibly to NULL which triggers the automatic sink selection. After commit: static void etm_setup_aux(...) <snip> struct coresight_device *sink; <snip> /* First get the selected sink from user space. */ if (event->attr.config2) { id = (u32)event->attr.config2; sink = coresight_get_sink_by_id(id); } <ctd> *sink pointer uninitialised when not providing a sink on the perf command line. This breaks later checks to enable automatic sink selection. Fixes: bb1860efc817 ("coresight: etm: perf: Sink selection using sysfs is deprecated") Signed-off-by: Mike Leach <[email protected]> Signed-off-by: Mathieu Poirier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-10-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds13-44/+103
Pull rdma fixes from Jason Gunthorpe: "The good news is people are testing rc1 in the RDMA world - the bad news is testing of the for-next area is not as good as I had hoped, as we really should have caught at least the rdma_connect_locked() issue before now. Notable merge window regressions that didn't get caught/fixed in time for rc1: - Fix in kernel users of rxe, they were broken by the rapid fix to undo the uABI breakage in rxe from another patch - EFA userspace needs to read the GID table but was broken with the new GID table logic - Fix user triggerable deadlock in mlx5 using devlink reload - Fix deadlock in several ULPs using rdma_connect from the CM handler callbacks - Memory leak in qedr" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/qedr: Fix memory leak in iWARP CM RDMA: Add rdma_connect_locked() RDMA/uverbs: Fix false error in query gid IOCTL RDMA/mlx5: Fix devlink deadlock on net namespace deletion RDMA/rxe: Fix small problem in network_type patch
2020-10-29r8169: fix issue with forced threading in combination with shared interruptsHeiner Kallweit1-2/+2
As reported by Serge flag IRQF_NO_THREAD causes an error if the interrupt is actually shared and the other driver(s) don't have this flag set. This situation can occur if a PCI(e) legacy interrupt is used in combination with forced threading. There's no good way to deal with this properly, therefore we have to remove flag IRQF_NO_THREAD. For fixing the original forced threading issue switch to napi_schedule(). Fixes: 424a646e072a ("r8169: fix operation under forced interrupt threading") Link: https://www.spinics.net/lists/netdev/msg694960.html Reported-by: Serge Belyshev <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Tested-by: Serge Belyshev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-29netem: fix zero division in tabledistAleksandr Nogikh1-1/+8
Currently it is possible to craft a special netlink RTM_NEWQDISC command that can result in jitter being equal to 0x80000000. It is enough to set the 32 bit jitter to 0x02000000 (it will later be multiplied by 2^6) or just set the 64 bit jitter via TCA_NETEM_JITTER64. This causes an overflow during the generation of uniformly distributed numbers in tabledist(), which in turn leads to division by zero (sigma != 0, but sigma * 2 is 0). The related fragment of code needs 32-bit division - see commit 9b0ed89 ("netem: remove unnecessary 64 bit modulus"), so switching to 64 bit is not an option. Fix the issue by keeping the value of jitter within the range that can be adequately handled by tabledist() - [0;INT_MAX]. As negative std deviation makes no sense, take the absolute value of the passed value and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit arithmetic in order to prevent overflows. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Aleksandr Nogikh <[email protected]> Reported-by: [email protected] Acked-by: Stephen Hemminger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-29ibmvnic: fix ibmvnic_set_macLijun Pan1-2/+6
Jakub Kicinski brought up a concern in ibmvnic_set_mac(). ibmvnic_set_mac() does this: ether_addr_copy(adapter->mac_addr, addr->sa_data); if (adapter->state != VNIC_PROBED) rc = __ibmvnic_set_mac(netdev, addr->sa_data); So if state == VNIC_PROBED, the user can assign an invalid address to adapter->mac_addr, and ibmvnic_set_mac() will still return 0. The fix is to validate ethernet address at the beginning of ibmvnic_set_mac(), and move the ether_addr_copy to the case of "adapter->state != VNIC_PROBED". Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters") Signed-off-by: Lijun Pan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-29x86/sev-es: Do not support MMIO to/from encrypted memoryJoerg Roedel1-7/+13
MMIO memory is usually not mapped encrypted, so there is no reason to support emulated MMIO when it is mapped encrypted. Prevent a possible hypervisor attack where a RAM page is mapped as an MMIO page in the nested page-table, so that any guest access to it will trigger a #VC exception and leak the data on that page to the hypervisor via the GHCB (like with valid MMIO). On the read side this attack would allow the HV to inject data into the guest. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29mptcp: add missing memory scheduling in the rx pathPaolo Abeni1-0/+10
When moving the skbs from the subflow into the msk receive queue, we must schedule there the required amount of memory. Try to borrow the required memory from the subflow, if needed, so that we leverage the existing TCP heuristic. Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/f6143a6193a083574f11b00dbf7b5ad151bc4ff4.1603810630.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-29drm/i915: Reject 90/270 degree rotated initial fbsVille Syrjälä1-0/+4
We don't currently handle the initial fb readout correctly for 90/270 degree rotated scanout. Reject it. Cc: [email protected] Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> (cherry picked from commit a40a8305a732f4ecc2186ac7ca132ba062ed770d) Signed-off-by: Rodrigo Vivi <[email protected]>
2020-10-29drm/i915: Restore ILK-M RPS supportVille Syrjälä1-0/+1
Restore RPS for ILK-M. We lost it when an extra HAS_RPS() check appeared in intel_rps_enable(). Unfortunaltey this just makes the performance worse on my ILK because intel_ips insists on limiting the GPU freq to the minimum. If we don't do the RPS init then intel_ips will not limit the frequency for whatever reason. Either it can't get at some required information and thus makes wrong decisions, or we mess up some weights/etc. and cause it to make the wrong decisions when RPS init has been done, or the entire thing is just wrong. Would require a bunch of reverse engineering to figure out what's going on. Cc: [email protected] Cc: Chris Wilson <[email protected]> Fixes: 9c878557b1eb ("drm/i915/gt: Use the RPM config register to determine clk frequencies") Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> (cherry picked from commit 2bf06370bcfb0dea5655e9a5ad460c7f7dca7739) Signed-off-by: Rodrigo Vivi <[email protected]>
2020-10-29drm/i915/region: fix max size calculationMatthew Auld3-2/+79
We are incorrectly limiting the max allocation size as per the mm max_order, which is effectively the largest power-of-two that we can fit in the region size. However, it's normal to setup the region or allocator with a non-power-of-two size(for example 3G), which we should already handle correctly, except it seems for the early too-big-check. v2: make sure we also exercise the I915_BO_ALLOC_CONTIGUOUS path, which is quite different, since for that we are actually limited by the largest power-of-two that we can fit within the region size. (Chris) Fixes: b908be543e44 ("drm/i915: support creating LMEM objects") Signed-off-by: Matthew Auld <[email protected]> Cc: Chris Wilson <[email protected]> Cc: CQ Tang <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 83ebef47f8ebe320d5c5673db82f9903a4f40a69) Signed-off-by: Rodrigo Vivi <[email protected]>
2020-10-29include: jhash/signal: Fix fall-through warnings for ClangGustavo A. R. Silva2-0/+4
In preparation to enable -Wimplicit-fallthrough for Clang, explicitly add break statements instead of letting the code fall through to the next case. This patch adds four break statements that, together, fix almost 40,000 warnings when building Linux 5.10-rc1 with Clang 12.0.0 and this[1] change reverted. Notice that in order to enable -Wimplicit-fallthrough for Clang, such change[1] is meant to be reverted at some point. So, this patch helps to move in that direction. Something important to mention is that there is currently a discrepancy between GCC and Clang when dealing with switch fall-through to empty case statements or to cases that only contain a break/continue/return statement[2][3][4]. Now that the -Wimplicit-fallthrough option has been globally enabled[5], any compiler should really warn on missing either a fallthrough annotation or any of the other case-terminating statements (break/continue/return/ goto) when falling through to the next case statement. Making exceptions to this introduces variation in case handling which may continue to lead to bugs, misunderstandings, and a general lack of robustness. The point of enabling options like -Wimplicit-fallthrough is to prevent human error and aid developers in spotting bugs before their code is even built/ submitted/committed, therefore eliminating classes of bugs. So, in order to really accomplish this, we should, and can, move in the direction of addressing any error-prone scenarios and get rid of the unintentional fallthrough bug-class in the kernel, entirely, even if there is some minor redundancy. Better to have explicit case-ending statements than continue to have exceptions where one must guess as to the right result. The compiler will eliminate any actual redundancy. [1] commit e2079e93f562c ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now") [2] https://github.com/ClangBuiltLinux/linux/issues/636 [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91432 [4] https://godbolt.org/z/xgkvIh [5] commit a035d552a93b ("Makefile: Globally enable fall-through warning") Co-developed-by: Kees Cook <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Gustavo A. R. Silva <[email protected]>
2020-10-29Merge tag 'afs-fixes-20201029' of ↵Linus Torvalds8-95/+188
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: - Fix copy_file_range() to an afs file now returning EINVAL if the splice_write file op isn't supplied. - Fix a deref-before-check in afs_unuse_cell(). - Fix a use-after-free in afs_xattr_get_acl(). - Fix afs to not try to clear PG_writeback when laundering a page. - Fix afs to take a ref on a page that it sets PG_private on and to drop that ref when clearing PG_private. This is done through recently added helpers. - Fix a page leak if write_begin() fails. - Fix afs_write_begin() to not alter the dirty region info stored in page->private, but rather do this in afs_write_end() instead when we know what we actually changed. - Fix afs_invalidatepage() to alter the dirty region info on a page when partial page invalidation occurs so that we don't inadvertantly include a span of zeros that will get written back if a page gets laundered due to a remote 3rd-party induced invalidation. We mustn't, however, reduce the dirty region if the page has been seen to be mapped (ie. we got called through the page_mkwrite vector) as the page might still be mapped and we might lose data if the file is extended again. - Fix the dirty region info to have a lower resolution if the size of the page is too large for this to be encoded (e.g. powerpc32 with 64K pages). Note that this might not be the ideal way to handle this, since it may allow some leakage of undirtied zero bytes to the server's copy in the case of a 3rd-party conflict. To aid the last two fixes, two additional changes: - Wrap the manipulations of the dirty region info stored in page->private into helper functions. - Alter the encoding of the dirty region so that the region bounds can be stored with one fewer bit, making a bit available for the indication of mappedness. * tag 'afs-fixes-20201029' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix dirty-region encoding on ppc32 with 64K pages afs: Fix afs_invalidatepage to adjust the dirty region afs: Alter dirty range encoding in page->private afs: Wrap page->private manipulations in inline functions afs: Fix where page->private is set during write afs: Fix page leak on afs_write_begin() failure afs: Fix to take ref on page when PG_private is set afs: Fix afs_launder_page to not clear PG_writeback afs: Fix a use after free in afs_xattr_get_acl() afs: Fix tracing deref-before-check afs: Fix copy_file_range()
2020-10-29x86/head/64: Check SEV encryption before switching to kernel page-tableJoerg Roedel2-0/+17
When SEV is enabled, the kernel requests the C-bit position again from the hypervisor to build its own page-table. Since the hypervisor is an untrusted source, the C-bit position needs to be verified before the kernel page-table is used. Call sev_verify_cbit() before writing the CR3. [ bp: Massage. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/boot/compressed/64: Check SEV encryption in 64-bit boot-pathJoerg Roedel4-0/+96
Check whether the hypervisor reported the correct C-bit when running as an SEV guest. Using a wrong C-bit position could be used to leak sensitive data from the guest to the hypervisor. The check function is in a separate file: arch/x86/kernel/sev_verify_cbit.S so that it can be re-used in the running kernel image. [ bp: Massage. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29tipc: fix memory leak caused by tipc_buf_append()Tung Nguyen1-3/+2
Commit ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()") replaced skb_unshare() with skb_copy() to not reduce the data reference counter of the original skb intentionally. This is not the correct way to handle the cloned skb because it causes memory leak in 2 following cases: 1/ Sending multicast messages via broadcast link The original skb list is cloned to the local skb list for local destination. After that, the data reference counter of each skb in the original list has the value of 2. This causes each skb not to be freed after receiving ACK: tipc_link_advance_transmq() { ... /* release skb */ __skb_unlink(skb, &l->transmq); kfree_skb(skb); <-- memory exists after being freed } 2/ Sending multicast messages via replicast link Similar to the above case, each skb cannot be freed after purging the skb list: tipc_mcast_xmit() { ... __skb_queue_purge(pkts); <-- memory exists after being freed } This commit fixes this issue by using skb_unshare() instead. Besides, to avoid use-after-free error reported by KASAN, the pointer to the fragment is set to NULL before calling skb_unshare() to make sure that the original skb is not freed after freeing the fragment 2 times in case skb_unshare() returns NULL. Fixes: ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()") Acked-by: Jon Maloy <[email protected]> Reported-by: Thang Hoang Ngo <[email protected]> Signed-off-by: Tung Nguyen <[email protected]> Reviewed-by: Xin Long <[email protected]> Acked-by: Cong Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-29gtp: fix an use-before-init in gtp_newlink()Masahiro Fujiwara1-8/+8
*_pdp_find() from gtp_encap_recv() would trigger a crash when a peer sends GTP packets while creating new GTP device. RIP: 0010:gtp1_pdp_find.isra.0+0x68/0x90 [gtp] <SNIP> Call Trace: <IRQ> gtp_encap_recv+0xc2/0x2e0 [gtp] ? gtp1_pdp_find.isra.0+0x90/0x90 [gtp] udp_queue_rcv_one_skb+0x1fe/0x530 udp_queue_rcv_skb+0x40/0x1b0 udp_unicast_rcv_skb.isra.0+0x78/0x90 __udp4_lib_rcv+0x5af/0xc70 udp_rcv+0x1a/0x20 ip_protocol_deliver_rcu+0xc5/0x1b0 ip_local_deliver_finish+0x48/0x50 ip_local_deliver+0xe5/0xf0 ? ip_protocol_deliver_rcu+0x1b0/0x1b0 gtp_encap_enable() should be called after gtp_hastable_new() otherwise *_pdp_find() will access the uninitialized hash table. Fixes: 1e3a3abd8b28 ("gtp: make GTP sockets in gtp_newlink optional") Signed-off-by: Masahiro Fujiwara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-29Merge tag 'ext4_for_linus_fixes' of ↵Linus Torvalds10-135/+78
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Bug fixes for the new ext4 fast commit feature, plus a fix for the 'data=journal' bug fix. Also use the generic casefolding support which has now landed in fs/libfs.c for 5.10" * tag 'ext4_for_linus_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: indicate that fast_commit is available via /sys/fs/ext4/feature/... ext4: use generic casefolding support ext4: do not use extent after put_bh ext4: use IS_ERR() for error checking of path ext4: fix mmap write protection for data=journal mode jbd2: fix a kernel-doc markup ext4: use s_mount_flags instead of s_mount_state for fast commit state ext4: make num of fast commit blocks configurable ext4: properly check for dirty state in ext4_inode_datasync_dirty() ext4: fix double locking in ext4_fc_commit_dentry_updates()
2020-10-29dma-mapping: fix 32-bit overflow with CONFIG_ARM_LPAE=nGeert Uytterhoeven1-3/+3
On r8a7791/koelsch and shmobile_defconfig, PCIe probing fails with: rcar-pcie fe000000.pcie: Adjusted size 0x0 invalid rcar-pcie: probe of fe000000.pcie failed with error -22 of_dma_get_range() returns the following map: cpu_start 0x40000000 dma_start 0x40000000 size 0x080000000 offset 0 cpu_start 0x00000000 dma_start 0x00000000 size 0x100000000 offset 0 If CONFIG_ARM_LPAE=n, dma_addr_t is 32-bit. Hence when assigning r->dma_start + r->size to dma_end, this value will be truncated to 32-bit, yielding zero when processing the second table entry. Consequently, both dma_start and dma_end will be zero, leading to a zero size. Fix this by changing the dma_start and dma_end variables from dma_addr_t to u64. Fixes: e0d072782c734d27 ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-10-29drm/i915: Use _MMIO_PIPE3() for ilk+ WM0_PIPE registersVille Syrjälä3-16/+12
Remove the hand rolled array of WM0_PIPE register offsets and use the standard _MMIO_PIPE3() instead. v2: Take care of gvt too Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Lucas De Marchi <[email protected]>
2020-10-29xfs: set xefi_discard when creating a deferred agfl free log intent itemDarrick J. Wong2-1/+2
Make sure that we actually initialize xefi_discard when we're scheduling a deferred free of an AGFL block. This was (eventually) found by the UBSAN while I was banging on realtime rmap problems, but it exists in the upstream codebase. While we're at it, rearrange the structure to reduce the struct size from 64 to 56 bytes. Fixes: fcb762f5de2e ("xfs: add bmapi nodiscard flag") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-10-29drm/ttm: nuke old page allocatorChristian König12-2633/+2
Not used any more. Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Madhav Chauhan <[email protected]> Tested-by: Huang Rui <[email protected]> Link: https://patchwork.freedesktop.org/patch/397087/?series=83051&rev=1
2020-10-29drm/vram_helpers: drop ttm_page_alloc.h includeChristian König1-1/+0
Not needed as far as I can see. Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Madhav Chauhan <[email protected]> Tested-by: Huang Rui <[email protected]> Link: https://patchwork.freedesktop.org/patch/397085/?series=83051&rev=1
2020-10-29drm/qxl: drop ttm_page_alloc.h includeChristian König1-1/+0
Not needed as far as I can see. Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Madhav Chauhan <[email protected]> Tested-by: Huang Rui <[email protected]> Link: https://patchwork.freedesktop.org/patch/397084/?series=83051&rev=1
2020-10-29drm/vmwgfx: switch to new allocatorChristian König2-37/+3
It should be able to handle all cases now. Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Madhav Chauhan <[email protected]> Tested-by: Huang Rui <[email protected]> Link: https://patchwork.freedesktop.org/patch/397083/?series=83051&rev=1
2020-10-29drm/nouveau: switch to new allocatorChristian König2-29/+2
It should be able to handle all cases now. Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Madhav Chauhan <[email protected]> Tested-by: Huang Rui <[email protected]> Link: https://patchwork.freedesktop.org/patch/397082/?series=83051&rev=1
2020-10-29drm/radeon: switch to new allocator v2Christian König1-38/+14
It should be able to handle all cases here. v2: fix debugfs as well Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Madhav Chauhan <[email protected]> Tested-by: Huang Rui <[email protected]> Link: https://patchwork.freedesktop.org/patch/397088/?series=83051&rev=1
2020-10-29drm/amdgpu: switch to new allocator v2Christian König1-31/+14
It should be able to handle all cases here. v2: fix debugfs as well Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Madhav Chauhan <[email protected]> Tested-by: Huang Rui <[email protected]> Link: https://patchwork.freedesktop.org/patch/397086/?series=83051&rev=1
2020-10-29drm/ttm: wire up the new pool as default one v2Christian König10-26/+36
Provide the necessary parameters by all drivers and use the new pool alloc when no driver specific function is provided. v2: fix the GEM VRAM helpers Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Madhav Chauhan <[email protected]> Tested-by: Huang Rui <[email protected]> Link: https://patchwork.freedesktop.org/patch/397081/?series=83051&rev=1
2020-10-29lib/scatterlist: use consistent sg_copy_buffer() return typeDavid Disseldorp1-1/+1
sg_copy_buffer() returns a size_t with the number of bytes copied. Return 0 instead of false if the copy is skipped. Signed-off-by: David Disseldorp <[email protected]> Reviewed-by: Douglas Gilbert <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-10-29Merge tag 'nvme-5.10-2020-10-29' of git://git.infradead.org/nvme into block-5.10Jens Axboe5-178/+127
Pull NVMe fixes from Christoph: "nvme updates for 5.10: - improve zone revalidation (Keith Busch) - gracefully handle zero length messages in nvme-rdma (zhenwei pi) - nvme-fc error handling fixes (James Smart) - nvmet tracing NULL pointer dereference fix (Chaitanya Kulkarni)" * tag 'nvme-5.10-2020-10-29' of git://git.infradead.org/nvme: nvmet: fix a NULL pointer dereference when tracing the flush command nvme-fc: remove nvme_fc_terminate_io() nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery nvme-fc: remove err_work work item nvme-fc: track error_recovery while connecting nvme-rdma: handle unexpected nvme completion data length nvme: ignore zone validate errors on subsequent scans
2020-10-29drm/ttm: new TT backend allocation pool v3Christian König5-1/+764
This replaces the spaghetti code in the two existing page pools. First of all depending on the allocation size it is between 3 (1GiB) and 5 (1MiB) times faster than the old implementation. It makes better use of buddy pages to allow for larger physical contiguous allocations which should result in better TLB utilization at least for amdgpu. Instead of a completely braindead approach of filling the pool with one CPU while another one is trying to shrink it we only give back freed pages. This also results in much less locking contention and a trylock free MM shrinker callback, so we can guarantee that pages are given back to the system when needed. Downside of this is that it takes longer for many small allocations until the pool is filled up. We could address this, but I couldn't find an use case where this actually matters. We also don't bother freeing large chunks of pages any more since the CPU overhead in that path isn't really that important. The sysfs files are replaced with a single module parameter, allowing users to override how many pages should be globally pooled in TTM. This unfortunately breaks the UAPI slightly, but as far as we know nobody ever depended on this. Zeroing memory coming from the pool was handled inconsistently. The alloc_pages() based pool was zeroing it, the dma_alloc_attr() based one wasn't. For now the new implementation isn't zeroing pages from the pool either and only sets the __GFP_ZERO flag when necessary. The implementation has only 768 lines of code compared to the over 2600 of the old one, and also allows for saving quite a bunch of code in the drivers since we don't need specialized handling there any more based on kernel config. Additional to all of that there was a neat bug with IOMMU, coherent DMA mappings and huge pages which is now fixed in the new code as well. v2: make ttm_pool_apply_caching static as reported by the kernel bot, add some more checks v3: fix some more checkpatch.pl warnings Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Madhav Chauhan <[email protected]> Tested-by: Huang Rui <[email protected]> Link: https://patchwork.freedesktop.org/patch/397080/?series=83051&rev=1
2020-10-29xsysace: use platform_get_resource() and platform_get_irq_optional()Andy Shevchenko1-23/+26
Use platform_get_resource() to fetch the memory resource and platform_get_irq_optional() to get optional IRQ instead of open-coded variants. IRQ is not supposed to be changed at runtime, so there is no functional change in ace_fsm_yieldirq(). On the other hand we now take first resources instead of last ones to proceed. I can't imagine how broken should be firmware to have a garbage in the first resource slots. But if it the case, it needs to be documented. Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Michal Simek <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-10-29afs: Fix dirty-region encoding on ppc32 with 64K pagesDavid Howells2-9/+20
The dirty region bounds stored in page->private on an afs page are 15 bits on a 32-bit box and can, at most, represent a range of up to 32K within a 32K page with a resolution of 1 byte. This is a problem for powerpc32 with 64K pages enabled. Further, transparent huge pages may get up to 2M, which will be a problem for the afs filesystem on all 32-bit arches in the future. Fix this by decreasing the resolution. For the moment, a 64K page will have a resolution determined from PAGE_SIZE. In the future, the page will need to be passed in to the helper functions so that the page size can be assessed and the resolution determined dynamically. Note that this might not be the ideal way to handle this, since it may allow some leakage of undirtied zero bytes to the server's copy in the case of a 3rd-party conflict. Fixing that would require a separately allocated record and is a more complicated fix. Fixes: 4343d00872e1 ("afs: Get rid of the afs_writeback record") Reported-by: kernel test robot <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
2020-10-29afs: Fix afs_invalidatepage to adjust the dirty regionDavid Howells4-14/+79
Fix afs_invalidatepage() to adjust the dirty region recorded in page->private when truncating a page. If the dirty region is entirely removed, then the private data is cleared and the page dirty state is cleared. Without this, if the page is truncated and then expanded again by truncate, zeros from the expanded, but no-longer dirty region may get written back to the server if the page gets laundered due to a conflicting 3rd-party write. It mustn't, however, shorten the dirty region of the page if that page is still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is stored in page->private to record this. Fixes: 4343d00872e1 ("afs: Get rid of the afs_writeback record") Signed-off-by: David Howells <[email protected]>
2020-10-29afs: Alter dirty range encoding in page->privateDavid Howells2-4/+4
Currently, page->private on an afs page is used to store the range of dirtied data within the page, where the range includes the lower bound, but excludes the upper bound (e.g. 0-1 is a range covering a single byte). This, however, requires a superfluous bit for the last-byte bound so that on a 4KiB page, it can say 0-4096 to indicate the whole page, the idea being that having both numbers the same would indicate an empty range. This is unnecessary as the PG_private bit is clear if it's an empty range (as is PG_dirty). Alter the way the dirty range is encoded in page->private such that the upper bound is reduced by 1 (e.g. 0-0 is then specified the same single byte range mentioned above). Applying this to both bounds frees up two bits, one of which can be used in a future commit. This allows the afs filesystem to be compiled on ppc32 with 64K pages; without this, the following warnings are seen: ../fs/afs/internal.h: In function 'afs_page_dirty_to': ../fs/afs/internal.h:881:15: warning: right shift count >= width of type [-Wshift-count-overflow] 881 | return (priv >> __AFS_PAGE_PRIV_SHIFT) & __AFS_PAGE_PRIV_MASK; | ^~ ../fs/afs/internal.h: In function 'afs_page_dirty': ../fs/afs/internal.h:886:28: warning: left shift count >= width of type [-Wshift-count-overflow] 886 | return ((unsigned long)to << __AFS_PAGE_PRIV_SHIFT) | from; | ^~ Fixes: 4343d00872e1 ("afs: Get rid of the afs_writeback record") Signed-off-by: David Howells <[email protected]>
2020-10-29afs: Wrap page->private manipulations in inline functionsDavid Howells3-34/+44
The afs filesystem uses page->private to store the dirty range within a page such that in the event of a conflicting 3rd-party write to the server, we write back just the bits that got changed locally. However, there are a couple of problems with this: (1) I need a bit to note if the page might be mapped so that partial invalidation doesn't shrink the range. (2) There aren't necessarily sufficient bits to store the entire range of data altered (say it's a 32-bit system with 64KiB pages or transparent huge pages are in use). So wrap the accesses in inline functions so that future commits can change how this works. Also move them out of the tracing header into the in-directory header. There's not really any need for them to be in the tracing header. Signed-off-by: David Howells <[email protected]>
2020-10-29afs: Fix where page->private is set during writeDavid Howells1-15/+26
In afs, page->private is set to indicate the dirty region of a page. This is done in afs_write_begin(), but that can't take account of whether the copy into the page actually worked. Fix this by moving the change of page->private into afs_write_end(). Fixes: 4343d00872e1 ("afs: Get rid of the afs_writeback record") Signed-off-by: David Howells <[email protected]>
2020-10-29afs: Fix page leak on afs_write_begin() failureDavid Howells1-12/+11
Fix the leak of the target page in afs_write_begin() when it fails. Fixes: 15b4650e55e0 ("afs: convert to new aops") Signed-off-by: David Howells <[email protected]> cc: Nick Piggin <[email protected]>
2020-10-29afs: Fix to take ref on page when PG_private is setDavid Howells4-26/+18
Fix afs to take a ref on a page when it sets PG_private on it and to drop the ref when removing the flag. Note that in afs_write_begin(), a lot of the time, PG_private is already set on a page to which we're going to add some data. In such a case, we leave the bit set and mustn't increment the page count. As suggested by Matthew Wilcox, use attach/detach_page_private() where possible. Fixes: 31143d5d515e ("AFS: implement basic file write support") Reported-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
2020-10-29null_blk: Fix locking in zoned modeDamien Le Moal2-25/+82
When the zoned mode is enabled in null_blk, Serializing read, write and zone management operations for each zone is necessary to protect device level information for managing zone resources (zone open and closed counters) as well as each zone condition and write pointer position. Commit 35bc10b2eafb ("null_blk: synchronization fix for zoned device") introduced a spinlock to implement this serialization. However, when memory backing is also enabled, GFP_NOIO memory allocations are executed under the spinlock, resulting in might_sleep() warnings. Furthermore, the zone_lock spinlock is locked/unlocked using spin_lock_irq/spin_unlock_irq, similarly to the memory backing code with the nullb->lock spinlock. This nested use of irq locks wrecks the irq enabled/disabled state. Fix all this by introducing a bitmap for per-zone lock, with locking implemented using wait_on_bit_lock_io() and clear_and_wake_up_bit(). This locking mechanism allows keeping a zone locked while executing null_process_cmd(), serializing all operations to the zone while allowing to sleep during memory backing allocation with GFP_NOIO. Device level zone resource management information is protected using a spinlock which is not held while executing null_process_cmd(); Fixes: 35bc10b2eafb ("null_blk: synchronization fix for zoned device") Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-10-29null_blk: Fix zone reset all tracingDamien Le Moal1-3/+8
In the cae of the REQ_OP_ZONE_RESET_ALL operation, the command sector is ignored and the operation is applied to all sequential zones. For these commands, tracing the effect of the command using the command sector to determine the target zone is thus incorrect. Fix null_zone_mgmt() zone condition tracing in the case of REQ_OP_ZONE_RESET_ALL to apply tracing to all sequential zones that are not already empty. Fixes: 766c3297d7e1 ("null_blk: add trace in null_blk_zoned.c") Signed-off-by: Damien Le Moal <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2020-10-29nbd: don't update block size after device is startedMing Lei1-4/+5
Mounted NBD device can be resized, one use case is rbd-nbd. Fix the issue by setting up default block size, then not touch it in nbd_size_update() any more. This kind of usage is aligned with loop which has same use case too. Cc: [email protected] Fixes: c8a83a6b54d0 ("nbd: Use set_blocksize() to set device blocksize") Reported-by: lining <[email protected]> Signed-off-by: Ming Lei <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Jan Kara <[email protected]> Tested-by: lining <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-10-29drm/i915: Remove unused variable retZou Wei1-2/+1
This patch fixes below warnings reported by coccicheck ./drivers/gpu/drm/i915/i915_debugfs.c:789:5-8: Unneeded variable: "ret". Return "0" on line 1012 Reported-by: Hulk Robot <[email protected]> Signed-off-by: Zou Wei <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-10-29cpufreq: schedutil: Always call driver if CPUFREQ_NEED_UPDATE_LIMITS is setRafael J. Wysocki1-2/+4
Because sugov_update_next_freq() may skip a frequency update even if the need_freq_update flag has been set for the policy at hand, policy limits updates may not take effect as expected. For example, if the intel_pstate driver operates in the passive mode with HWP enabled, it needs to update the HWP min and max limits when the policy min and max limits change, respectively, but that may not happen if the target frequency does not change along with the limit at hand. In particular, if the policy min is changed first, causing the target frequency to be adjusted to it, and the policy max limit is changed later to the same value, the HWP max limit will not be updated to follow it as expected, because the target frequency is still equal to the policy min limit and it will not change until that limit is updated. To address this issue, modify get_next_freq() to let the driver callback run if the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag is set regardless of whether or not the new frequency to set is equal to the previous one. Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") Reported-by: Zhang Rui <[email protected]> Tested-by: Zhang Rui <[email protected]> Cc: 5.9+ <[email protected]> # 5.9+: 1c534352f47f cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS ... Cc: 5.9+ <[email protected]> # 5.9+: a62f68f5ca53 cpufreq: Introduce cpufreq_driver_test_flags() Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Viresh Kumar <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2020-10-29cpufreq: Introduce cpufreq_driver_test_flags()Rafael J. Wysocki2-0/+13
Add a helper function to test the flags of the cpufreq driver in use againt a given flags mask. In particular, this will be needed to test the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag in the schedutil governor. Signed-off-by: Rafael J. Wysocki <[email protected]>
2020-10-29arm64: Add workaround for Arm Cortex-A77 erratum 1508412Rob Herring13-15/+66
On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load and a store exclusive or PAR_EL1 read can cause a deadlock. The workaround requires a DMB SY before and after a PAR_EL1 register read. In addition, it's possible an interrupt (doing a device read) or KVM guest exit could be taken between the DMB and PAR read, so we also need a DMB before returning from interrupt and before returning to a guest. A deadlock is still possible with the workaround as KVM guests must also have the workaround. IOW, a malicious guest can deadlock an affected systems. This workaround also depends on a firmware counterpart to enable the h/w to insert DMB SY after load and store exclusive instructions. See the errata document SDEN-1152370 v10 [1] for more information. [1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Acked-by: Marc Zyngier <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Suzuki K Poulose <[email protected]> Cc: Will Deacon <[email protected]> Cc: Julien Thierry <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-10-29arm64: Add part number for Arm Cortex-A77Rob Herring1-0/+2
Add the MIDR part number info for the Arm Cortex-A77. Signed-off-by: Rob Herring <[email protected]> Acked-by: Catalin Marinas <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-10-29x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handlerJoerg Roedel1-0/+26
The early #VC handler which doesn't have a GHCB can only handle CPUID exit codes. It is needed by the early boot code to handle #VC exceptions raised in verify_cpu() and to get the position of the C-bit. But the CPUID information comes from the hypervisor which is untrusted and might return results which trick the guest into the no-SEV boot path with no C-bit set in the page-tables. All data written to memory would then be unencrypted and could leak sensitive data to the hypervisor. Add sanity checks to the early #VC handler to make sure the hypervisor can not pretend that SEV is disabled. [ bp: Massage a bit. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]