aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-05-17x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}Borislav Petkov2-49/+44
Function bodies are very similar and are going to grow more almost identical code. Add a bool arg to determine whether SPEC_CTRL is being set for the guest or restored to the host. No functional changes. Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-05-17x86/speculation: Rework speculative_store_bypass_update()Thomas Gleixner3-4/+9
The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse speculative_store_bypass_update() to avoid code duplication. Add an argument for supplying a thread info (TIF) value and create a wrapper speculative_store_bypass_update_current() which is used at the existing call site. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-05-17x86/speculation: Add virtualized speculative store bypass disable supportTom Lendacky4-2/+18
Some AMD processors only support a non-architectural means of enabling speculative store bypass disable (SSBD). To allow a simplified view of this to a guest, an architectural definition has been created through a new CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a hypervisor can virtualize the existence of this definition and provide an architectural method for using SSBD to a guest. Add the new CPUID feature, the new MSR and update the existing SSBD support to use this MSR when present. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]>
2018-05-17x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRLThomas Gleixner4-9/+35
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-05-17x86/speculation: Handle HT correctly on AMDThomas Gleixner3-6/+130
The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when hyperthreading is enabled the SSBD bit toggle needs to take both cores into account. Otherwise the following situation can happen: CPU0 CPU1 disable SSB disable SSB enable SSB <- Enables it for the Core, i.e. for CPU0 as well So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled again. On Intel the SSBD control is per core as well, but the synchronization logic is implemented behind the per thread SPEC_CTRL MSR. It works like this: CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL i.e. if one of the threads enables a mitigation then this affects both and the mitigation is only disabled in the core when both threads disabled it. Add the necessary synchronization logic for AMD family 17H. Unfortunately that requires a spinlock to serialize the access to the MSR, but the locks are only shared between siblings. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-05-17x86/cpufeatures: Add FEATURE_ZENThomas Gleixner2-0/+2
Add a ZEN feature bit so family-dependent static_cpu_has() optimizations can be built for ZEN. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-05-17x86/cpufeatures: Disentangle SSBD enumerationThomas Gleixner6-16/+14
The SSBD enumeration is similarly to the other bits magically shared between Intel and AMD though the mechanisms are different. Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific features or family dependent setup. Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is controlled via MSR_SPEC_CTRL and fix up the usage sites. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-05-17x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRSThomas Gleixner4-9/+20
The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on Intel and implied by IBRS or STIBP support on AMD. That's just confusing and in case an AMD CPU has IBRS not supported because the underlying problem has been fixed but has another bit valid in the SPEC_CTRL MSR, the thing falls apart. Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the availability on both Intel and AMD. While at it replace the boot_cpu_has() checks with static_cpu_has() where possible. This prevents late microcode loading from exposing SPEC_CTRL, but late loading is already very limited as it does not reevaluate the mitigation options and other bits and pieces. Having static_cpu_has() is the simplest and least fragile solution. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-05-17x86/speculation: Use synthetic bits for IBRS/IBPB/STIBPBorislav Petkov5-23/+26
Intel and AMD have different CPUID bits hence for those use synthetic bits which get set on the respective vendor's in init_speculation_control(). So that debacles like what the commit message of c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") talks about don't happen anymore. Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Jörg Otte <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-05-17KVM: SVM: Move spec control call after restore of GSThomas Gleixner1-12/+12
svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current' to determine the host SSBD state of the thread. 'current' is GS based, but host GS is not yet restored and the access causes a triple fault. Move the call after the host GS restore. Fixes: 885f82bfbc6f x86/process: Allow runtime control of Speculative Store Bypass Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Acked-by: Paolo Bonzini <[email protected]>
2018-05-18powerpc/powernv: Fix NVRAM sleep in invalid context when crashingNicholas Piggin1-2/+12
Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context. Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: [email protected] # v3.2 Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-05-17MAINTAINERS: add entry for STM32 I2C driverPierre-Yves MORDRET1-0/+6
Add I2C/SMBUS Driver entry for STM32 family from ST Microelectronics. Signed-off-by: Pierre-Yves MORDRET <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2018-05-17btrfs: fix crash when trying to resume balance without the resume flagAnand Jain1-0/+9
We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance() only, which isn't called during the remount. So when resuming from the paused balance we hit the bug: kernel: kernel BUG at fs/btrfs/volumes.c:3890! :: kernel: balance_kthread+0x51/0x60 [btrfs] kernel: kthread+0x111/0x130 :: kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8 Reproducer: On a mounted filesystem: btrfs balance start --full-balance /btrfs btrfs balance pause /btrfs mount -o remount,ro /dev/sdb /btrfs mount -o remount,rw /dev/sdb /btrfs To fix this set the BTRFS_BALANCE_RESUME flag in btrfs_resume_balance_async(). CC: [email protected] # 4.4+ Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-17btrfs: Fix delalloc inodes invalidation during transaction abortNikolay Borisov1-11/+15
When a transaction is aborted btrfs_cleanup_transaction is called to cleanup all the various in-flight bits and pieces which migth be active. One of those is delalloc inodes - inodes which have dirty pages which haven't been persisted yet. Currently the process of freeing such delalloc inodes in exceptional circumstances such as transaction abort boiled down to calling btrfs_invalidate_inodes whose sole job is to invalidate the dentries for all inodes related to a root. This is in fact wrong and insufficient since such delalloc inodes will likely have pending pages or ordered-extents and will be linked to the sb->s_inode_list. This means that unmounting a btrfs instance with an aborted transaction could potentially lead inodes/their pages visible to the system long after their superblock has been freed. This in turn leads to a "use-after-free" situation once page shrink is triggered. This situation could be simulated by running generic/019 which would cause such inodes to be left hanging, followed by generic/176 which causes memory pressure and page eviction which lead to touching the freed super block instance. This situation is additionally detected by the unmount code of VFS with the following message: "VFS: Busy inodes after unmount of Self-destruct in 5 seconds. Have a nice day..." Additionally btrfs hits WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); in free_fs_root for the same reason. This patch aims to rectify the sitaution by doing the following: 1. Change btrfs_destroy_delalloc_inodes so that it calls invalidate_inode_pages2 for every inode on the delalloc list, this ensures that all the pages of the inode are released. This function boils down to calling btrfs_releasepage. During test I observed cases where inodes on the delalloc list were having an i_count of 0, so this necessitates using igrab to be sure we are working on a non-freed inode. 2. Since calling btrfs_releasepage might queue delayed iputs move the call out to btrfs_cleanup_transaction in btrfs_error_commit_super before calling run_delayed_iputs for the last time. This is necessary to ensure that delayed iputs are run. Note: this patch is tagged for 4.14 stable but the fix applies to older versions too but needs to be backported manually due to conflicts. CC: [email protected] # 4.14.x: 2b8773313494: btrfs: Split btrfs_del_delalloc_inode into 2 functions CC: [email protected] # 4.14.x Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> [ add comment to igrab ] Signed-off-by: David Sterba <[email protected]>
2018-05-17btrfs: Split btrfs_del_delalloc_inode into 2 functionsNikolay Borisov2-3/+12
This is in preparation of fixing delalloc inodes leakage on transaction abort. Also export the new function. Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-17btrfs: fix reading stale metadata blocks after degraded raid1 mountsLiu Bo1-3/+3
If a btree block, aka. extent buffer, is not available in the extent buffer cache, it'll be read out from the disk instead, i.e. btrfs_search_slot() read_block_for_search() # hold parent and its lock, go to read child btrfs_release_path() read_tree_block() # read child Unfortunately, the parent lock got released before reading child, so commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had used 0 as parent transid to read the child block. It forces read_tree_block() not to check if parent transid is different with the generation id of the child that it reads out from disk. A simple PoC is included in btrfs/124, 0. A two-disk raid1 btrfs, 1. Right after mkfs.btrfs, block A is allocated to be device tree's root. 2. Mount this filesystem and put it in use, after a while, device tree's root got COW but block A hasn't been allocated/overwritten yet. 3. Umount it and reload the btrfs module to remove both disks from the global @fs_devices list. 4. mount -odegraded dev1 and write some data, so now block A is allocated to be a leaf in checksum tree. Note that only dev1 has the latest metadata of this filesystem. 5. Umount it and mount it again normally (with both disks), since raid1 can pick up one disk by the writer task's pid, if btrfs_search_slot() needs to read block A, dev2 which does NOT have the latest metadata might be read for block A, then we got a stale block A. 6. As parent transid is not checked, block A is marked as uptodate and put into the extent buffer cache, so the future search won't bother to read disk again, which means it'll make changes on this stale one and make it dirty and flush it onto disk. To avoid the problem, parent transid needs to be passed to read_tree_block(). In order to get a valid parent transid, we need to hold the parent's lock until finishing reading child. This patch needs to be slightly adapted for stable kernels, the &first_key parameter added to read_tree_block() is from 4.16+ (581c1760415c4). The fix is to replace 0 by 'gen'. Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race") CC: [email protected] # 4.4+ Signed-off-by: Liu Bo <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> [ update changelog ] Signed-off-by: David Sterba <[email protected]>
2018-05-17btrfs: property: Set incompat flag if lzo/zstd compression is setMisono Tomohiro1-4/+8
Incompat flag of LZO/ZSTD compression should be set at: 1. mount time (-o compress/compress-force) 2. when defrag is done 3. when property is set Currently 3. is missing and this commit adds this. This could lead to a filesystem that uses ZSTD but is not marked as such. If a kernel without a ZSTD support encounteres a ZSTD compressed extent, it will handle that but this could be confusing to the user. Typically the filesystem is mounted with the ZSTD option, but the discrepancy can arise when a filesystem is never mounted with ZSTD and then the property on some file is set (and some new extents are written). A simple mount with -o compress=zstd will fix that up on an unpatched kernel. Same goes for LZO, but this has been around for a very long time (2.6.37) so it's unlikely that a pre-LZO kernel would be used. Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: [email protected] # 4.14+ Signed-off-by: Tomohiro Misono <[email protected]> Reviewed-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> [ add user visible impact ] Signed-off-by: David Sterba <[email protected]>
2018-05-17Btrfs: fix duplicate extents after fsync of file with prealloc extentsFilipe Manana1-25/+112
In commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay"), on fsync, we started to always log all prealloc extents beyond an inode's i_size in order to avoid losing them after a power failure. However under some cases this can lead to the log replay code to create duplicate extent items, with different lengths, in the extent tree. That happens because, as of that commit, we can now log extent items based on extent maps that are not on the "modified" list of extent maps of the inode's extent map tree. Logging extent items based on extent maps is used during the fast fsync path to save time and for this to work reliably it requires that the extent maps are not merged with other adjacent extent maps - having the extent maps in the list of modified extents gives such guarantee. Consider the following example, captured during a long run of fsstress, which illustrates this problem. We have inode 271, in the filesystem tree (root 5), for which all of the following operations and discussion apply to. A buffered write starts at offset 312391 with a length of 933471 bytes (end offset at 1245862). At this point we have, for this inode, the following extent maps with the their field values: em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613, block_len 0, orig_block_len 0 em B, start 40960, orig_start 40960, len 376832, block_start 1106399232, block_len 376832, orig_block_len 376832 em C, start 417792, orig_start 417792, len 782336, block_start 18446744073709551613, block_len 0, orig_block_len 0 em D, start 1200128, orig_start 1200128, len 835584, block_start 1106776064, block_len 835584, orig_block_len 835584 em E, start 2035712, orig_start 2035712, len 245760, block_start 1107611648, block_len 245760, orig_block_len 245760 Extent map A corresponds to a hole and extent maps D and E correspond to preallocated extents. Extent map D ends where extent map E begins (1106776064 + 835584 = 1107611648), but these extent maps were not merged because they are in the inode's list of modified extent maps. An fsync against this inode is made, which triggers the fast path (BTRFS_INODE_NEEDS_FULL_SYNC is not set). This fsync triggers writeback of the data previously written using buffered IO, and when the respective ordered extent finishes, btrfs_drop_extents() is called against the (aligned) range 311296..1249279. This causes a split of extent map D at btrfs_drop_extent_cache(), replacing extent map D with a new extent map D', also added to the list of modified extents, with the following values: em D', start 1249280, orig_start of 1200128, block_start 1106825216 (= 1106776064 + 1249280 - 1200128), orig_block_len 835584, block_len 786432 (835584 - (1249280 - 1200128)) Then, during the fast fsync, btrfs_log_changed_extents() is called and extent maps D' and E are removed from the list of modified extents. The flag EXTENT_FLAG_LOGGING is also set on them. After the extents are logged clear_em_logging() is called on each of them, and that makes extent map E to be merged with extent map D' (try_merge_map()), resulting in D' being deleted and E adjusted to: em E, start 1249280, orig_start 1200128, len 1032192, block_start 1106825216, block_len 1032192, orig_block_len 245760 A direct IO write at offset 1847296 and length of 360448 bytes (end offset at 2207744) starts, and at that moment the following extent maps exist for our inode: em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613, block_len 0, orig_block_len 0 em B, start 40960, orig_start 40960, len 270336, block_start 1106399232, block_len 270336, orig_block_len 376832 em C, start 311296, orig_start 311296, len 937984, block_start 1112842240, block_len 937984, orig_block_len 937984 em E (prealloc), start 1249280, orig_start 1200128, len 1032192, block_start 1106825216, block_len 1032192, orig_block_len 245760 The dio write results in drop_extent_cache() being called twice. The first time for a range that starts at offset 1847296 and ends at offset 2035711 (length of 188416), which results in a double split of extent map E, replacing it with two new extent maps: em F, start 1249280, orig_start 1200128, block_start 1106825216, block_len 598016, orig_block_len 598016 em G, start 2035712, orig_start 1200128, block_start 1107611648, block_len 245760, orig_block_len 1032192 It also creates a new extent map that represents a part of the requested IO (through create_io_em()): em H, start 1847296, len 188416, block_start 1107423232, block_len 188416 The second call to drop_extent_cache() has a range with a start offset of 2035712 and end offset of 2207743 (length of 172032). This leads to replacing extent map G with a new extent map I with the following values: em I, start 2207744, orig_start 1200128, block_start 1107783680, block_len 73728, orig_block_len 1032192 It also creates a new extent map that represents the second part of the requested IO (through create_io_em()): em J, start 2035712, len 172032, block_start 1107611648, block_len 172032 The dio write set the inode's i_size to 2207744 bytes. After the dio write the inode has the following extent maps: em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613, block_len 0, orig_block_len 0 em B, start 40960, orig_start 40960, len 270336, block_start 1106399232, block_len 270336, orig_block_len 376832 em C, start 311296, orig_start 311296, len 937984, block_start 1112842240, block_len 937984, orig_block_len 937984 em F, start 1249280, orig_start 1200128, len 598016, block_start 1106825216, block_len 598016, orig_block_len 598016 em H, start 1847296, orig_start 1200128, len 188416, block_start 1107423232, block_len 188416, orig_block_len 835584 em J, start 2035712, orig_start 2035712, len 172032, block_start 1107611648, block_len 172032, orig_block_len 245760 em I, start 2207744, orig_start 1200128, len 73728, block_start 1107783680, block_len 73728, orig_block_len 1032192 Now do some change to the file, like adding a xattr for example and then fsync it again. This triggers a fast fsync path, and as of commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay"), we use the extent map I to log a file extent item because it's a prealloc extent and it starts at an offset matching the inode's i_size. However when we log it, we create a file extent item with a value for the disk byte location that is wrong, as can be seen from the following output of "btrfs inspect-internal dump-tree": item 1 key (271 EXTENT_DATA 2207744) itemoff 3782 itemsize 53 generation 22 type 2 (prealloc) prealloc data disk byte 1106776064 nr 1032192 prealloc data offset 1007616 nr 73728 Here the disk byte value corresponds to calculation based on some fields from the extent map I: 1106776064 = block_start (1107783680) - 1007616 (extent_offset) extent_offset = 2207744 (start) - 1200128 (orig_start) = 1007616 The disk byte value of 1106776064 clashes with disk byte values of the file extent items at offsets 1249280 and 1847296 in the fs tree: item 6 key (271 EXTENT_DATA 1249280) itemoff 3568 itemsize 53 generation 20 type 2 (prealloc) prealloc data disk byte 1106776064 nr 835584 prealloc data offset 49152 nr 598016 item 7 key (271 EXTENT_DATA 1847296) itemoff 3515 itemsize 53 generation 20 type 1 (regular) extent data disk byte 1106776064 nr 835584 extent data offset 647168 nr 188416 ram 835584 extent compression 0 (none) item 8 key (271 EXTENT_DATA 2035712) itemoff 3462 itemsize 53 generation 20 type 1 (regular) extent data disk byte 1107611648 nr 245760 extent data offset 0 nr 172032 ram 245760 extent compression 0 (none) item 9 key (271 EXTENT_DATA 2207744) itemoff 3409 itemsize 53 generation 20 type 2 (prealloc) prealloc data disk byte 1107611648 nr 245760 prealloc data offset 172032 nr 73728 Instead of the disk byte value of 1106776064, the value of 1107611648 should have been logged. Also the data offset value should have been 172032 and not 1007616. After a log replay we end up getting two extent items in the extent tree with different lengths, one of 835584, which is correct and existed before the log replay, and another one of 1032192 which is wrong and is based on the logged file extent item: item 12 key (1106776064 EXTENT_ITEM 835584) itemoff 3406 itemsize 53 refs 2 gen 15 flags DATA extent data backref root 5 objectid 271 offset 1200128 count 2 item 13 key (1106776064 EXTENT_ITEM 1032192) itemoff 3353 itemsize 53 refs 1 gen 22 flags DATA extent data backref root 5 objectid 271 offset 1200128 count 1 Obviously this leads to many problems and a filesystem check reports many errors: (...) checking extents Extent back ref already exists for 1106776064 parent 0 root 5 owner 271 offset 1200128 num_refs 1 extent item 1106776064 has multiple extent items ref mismatch on [1106776064 835584] extent item 2, found 3 Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 2 wanted 1 back 0x55b1d0ad7680 Backref 1106776064 root 5 owner 271 offset 1200128 num_refs 0 not found in extent tree Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 1 wanted 0 back 0x55b1d0ad4e70 Backref bytes do not match extent backref, bytenr=1106776064, ref bytes=835584, backref bytes=1032192 backpointer mismatch on [1106776064 835584] checking free space cache block group 1103101952 has wrong amount of free space failed to load free space cache for block group 1103101952 checking fs roots (...) So fix this by logging the prealloc extents beyond the inode's i_size based on searches in the subvolume tree instead of the extent maps. Fixes: 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay") CC: [email protected] # 4.14+ Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-17dmaengine: qcom: bam_dma: check if the runtime pm enabledSrinivas Kandagatla1-5/+13
Disabling pm runtime at probe is not sufficient to get BAM working on remotely controller instances. pm_runtime_get_sync() would return -EACCES in such cases. So check if runtime pm is enabled before returning error from bam functions. Fixes: 5b4a68952a89 ("dmaengine: qcom: bam_dma: disable runtime pm on remote controlled") Signed-off-by: Srinivas Kandagatla <[email protected]> Signed-off-by: Vinod Koul <[email protected]>
2018-05-17Merge branch 'vmwgfx-fixes-4.17' of ↵Dave Airlie1-0/+2
git://people.freedesktop.org/~thomash/linux into drm-fixes A single fix for a recent regression. * 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux: drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful
2018-05-17Merge tag 'drm-misc-fixes-2018-05-16' of ↵Dave Airlie3-4/+6
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes - core: Fix regression in dev node offsets (Haneen) - vc4: Fix memory leak on driver close (Eric) - dumb-buffers: Prevent overflow in DIV_ROUND_UP() (Dan) Cc: Haneen Mohammed <[email protected]> Cc: Eric Anholt <[email protected]> Cc: Dan Carpenter <[email protected]> * tag 'drm-misc-fixes-2018-05-16' of git://anongit.freedesktop.org/drm/drm-misc: drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl() drm/vc4: Fix leak of the file_priv that stored the perfmon. drm: Match sysfs name in link removal to link creation
2018-05-16Merge tag 'trace-v4.17-rc4-2' of ↵Linus Torvalds3-22/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Some of the ftrace internal events use a zero for a data size of a field event. This is increasingly important for the histogram trigger work that is being extended. While auditing trace events, I found that a couple of the xen events were used as just marking that a function was called, by creating a static array of size zero. This can play havoc with the tracing features if these events are used, because a zero size of a static array is denoted as a special nul terminated dynamic array (this is what the trace_marker code uses). But since the xen events have no size, they are not nul terminated, and unexpected results may occur. As trace events were never intended on being a marker to denote that a function was hit or not, especially since function tracing and kprobes can trivially do the same, the best course of action is to simply remove these events" * tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
2018-05-16tuntap: fix use after free during releaseJason Wang1-1/+1
After commit b196d88aba8a ("tun: fix use after free for ptr_ring") we need clean up tx ring during release(). But unfortunately, it tries to do the cleanup blindly after socket were destroyed which will lead another use-after-free. Fix this by doing the cleanup before dropping the last reference of the socket in __tun_detach(). Reported-by: Andrei Vagin <[email protected]> Acked-by: Andrei Vagin <[email protected]> Fixes: b196d88aba8a ("tun: fix use after free for ptr_ring") Signed-off-by: Jason Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16Merge branch 'qed-LL2-fixes'David S. Miller1-11/+50
Michal Kalderon says: ==================== qed: LL2 fixes This series fixes some issues in ll2 related to synchronization and resource freeing ==================== Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16qed: Fix LL2 race during connection terminateMichal Kalderon1-10/+14
Stress on qedi/qedr load unload lead to list_del corruption. This is due to ll2 connection terminate freeing resources without verifying that no more ll2 processing will occur. This patch unregisters the ll2 status block before terminating the connection to assure this race does not occur. Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling") Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16qed: Fix possibility of list corruption during rmmod flowsMichal Kalderon1-1/+10
The ll2 flows of flushing the txq/rxq need to be synchronized with the regular fp processing. Caused list corruption during load/unload stress tests. Fixes: 0a7fb11c23c0f ("qed: Add Light L2 support") Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16qed: LL2 flush isles when connection is closedMichal Kalderon1-0/+26
Driver should free all pending isles once it gets a FLUSH cqe from FW. Part of iSCSI out of order flow. Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling") Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16net/sched: fix refcnt leak in the error path of tcf_vlan_init()Davide Caratti1-0/+2
Similarly to what was done with commit a52956dfc503 ("net sched actions: fix refcnt leak in skbmod"), fix the error path of tcf_vlan_init() to avoid refcnt leaks when wrong value of TCA_VLAN_PUSH_VLAN_PROTOCOL is given. Fixes: 5026c9b1bafc ("net sched: vlan action fix late binding") CC: Roman Mashak <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16net: 8390: ne: Fix accidentally removed RBTX4927 supportGeert Uytterhoeven1-1/+3
The configuration settings for RBTX4927 were accidentally removed, leading to a silently broken network interface. Re-add the missing settings to fix this. Fixes: 8eb97ff5a4ec941d ("net: 8390: remove m32r specific bits") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16Merge branch 'dsa-bcm_sf2-CFP-fixes'David S. Miller1-14/+22
Florian Fainelli says: ==================== net: dsa: bcm_sf2: CFP fixes This patch series fixes a number of usability issues with the SF2 Compact Field Processor code: - we would not be properly bound checking the location when we let the kernel automatically place rules with RX_CLS_LOC_ANY - when using IPv6 rules and user space specifies a location identifier we would be off by one in what the chain ID (within the Broadcom tag) indicates - it would be possible to delete one of the two slices of an IPv6 while leaving the other one programming leading to various problems ==================== Signed-off-by: David S. Miller <[email protected]>
2018-05-16net: dsa: bcm_sf2: Fix IPv6 rule half deletionFlorian Fainelli1-4/+7
It was possible to delete only one half of an IPv6, which would leave the second half still programmed and possibly in use. Instead of checking for the unused bitmap, we need to check the unique bitmap, and refuse any deletion that does not match that criteria. We also need to move that check from bcm_sf2_cfp_rule_del_one() into its caller: bcm_sf2_cfp_rule_del() otherwise we would not be able to delete second halves anymore that would not pass the first test. Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16net: dsa: bcm_sf2: Fix IPv6 rules and chain IDFlorian Fainelli1-9/+11
We had several issues that would make the programming of IPv6 rules both inconsistent and error prone: - the chain ID that we would be asking the hardware to put in the packet's Broadcom tag would be off by one, it would return one of the two indexes, but not the one user-space specified - when an user specified a particular location to insert a CFP rule at, we would not be returning the same index, which would be confusing if nothing else - finally, like IPv4, it would be possible to overflow the last entry by re-programming it Fix this by swapping the usage of rule_index[0] and rule_index[1] where relevant in order to return a consistent and correct user-space experience. Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16net: dsa: bcm_sf2: Fix RX_CLS_LOC_ANY overwrite for last ruleFlorian Fainelli1-1/+4
When we let the kernel pick up a rule location with RX_CLS_LOC_ANY, we would be able to overwrite the last rules because of a number of issues. The IPv4 code path would not be checking that rule_index is within bounds, and it would also only be allowed to pick up rules from range 0..126 instead of the full 0..127 range. This would lead us to allow overwriting the last rule when we let the kernel pick-up the location. Fixes: 3306145866b6 ("net: dsa: bcm_sf2: Move IPv4 CFP processing to specific functions") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16Merge tag 'trace-v4.17-rc5-vsprintf' of ↵Linus Torvalds1-11/+15
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull memory barrier for from Steven Rostedt: "The memory barrier usage in updating the random ptr hash for %p in vsprintf is incorrect. Instead of adding the read memory barrier into vsprintf() which will cause a slight degradation to a commonly used function in the kernel just to solve a very unlikely race condition that can only happen at boot up, change the code from using a variable branch to a static_branch. Not only does this solve the race condition, it actually will improve the performance of vsprintf() by removing the conditional branch that is only needed at boot" * tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: vsprintf: Replace memory barrier with static_key for random_ptr_key update
2018-05-16usbip: usbip_host: fix bad unlock balance during stub_probe()Shuah Khan (Samsung OSG)1-1/+2
stub_probe() calls put_busid_priv() in an error path when device isn't found in the busid_table. Fix it by making put_busid_priv() safe to be called with null struct bus_id_priv pointer. This problem happens when "usbip bind" is run without loading usbip_host driver and then running modprobe. The first failed bind attempt unbinds the device from the original driver and when usbip_host is modprobed, stub_probe() runs and doesn't find the device in its busid table and calls put_busid_priv(0 with null bus_id_priv pointer. usbip-host 3-10.2: 3-10.2 is not in match_busid table... skip! [ 367.359679] ===================================== [ 367.359681] WARNING: bad unlock balance detected! [ 367.359683] 4.17.0-rc4+ #5 Not tainted [ 367.359685] ------------------------------------- [ 367.359688] modprobe/2768 is trying to release lock ( [ 367.359689] ================================================================== [ 367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110 [ 367.359699] Read of size 8 at addr 0000000000000058 by task modprobe/2768 [ 367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5 Fixes: 22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-05-16net: phy: micrel: add 125MHz reference clock workaroundMarkus Niebel2-0/+38
The micrel KSZ9031 phy has a optional clock pin (CLK125_NDO) which can be used as reference clock for the MAC unit. The clock signal must meet the RGMII requirements to ensure the correct data transmission between the MAC and the PHY. The KSZ9031 phy does not fulfill the duty cycle requirement if the phy is configured as slave. For a complete describtion look at the errata sheets: DS80000691D or DS80000692D. The errata sheet recommends to force the phy into master mode whenever there is a 1000Base-T link-up as work around. Only set the "micrel,force-master" property if you use the phy reference clock provided by CLK125_NDO pin as MAC reference clock in your application. Attenation, this workaround is only usable if the link partner can be configured to slave mode for 1000Base-T. Signed-off-by: Markus Niebel <[email protected]> [[email protected]: fix dt-binding documentation] [[email protected]: use already existing result var for read/write] [[email protected]: add error handling] [[email protected]: add more comments] Signed-off-by: Marco Felsch <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16tcp: purge write queue in tcp_connect_init()Eric Dumazet1-2/+5
syzkaller found a reliable way to crash the host, hitting a BUG() in __tcp_retransmit_skb() Malicous MSG_FASTOPEN is the root cause. We need to purge write queue in tcp_connect_init() at the point we init snd_una/write_seq. This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE() kernel BUG at net/ipv4/tcp_output.c:2837! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837 RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206 RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49 RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005 RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2 R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80 FS: 0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923 tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488 tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573 tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Signed-off-by: Eric Dumazet <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Neal Cardwell <[email protected]> Reported-by: syzbot <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16net/mlx5: Fix build break when CONFIG_SMP=nSaeed Mahameed1-11/+1
Avoid using the kernel's irq_descriptor and return IRQ vector affinity directly from the driver. This fixes the following build break when CONFIG_SMP=n include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’: include/linux/mlx5/driver.h:1299:13: error: ‘struct irq_desc’ has no member named ‘affinity_hint’ Fixes: 6082d9c9c94a ("net/mlx5: Fix mlx5_get_vector_affinity function") Signed-off-by: Saeed Mahameed <[email protected]> CC: Randy Dunlap <[email protected]> CC: Guenter Roeck <[email protected]> CC: Thomas Gleixner <[email protected]> Tested-by: Israel Rukshin <[email protected]> Reported-by: kbuild test robot <[email protected]> Reported-by: Randy Dunlap <[email protected]> Tested-by: Randy Dunlap <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16ipvlan: call netdevice notifier when master mac address changedKeefe Liu1-1/+3
When master device's mac has been changed, the commit 32c10bbfe914 ("ipvlan: always use the current L2 addr of the master") makes the IPVlan devices's mac changed also, but it doesn't do related works such as flush the IPVlan devices's arp table. Signed-off-by: Keefe Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-05-16drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl()Dan Carpenter1-3/+4
There is a comment here which says that DIV_ROUND_UP() and that's where the problem comes from. Say you pick: args->bpp = UINT_MAX - 7; args->width = 4; args->height = 1; The integer overflow in DIV_ROUND_UP() means "cpp" is UINT_MAX / 8 and because of how we picked args->width that means cpp < UINT_MAX / 4. I've fixed it by preventing the integer overflow in DIV_ROUND_UP(). I removed the check for !cpp because it's not possible after this change. I also changed all the 0xffffffffU references to U32_MAX. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/20180516140026.GA19340@mwanda
2018-05-16vsprintf: Replace memory barrier with static_key for random_ptr_key updateSteven Rostedt (VMware)1-11/+15
Reviewing Tobin's patches for getting pointers out early before entropy has been established, I noticed that there's a lone smp_mb() in the code. As with most lone memory barriers, this one appears to be incorrectly used. We currently basically have this: get_random_bytes(&ptr_key, sizeof(ptr_key)); /* * have_filled_random_ptr_key==true is dependent on get_random_bytes(). * ptr_to_id() needs to see have_filled_random_ptr_key==true * after get_random_bytes() returns. */ smp_mb(); WRITE_ONCE(have_filled_random_ptr_key, true); And later we have: if (unlikely(!have_filled_random_ptr_key)) return string(buf, end, "(ptrval)", spec); /* Missing memory barrier here. */ hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key); As the CPU can perform speculative loads, we could have a situation with the following: CPU0 CPU1 ---- ---- load ptr_key = 0 store ptr_key = random smp_mb() store have_filled_random_ptr_key load have_filled_random_ptr_key = true BAD BAD BAD! (you're so bad!) Because nothing prevents CPU1 from loading ptr_key before loading have_filled_random_ptr_key. But this race is very unlikely, but we can't keep an incorrect smp_mb() in place. Instead, replace the have_filled_random_ptr_key with a static_branch not_filled_random_ptr_key, that is initialized to true and changed to false when we get enough entropy. If the update happens in early boot, the static_key is updated immediately, otherwise it will have to wait till entropy is filled and this happens in an interrupt handler which can't enable a static_key, as that requires a preemptible context. In that case, a work_queue is used to enable it, as entropy already took too long to establish in the first place waiting a little more shouldn't hurt anything. The benefit of using the static key is that the unlikely branch in vsprintf() now becomes a nop. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: ad67b74d2469d ("printk: hash addresses printed with %p") Acked-by: Linus Torvalds <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-05-16x86/boot/compressed/64: Fix moving page table out of trampoline memoryKirill A. Shutemov2-11/+14
cleanup_trampoline() relocates the top-level page table out of trampoline memory. We use 'top_pgtable' as our new top-level page table. But if the 'top_pgtable' would be referenced from C in a usual way, the address of the table will be calculated relative to RIP. After kernel gets relocated, the address will be in the middle of decompression buffer and the page table may get overwritten. This leads to a crash. We calculate the address of other page tables relative to the relocation address. It makes them safe. We should do the same for 'top_pgtable'. Calculate the address of 'top_pgtable' in assembly and pass down to cleanup_trampoline(). Move the page table to .pgtable section where the rest of page tables are. The section is @nobits so we save 4k in kernel image. Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-05-16x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()Kirill A. Shutemov1-13/+55
Eric and Hugh have reported instant reboot due to my recent changes in decompression code. The root cause is that I didn't realize that we need to adjust GOT to be able to run C code that early. The problem is only visible with an older toolchain. Binutils >= 2.24 is able to eliminate GOT references by replacing them with RIP-relative address loads: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=80d873266dec We need to adjust GOT two times: - before calling paging_prepare() using the initial load address - before calling C code from the relocated kernel Reported-by: Eric Dumazet <[email protected]> Reported-by: Hugh Dickins <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: 194a9749c73d ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-05-16locking/percpu-rwsem: Annotate rwsem ownership transfer by setting ↵Waiman Long3-1/+13
RWSEM_OWNER_UNKNOWN The filesystem freezing code needs to transfer ownership of a rwsem embedded in a percpu-rwsem from the task that does the freezing to another one that does the thawing by calling percpu_rwsem_release() after freezing and percpu_rwsem_acquire() before thawing. However, the new rwsem debug code runs afoul with this scheme by warning that the task that releases the rwsem isn't the one that acquires it, as reported by Amir Goldstein: DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) WARNING: CPU: 1 PID: 1401 at /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79 Call Trace: percpu_up_write+0x1f/0x28 thaw_super_locked+0xdf/0x120 do_vfs_ioctl+0x270/0x5f1 ksys_ioctl+0x52/0x71 __x64_sys_ioctl+0x16/0x19 do_syscall_64+0x5d/0x167 entry_SYSCALL_64_after_hwframe+0x49/0xbe To work properly with the rwsem debug code, we need to annotate that the rwsem ownership is unknown during the tranfer period until a brave soul comes forward to acquire the ownership. During that period, optimistic spinning will be disabled. Reported-by: Amir Goldstein <[email protected]> Tested-by: Amir Goldstein <[email protected]> Signed-off-by: Waiman Long <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Jan Kara <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Theodore Y. Ts'o <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-05-16locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flagWaiman Long3-21/+28
There are use cases where a rwsem can be acquired by one task, but released by another task. In thess cases, optimistic spinning may need to be disabled. One example will be the filesystem freeze/thaw code where the task that freezes the filesystem will acquire a write lock on a rwsem and then un-owns it before returning to userspace. Later on, another task will come along, acquire the ownership, thaw the filesystem and release the rwsem. Bit 0 of the owner field was used to designate that it is a reader owned rwsem. It is now repurposed to mean that the owner of the rwsem is not known. If only bit 0 is set, the rwsem is reader owned. If bit 0 and other bits are set, it is writer owned with an unknown owner. One such value for the latter case is (-1L). So we can set owner to 1 for reader-owned, -1 for writer-owned. The owner is unknown in both cases. To handle transfer of rwsem ownership, the higher level code should set the owner field to -1 to indicate a write-locked rwsem with unknown owner. Optimistic spinning will be disabled in this case. Once the higher level code figures who the new owner is, it can then set the owner field accordingly. Tested-by: Amir Goldstein <[email protected]> Signed-off-by: Waiman Long <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Jan Kara <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Theodore Y. Ts'o <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-05-16drm/i915/gen9: Add WaClearHIZ_WM_CHICKEN3 for bxt and glkMichel Thierry2-0/+7
Factor in clear values wherever required while updating destination min/max. References: HSDES#1604444184 Signed-off-by: Michel Thierry <[email protected]> Cc: [email protected] Cc: Mika Kuoppala <[email protected]> Cc: Oscar Mateo <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Cc: [email protected] Cc: Joonas Lahtinen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (backported from commit 0c79f9cb77eae28d48a4f9fc1b3341aacbbd260c) Signed-off-by: Joonas Lahtinen <[email protected]>
2018-05-16drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successfulDeepak Rawat1-0/+2
SOU primary plane prepare_fb hook depends upon dmabuf_size to pin up BO (and not call a new vmw_dmabuf_init) when a new fb size is same as current fb. This was changed in a recent commit which is causing page_flip to fail on VM with low display memory and multi-mon failure when cycle monitors from secondary display. Cc: <[email protected]> # 4.14, 4.16 Fixes: 20fb5a635a0c ("drm/vmwgfx: Unpin the screen object backup buffer when not used") Signed-off-by: Deepak Rawat <[email protected]> Reviewed-by: Sinclair Yeh <[email protected]> Signed-off-by: Thomas Hellstrom <[email protected]>
2018-05-15IB/umem: Use the correct mm during ib_umem_releaseLidong Chen2-7/+1
User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads. If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has exited, get_pid_task will return NULL and ib_umem_release will not decrease mm->pinned_vm. Instead of using threads to locate the mm, use the overall tgid from the ib_ucontext struct instead. This matches the behavior of ODP and disassociate in handling the mm of the process that called ibv_reg_mr. Cc: <[email protected]> Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") Signed-off-by: Lidong Chen <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-05-15clk: stm32: fix: stm32 clock drivers are not compiled by defaultGabriel Fernandez1-4/+2
Clock driver is mandatory if the machine is selected. Then don't use 'bool' and 'depends on' commands, but 'def_bool' with the machine(s). Fixes: da32d3539fca ("clk: stm32: add configuration flags for each of the stm32 drivers") Signed-off-by: Gabriel Fernandez <[email protected]> Acked-by: Alexandre TORGUE <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>
2018-05-15clk: imx6ull: use OSC clock during AXI rate changeStefan Agner1-1/+1
On i.MX6 ULL using PLL3 seems to cause a freeze when setting the parent to IMX6UL_CLK_PLL3_USB_OTG. This only seems to appear since commit 6f9575e55632 ("clk: imx: Add CLK_IS_CRITICAL flag for busy divider and busy mux"), probably because the clock is now forced to be on. Fixes: 6f9575e55632("clk: imx: Add CLK_IS_CRITICAL flag for busy divider and busy mux") Signed-off-by: Stefan Agner <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>