aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-06-22btrfs: ensure relocation never runs while we have send operations runningFilipe Manana6-19/+32
Relocation and send do not play well together because while send is running a block group can be relocated, a transaction committed and the respective disk extents get re-allocated and written to or discarded while send is about to do something with the extents. This was explained in commit 9e967495e0e0ae ("Btrfs: prevent send failures and crashes due to concurrent relocation"), which prevented balance and send from running in parallel but it did not address one remaining case where chunk relocation can happen: shrinking a device (and device deletion which shrinks a device's size to 0 before deleting the device). We also have now one more case where relocation is triggered: on zoned filesystems partially used block groups get relocated by a background thread, introduced in commit 18bb8bbf13c183 ("btrfs: zoned: automatically reclaim zones"). So make sure that instead of preventing balance from running when there are ongoing send operations, we prevent relocation from happening. This uses the infrastructure recently added by a patch that has the subject: "btrfs: add cancellable chunk relocation support". Also it adds a spinlock used exclusively for the exclusivity between send and relocation, as before fs_info->balance_mutex was used, which would make an attempt to run send to block waiting for balance to finish, which can take a lot of time on large filesystems. Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: shorten integrity checker extent data mount optionDavid Sterba3-6/+4
Subjectively, CHECK_INTEGRITY_INCLUDING_EXTENT_DATA is quite long and calling it CHECK_INTEGRITY_DATA still keeps the meaning and matches the mount option name. Reviewed-by: Anand Jain <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: switch mount option bits to enums and use wider typeDavid Sterba1-32/+33
Switch defines of BTRFS_MOUNT_* to an enum (the symbolic names are recorded in the debugging information for convenience). There are two more things done but separating them would not make much sense as it's touching the same lines: - Renumber shifts 18..31 to 17..30 to get rid of the hole in the sequence. - Use 1UL as the value that gets shifted because we're approaching the 32bit limit and due to integer promotions the value of (1 << 31) becomes 0xffffffff80000000 when cast to unsigned long (eg. the option manipulating helpers). This is not causing any problems yet as the operations are in-memory and masking the 31st bit works, we don't have more than 31 bits so the ill effects of not masking higher bits don't happen. But once we have more, the problems will emerge. Reviewed-by: Anand Jain <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: props: change how empty value is interpretedDavid Sterba1-0/+14
Based on user feedback and actual problems with compression property, there's no support to unset any compression options, or to force no compression flag. Note: This has changed recently in e2fsprogs 1.46.2, 'chattr +m' (setting NOCOMPRESS). In btrfs properties, the empty value should really mean reset to defaults, for all properties in general. Right now there's only the compression one, so this change should not cause too many problems. Old behaviour: $ lsattr file ---------------------- file # the NOCOMPRESS bit is set $ btrfs prop set file compression '' $ lsattr file ---------------------m file This is equivalent to 'btrfs prop set file compression no' in current btrfs-progs as the 'no' or 'none' values are translated to an empty string. This is where the new behaviour is different: empty string drops the compression flag (-c) and nocompress (-m): $ lsattr file ---------------------- file # No change $ btrfs prop set file compression '' $ lsattr file ---------------------- file $ btrfs prop set file compression lzo $ lsattr file --------c------------- file $ btrfs prop get file compression compression=lzo $ btrfs prop set file compression '' # Reset to the initial state $ lsattr file ---------------------- file # Set NOCOMPRESS bit $ btrfs prop set file compression no $ lsattr file ---------------------m file This obviously brings problems with backward compatibility, so this patch should not be backported without making sure the updated btrfs-progs are also used and that scripts have been updated to use the new semantics. Summary: - old kernel: no, none, "" - set NOCOMPRESS bit - new kernel: no, none - set NOCOMPRESS bit "" - drop all compression flags, ie. COMPRESS and NOCOMPRESS Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: compression: don't try to compress if we don't have enough pagesDavid Sterba1-1/+1
The early check if we should attempt compression does not take into account the number of input pages. It can happen that there's only one page, eg. a tail page after some ranges of the BTRFS_MAX_UNCOMPRESSED have been processed, or an isolated page that won't be converted to an inline extent. The single page would be compressed but a later check would drop it again because the result size must be at least one block shorter than the input. That can never work with just one page. CC: [email protected] # 4.4+ Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: fix unbalanced unlock in qgroup_account_snapshot()Naohiro Aota1-1/+1
qgroup_account_snapshot() is trying to unlock the not taken tree_log_mutex in a error path. Since ret != 0 in this case, we can just return from here. Fixes: 2a4d84c11a87 ("btrfs: move delayed ref flushing for qgroup into qgroup helper") CC: [email protected] # 5.12+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: sysfs: export dev stats in devinfo directoryDavid Sterba1-0/+29
The device stats can be read by ioctl, wrapped by command 'btrfs device stats'. Provide another source where to read the information in /sys/fs/btrfs/FSID/devinfo/DEVID/error_stats . The format is a list of 'key value' pairs one per line, which is common in other stat files. The names are the same as used in other device stat outputs. The stats are all in one file as it's the snapshot of all available stats. The 'one value per file' format is not very suitable here. The stats should be valid right after the stats item is read from disk, shortly after initializing the device. In case the stats are not yet valid, print just 'invalid' as the file contents. Reviewed-by: Anand Jain <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: fix typos in commentsDavid Sterba21-32/+32
Fix typos that have snuck in since the last round. Found by codespell. Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: remove a stale comment for btrfs_decompress_bio()Qu Wenruo1-14/+0
Since commit 8140dc30a432 ("btrfs: btrfs_decompress_bio() could accept compressed_bio instead"), btrfs_decompress_bio() accepts "struct compressed_bio" other than open-coded parameter list. Thus the comments for the parameter list is no longer needed. Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: send: use list_move_tail instead of list_del/list_add_tailBaokun Li1-11/+7
Use list_move_tail() instead of list_del() + list_add_tail() as it's doing the same thing and allows further cleanups. Open code name_cache_used() as there is only one user. Reported-by: Hulk Robot <[email protected]> Signed-off-by: Baokun Li <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: disable build on platforms having page size 256KChristophe Leroy1-0/+2
With a config having PAGE_SIZE set to 256K, BTRFS build fails with the following message include/linux/compiler_types.h:326:38: error: call to '__compiletime_assert_791' declared with attribute error: BUILD_BUG_ON failed: (BTRFS_MAX_COMPRESSED % PAGE_SIZE) != 0 BTRFS_MAX_COMPRESSED being 128K, BTRFS cannot support platforms with 256K pages at the time being. There are two platforms that can select 256K pages: - hexagon - powerpc Disable BTRFS when 256K page size is selected. Supporting this would require changes to the subpage mode that's currently being developed. Given that 256K is many times larger than page sizes commonly used and for what the algorithms and structures have been tuned, it's out of scope and disabling build is a reasonable option. Reported-by: kernel test robot <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> [ update changelog ] Signed-off-by: David Sterba <[email protected]>
2021-06-22btrfs: send: fix invalid path for unlink operations after parent orphanizationFilipe Manana1-0/+11
During an incremental send operation, when processing the new references for the current inode, we might send an unlink operation for another inode that has a conflicting path and has more than one hard link. However this path was computed and cached before we processed previous new references for the current inode. We may have orphanized a directory of that path while processing a previous new reference, in which case the path will be invalid and cause the receiver process to fail. The following reproducer triggers the problem and explains how/why it happens in its comments: $ cat test-send-unlink.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV >/dev/null mount $DEV $MNT # Create our test files and directory. Inode 259 (file3) has two hard # links. touch $MNT/file1 touch $MNT/file2 touch $MNT/file3 mkdir $MNT/A ln $MNT/file3 $MNT/A/hard_link # Filesystem looks like: # # . (ino 256) # |----- file1 (ino 257) # |----- file2 (ino 258) # |----- file3 (ino 259) # |----- A/ (ino 260) # |---- hard_link (ino 259) # # Now create the base snapshot, which is going to be the parent snapshot # for a later incremental send. btrfs subvolume snapshot -r $MNT $MNT/snap1 btrfs send -f /tmp/snap1.send $MNT/snap1 # Move inode 257 into directory inode 260. This results in computing the # path for inode 260 as "/A" and caching it. mv $MNT/file1 $MNT/A/file1 # Move inode 258 (file2) into directory inode 260, with a name of # "hard_link", moving first inode 259 away since it currently has that # location and name. mv $MNT/A/hard_link $MNT/tmp mv $MNT/file2 $MNT/A/hard_link # Now rename inode 260 to something else (B for example) and then create # a hard link for inode 258 that has the old name and location of inode # 260 ("/A"). mv $MNT/A $MNT/B ln $MNT/B/hard_link $MNT/A # Filesystem now looks like: # # . (ino 256) # |----- tmp (ino 259) # |----- file3 (ino 259) # |----- B/ (ino 260) # | |---- file1 (ino 257) # | |---- hard_link (ino 258) # | # |----- A (ino 258) # Create another snapshot of our subvolume and use it for an incremental # send. btrfs subvolume snapshot -r $MNT $MNT/snap2 btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2 # Now unmount the filesystem, create a new one, mount it and try to # apply both send streams to recreate both snapshots. umount $DEV mkfs.btrfs -f $DEV >/dev/null mount $DEV $MNT # First add the first snapshot to the new filesystem by applying the # first send stream. btrfs receive -f /tmp/snap1.send $MNT # The incremental receive operation below used to fail with the # following error: # # ERROR: unlink A/hard_link failed: No such file or directory # # This is because when send is processing inode 257, it generates the # path for inode 260 as "/A", since that inode is its parent in the send # snapshot, and caches that path. # # Later when processing inode 258, it first processes its new reference # that has the path of "/A", which results in orphanizing inode 260 # because there is a a path collision. This results in issuing a rename # operation from "/A" to "/o260-6-0". # # Finally when processing the new reference "B/hard_link" for inode 258, # it notices that it collides with inode 259 (not yet processed, because # it has a higher inode number), since that inode has the name # "hard_link" under the directory inode 260. It also checks that inode # 259 has two hardlinks, so it decides to issue a unlink operation for # the name "hard_link" for inode 259. However the path passed to the # unlink operation is "/A/hard_link", which is incorrect since currently # "/A" does not exists, due to the orphanization of inode 260 mentioned # before. The path is incorrect because it was computed and cached # before the orphanization. This results in the receiver to fail with # the above error. btrfs receive -f /tmp/snap2.send $MNT umount $MNT When running the test, it fails like this: $ ./test-send-unlink.sh Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1' At subvol /mnt/sdi/snap1 Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2' At subvol /mnt/sdi/snap2 At subvol snap1 At snapshot snap2 ERROR: unlink A/hard_link failed: No such file or directory Fix this by recomputing a path before issuing an unlink operation when processing the new references for the current inode if we previously have orphanized a directory. A test case for fstests will follow soon. CC: [email protected] # 4.4+ Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-06-22sched/fair: Ensure that the CFS parent is added after unthrottlingRik van Riel1-0/+28
Ensure that a CFS parent will be in the list whenever one of its children is also in the list. A warning on rq->tmp_alone_branch != &rq->leaf_cfs_rq_list has been reported while running LTP test cfs_bandwidth01. Odin Ugedal found the root cause: $ tree /sys/fs/cgroup/ltp/ -d --charset=ascii /sys/fs/cgroup/ltp/ |-- drain `-- test-6851 `-- level2 |-- level3a | |-- worker1 | `-- worker2 `-- level3b `-- worker3 Timeline (ish): - worker3 gets throttled - level3b is decayed, since it has no more load - level2 get throttled - worker3 get unthrottled - level2 get unthrottled - worker3 is added to list - level3b is not added to list, since nr_running==0 and is decayed [ Vincent Guittot: Rebased and updated to fix for the reported warning. ] Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") Reported-by: Sachin Sant <[email protected]> Suggested-by: Vincent Guittot <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Sachin Sant <[email protected]> Acked-by: Odin Ugedal <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22locking/lockdep: Improve noinstr vs errorsPeter Zijlstra3-2/+6
Better handle the failure paths. vmlinux.o: warning: objtool: debug_locks_off()+0x23: call to console_verbose() leaves .noinstr.text section vmlinux.o: warning: objtool: debug_locks_off()+0x19: call to __kasan_check_write() leaves .noinstr.text section debug_locks_off+0x19/0x40: instrument_atomic_write at include/linux/instrumented.h:86 (inlined by) __debug_locks_off at include/linux/debug_locks.h:17 (inlined by) debug_locks_off at lib/debug_locks.c:41 Fixes: 6eebad1ad303 ("lockdep: __always_inline more for noinstr") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22x86: Always inline task_size_max()Peter Zijlstra1-1/+1
Fix: vmlinux.o: warning: objtool: handle_bug()+0x10: call to task_size_max() leaves .noinstr.text section When #UD isn't a BUG, we shouldn't violate noinstr (we'll still probably die, but that's another story). Fixes: 025768a966a3 ("x86/cpu: Use alternative to generate the TASK_SIZE_MAX constant") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22x86/xen: Fix noinstr fail in exc_xen_unknown_trap()Peter Zijlstra1-0/+2
Fix: vmlinux.o: warning: objtool: exc_xen_unknown_trap()+0x7: call to printk() leaves .noinstr.text section Fixes: 2e92493637a0 ("x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22x86/xen: Fix noinstr fail in xen_pv_evtchn_do_upcall()Peter Zijlstra1-1/+2
Fix: vmlinux.o: warning: objtool: xen_pv_evtchn_do_upcall()+0x23: call to irq_enter_rcu() leaves .noinstr.text section Fixes: 359f01d1816f ("x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22x86/entry: Fix noinstr fail in __do_fast_syscall_32()Peter Zijlstra1-1/+1
Fix: vmlinux.o: warning: objtool: __do_fast_syscall_32()+0xf5: call to trace_hardirqs_off() leaves .noinstr.text section Fixes: 5d5675df792f ("x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22spi: dt-bindings: support devices with multiple chipselectsSebastian Reichel1-2/+5
Add binding support for devices, that have more than one chip select. A typical example are SPI connected microcontroller, that can also be programmed over SPI like NXP Kinetis or chips with a configuration and a data chip select, such as Microchip's MRF89XA transceiver. Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2021-06-22spi: add ancillary device supportSebastian Reichel2-31/+108
Introduce support for ancillary devices, similar to existing implementation for I2C. This is useful for devices having multiple chip-selects, for example some microcontrollers provide a normal SPI interface and a flashing SPI interface. Signed-off-by: Sebastian Reichel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2021-06-22ceph: fix error handling in ceph_atomic_open and ceph_lookupJeff Layton3-17/+21
Commit aa60cfc3f7ee broke the error handling in these functions such that they don't handle non-ENOENT errors from ceph_mdsc_do_request properly. Move the checking of -ENOENT out of ceph_handle_snapdir and into the callers, and if we get a different error, return it immediately. Fixes: aa60cfc3f7ee ("ceph: don't use d_add in ceph_handle_snapdir") Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2021-06-22ceph: must hold snap_rwsem when filling inode for async createJeff Layton2-0/+5
...and add a lockdep assertion for it to ceph_fill_inode(). Cc: [email protected] # v5.7+ Fixes: 9a8d03ca2e2c3 ("ceph: attempt to do async create when possible") Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2021-06-22regulator: hi6421v600: Fix setting wrong driver_dataAxel Lin1-11/+15
Current code set "config.driver_data = sreg" but sreg only init the mutex, the othere fields are just zero. Fix it by pass *info to config.driver_data so each regulator can get corresponding data by rdev_get_drvdata(). Separate enable_mutex from struct hi6421_spmi_reg_info since only need one mutex for the driver. Fixes: d2dfd50a0b57 ("staging: hikey9xx: hi6421v600-regulator: move LDO config from DT") Signed-off-by: Axel Lin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2021-06-22arm64: Restrict undef hook for cpufeature registersRaphael Gault1-2/+2
This commit modifies the mask of the mrs_hook declared in arch/arm64/kernel/cpufeatures.c which emulates only feature register access. This is necessary because this hook's mask was too large and thus masking any mrs instruction, even if not related to the emulated registers which made the pmu emulation inefficient. Signed-off-by: Raphael Gault <[email protected]> Signed-off-by: Rob Herring <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-06-22platform/x86: think-lmi: Return EINVAL when kbdlang gets set to a 0 length ↵Hans de Goede1-8/+3
string Commit 0ddcf3a6b442 ("platform/x86: think-lmi: Avoid potential read before start of the buffer") moved the length == 0 up to before stripping the '\n' which typically gets added when users echo a value to a sysfs-attribute from the shell. This avoids a potential buffer-underrun, but it also causes a behavioral change, prior to this change "echo > kbdlang", iow writing just a single '\n' would result in an EINVAL error, but after the change this gets accepted setting kbdlang to an empty string. Fix this by replacing the manual '\n' check with using strchrnul() to get the length till '\n' or terminating 0 in one go; and then do the length != 0 check after this. Fixes: 0ddcf3a6b442 ("platform/x86: think-lmi: Avoid potential read before start of the buffer") Reported-by: Juha Leppänen <[email protected]> Suggested-by: Andy Shevchenko <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22platform/x86: intel_cht_int33fe: Move to its own subfolderAndy Shevchenko10-28/+31
Since we have started collecting Intel x86 specific drivers in their own folder, move intel_cht_int33fe to its own subfolder there. Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-06-22platform/x86: intel_skl_int3472: Move to intel/ subfolderAndy Shevchenko12-4/+33
Start collecting Intel x86 related drivers in its own subfolder. Move intel_skl_int3472 first. Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-06-22selftests: futex: Add futex compare requeue testAndré Almeida4-1/+142
Add testing for futex_cmp_requeue(). The first test just requeues from one waiter to another one, and wakes it. The second performs both wake and requeue, and checks the return values to see if the operation woke/requeued the expected number of waiters. Signed-off-by: André Almeida <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22selftests: futex: Add futex wait testAndré Almeida4-1/+177
There are three different strategies to uniquely identify a futex in the kernel: - Private futexes: uses the pointer to mm_struct and the page address - Shared futexes: checks if the page containing the address is a PageAnon: - If it is, uses the same data as a private futexes - If it isn't, uses an inode sequence number from struct inode and the page's index Create a selftest to check those three paths and basic wait/wake mechanism. Signed-off-by: André Almeida <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_clock()Andy Shevchenko3-3/+10
For the sake of APIs to be properly layered provide skl_int3472_unregister_clock(). Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Daniel Scally <[email protected]> Tested-by: Daniel Scally <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-06-22platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_regulator()Andy Shevchenko3-2/+10
For the sake of APIs to be properly layered provide skl_int3472_unregister_regulator(). Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Daniel Scally <[email protected]> Tested-by: Daniel Scally <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-06-22platform/x86: intel_skl_int3472: Use ACPI GPIO resource directlyAndy Shevchenko3-20/+17
When we call acpi_gpio_get_io_resource(), the output will be the pointer to the ACPI GPIO resource. Use it directly instead of dereferencing the generic resource. Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Daniel Scally <[email protected]> Tested-by: Daniel Scally <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-06-22platform/x86: intel_skl_int3472: Fix dependencies (drop CLKDEV_LOOKUP)Andy Shevchenko1-1/+1
Besides the fact that COMMON_CLK selects CLKDEV_LOOKUP, the latter is going to be removed from clock framework. Reviewed-by: Daniel Scally <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-06-22platform/x86: intel_skl_int3472: Free ACPI device resources after useAndy Shevchenko1-7/+6
We may free ACPI device resources immediately after use. Refactor skl_int3472_parse_crs() accordingly. Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Daniel Scally <[email protected]> Tested-by: Daniel Scally <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-06-22x86/fpu: Make init_fpstate correct with optimized XSAVEThomas Gleixner2-25/+46
The XSAVE init code initializes all enabled and supported components with XRSTOR(S) to init state. Then it XSAVEs the state of the components back into init_fpstate which is used in several places to fill in the init state of components. This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because those use the init optimization and skip writing state of components which are in init state. So init_fpstate.xsave still contains all zeroes after this operation. There are two ways to solve that: 1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when XSAVES is enabled because XSAVES uses compacted format. 2) Save the components which are known to have a non-zero init state by other means. Looking deeper, #2 is the right thing to do because all components the kernel supports have all-zeroes init state except the legacy features (FP, SSE). Those cannot be hard coded because the states are not identical on all CPUs, but they can be saved with FXSAVE which avoids all conditionals. Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with a BUILD_BUG_ON() which reminds developers to validate that a newly added component has all zeroes init state. As a bonus remove the now unused copy_xregs_to_kernel_booting() crutch. The XSAVE and reshuffle method can still be implemented in the unlikely case that components are added which have a non-zero init state and no other means to save them. For now, FXSAVE is just simple and good enough. [ bp: Fix a typo or two in the text. ] Fixes: 6bad06b76892 ("x86, xsave: Use xsaveopt in context-switch path when supported") Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2021-06-22platform/x86: Remove "default n" entriesAndy Shevchenko2-2/+0
Linus already once did that for PDx86, don't repeat our mistakes. TL;DR: 'n' *is* the default 'default'. Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-06-22x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()Thomas Gleixner1-18/+8
sanitize_restored_user_xstate() preserves the supervisor states only when the fx_only argument is zero, which allows unprivileged user space to put supervisor states back into init state. Preserve them unconditionally. [ bp: Fix a typo or two in the text. ] Fixes: 5d6b6a6f9b5c ("x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates") Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2021-06-22Revert "drm: add a locked version of drm_is_current_master"Daniel Vetter1-32/+19
This reverts commit 1815d9c86e3090477fbde066ff314a7e9721ee0f. Unfortunately this inverts the locking hierarchy, so back to the drawing board. Full lockdep splat below: ====================================================== WARNING: possible circular locking dependency detected 5.13.0-rc7-CI-CI_DRM_10254+ #1 Not tainted ------------------------------------------------------ kms_frontbuffer/1087 is trying to acquire lock: ffff88810dcd01a8 (&dev->master_mutex){+.+.}-{3:3}, at: drm_is_current_master+0x1b/0x40 but task is already holding lock: ffff88810dcd0488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_mode_getconnector+0x1c6/0x4a0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&dev->mode_config.mutex){+.+.}-{3:3}: __mutex_lock+0xab/0x970 drm_client_modeset_probe+0x22e/0xca0 __drm_fb_helper_initial_config_and_unlock+0x42/0x540 intel_fbdev_initial_config+0xf/0x20 [i915] async_run_entry_fn+0x28/0x130 process_one_work+0x26d/0x5c0 worker_thread+0x37/0x380 kthread+0x144/0x170 ret_from_fork+0x1f/0x30 -> #1 (&client->modeset_mutex){+.+.}-{3:3}: __mutex_lock+0xab/0x970 drm_client_modeset_commit_locked+0x1c/0x180 drm_client_modeset_commit+0x1c/0x40 __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0 drm_fb_helper_set_par+0x34/0x40 intel_fbdev_set_par+0x11/0x40 [i915] fbcon_init+0x270/0x4f0 visual_init+0xc6/0x130 do_bind_con_driver+0x1e5/0x2d0 do_take_over_console+0x10e/0x180 do_fbcon_takeover+0x53/0xb0 register_framebuffer+0x22d/0x310 __drm_fb_helper_initial_config_and_unlock+0x36c/0x540 intel_fbdev_initial_config+0xf/0x20 [i915] async_run_entry_fn+0x28/0x130 process_one_work+0x26d/0x5c0 worker_thread+0x37/0x380 kthread+0x144/0x170 ret_from_fork+0x1f/0x30 -> #0 (&dev->master_mutex){+.+.}-{3:3}: __lock_acquire+0x151e/0x2590 lock_acquire+0xd1/0x3d0 __mutex_lock+0xab/0x970 drm_is_current_master+0x1b/0x40 drm_mode_getconnector+0x37e/0x4a0 drm_ioctl_kernel+0xa8/0xf0 drm_ioctl+0x1e8/0x390 __x64_sys_ioctl+0x6a/0xa0 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: &dev->master_mutex --> &client->modeset_mutex --> &dev->mode_config.mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&dev->mode_config.mutex); lock(&client->modeset_mutex); lock(&dev->mode_config.mutex); lock(&dev->master_mutex); *** DEADLOCK *** 1 lock held by kms_frontbuffer/1087: #0: ffff88810dcd0488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_mode_getconnector+0x1c6/0x4a0 stack backtrace: CPU: 7 PID: 1087 Comm: kms_frontbuffer Not tainted 5.13.0-rc7-CI-CI_DRM_10254+ #1 Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019 Call Trace: dump_stack+0x7f/0xad check_noncircular+0x12e/0x150 __lock_acquire+0x151e/0x2590 lock_acquire+0xd1/0x3d0 __mutex_lock+0xab/0x970 drm_is_current_master+0x1b/0x40 drm_mode_getconnector+0x37e/0x4a0 drm_ioctl_kernel+0xa8/0xf0 drm_ioctl+0x1e8/0x390 __x64_sys_ioctl+0x6a/0xa0 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Note that this broke the intel-gfx CI pretty much across the board because it has to reboot machines after it hits a lockdep splat. Testcase: igt/debugfs_test/read_all_entries Acked-by: Petri Latvala <[email protected]> Fixes: 1815d9c86e30 ("drm: add a locked version of drm_is_current_master") Cc: Desmond Cheong Zhi Xi <[email protected]> Cc: Emil Velikov <[email protected]> Cc: [email protected] Signed-off-by: Daniel Vetter <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-06-22arm64: mte: Sync tags for pages where PTE is untaggedSteven Price3-10/+34
A KVM guest could store tags in a page even if the VMM hasn't mapped the page with PROT_MTE. So when restoring pages from swap we will need to check to see if there are any saved tags even if !pte_tagged(). However don't check pages for which pte_access_permitted() returns false as these will not have been swapped out. Reviewed-by: Catalin Marinas <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22printk: fix cpu lock orderingJohn Ogness1-3/+50
The cpu lock implementation uses a full memory barrier to take the lock, but no memory barriers when releasing the lock. This means that changes performed by a lock owner may not be seen by the next lock owner. This may have been "good enough" for use by dump_stack() as a serialization mechanism, but it is not enough to provide proper protection for a critical section. Correct this problem by using acquire/release memory barriers for lock/unlock, respectively. Signed-off-by: John Ogness <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22lib/dump_stack: move cpu lock to printk.cJohn Ogness3-36/+112
dump_stack() implements its own cpu-reentrant spinning lock to best-effort serialize stack traces in the printk log. However, there are other functions (such as show_regs()) that can also benefit from this serialization. Move the cpu-reentrant spinning lock (cpu lock) into new helper functions printk_cpu_lock_irqsave()/printk_cpu_unlock_irqrestore() so that it is available for others as well. For !CONFIG_SMP the cpu lock is a NOP. Note that having multiple cpu locks in the system can easily lead to deadlock. Code needing a cpu lock should use the printk cpu lock, since the printk cpu lock could be acquired from any code and any context. Also note that it is not necessary for a cpu lock to disable interrupts. However, in upcoming work this cpu lock will be used for emergency tasks (for example, atomic consoles during kernel crashes) and any interruptions while holding the cpu lock should be avoided if possible. Signed-off-by: John Ogness <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Reviewed-by: Petr Mladek <[email protected]> [[email protected]: Backported on top of 5.13-rc1.] Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22gpiolib: cdev: zero padding during conversion to gpioline_info_changedGabriel Knezek1-0/+1
When userspace requests a GPIO v1 line info changed event, lineinfo_watch_read() populates and returns the gpioline_info_changed structure. It contains 5 words of padding at the end which are not initialized before being returned to userspace. Zero the structure in gpio_v2_line_info_change_to_v1() before populating its contents. Fixes: aad955842d1c ("gpiolib: cdev: support GPIO_V2_GET_LINEINFO_IOCTL and GPIO_V2_GET_LINEINFO_WATCH_IOCTL") Signed-off-by: Gabriel Knezek <[email protected]> Reviewed-by: Kent Gibson <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2021-06-22Merge branch kvm-arm64/selftest/sysreg-list-fix into kvmarm-master/nextMarc Zyngier5-123/+323
Selftest updates from Andrew Jones, fixing the sysgreg list expectations by dealing with multiple configurations, such as with or without a PMU. * kvm-arm64/selftest/sysreg-list-fix: KVM: arm64: Update MAINTAINERS to include selftests KVM: arm64: selftests: get-reg-list: Split base and pmu registers KVM: arm64: selftests: get-reg-list: Remove get-reg-list-sve KVM: arm64: selftests: get-reg-list: Provide config selection option KVM: arm64: selftests: get-reg-list: Prepare to run multiple configs at once KVM: arm64: selftests: get-reg-list: Introduce vcpu configs
2021-06-22KVM: arm64: Update MAINTAINERS to include selftestsMarc Zyngier1-0/+2
As the KVM/arm64 selftests are routed via the kvmarm tree, add the relevant references to the MAINTAINERS file. Suggested-by: Andrew Jones <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/20210622070732.zod7gaqhqo344vg6@gator
2021-06-22KVM: arm64: selftests: get-reg-list: Split base and pmu registersAndrew Jones1-8/+31
Since KVM commit 11663111cd49 ("KVM: arm64: Hide PMU registers from userspace when not available") the get-reg-list* tests have been failing with ... ... There are 74 missing registers. The following lines are missing registers: ... where the 74 missing registers are all PMU registers. This isn't a bug in KVM that the selftest found, even though it's true that a KVM userspace that wasn't setting the KVM_ARM_VCPU_PMU_V3 VCPU flag, but still expecting the PMU registers to be in the reg-list, would suddenly no longer have their expectations met. In that case, the expectations were wrong, though, so that KVM userspace needs to be fixed, and so does this selftest. The fix for this selftest is to pull the PMU registers out of the base register sublist into their own sublist and then create new, pmu-enabled vcpu configs which can be tested. Signed-off-by: Andrew Jones <[email protected]> Reviewed-by: Ricardo Koller <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22KVM: arm64: selftests: get-reg-list: Remove get-reg-list-sveAndrew Jones4-15/+21
Now that we can easily run the test for multiple vcpu configs, let's merge get-reg-list and get-reg-list-sve into just get-reg-list. We also add a final change to make it more possible to run multiple tests, which is to fork the test, rather than directly run it. That allows a test to fail, but subsequent tests can still run. Signed-off-by: Andrew Jones <[email protected]> Reviewed-by: Ricardo Koller <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22KVM: arm64: selftests: get-reg-list: Provide config selection optionAndrew Jones1-3/+53
Add a new command line option that allows the user to select a specific configuration, e.g. --config=sve will give the sve config. Also provide help text and the --help/-h options. Signed-off-by: Andrew Jones <[email protected]> Reviewed-by: Ricardo Koller <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22KVM: arm64: selftests: get-reg-list: Prepare to run multiple configs at onceAndrew Jones1-17/+51
We don't want to have to create a new binary for each vcpu config, so prepare to run the test for multiple vcpu configs in a single binary. We do this by factoring out the test from main() and then looping over configs. When given '--list' we still never print more than a single reg-list for a single vcpu config though, because it would be confusing otherwise. No functional change intended. Signed-off-by: Andrew Jones <[email protected]> Reviewed-by: Ricardo Koller <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-22KVM: arm64: selftests: get-reg-list: Introduce vcpu configsAndrew Jones1-90/+175
We already break register lists into sublists that get selected based on vcpu config. However, since we only had two configs (vregs and sve), we didn't structure the code very well to manage them. Restructure it now to more cleanly handle register sublists that are dependent on the vcpu config. This patch has no intended functional change (except for the vcpu config name now being prepended to all output). Signed-off-by: Andrew Jones <[email protected]> Reviewed-by: Ricardo Koller <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-21cifs: Avoid field over-reading memcpy()Kees Cook1-1/+4
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally reading across neighboring fields. Instead of using memcpy to read across multiple struct members, just perform per-member assignments as already done for other members. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Steve French <[email protected]>