aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-12-09locking/rwsem: Prevent potential lock starvationWaiman Long1-2/+13
The lock handoff bit is added in commit 4f23dbc1e657 ("locking/rwsem: Implement lock handoff to prevent lock starvation") to avoid lock starvation. However, allowing readers to do optimistic spinning does introduce an unlikely scenario where lock starvation can happen. The lock handoff bit may only be set when a waiter is being woken up. In the case of reader unlock, wakeup happens only when the reader count reaches 0. If there is a continuous stream of incoming readers acquiring read lock via optimistic spinning, it is possible that the reader count may never reach 0 and so the handoff bit will never be asserted. One way to prevent this scenario from happening is to disallow optimistic spinning if the rwsem is currently owned by readers. If the previous or current owner is a writer, optimistic spinning will be allowed. If the previous owner is a reader but the reader count has reached 0 before, a wakeup should have been issued. So the handoff mechanism will be kicked in to prevent lock starvation. As a result, it should be OK to do optimistic spinning in this case. This patch may have some impact on reader performance as it reduces reader optimistic spinning especially if the lock critical sections are short the number of contending readers are small. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-09locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath()Waiman Long1-8/+10
The atomic count value right after reader count increment can be useful to determine the rwsem state at trylock time. So the count value is passed down to rwsem_down_read_slowpath() to be used when appropriate. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-09locking/rwsem: Fold __down_{read,write}*()Peter Zijlstra1-22/+23
There's a lot needless duplication in __down_{read,write}*(), cure that with a helper. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-09locking/rwsem: Introduce rwsem_write_trylock()Peter Zijlstra1-22/+16
One copy of this logic is better than three. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-09locking/rwsem: Better collate rwsem_read_trylock()Peter Zijlstra1-7/+8
All users of rwsem_read_trylock() do rwsem_set_reader_owned(sem) on success, move it into rwsem_read_trylock() proper. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-09Merge branch 'locking/rwsem'Peter Zijlstra1059-6932/+15928
2020-12-09rwsem: Implement down_read_interruptibleEric W. Biederman2-0/+27
In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_interruptible. This is needed for perf_event_open to be converted (with no semantic changes) from working on a mutex to wroking on a rwsem. Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-09rwsem: Implement down_read_killable_nestedEric W. Biederman2-0/+16
In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_killable_nested. This is needed so that kcmp_lock can be converted from working on a mutexes to working on rw_semaphores. Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-03refcount: Fix a kernel-doc markupMauro Carvalho Chehab1-1/+1
The kernel-doc markup is wrong: it is asking the tool to document struct refcount_struct, instead of documenting typedef refcount_t. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/afb9bb1e675bf5f72a34a55d780779d7d5916b4c.1606823973.git.mchehab+huawei@kernel.org
2020-12-03completion: Drop init_completion defineMauro Carvalho Chehab1-3/+2
Changeset cd8084f91c02 ("locking/lockdep: Apply crossrelease to completions") added a CONFIG_LOCKDEP_COMPLETE (that was later renamed to CONFIG_LOCKDEP_COMPLETIONS). Such changeset renamed the init_completion, and add a macro that would either run a modified version or the original code. However, such code reported too many false positives. So, it ended being dropped later on by changeset e966eaeeb623 ("locking/lockdep: Remove the cross-release locking checks"). Yet, the define remained there as just: #define init_completion(x) __init_completion(x) Get rid of the define, and return __init_completion() function to its original name. Fixes: e966eaeeb623 ("locking/lockdep: Remove the cross-release locking checks") Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/e657bfc533545c185b1c3c55926a449ead56a88b.1606823973.git.mchehab+huawei@kernel.org
2020-12-03atomic: Update MAINTAINERSPeter Zijlstra1-0/+2
Update the files list to include refcount.h and the Documentation/ Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
2020-12-03atomic: Delete obsolete documentationPeter Zijlstra1-664/+0
It's been superseded by Documentation/atomic_*.txt. Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
2020-12-03seqlock: Rename __seqprop() usersPeter Zijlstra1-23/+23
More consistent naming should make it easier to untangle the _Generic token pasting maze called __seqprop(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-03lockdep/selftest: Add spin_nest_lock testBoqun Feng1-0/+17
Add a self test case to test the behavior for the following case: lock(A); lock_nest_lock(C1, A); lock(B); lock_nest_lock(C2, A); This is a reproducer for a problem[1] reported by Chris Wilson, and is helpful to prevent this. [1]: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Boqun Feng <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-03lockdep/selftests: Fix PROVE_RAW_LOCK_NESTINGPeter Zijlstra1-17/+17
The selftest nests rwlock_t inside raw_spinlock_t, this is invalid. Reported-by: Boqun Feng <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
2020-12-03seqlock: avoid -Wshadow warningsArnd Bergmann1-7/+7
When building with W=2, there is a flood of warnings about the seqlock macros shadowing local variables: 19806 linux/seqlock.h:331:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] 48 linux/seqlock.h:348:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] 8 linux/seqlock.h:379:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] Prefix the local variables to make the warning useful elsewhere again. Fixes: 52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as statement expressions") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-02Merge tag 'gfs2-v5.10-rc5-fixes' of ↵Linus Torvalds5-12/+42
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Various gfs2 fixes" * tag 'gfs2-v5.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func gfs2: Upgrade shared glocks for atime updates gfs2: Don't freeze the file system during unmount gfs2: check for empty rgrp tree in gfs2_ri_update gfs2: set lockdep subclass for iopen glocks gfs2: Fix deadlock dumping resource group glocks
2020-12-02Merge tag 'arm64-fixes' of ↵Linus Torvalds12-181/+243
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "I'm sad to say that we've got an unusually large arm64 fixes pull for rc7 which addresses numerous significant instrumentation issues with our entry code. Without these patches, lockdep is hopelessly unreliable in some configurations [1,2] and syzkaller is therefore not a lot of use because it's so noisy. Although much of this has always been broken, it appears to have been exposed more readily by other changes such as 044d0d6de9f5 ("lockdep: Only trace IRQ edges") and general lockdep improvements around IRQ tracing and NMIs. Fixing this properly required moving much of the instrumentation hooks from our entry assembly into C, which Mark has been working on for the last few weeks. We're not quite ready to move to the recently added generic functions yet, but the code here has been deliberately written to mimic that closely so we can look at cleaning things up once we have a bit more breathing room. Having said all that, the second version of these patches was posted last week and I pushed it into our CI (kernelci and cki) along with a commit which forced on PROVE_LOCKING, NOHZ_FULL and CONTEXT_TRACKING_FORCE. The result? We found a real bug in the md/raid10 code [3]. Oh, and there's also a really silly typo patch that's unrelated. Summary: - Fix numerous issues with instrumentation and exception entry - Fix hideous typo in unused register field definition" [1] https://lore.kernel.org/r/CACT4Y+aAzoJ48Mh1wNYD17pJqyEcDnrxGfApir=-j171TnQXhw@mail.gmail.com [2] https://lore.kernel.org/r/[email protected] [3] https://lore.kernel.org/r/[email protected] * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Fix typo in macro definition arm64: entry: fix EL1 debug transitions arm64: entry: fix NMI {user, kernel}->kernel transitions arm64: entry: fix non-NMI kernel<->kernel transitions arm64: ptrace: prepare for EL1 irq/rcu tracking arm64: entry: fix non-NMI user<->kernel transitions arm64: entry: move el1 irq/nmi logic to C arm64: entry: prepare ret_to_user for function call arm64: entry: move enter_from_user_mode to entry-common.c arm64: entry: mark entry code as noinstr arm64: mark idle code as noinstr arm64: syscall: exit userspace before unmasking exceptions
2020-12-02Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds3-1/+5
Pull vdpa fixes from Michael Tsirkin: "A couple of fixes that surfaced at the last minute" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost_vdpa: return -EFAULT if copy_to_user() fails vdpa: mlx5: fix vdpa/vhost dependencies
2020-12-02Merge tag 'sound-5.10-rc7' of ↵Linus Torvalds11-47/+149
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Here are the pending sound fixes for 5.10: all small device-specific fixes, and nothing particular stands out, so far" * tag 'sound-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Add mute LED quirk to yet another HP x360 model ALSA: hda/realtek: Fix bass speaker DAC assignment on Asus Zephyrus G14 ALSA: hda/generic: Add option to enforce preferred_dacs pairs ALSA: usb-audio: US16x08: fix value count for level meters ALSA: hda/realtek - Add new codec supported for ALC897 ASoC: rt5682: change SAR voltage threshold ASoC: wm_adsp: fix error return code in wm_adsp_load() ALSA: hda/realtek: Enable headset of ASUS UX482EG & B9400CEA with ALC294 ASoC: qcom: Fix enabling BCLK and LRCLK in LPAIF invalid state ALSA: hda/realtek - Fixed Dell AIO wrong sound tone ASoC: Intel: bytcr_rt5640: Fix HP Pavilion x2 Detachable quirks
2020-12-02Merge tag 'trace-v5.10-rc6-bootconfig' of ↵Linus Torvalds3-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull bootconfig fixes from Steven Rostedt: "Have bootconfig size and checksum be little endian In case the bootconfig is created on one kind of endian machine, and then read on the other kind of endian kernel, the size and checksum will be incorrect. Instead, have both the size and checksum always be little endian and have the tool and the kernel convert it from little endian to or from the host endian" * tag 'trace-v5.10-rc6-bootconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: docs: bootconfig: Add the endianness of fields tools/bootconfig: Store size and checksum in footer as le32 bootconfig: Load size and checksum in the footer as le32
2020-12-02vhost_vdpa: return -EFAULT if copy_to_user() failsDan Carpenter1-1/+3
The copy_to_user() function returns the number of bytes remaining to be copied but this should return -EFAULT to the user. Fixes: 1b48dc03e575 ("vhost: vdpa: report iova range") Signed-off-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/X8c32z5EtDsMyyIL@mwanda Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]>
2020-12-02vdpa: mlx5: fix vdpa/vhost dependenciesRandy Dunlap2-0/+2
drivers/vdpa/mlx5/ uses vhost_iotlb*() interfaces, so select VHOST_IOTLB to make them be built. However, if VHOST_IOTLB is the only VHOST symbol that is set/enabled, the object file still won't be built because drivers/Makefile won't descend into drivers/vhost/ to build it, so make drivers/Makefile build the needed binary whenever VHOST_IOTLB is set, like it does for VHOST_RING. Fixes these build errors: ERROR: modpost: "vhost_iotlb_itree_next" [drivers/vdpa/mlx5/mlx5_vdpa.ko] undefined! ERROR: modpost: "vhost_iotlb_itree_first" [drivers/vdpa/mlx5/mlx5_vdpa.ko] undefined! Fixes: 29064bfdabd5 ("vdpa/mlx5: Add support library for mlx5 VDPA implementation") Fixes: aff90770e54c ("vdpa/mlx5: Fix dependency on MLX5_CORE") Reported-by: kernel test robot <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Cc: Eli Cohen <[email protected]> Cc: Parav Pandit <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: [email protected] Cc: Saeed Mahameed <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2020-12-01Merge tag '5.10-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2-2/+4
Pull cifs fixes from Steve French: "Two smb3 fixes for stable" * tag '5.10-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix potential use-after-free in cifs_echo_request() cifs: allow syscalls to be restarted in __smb_send_rqst()
2020-12-01Merge tag 'trace-v5.10-rc6' of ↵Linus Torvalds12-62/+138
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Use correct timestamp variable for ring buffer write stamp update - Fix up before stamp and write stamp when crossing ring buffer sub buffers - Keep a zero delta in ring buffer in slow path if cmpxchg fails - Fix trace_printk static buffer for archs that care - Fix ftrace record accounting for ftrace ops with trampolines - Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency - Remove WARN_ON in hwlat tracer that triggers on something that is OK - Make "my_tramp" trampoline in ftrace direct sample code global - Fixes in the bootconfig tool for better alignment management * tag 'trace-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Always check to put back before stamp when crossing pages ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency ftrace: Fix updating FTRACE_FL_TRAMP tracing: Fix alignment of static buffer tracing: Remove WARN_ON in start_thread() samples/ftrace: Mark my_tramp[12]? global ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next() ring-buffer: Update write stamp with the correct ts docs: bootconfig: Update file format on initrd image tools/bootconfig: Align the bootconfig applied initrd image size to 4 tools/bootconfig: Fix to check the write failure correctly tools/bootconfig: Fix errno reference after printf()
2020-12-01Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds3-22/+68
Pull vhost fixes from Michael Tsirkin: "A couple of minor fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: fix page pinning leakage in error path (rework) vringh: fix vringh_iov_push_*() documentation vhost scsi: fix lun reset completion handling
2020-11-30docs: bootconfig: Add the endianness of fieldsMasami Hiramatsu1-1/+3
Add a description about the endianness of the size and the checksum fields. Those must be stored as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583936246.547349.10964204130590955409.stgit@devnote2 Reported-by: Steven Rostedt <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30tools/bootconfig: Store size and checksum in footer as le32Masami Hiramatsu1-2/+5
Store the size and the checksum fields in the footer as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583935332.547349.5897811300636587426.stgit@devnote2 Reported-by: Steven Rostedt <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30bootconfig: Load size and checksum in the footer as le32Masami Hiramatsu1-2/+2
Load the size and the checksum fields in the footer as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583934457.547349.10504070298990791074.stgit@devnote2 Reported-by: Steven Rostedt <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30ring-buffer: Always check to put back before stamp when crossing pagesSteven Rostedt (VMware)1-8/+6
The current ring buffer logic checks to see if the updating of the event buffer was interrupted, and if it is, it will try to fix up the before stamp with the write stamp to make them equal again. This logic is flawed, because if it is not interrupted, the two are guaranteed to be different, as the current event just updated the before stamp before allocation. This guarantees that the next event (this one or another interrupting one) will think it interrupted the time updates of a previous event and inject an absolute time stamp to compensate. The correct logic is to always update the timestamps when traversing to a new sub buffer. Cc: [email protected] Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependencyNaveen N. Rao1-1/+1
DYNAMIC_FTRACE_WITH_DIRECT_CALLS should depend on DYNAMIC_FTRACE_WITH_REGS since we need ftrace_regs_caller(). Link: https://lkml.kernel.org/r/fc4b257ea8689a36f086d2389a9ed989496ca63a.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: [email protected] Fixes: 763e34e74bb7d5c ("ftrace: Add register_ftrace_direct()") Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30ftrace: Fix updating FTRACE_FL_TRAMPNaveen N. Rao1-1/+21
On powerpc, kprobe-direct.tc triggered FTRACE_WARN_ON() in ftrace_get_addr_new() followed by the below message: Bad trampoline accounting at: 000000004222522f (wake_up_process+0xc/0x20) (f0000001) The set of steps leading to this involved: - modprobe ftrace-direct-too - enable_probe - modprobe ftrace-direct - rmmod ftrace-direct <-- trigger The problem turned out to be that we were not updating flags in the ftrace record properly. From the above message about the trampoline accounting being bad, it can be seen that the ftrace record still has FTRACE_FL_TRAMP set though ftrace-direct module is going away. This happens because we are checking if any ftrace_ops has the FTRACE_FL_TRAMP flag set _before_ updating the filter hash. The fix for this is to look for any _other_ ftrace_ops that also needs FTRACE_FL_TRAMP. Link: https://lkml.kernel.org/r/56c113aa9c3e10c19144a36d9684c7882bf09af5.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: [email protected] Fixes: a124692b698b0 ("ftrace: Enable trampoline when rec count returns back to one") Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30tracing: Fix alignment of static bufferMinchan Kim1-1/+1
With 5.9 kernel on ARM64, I found ftrace_dump output was broken but it had no problem with normal output "cat /sys/kernel/debug/tracing/trace". With investigation, it seems coping the data into temporal buffer seems to break the align binary printf expects if the static buffer is not aligned with 4-byte. IIUC, get_arg in bstr_printf expects that args has already right align to be decoded and seq_buf_bprintf says ``the arguments are saved in a 32bit word array that is defined by the format string constraints``. So if we don't keep the align under copy to temporal buffer, the output will be broken by shifting some bytes. This patch fixes it. Link: https://lkml.kernel.org/r/[email protected] Cc: <[email protected]> Fixes: 8e99cf91b99bb ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic") Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30tracing: Remove WARN_ON in start_thread()Vasily Averin1-1/+1
This patch reverts commit 978defee11a5 ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists") .start hook can be legally called several times if according tracer is stopped screen window 1 [root@localhost ~]# echo 1 > /sys/kernel/tracing/events/kmem/kfree/enable [root@localhost ~]# echo 1 > /sys/kernel/tracing/options/pause-on-trace [root@localhost ~]# less -F /sys/kernel/tracing/trace screen window 2 [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on 0 [root@localhost ~]# echo hwlat > /sys/kernel/debug/tracing/current_tracer [root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/tracing_on [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on 0 [root@localhost ~]# echo 2 > /sys/kernel/debug/tracing/tracing_on triggers warning in dmesg: WARNING: CPU: 3 PID: 1403 at kernel/trace/trace_hwlat.c:371 hwlat_tracer_start+0xc9/0xd0 Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Cc: [email protected] Fixes: 978defee11a5 ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists") Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30samples/ftrace: Mark my_tramp[12]? globalSami Tolvanen3-0/+4
my_tramp[12]? are declared as global functions in C, but they are not marked global in the inline assembly definition. This mismatch confuses Clang's Control-Flow Integrity checking. Fix the definitions by adding .globl. Link: https://lkml.kernel.org/r/[email protected] Fixes: 9d907f1ae80b8 ("ftrace/samples: Add a sample module that implements modify_ftrace_direct()") Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Sami Tolvanen <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-12-01gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_funcAndreas Gruenbacher1-10/+11
In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work may block waiting for delete_work_func to complete, and delete_work_func may block trying to acquire the inode glock in gfs2_inode_lookup. Reported-by: Alexander Aring <[email protected]> Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work") Cc: [email protected] # v5.8+ Signed-off-by: Andreas Gruenbacher <[email protected]>
2020-11-30cifs: fix potential use-after-free in cifs_echo_request()Paulo Alcantara1-0/+2
This patch fixes a potential use-after-free bug in cifs_echo_request(). For instance, thread 1 -------- cifs_demultiplex_thread() clean_demultiplex_info() kfree(server) thread 2 (workqueue) -------- apic_timer_interrupt() smp_apic_timer_interrupt() irq_exit() __do_softirq() run_timer_softirq() call_timer_fn() cifs_echo_request() <- use-after-free in server ptr Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> CC: Stable <[email protected]> Reviewed-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]>
2020-11-30cifs: allow syscalls to be restarted in __smb_send_rqst()Paulo Alcantara1-2/+2
A customer has reported that several files in their multi-threaded app were left with size of 0 because most of the read(2) calls returned -EINTR and they assumed no bytes were read. Obviously, they could have fixed it by simply retrying on -EINTR. We noticed that most of the -EINTR on read(2) were due to real-time signals sent by glibc to process wide credential changes (SIGRT_1), and its signal handler had been established with SA_RESTART, in which case those calls could have been automatically restarted by the kernel. Let the kernel decide to whether or not restart the syscalls when there is a signal pending in __smb_send_rqst() by returning -ERESTARTSYS. If it can't, it will return -EINTR anyway. Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> CC: Stable <[email protected]> Reviewed-by: Ronnie Sahlberg <[email protected]> Reviewed-by: Pavel Shilovsky <[email protected]> Signed-off-by: Steve French <[email protected]>
2020-11-30ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next()Andrea Righi1-3/+3
In the slow path of __rb_reserve_next() a nested event(s) can happen between evaluating the timestamp delta of the current event and updating write_stamp via local_cmpxchg(); in this case the delta is not valid anymore and it should be set to 0 (same timestamp as the interrupting event), since the event that we are currently processing is not the last event in the buffer. Link: https://lkml.kernel.org/r/X8IVJcp1gRE+FJCJ@xps-13-7390 Cc: Ingo Molnar <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: [email protected] Link: https://lwn.net/Articles/831207 Fixes: a389d86f7fd0 ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30ring-buffer: Update write stamp with the correct tsSteven Rostedt (VMware)1-1/+1
The write stamp, used to calculate deltas between events, was updated with the stale "ts" value in the "info" structure, and not with the updated "ts" variable. This caused the deltas between events to be inaccurate, and when crossing into a new sub buffer, had time go backwards. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp") Reported-by: "J. Avila" <[email protected]> Tested-by: Daniel Mentz <[email protected]> Tested-by: Will McVicker <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30arm64: mte: Fix typo in macro definitionVincenzo Frascino1-1/+1
UL in the definition of SYS_TFSR_EL1_TF1 was misspelled causing compilation issues when trying to implement in kernel MTE async mode. Fix the macro correcting the typo. Note: MTE async mode will be introduced with a future series. Fixes: c058b1c4a5ea ("arm64: mte: system register definitions") Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Vincenzo Frascino <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-30arm64: entry: fix EL1 debug transitionsMark Rutland2-25/+26
In debug_exception_enter() and debug_exception_exit() we trace hardirqs on/off while RCU isn't guaranteed to be watching, and we don't save and restore the hardirq state, and so may return with this having changed. Handle this appropriately with new entry/exit helpers which do the bare minimum to ensure this is appropriately maintained, without marking debug exceptions as NMIs. These are placed in entry-common.c with the other entry/exit helpers. In future we'll want to reconsider whether some debug exceptions should be NMIs, but this will require a significant refactoring, and for now this should prevent issues with lockdep and RCU. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marins <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-30arm64: entry: fix NMI {user, kernel}->kernel transitionsMark Rutland4-10/+48
Exceptions which can be taken at (almost) any time are consdiered to be NMIs. On arm64 that includes: * SDEI events * GICv3 Pseudo-NMIs * Kernel stack overflows * Unexpected/unhandled exceptions ... but currently debug exceptions (BRKs, breakpoints, watchpoints, single-step) are not considered NMIs. As these can be taken at any time, kernel features (lockdep, RCU, ftrace) may not be in a consistent kernel state. For example, we may take an NMI from the idle code or partway through an entry/exit path. While nmi_enter() and nmi_exit() handle most of this state, notably they don't save/restore the lockdep state across an NMI being taken and handled. When interrupts are enabled and an NMI is taken, lockdep may see interrupts become disabled within the NMI code, but not see interrupts become enabled when returning from the NMI, leaving lockdep believing interrupts are disabled when they are actually disabled. The x86 code handles this in idtentry_{enter,exit}_nmi(), which will shortly be moved to the generic entry code. As we can't use either yet, we copy the x86 approach in arm64-specific helpers. All the NMI entrypoints are marked as noinstr to prevent any instrumentation handling code being invoked before the state has been corrected. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-30arm64: entry: fix non-NMI kernel<->kernel transitionsMark Rutland2-3/+67
There are periods in kernel mode when RCU is not watching and/or the scheduler tick is disabled, but we can still take exceptions such as interrupts. The arm64 exception handlers do not account for this, and it's possible that RCU is not watching while an exception handler runs. The x86/generic entry code handles this by ensuring that all (non-NMI) kernel exception handlers call irqentry_enter() and irqentry_exit(), which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to the generic entry code, and already hadnle the user<->kernel transitions elsewhere, so we add new kernel<->kernel transition helpers alog the lines of the generic entry code. Since we now track interrupts becoming masked when an exception is taken, local_daif_inherit() is modified to track interrupts becoming re-enabled when the original context is inherited. To balance the entry/exit paths, each handler masks all DAIF exceptions before exit_to_kernel_mode(). Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-30arm64: ptrace: prepare for EL1 irq/rcu trackingMark Rutland1-0/+4
Exceptions from EL1 may be taken when RCU isn't watching (e.g. in idle sequences), or when the lockdep hardirqs transiently out-of-sync with the hardware state (e.g. in the middle of local_irq_enable()). To correctly handle these cases, we'll need to save/restore this state across some exceptions taken from EL1. A series of subsequent patches will update EL1 exception handlers to handle this. In preparation for this, and to avoid dependencies between those patches, this patch adds two new fields to struct pt_regs so that exception handlers can track this state. Note that this is placed in pt_regs as some entry/exit sequences such as el1_irq are invoked from assembly, which makes it very difficult to add a separate structure as with the irqentry_state used by x86. We can separate this once more of the exception logic is moved to C. While the fields only need to be bool, they are both made u64 to keep pt_regs 16-byte aligned. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-30arm64: entry: fix non-NMI user<->kernel transitionsMark Rutland5-48/+51
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE will WARN() at boot time that interrupts are enabled when we call context_tracking_user_enter(), despite the DAIF flags indicating that IRQs are masked. The problem is that we're not tracking IRQ flag changes accurately, and so lockdep believes interrupts are enabled when they are not (and vice-versa). We can shuffle things so to make this more accurate. For kernel->user transitions there are a number of constraints we need to consider: 1) When we call __context_tracking_user_enter() HW IRQs must be disabled and lockdep must be up-to-date with this. 2) Userspace should be treated as having IRQs enabled from the PoV of both lockdep and tracing. 3) As context_tracking_user_enter() stops RCU from watching, we cannot use RCU after calling it. 4) IRQ flag tracing and lockdep have state that must be manipulated before RCU is disabled. ... with similar constraints applying for user->kernel transitions, with the ordering reversed. The generic entry code has enter_from_user_mode() and exit_to_user_mode() helpers to handle this. We can't use those directly, so we add arm64 copies for now (without the instrumentation markers which aren't used on arm64). These replace the existing user_exit() and user_exit_irqoff() calls spread throughout handlers, and the exception unmasking is left as-is. Note that: * The accounting for debug exceptions from userspace now happens in el0_dbg() and ret_to_user(), so this is removed from debug_exception_enter() and debug_exception_exit(). As user_exit_irqoff() wakes RCU, the userspace-specific check is removed. * The accounting for syscalls now happens in el0_svc(), el0_svc_compat(), and ret_to_user(), so this is removed from el0_svc_common(). This does not adversely affect the workaround for erratum 1463225, as this does not depend on any of the state tracking. * In ret_to_user() we mask interrupts with local_daif_mask(), and so we need to inform lockdep and tracing. Here a trace_hardirqs_off() is sufficient and safe as we have not yet exited kernel context and RCU is usable. * As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only needs to check for the latter. * EL0 SError handling will be dealt with in a subsequent patch, as this needs to be treated as an NMI. Prior to this patch, booting an appropriately-configured kernel would result in spats as below: | DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) | WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0 | Modules linked in: | CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3 | Hardware name: linux,dummy-virt (DT) | pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--) | pc : check_flags.part.54+0x1dc/0x1f0 | lr : check_flags.part.54+0x1dc/0x1f0 | sp : ffff80001003bd80 | x29: ffff80001003bd80 x28: ffff66ce801e0000 | x27: 00000000ffffffff x26: 00000000000003c0 | x25: 0000000000000000 x24: ffffc31842527258 | x23: ffffc31842491368 x22: ffffc3184282d000 | x21: 0000000000000000 x20: 0000000000000001 | x19: ffffc318432ce000 x18: 0080000000000000 | x17: 0000000000000000 x16: ffffc31840f18a78 | x15: 0000000000000001 x14: ffffc3184285c810 | x13: 0000000000000001 x12: 0000000000000000 | x11: ffffc318415857a0 x10: ffffc318406614c0 | x9 : ffffc318415857a0 x8 : ffffc31841f1d000 | x7 : 647261685f706564 x6 : ffffc3183ff7c66c | x5 : ffff66ce801e0000 x4 : 0000000000000000 | x3 : ffffc3183fe00000 x2 : ffffc31841500000 | x1 : e956dc24146b3500 x0 : 0000000000000000 | Call trace: | check_flags.part.54+0x1dc/0x1f0 | lock_is_held_type+0x10c/0x188 | rcu_read_lock_sched_held+0x70/0x98 | __context_tracking_enter+0x310/0x350 | context_tracking_enter.part.3+0x5c/0xc8 | context_tracking_user_enter+0x6c/0x80 | finish_ret_to_user+0x2c/0x13cr Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-30arm64: entry: move el1 irq/nmi logic to CMark Rutland4-45/+22
In preparation for reworking the EL1 irq/nmi entry code, move the existing logic to C. We no longer need the asm_nmi_enter() and asm_nmi_exit() wrappers, so these are removed. The new C functions are marked noinstr, which prevents compiler instrumentation and runtime probing. In subsequent patches we'll want the new C helpers to be called in all cases, so we don't bother wrapping the calls with ifdeferry. Even when the new C functions are stubs the trivial calls are unlikely to have a measurable impact on the IRQ or NMI paths anyway. Prototypes are added to <asm/exception.h> as otherwise (in some configurations) GCC will complain about the lack of a forward declaration. We already do this for existing function, e.g. enter_from_user_mode(). The new helpers are marked as noinstr (which prevents all instrumentation, tracing, and kprobes). Otherwise, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-30arm64: entry: prepare ret_to_user for function callMark Rutland1-4/+5
In a subsequent patch ret_to_user will need to make a C function call (in some configurations) which may clobber x0-x18 at the start of the finish_ret_to_user block, before enable_step_tsk consumes the flags loaded into x1. In preparation for this, let's load the flags into x19, which is preserved across C function calls. This avoids a redundant reload of the flags and ensures we operate on a consistent shapshot regardless. There should be no functional change as a result of this patch. At this point of the entry/exit paths we only need to preserve x28 (tsk) and the sp, and x19 is free for this use. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-30arm64: entry: move enter_from_user_mode to entry-common.cMark Rutland2-7/+6
In later patches we'll want to extend enter_from_user_mode() and add a corresponding exit_to_user_mode(). As these will be common for all entries/exits from userspace, it'd be better for these to live in entry-common.c with the rest of the entry logic. This patch moves enter_from_user_mode() into entry-common.c. As with other functions in entry-common.c it is marked as noinstr (which prevents all instrumentation, tracing, and kprobes) but there are no other functional changes. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-30arm64: entry: mark entry code as noinstrMark Rutland1-50/+25
Functions in entry-common.c are marked as notrace and NOKPROBE_SYMBOL(), but they're still subject to other instrumentation which may rely on lockdep/rcu/context-tracking being up-to-date, and may cause nested exceptions (e.g. for WARN/BUG or KASAN's use of BRK) which will corrupt exceptions registers which have not yet been read. Prevent this by marking all functions in entry-common.c as noinstr to prevent compiler instrumentation. This also blacklists the functions for tracing and kprobes, so we don't need to handle that separately. Functions elsewhere will be dealt with in subsequent patches. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>