aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-11nvme-pci: use simple suspend when a HMB is enabledChristoph Hellwig1-0/+6
While the NVMe specification allows the device to access the host memory buffer in host DRAM from all power states, hosts will fail access to DRAM during S3 and similar power states. Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-11nvme-fc: don't call nvme_cleanup_cmd() for AENsDaniel Wagner1-2/+3
Asynchronous event notifications do not have an associated request. When fcp_io() fails we unconditionally call nvme_cleanup_cmd() which leads to a crash. Fixes: 16686f3a6c3c ("nvme: move common call to nvme_cleanup_cmd to core layer") Signed-off-by: Daniel Wagner <[email protected]> Reviewed-by: Himanshu Madhani <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-11nvmet-tcp: constify nvmet_tcp_opsMax Gurtovoy1-2/+2
nvmet_tcp_ops is never modified and can be made const to allow the compiler to put it in read-only memory, as done in other transports. Before: text data bss dec hex filename 16164 160 12 16336 3fd0 drivers/nvme/target/tcp.o After: text data bss dec hex filename 16277 64 12 16353 3fe1 drivers/nvme/target/tcp.o Signed-off-by: Max Gurtovoy <[email protected]> Reviewed-by: Himanshu Madhani <[email protected]> Reviewed-by: Israel Rukshin <[email protected]> Acked-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-11nvme-tcp: constify nvme_tcp_mq_ops and nvme_tcp_admin_mq_opsRikard Falkeborn1-4/+4
nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops are never modified and can be made const to allow the compiler to put them in read-only memory. Before: text data bss dec hex filename 53102 6885 576 60563 ec93 drivers/nvme/host/tcp.o After: text data bss dec hex filename 53422 6565 576 60563 ec93 drivers/nvme/host/tcp.o Signed-off-by: Rikard Falkeborn <[email protected]> Acked-by: Sagi Grimberg <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-11nvme: do not call del_gendisk() on a disk that was never addedNiklas Cassel1-3/+1
device_add_disk() is negated by del_gendisk(). alloc_disk_node() is negated by put_disk(). In nvme_alloc_ns(), device_add_disk() is one of the last things being called in the success case, and only void functions are being called after this. Therefore this call should not be negated in the error path. The superfluous call to del_gendisk() leads to the following prints: [ 7.839975] kobject: '(null)' (000000001ff73734): is not initialized, yet kobject_put() is being called. [ 7.840865] WARNING: CPU: 2 PID: 361 at lib/kobject.c:736 kobject_put+0x70/0x120 Fixes: 33cfdc2aa696 ("nvme: enforce extended LBA format for fabrics metadata") Signed-off-by: Niklas Cassel <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-11ext4: mballoc: Use this_cpu_read instead of this_cpu_ptrRitesh Harjani1-1/+1
Simplify reading a seq variable by directly using this_cpu_read API instead of doing this_cpu_ptr and then dereferencing it. This also avoid the below kernel BUG: which happens when CONFIG_DEBUG_PREEMPT is enabled BUG: using smp_processor_id() in preemptible [00000000] code: syz-fuzzer/6927 caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711 CPU: 1 PID: 6927 Comm: syz-fuzzer Not tainted 5.7.0-next-20200602-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48 ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711 ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244 ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626 ext4_getblk+0xad/0x520 fs/ext4/inode.c:833 ext4_bread+0x7c/0x380 fs/ext4/inode.c:883 ext4_append+0x153/0x360 fs/ext4/namei.c:67 ext4_init_new_dir fs/ext4/namei.c:2757 [inline] ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802 vfs_mkdir+0x419/0x690 fs/namei.c:3632 do_mkdirat+0x21e/0x280 fs/namei.c:3655 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 42f56b7a4a7d ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling") Suggested-by: Borislav Petkov <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Signed-off-by: Ritesh Harjani <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reported-by: [email protected] Link: https://lore.kernel.org/r/534f275016296996f54ecf65168bb3392b6f653d.1591699601.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <[email protected]>
2020-06-11ext4: avoid utf8_strncasecmp() with unstable nameEric Biggers1-0/+16
If the dentry name passed to ->d_compare() fits in dentry::d_iname, then it may be concurrently modified by a rename. This can cause undefined behavior (possibly out-of-bounds memory accesses or crashes) in utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings that may be concurrently modified. Fix this by first copying the filename to a stack buffer if needed. This way we get a stable snapshot of the filename. Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups") Cc: <[email protected]> # v5.2+ Cc: Al Viro <[email protected]> Cc: Daniel Rosenberg <[email protected]> Cc: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Eric Biggers <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-06-11ext4: stop overwrite the errcode in ext4_setup_superyangerkun1-0/+1
Now the errcode from ext4_commit_super will overwrite EROFS exists in ext4_setup_super. Actually, no need to call ext4_commit_super since we will return EROFS. Fix it by goto done directly. Fixes: c89128a00838 ("ext4: handle errors on ext4_commit_super") Signed-off-by: yangerkun <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-06-11ext4: fix partial cluster initialization when splitting extentJeffle Xu1-1/+1
Fix the bug when calculating the physical block number of the first block in the split extent. This bug will cause xfstests shared/298 failure on ext4 with bigalloc enabled occasionally. Ext4 error messages indicate that previously freed blocks are being freed again, and the following fsck will fail due to the inconsistency of block bitmap and bg descriptor. The following is an example case: 1. First, Initialize a ext4 filesystem with cluster size '16K', block size '4K', in which case, one cluster contains four blocks. 2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent tree of this file is like: ... 36864:[0]4:220160 36868:[0]14332:145408 51200:[0]2:231424 ... 3. Then execute PUNCH_HOLE fallocate on this file. The hole range is like: .. ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1 ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1 ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1 ... 4. Then the extent tree of this file after punching is like ... 49507:[0]37:158047 49547:[0]58:158087 ... 5. Detailed procedure of punching hole [49544, 49546] 5.1. The block address space: ``` lblk ~49505 49506 49507~49543 49544~49546 49547~ ---------+------+-------------+----------------+-------- extent | hole | extent | hole | extent ---------+------+-------------+----------------+-------- pblk ~158045 158046 158047~158083 158084~158086 158087~ ``` 5.2. The detailed layout of cluster 39521: ``` cluster 39521 <-------------------------------> hole extent <----------------------><-------- lblk 49544 49545 49546 49547 +-------+-------+-------+-------+ | | | | | +-------+-------+-------+-------+ pblk 158084 1580845 158086 158087 ``` 5.3. The ftrace output when punching hole [49544, 49546]: - ext4_ext_remove_space (start 49544, end 49546) - ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2]) - ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2] - ext4_free_blocks: (block 158084 count 4) - ext4_mballoc_free (extent 1/6753/1) 5.4. Ext4 error message in dmesg: EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, block 158084:freeing already freed block (bit 6753); block bitmap corrupt. EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters In this case, the whole cluster 39521 is freed mistakenly when freeing pblock 158084~158086 (i.e., the first three blocks of this cluster), although pblock 158087 (the last remaining block of this cluster) has not been freed yet. The root cause of this isuue is that, the pclu of the partial cluster is calculated mistakenly in ext4_ext_remove_space(). The correct partial_cluster.pclu (i.e., the cluster number of the first block in the next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather than 39522. Fixes: f4226d9ea400 ("ext4: fix partial cluster initialization") Signed-off-by: Jeffle Xu <[email protected]> Reviewed-by: Eric Whitney <[email protected]> Cc: [email protected] # v3.19+ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-06-11ext4: avoid race conditions when remounting with options that change daxTheodore Ts'o1-18/+24
Trying to change dax mount options when remounting could allow mount options to be enabled for a small amount of time, and then the mount option change would be reverted. In the case of "mount -o remount,dax", this can cause a race where files would temporarily treated as DAX --- and then not. Cc: [email protected] Reported-by: [email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-06-11Enable ext4 support for per-file/directory dax operationsTheodore Ts'o18-60/+350
This adds the same per-file/per-directory DAX support for ext4 as was done for xfs, now that we finally have consensus over what the interface should be.
2020-06-11tools, bpftool: Fix memory leak in codegen error casesTobias Klauser1-0/+2
Free the memory allocated for the template on error paths in function codegen. Signed-off-by: Tobias Klauser <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-06-11selftests/bpf: Add cgroup_skb/egress test for load_bytes_relativeYiFei Zhu2-0/+119
When cgroup_skb/egress triggers the MAC header is not set. Added a test that asserts reading MAC header is a -EFAULT but NET header succeeds. The test result from within the eBPF program is stored in an 1-element array map that the userspace then reads and asserts on. Another assertion is added that reading from a large offset, past the end of packet, returns -EFAULT. Signed-off-by: YiFei Zhu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/9028ccbea4385a620e69c0a104f469ffd655c01e.1591812755.git.zhuyifei@google.com
2020-06-11net/filter: Permit reading NET in load_bytes_relative when MAC not setYiFei Zhu1-7/+9
Added a check in the switch case on start_header that checks for the existence of the header, and in the case that MAC is not set and the caller requests for MAC, -EFAULT. If the caller requests for NET then MAC's existence is completely ignored. There is no function to check NET header's existence and as far as cgroup_skb/egress is concerned it should always be set. Removed for ptr >= the start of header, considering offset is bounded unsigned and should always be true. len <= end - mac is redundant to ptr + len <= end. Fixes: 3eee1f75f2b9 ("bpf: fix bpf_skb_load_bytes_relative pkt length check") Signed-off-by: YiFei Zhu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/76bb820ddb6a95f59a772ecbd8c8a336f646b362.1591812755.git.zhuyifei@google.com
2020-06-11x86/mce/dev-mcelog: Fix -Wstringop-truncation warning about strncpy()Tony Luck1-1/+1
The kbuild test robot reported this warning: arch/x86/kernel/cpu/mce/dev-mcelog.c: In function 'dev_mcelog_init_device': arch/x86/kernel/cpu/mce/dev-mcelog.c:346:2: warning: 'strncpy' output \ truncated before terminating nul copying 12 bytes from a string of the \ same length [-Wstringop-truncation] This is accurate, but I don't care that the trailing NUL character isn't copied. The string being copied is just a magic number signature so that crash dump tools can be sure they are decoding the right blob of memory. Use memcpy() instead of strncpy(). Fixes: d8ecca4043f2 ("x86/mce/dev-mcelog: Dynamically allocate space for machine check records") Reported-by: kbuild test robot <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisonedTony Luck4-12/+31
An interesting thing happened when a guest Linux instance took a machine check. The VMM unmapped the bad page from guest physical space and passed the machine check to the guest. Linux took all the normal actions to offline the page from the process that was using it. But then guest Linux crashed because it said there was a second machine check inside the kernel with this stack trace: do_memory_failure set_mce_nospec set_memory_uc _set_memory_uc change_page_attr_set_clr cpa_flush clflush_cache_range_opt This was odd, because a CLFLUSH instruction shouldn't raise a machine check (it isn't consuming the data). Further investigation showed that the VMM had passed in another machine check because is appeared that the guest was accessing the bad page. Fix is to check the scope of the poison by checking the MCi_MISC register. If the entire page is affected, then unmap the page. If only part of the page is affected, then mark the page as uncacheable. This assumes that VMMs will do the logical thing and pass in the "whole page scope" via the MCi_MISC register (since they unmapped the entire page). [ bp: Adjust to x86/entry changes. ] Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Reported-by: Jue Wang <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Jue Wang <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11Merge branch 'x86/entry' into ras/coreThomas Gleixner14668-299542/+837028
to fixup conflicts in arch/x86/kernel/cpu/mce/core.c so MCE specific follow up patches can be applied without creating a horrible merge conflict afterwards.
2020-06-11x86/entry: Unbreak __irqentry_text_start/end magicThomas Gleixner9-25/+36
The entry rework moved interrupt entry code from the irqentry to the noinstr section which made the irqentry section empty. This breaks boundary checks which rely on the __irqentry_text_start/end markers to find out whether a function in a stack trace is interrupt/exception entry code. This affects the function graph tracer and filter_irq_stacks(). As the IDT entry points are all sequentialy emitted this is rather simple to unbreak by injecting __irqentry_text_start/end as global labels. To make this work correctly: - Remove the IRQENTRY_TEXT section from the x86 linker script - Define __irqentry so it breaks the build if it's used - Adjust the entry mirroring in PTI - Remove the redundant kprobes and unwinder bound checks Reported-by: Qian Cai <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2020-06-11x86/entry: __always_inline CR2 for noinstrPeter Zijlstra3-6/+6
vmlinux.o: warning: objtool: exc_page_fault()+0x9: call to read_cr2() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_page_fault()+0x24: call to prefetchw() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_page_fault()+0x21: call to kvm_handle_async_pf.isra.0() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_nmi()+0x1cc: call to write_cr2() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11lockdep: __always_inline more for noinstrPeter Zijlstra2-3/+3
vmlinux.o: warning: objtool: debug_locks_off()+0xd: call to __debug_locks_off() leaves .noinstr.text section vmlinux.o: warning: objtool: match_held_lock()+0x6a: call to look_up_lock_class.isra.0() leaves .noinstr.text section vmlinux.o: warning: objtool: lock_is_held_type()+0x90: call to lockdep_recursion_finish() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: Re-order #DB handler to avoid *SAN instrumentationPeter Zijlstra1-28/+27
vmlinux.o: warning: objtool: exc_debug()+0xbb: call to clear_ti_thread_flag.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: noist_exc_debug()+0x55: call to clear_ti_thread_flag.constprop.0() leaves .noinstr.text section Rework things so that handle_debug() looses the noinstr and move the clear_thread_flag() into that. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: __always_inline arch_atomic_* for noinstrPeter Zijlstra1-7/+7
vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0x33: call to arch_atomic_and.constprop.0() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: __always_inline irqflags for noinstrPeter Zijlstra1-10/+10
vmlinux.o: warning: objtool: lockdep_hardirqs_on()+0x65: call to arch_local_save_flags() leaves .noinstr.text section vmlinux.o: warning: objtool: lockdep_hardirqs_off()+0x5d: call to arch_local_save_flags() leaves .noinstr.text section vmlinux.o: warning: objtool: lock_is_held_type()+0x35: call to arch_local_irq_save() leaves .noinstr.text section vmlinux.o: warning: objtool: check_preemption_disabled()+0x31: call to arch_local_save_flags() leaves .noinstr.text section vmlinux.o: warning: objtool: check_preemption_disabled()+0x33: call to arch_irqs_disabled_flags() leaves .noinstr.text section vmlinux.o: warning: objtool: lock_is_held_type()+0x2f: call to native_irq_disable() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: __always_inline debugreg for noinstrPeter Zijlstra1-3/+3
vmlinux.o: warning: objtool: exc_debug()+0x21: call to native_get_debugreg() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/idt: Consolidate idt functionalityThomas Gleixner3-52/+44
- Move load_current_idt() out of line and replace the hideous comment with a lockdep assert. This allows to make idt_table and idt_descr static. - Mark idt_table read only after the IDT initialization is complete. - Shuffle code around to consolidate the #ifdef sections into one. - Adapt the F00F bug code. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/idt: Cleanup trap_init()Thomas Gleixner2-9/+18
No point in having all the IDT cruft in trap_init(). Move it into the IDT code and fixup the comments. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/idt: Use proper constants for table sizeThomas Gleixner1-1/+2
Use the actual struct size to calculate the IDT table size instead of hardcoded values. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/idt: Add comments about early #PF handlingThomas Gleixner1-2/+8
The difference between 32 and 64 bit vs. early #PF handling is not documented. Replace the FIXME at idt_setup_early_pf() with proper comments. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/idt: Mark init only functions __initThomas Gleixner1-2/+2
Since 8175cfbbbfcb ("x86/idt: Remove update_intr_gate()") set_intr_gate() and idt_setup_from_table() are only called from __init functions. Mark them as well. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: Rename trace_hardirqs_off_prepare()Peter Zijlstra6-14/+14
The typical pattern for trace_hardirqs_off_prepare() is: ENTRY lockdep_hardirqs_off(); // because hardware ... do entry magic instrumentation_begin(); trace_hardirqs_off_prepare(); ... do actual work trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); instrumentation_end(); ... do exit magic lockdep_hardirqs_on(); which shows that it's named wrong, rename it to trace_hardirqs_off_finish(), as it concludes the hardirq_off transition. Also, given that the above is the only correct order, make the traditional all-in-one trace_hardirqs_off() follow suit. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: Clarify irq_{enter,exit}_rcu()Peter Zijlstra2-9/+14
Because: irq_enter_rcu() includes lockdep_hardirq_enter() irq_exit_rcu() does *NOT* include lockdep_hardirq_exit() Which resulted in two 'stray' lockdep_hardirq_exit() calls in idtentry.h, and me spending a long time trying to find the matching enter calls. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: Remove DBn stacksPeter Zijlstra5-35/+5
Both #DB itself, as all other IST users (NMI, #MC) now clear DR7 on entry. Combined with not allowing breakpoints on entry/noinstr/NOKPROBE text and no single step (EFLAGS.TF) inside the #DB handler should guarantee no nested #DB. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: Remove debug IDT frobbingPeter Zijlstra5-108/+1
This is all unused now. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: Optimize local_db_save() for virtPeter Zijlstra3-6/+27
Because DRn access is 'difficult' with virt; but the DR7 read is cheaper than a cacheline miss on native, add a virt specific fast path to local_db_save(), such that when breakpoints are not in use to avoid touching DRn entirely. Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry, mce: Disallow #DB during #MCPeter Zijlstra1-0/+12
#MC is fragile as heck, don't tempt fate. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry, nmi: Disable #DBPeter Zijlstra1-52/+3
Instead of playing stupid games with IST stacks, fully disallow #DB during NMIs. There is absolutely no reason to allow them, and killing this saves a heap of trouble. #DB is already forbidden on noinstr and CEA, so there can't be a #DB before this. Disabling it right after nmi_enter() ensures that the full NMI code is protected. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: Introduce local_db_{save,restore}()Peter Zijlstra2-16/+32
In order to allow other exceptions than #DB to disable breakpoints, provide common helpers. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/hw_breakpoint: Prevent data breakpoints on user_pcid_flush_maskLai Jiangshan1-0/+11
The per-CPU user_pcid_flush_mask is used in the low level entry code. A data breakpoint can cause #DB recursion. Protect the full cpu_tlbstate structure for simplicity. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/hw_breakpoint: Prevent data breakpoints on per_cpu cpu_tss_rwLai Jiangshan1-0/+9
cpu_tss_rw is not directly referenced by hardware, but cpu_tss_rw is accessed in CPU entry code, especially when #DB shifts its stacks. If a data breakpoint would be set on cpu_tss_rw.x86_tss.ist[IST_INDEX_DB], it would cause recursive #DB ending up in a double fault. Add it to the list of protected items. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/hw_breakpoint: Prevent data breakpoints on direct GDTLai Jiangshan1-8/+22
A data breakpoint on the GDT can be fatal and must be avoided. The GDT in the CPU entry area is already protected, but not the direct GDT. Add the necessary protection. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/hw_breakpoint: Add within_area() to check data breakpointsLai Jiangshan1-2/+11
Add a within_area() helper to checking whether the data breakpoints overlap with cpu_entry_area. It will be used to completely prevent data breakpoints on GDT, IDT, or TSS. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-06-11xen: Move xen_setup_callback_vector() definition to include/xen/hvm.hVitaly Kuznetsov4-1/+5
Kbuild test robot reports the following problem on ARM: for 'xen_setup_callback_vector' [-Wmissing-prototypes] 1664 | void xen_setup_callback_vector(void) {} | ^~~~~~~~~~~~~~~~~~~~~~~~~ The problem is that xen_setup_callback_vector is a x86 only thing, its definition is present in arch/x86/xen/xen-ops.h but not on ARM. In events_base.c there is a stub for !CONFIG_XEN_PVHVM but it is not declared as 'static'. On x86 the situation is hardly better: drivers/xen/events/events_base.c doesn't include 'xen-ops.h' from arch/x86/xen/, it includes its namesake from include/xen/ which also results in a 'no previous prototype' warning. Currently, xen_setup_callback_vector() has two call sites: one in drivers/xen/events_base.c and another in arch/x86/xen/suspend_hvm.c. The former is placed under #ifdef CONFIG_X86 and the later is only compiled in when CONFIG_XEN_PVHVM. Resolve the issue by moving xen_setup_callback_vector() declaration to arch neutral 'include/xen/hvm.h' as the implementation lives in arch neutral drivers/xen/events/events_base.c. Reported-by: kbuild test robot <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-06-11x86/entry: Remove the TRACE_IRQS cruftThomas Gleixner3-31/+1
No more users. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-11x86/entry: Move paranoid irq tracing out of ASM codeThomas Gleixner4-13/+17
The last step to remove the irq tracing cruft from ASM. Ignore #DF as the maschine is going to die anyway. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-11x86/entry/64: Remove TRACE_IRQS_*_DEBUGThomas Gleixner1-45/+3
Since INT3/#BP no longer runs on an IST, this workaround is no longer required. Tested by running lockdep+ftrace as described in the initial commit: 5963e317b1e9 ("ftrace/x86: Do not change stacks in DEBUG when calling lockdep") Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-11x86/entry/32: Remove redundant irq disable codeThomas Gleixner1-76/+0
All exceptions/interrupts return with interrupts disabled now. No point in doing this in ASM again. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-11x86/entry: Make enter_from_user_mode() staticThomas Gleixner1-1/+1
The ASM users are gone. All callers are local. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-11x86/entry/64: Remove IRQ stack switching ASMThomas Gleixner1-96/+0
No more users. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-11x86/entry: Remove the apic/BUILD interrupt leftoversThomas Gleixner5-196/+4
Remove all the code which was there to emit the system vector stubs. All users are gone. Move the now unused GET_CR2_INTO macro muck to head_64.S where the last user is. Fixup the eye hurting comment there while at it. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-11x86/entry: Convert reschedule interrupt to IDTENTRY_SYSVEC_SIMPLEThomas Gleixner9-63/+7
The scheduler IPI does not need the full interrupt entry handling logic when the entry is from kernel mode. Use IDTENTRY_SYSVEC_SIMPLE and spare all the overhead. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]