aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-02-21KVM: SVM: Fix potential memory leak in svm_cpu_init()Miaohe Lin1-7/+6
When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually the only possible outcome here. Reviewed-by: Liran Alon <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-21KVM: apic: avoid calculating pending eoi from an uninitialized valMiaohe Lin1-1/+3
When pv_eoi_get_user() fails, 'val' may remain uninitialized and the return value of pv_eoi_get_pending() becomes random. Fix the issue by initializing the variable. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Miaohe Lin <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-21KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when ↵Vitaly Kuznetsov4-11/+8
apicv is globally disabled When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*), nothing happens to VMX MSRs on the already existing vCPUs, however, all new ones are created with PIN_BASED_POSTED_INTR filtered out. This is very confusing and results in the following picture inside the guest: $ rdmsr -ax 0x48d ff00000016 7f00000016 7f00000016 7f00000016 This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three. L1 hypervisor may only check CPU0's controls to find out what features are available and it will be very confused later. Switch to setting PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting. Signed-off-by: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-21KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1Vitaly Kuznetsov4-10/+17
Even when APICv is disabled for L1 it can (and, actually, is) still available for L2, this means we need to always call vmx_deliver_nested_posted_interrupt() when attempting an interrupt delivery. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-21kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabledSuravee Suthikulpanit1-0/+3
Launching VM w/ AVIC disabled together with pass-through device results in NULL pointer dereference bug with the following call trace. RIP: 0010:svm_refresh_apicv_exec_ctrl+0x17e/0x1a0 [kvm_amd] Call Trace: kvm_vcpu_update_apicv+0x44/0x60 [kvm] kvm_arch_vcpu_ioctl_run+0x3f4/0x1c80 [kvm] kvm_vcpu_ioctl+0x3d8/0x650 [kvm] do_vfs_ioctl+0xaa/0x660 ? tomoyo_file_ioctl+0x19/0x20 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x57/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Investigation shows that this is due to the uninitialized usage of struct vapu_svm.ir_list in the svm_set_pi_irte_mode(), which is called from svm_refresh_apicv_exec_ctrl(). The ir_list is initialized only if AVIC is enabled. So, fixes by adding a check if AVIC is enabled in the svm_refresh_apicv_exec_ctrl(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206579 Fixes: 8937d762396d ("kvm: x86: svm: Add support to (de)activate posted interrupts.") Signed-off-by: Suravee Suthikulpanit <[email protected]> Tested-by: Alex Williamson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-21KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSEXiaoyao Li2-1/+2
Commit 159348784ff0 ("x86/vmx: Introduce VMX_FEATURES_*") missed bit 26 (enable user wait and pause) of Secondary Processor-based VM-Execution Controls. Add VMX_FEATURE_USR_WAIT_PAUSE flag so that it shows up in /proc/cpuinfo, and use it to define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE to make them uniform. Signed-off-by: Xiaoyao Li <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-21KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadowwanpeng li1-2/+9
For the duration of mapping eVMCS, it derefences ->memslots without holding ->srcu or ->slots_lock when accessing hv assist page. This patch fixes it by moving nested_sync_vmcs12_to_shadow to prepare_guest_switch, where the SRCU is already taken. It can be reproduced by running kvm's evmcs_test selftest. ============================= warning: suspicious rcu usage 5.6.0-rc1+ #53 tainted: g w ioe ----------------------------- ./include/linux/kvm_host.h:623 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by evmcs_test/8507: #0: ffff9ddd156d00d0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680 [kvm] stack backtrace: cpu: 6 pid: 8507 comm: evmcs_test tainted: g w ioe 5.6.0-rc1+ #53 hardware name: dell inc. optiplex 7040/0jctf8, bios 1.4.9 09/12/2016 call trace: dump_stack+0x68/0x9b kvm_read_guest_cached+0x11d/0x150 [kvm] kvm_hv_get_assist_page+0x33/0x40 [kvm] nested_enlightened_vmentry+0x2c/0x60 [kvm_intel] nested_vmx_handle_enlightened_vmptrld.part.52+0x32/0x1c0 [kvm_intel] nested_sync_vmcs12_to_shadow+0x439/0x680 [kvm_intel] vmx_vcpu_run+0x67a/0xe60 [kvm_intel] vcpu_enter_guest+0x35e/0x1bc0 [kvm] kvm_arch_vcpu_ioctl_run+0x40b/0x670 [kvm] kvm_vcpu_ioctl+0x370/0x680 [kvm] ksys_ioctl+0x235/0x850 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x77/0x780 entry_syscall_64_after_hwframe+0x49/0xbe Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-21KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOIMiaohe Lin1-1/+1
Commit 13db77347db1 ("KVM: x86: don't notify userspace IOAPIC on edge EOI") said, edge-triggered interrupts don't set a bit in TMR, which means that IOAPIC isn't notified on EOI. And var level indicates level-triggered interrupt. But commit 3159d36ad799 ("KVM: x86: use generic function for MSI parsing") replace var level with irq.level by mistake. Fix it by changing irq.level to irq.trig_mode. Cc: [email protected] Fixes: 3159d36ad799 ("KVM: x86: use generic function for MSI parsing") Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-21Merge branch 'nvme-5.6-rc3' of git://git.infradead.org/nvme into block-5.6Jens Axboe3-2/+16
Pull NVMe fixes from Keith. * 'nvme-5.6-rc3' of git://git.infradead.org/nvme: nvme-multipath: Fix memory leak with ana_log_buf nvme: Fix uninitialized-variable warning nvme-pci: Use single IRQ vector for old Apple models nvme/pci: Add sleep quirk for Samsung and Toshiba drives
2020-02-21io_uring: prevent sq_thread from spinning when it should stopStefano Garzarella1-12/+12
This patch drops 'cur_mm' before calling cond_resched(), to prevent the sq_thread from spinning even when the user process is finished. Before this patch, if the user process ended without closing the io_uring fd, the sq_thread continues to spin until the 'sq_thread_idle' timeout ends. In the worst case where the 'sq_thread_idle' parameter is bigger than INT_MAX, the sq_thread will spin forever. Fixes: 6c271ce2f1d5 ("io_uring: add submission polling") Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-02-21Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eofFilipe Manana1-5/+5
While logging the prealloc extents of an inode during a fast fsync we call btrfs_truncate_inode_items(), through btrfs_log_prealloc_extents(), while holding a read lock on a leaf of the inode's root (not the log root, the fs/subvol root), and then that function locks the file range in the inode's iotree. This can lead to a deadlock when: * the fsync is ranged * the file has prealloc extents beyond eof * writeback for a range different from the fsync range starts during the fsync * the size of the file is not sector size aligned Because when finishing an ordered extent we lock first a file range and then try to COW the fs/subvol tree to insert an extent item. The following diagram shows how the deadlock can happen. CPU 1 CPU 2 btrfs_sync_file() --> for range [0, 1MiB) --> inode has a size of 1MiB and has 1 prealloc extent beyond the i_size, starting at offset 4MiB flushes all delalloc for the range [0MiB, 1MiB) and waits for the respective ordered extents to complete --> before task at CPU 1 locks the inode, a write into file range [1MiB, 2MiB + 1KiB) is made --> i_size is updated to 2MiB + 1KiB --> writeback is started for that range, [1MiB, 2MiB + 4KiB) --> end offset rounded up to be sector size aligned btrfs_log_dentry_safe() btrfs_log_inode_parent() btrfs_log_inode() btrfs_log_changed_extents() btrfs_log_prealloc_extents() --> does a search on the inode's root --> holds a read lock on leaf X btrfs_finish_ordered_io() --> locks range [1MiB, 2MiB + 4KiB) --> end offset rounded up to be sector size aligned --> tries to cow leaf X, through insert_reserved_file_extent() --> already locked by the task at CPU 1 btrfs_truncate_inode_items() --> gets an i_size of 2MiB + 1KiB, which is not sector size aligned --> tries to lock file range [2MiB, (u64)-1) --> the start range is rounded down from 2MiB + 1K to 2MiB to be sector size aligned --> but the subrange [2MiB, 2MiB + 4KiB) is already locked by task at CPU 2 which is waiting to get a write lock on leaf X for which we are holding a read lock *** deadlock *** This results in a stack trace like the following, triggered by test case generic/561 from fstests: [ 2779.973608] INFO: task kworker/u8:6:247 blocked for more than 120 seconds. [ 2779.979536] Not tainted 5.6.0-rc2-btrfs-next-53 #1 [ 2779.984503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2779.990136] kworker/u8:6 D 0 247 2 0x80004000 [ 2779.990457] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] [ 2779.990466] Call Trace: [ 2779.990491] ? __schedule+0x384/0xa30 [ 2779.990521] schedule+0x33/0xe0 [ 2779.990616] btrfs_tree_read_lock+0x19e/0x2e0 [btrfs] [ 2779.990632] ? remove_wait_queue+0x60/0x60 [ 2779.990730] btrfs_read_lock_root_node+0x2f/0x40 [btrfs] [ 2779.990782] btrfs_search_slot+0x510/0x1000 [btrfs] [ 2779.990869] btrfs_lookup_file_extent+0x4a/0x70 [btrfs] [ 2779.990944] __btrfs_drop_extents+0x161/0x1060 [btrfs] [ 2779.990987] ? mark_held_locks+0x6d/0xc0 [ 2779.990994] ? __slab_alloc.isra.49+0x99/0x100 [ 2779.991060] ? insert_reserved_file_extent.constprop.19+0x64/0x300 [btrfs] [ 2779.991145] insert_reserved_file_extent.constprop.19+0x97/0x300 [btrfs] [ 2779.991222] ? start_transaction+0xdd/0x5c0 [btrfs] [ 2779.991291] btrfs_finish_ordered_io+0x4f4/0x840 [btrfs] [ 2779.991405] btrfs_work_helper+0xaa/0x720 [btrfs] [ 2779.991432] process_one_work+0x26d/0x6a0 [ 2779.991460] worker_thread+0x4f/0x3e0 [ 2779.991481] ? process_one_work+0x6a0/0x6a0 [ 2779.991489] kthread+0x103/0x140 [ 2779.991499] ? kthread_create_worker_on_cpu+0x70/0x70 [ 2779.991515] ret_from_fork+0x3a/0x50 (...) [ 2780.026211] INFO: task fsstress:17375 blocked for more than 120 seconds. [ 2780.027480] Not tainted 5.6.0-rc2-btrfs-next-53 #1 [ 2780.028482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2780.030035] fsstress D 0 17375 17373 0x00004000 [ 2780.030038] Call Trace: [ 2780.030044] ? __schedule+0x384/0xa30 [ 2780.030052] schedule+0x33/0xe0 [ 2780.030075] lock_extent_bits+0x20c/0x320 [btrfs] [ 2780.030094] ? btrfs_truncate_inode_items+0xf4/0x1150 [btrfs] [ 2780.030098] ? rcu_read_lock_sched_held+0x59/0xa0 [ 2780.030102] ? remove_wait_queue+0x60/0x60 [ 2780.030122] btrfs_truncate_inode_items+0x133/0x1150 [btrfs] [ 2780.030151] ? btrfs_set_path_blocking+0xb2/0x160 [btrfs] [ 2780.030165] ? btrfs_search_slot+0x379/0x1000 [btrfs] [ 2780.030195] btrfs_log_changed_extents.isra.8+0x841/0x93e [btrfs] [ 2780.030202] ? do_raw_spin_unlock+0x49/0xc0 [ 2780.030215] ? btrfs_get_num_csums+0x10/0x10 [btrfs] [ 2780.030239] btrfs_log_inode+0xf83/0x1124 [btrfs] [ 2780.030251] ? __mutex_unlock_slowpath+0x45/0x2a0 [ 2780.030275] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs] [ 2780.030282] ? dget_parent+0xa1/0x370 [ 2780.030309] btrfs_log_dentry_safe+0x4a/0x70 [btrfs] [ 2780.030329] btrfs_sync_file+0x3f3/0x490 [btrfs] [ 2780.030339] do_fsync+0x38/0x60 [ 2780.030343] __x64_sys_fdatasync+0x13/0x20 [ 2780.030345] do_syscall_64+0x5c/0x280 [ 2780.030348] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 2780.030356] RIP: 0033:0x7f2d80f6d5f0 [ 2780.030361] Code: Bad RIP value. [ 2780.030362] RSP: 002b:00007ffdba3c8548 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [ 2780.030364] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d80f6d5f0 [ 2780.030365] RDX: 00007ffdba3c84b0 RSI: 00007ffdba3c84b0 RDI: 0000000000000003 [ 2780.030367] RBP: 000000000000004a R08: 0000000000000001 R09: 00007ffdba3c855c [ 2780.030368] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000001f4 [ 2780.030369] R13: 0000000051eb851f R14: 00007ffdba3c85f0 R15: 0000557a49220d90 So fix this by making btrfs_truncate_inode_items() not lock the range in the inode's iotree when the target root is a log root, since it's not needed to lock the range for log roots as the protection from the inode's lock and log_mutex are all that's needed. Fixes: 28553fa992cb28 ("Btrfs: fix race between shrinking truncate and fiemap") CC: [email protected] # 4.4+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-02-21bootconfig: Add append value operator supportMasami Hiramatsu3-7/+34
Add append value operator "+=" support to bootconfig syntax. With this operator, user can add new value to the key as an entry of array instead of overwriting. For example, foo = bar ... foo += baz Then the key "foo" has "bar" and "baz" values as an array. Link: http://lkml.kernel.org/r/158227283195.12842.8310503105963275584.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-02-21bootconfig: Prohibit re-defining value on same keyMasami Hiramatsu3-6/+24
Currently, bootconfig adds a new value on the existing key to the tail of an array. But this looks a bit confusing because an admin can easily rewrite the original value in the same config file. This rejects the following value re-definition. key = value1 ... key = value2 You should rewrite value1 to value2 in this case. Link: http://lkml.kernel.org/r/158227282199.12842.10110929876059658601.stgit@devnote2 Suggested-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> [ Fixed spelling of arraies to arrays ] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-02-21nvme-multipath: Fix memory leak with ana_log_bufLogan Gunthorpe1-0/+1
kmemleak reports a memory leak with the ana_log_buf allocated by nvme_mpath_init(): unreferenced object 0xffff888120e94000 (size 8208): comm "nvme", pid 6884, jiffies 4295020435 (age 78786.312s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000e2360188>] kmalloc_order+0x97/0xc0 [<0000000079b18dd4>] kmalloc_order_trace+0x24/0x100 [<00000000f50c0406>] __kmalloc+0x24c/0x2d0 [<00000000f31a10b9>] nvme_mpath_init+0x23c/0x2b0 [<000000005802589e>] nvme_init_identify+0x75f/0x1600 [<0000000058ef911b>] nvme_loop_configure_admin_queue+0x26d/0x280 [<00000000673774b9>] nvme_loop_create_ctrl+0x2a7/0x710 [<00000000f1c7a233>] nvmf_dev_write+0xc66/0x10b9 [<000000004199f8d0>] __vfs_write+0x50/0xa0 [<0000000065466fef>] vfs_write+0xf3/0x280 [<00000000b0db9a8b>] ksys_write+0xc6/0x160 [<0000000082156b91>] __x64_sys_write+0x43/0x50 [<00000000c34fbb6d>] do_syscall_64+0x77/0x2f0 [<00000000bbc574c9>] entry_SYSCALL_64_after_hwframe+0x49/0xbe nvme_mpath_init() is called by nvme_init_identify() which is called in multiple places (nvme_reset_work(), nvme_passthru_end(), etc). This means nvme_mpath_init() may be called multiple times before nvme_mpath_uninit() (which is only called on nvme_free_ctrl()). When nvme_mpath_init() is called multiple times, it overwrites the ana_log_buf pointer with a new allocation, thus leaking the previous allocation. To fix this, free ana_log_buf before allocating a new one. Fixes: 0d0b660f214dc490 ("nvme: add ANA support") Cc: <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2020-02-21spi: spidev: Fix CS polarity if GPIO descriptors are usedLukas Wunner1-0/+5
Commit f3186dd87669 ("spi: Optionally use GPIO descriptors for CS GPIOs") amended of_spi_parse_dt() to always set SPI_CS_HIGH for SPI slaves whose Chip Select is defined by a "cs-gpios" devicetree property. This change broke userspace applications which issue an SPI_IOC_WR_MODE ioctl() to an spidev: Chip Select polarity will be incorrect unless the application is changed to set SPI_CS_HIGH. And once changed, it will be incompatible with kernels not containing the commit. Fix by setting SPI_CS_HIGH in spidev_ioctl() (under the same conditions as in of_spi_parse_dt()). Fixes: f3186dd87669 ("spi: Optionally use GPIO descriptors for CS GPIOs") Reported-by: Simon Han <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Link: https://lore.kernel.org/r/fca3ba7cdc930cd36854666ceac4fbcf01b89028.1582027457.git.lukas@wunner.de Signed-off-by: Mark Brown <[email protected]> Cc: [email protected] # v5.1+
2020-02-21spi: qup: call spi_qup_pm_resume_runtime before suspendingYuji Sasaki1-4/+7
spi_qup_suspend() will cause synchronous external abort when runtime suspend is enabled and applied, as it tries to access SPI controller register while clock is already disabled in spi_qup_pm_suspend_runtime(). Signed-off-by: Yuji sasaki <[email protected]> Signed-off-by: Vinod Koul <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2020-02-21genirq/irqdomain: Make sure all irq domain flags are distinctZenghui Yu1-1/+1
This was noticed when printing debugfs for MSIs on my ARM64 server. The new dstate IRQD_MSI_NOMASK_QUIRK came out surprisingly while it should only be the x86 stuff for the time being... The new MSI quirk flag uses the same bit as IRQ_DOMAIN_NAME_ALLOCATED which is oddly defined as bit 6 for no good reason. Switch it to the non used bit 1. Fixes: 6f1a4891a592 ("x86/apic/msi: Plug non-maskable MSI affinity race") Signed-off-by: Zenghui Yu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-02-21mac80211: Remove a redundant mutex unlockAndrei Otcheretianski1-5/+1
The below-mentioned commit changed the code to unlock *inside* the function, but previously the unlock was *outside*. It failed to remove the outer unlock, however, leading to double unlock. Fix this. Fixes: 33483a6b88e4 ("mac80211: fix missing unlock on error in ieee80211_mark_sta_auth()") Signed-off-by: Andrei Otcheretianski <[email protected]> Link: https://lore.kernel.org/r/20200221104719.cce4741cf6eb.I671567b185c8a4c2409377e483fd149ce590f56d@changeid [rewrite commit message to better explain what happened] Signed-off-by: Johannes Berg <[email protected]>
2020-02-21cfg80211: check reg_rule for NULL in handle_channel_custom()Johannes Berg1-1/+1
We may end up with a NULL reg_rule after the loop in handle_channel_custom() if the bandwidth didn't fit, check if this is the case and bail out if so. Signed-off-by: Johannes Berg <[email protected]> Link: https://lore.kernel.org/r/20200221104449.3b558a50201c.I4ad3725c4dacaefd2d18d3cc65ba6d18acd5dbfe@changeid Signed-off-by: Johannes Berg <[email protected]>
2020-02-21nl80211: fix potential leak in AP startJohannes Berg1-2/+2
If nl80211_parse_he_obss_pd() fails, we leak the previously allocated ACL memory. Free it in this case. Fixes: 796e90f42b7e ("cfg80211: add support for parsing OBBS_PD attributes") Signed-off-by: Johannes Berg <[email protected]> Link: https://lore.kernel.org/r/20200221104142.835aba4cdd14.I1923b55ba9989c57e13978f91f40bfdc45e60cbd@changeid Signed-off-by: Johannes Berg <[email protected]>
2020-02-21zonefs: fix documentation typos etc.Randy Dunlap1-10/+10
Fix typos, spellos, etc. in zonefs.txt. Signed-off-by: Randy Dunlap <[email protected]> Cc: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Damien Le Moal <[email protected]>
2020-02-21csky: Implement copy_thread_tlsGuo Ren2-3/+5
This is required for clone3 which passes the TLS value through a struct rather than a register. Cc: Amanieu d'Antras <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Add PCI supportMaJun3-1/+39
Add the pci related code for csky arch to support basic pci virtual function, such as qemu virt-pci-9pfs. Signed-off-by: MaJun <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Minimize defconfig to support buildroot config.fragmentMa Jun1-7/+0
Some bsp (eg: buildroot) has defconfig.fragment design to add more configs into the defconfig in linux source code tree. For example, we could put different cpu configs into different defconfig.fragments, but they all use the same defconfig in Linux. Signed-off-by: Ma Jun <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Add setup_initrd check codeGuo Ren2-3/+44
We should give some necessary check for initrd just like other architectures and it seems that setup_initrd() could be a common code for all architectures. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Cleanup old Kconfig optionsKrzysztof Kozlowski2-3/+0
CONFIG_CLKSRC_OF is gone since commit bb0eb050a577 ("clocksource/drivers: Rename CLKSRC_OF to TIMER_OF"). The platform already selects TIMER_OF. CONFIG_HAVE_DMA_API_DEBUG is gone since commit 6e88628d03dd ("dma-debug: remove CONFIG_HAVE_DMA_API_DEBUG"). CONFIG_DEFAULT_DEADLINE is gone since commit f382fb0bcef4 ("block: remove legacy IO schedulers"). Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2020-02-21arch/csky: fix some Kconfig typosRandy Dunlap1-1/+1
Fix wording in help text for the CPU_HAS_LDSTEX symbol. Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Guo Ren <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Fixup compile warning for three unimplemented syscallsGuo Ren1-0/+3
Implement fstat64, fstatat64, clone3 syscalls to fixup checksyscalls.sh compile warnings. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Remove unused cache implementationGuo Ren1-15/+1
Only for coding convention, these codes are unnecessary for abiv2. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Fixup ftrace modify panicGuo Ren1-0/+2
During ftrace init, linux will replace all function prologues (call_mcout) with nops, but it need flush_dcache and invalidate_icache to make it work. So flush_cache functions couldn't be nested called by ftrace framework. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Add flush_icache_mm to defer flush icache allGuo Ren7-11/+77
Some CPUs don't support icache.va instruction to maintain the whole smp cores' icache. Using icache.all + IPI casue a lot on performace and using defer mechanism could reduce the number of calling icache _flush_all functions. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Optimize abiv2 copy_to_user_page with VM_EXECGuo Ren1-1/+3
Only when vma is for VM_EXEC, we need sync dcache & icache. eg: - gdb ptrace modify user space instruction code area. Add VM_EXEC condition to reduce unnecessary cache flush. The abiv1 cpus' cache are all VIPT, so we still need to deal with dcache aliasing problem. But there is optimized way to use cache color, just like what's done in arch/csky/abiv1/inc/abi/page.h. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Enable defer flush_dcache_page for abiv2 cpus (807/810/860)Guo Ren2-8/+18
Instead of flushing cache per update_mmu_cache() called, we use flush_dcache_page to reduce the frequency of flashing the cache. As abiv2 cpus are all PIPT for icache & dcache, we needn't handle dcache aliasing problem. But their icache can't snoop dcache, so we still need sync_icache_dcache in update_mmu_cache(). Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Remove unnecessary flush_icache_* implementationGuo Ren3-37/+2
The abiv2 CPUs are all PIPT cache, so there is no need to implement flush_icache_page function. The function flush_icache_user_range hasn't been used, so just remove it. The function flush_cache_range is not necessary for PIPT cache when tlb mapping changed. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Support icache flush without specific instructionsGuo Ren4-8/+31
Some CPUs don't support icache specific instructions to flush icache lines in broadcast way. We use cpu control registers to flush local icache and use IPI to notify other cores. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky/Kconfig: Add Kconfig.platforms to support some driversGuo Ren2-0/+11
Such as snps,dw-apb-ictl Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky/smp: Fixup boot failed when CONFIG_SMPGuo Ren1-1/+1
If we use a non-ipi-support interrupt controller, it will cause panic here. We should let cpu up and work with CONFIG_SMP, when we use a non-ipi intc. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Set regs->usp to kernel sp, when the exception is from kernelGuo Ren3-7/+31
In the past, we didn't care about kernel sp when saving pt_reg. But in some cases, we still need pt_reg->usp to represent the kernel stack before enter exception. For cmpxhg in atomic.S, we need save and restore usp for above. Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky/mm: Fixup export invalid_pte_table symbolGuo Ren1-0/+1
There is no present bit in csky pmd hardware, so we need to prepare invalid_pte_table for empty pmd entry and the functions (pmd_none & pmd_present) in pgtable.h need invalid_pte_talbe to get result. If a module use these functions, we need export the symbol for it. Signed-off-by: Guo Ren <[email protected]> Cc: Mo Qihui <[email protected]> Cc: Zhange Jian <[email protected]>
2020-02-21csky: Separate fixaddr_init from highmemGuo Ren7-66/+61
After fixaddr_init is separated from highmem, we could use tcm without highmem selected. (610 (abiv1) don't support highmem, but it could use tcm now.) Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Tightly-Coupled Memory or Sram supportGuo Ren7-1/+304
The implementation are not only used by TCM but also used by sram on SOC bus. It follow existed linux tcm software interface, so that old tcm application codes could be re-used directly. Software interface list in asm/tcm.h: - Variables/Const: __tcmdata, __tcmconst - Functions: __tcmfunc, __tcmlocalfunc - Malloc/Free: tcm_alloc, tcm_free In linux menuconfig: - Choose a TCM contain instrctions + data or separated in ITCM/DTCM. - Determine TCM_BASE (DTCM_BASE) in phyiscal address. - Determine size of TCM or ITCM(DTCM) in page counts. Here is hello tcm example from Documentation/arm/tcm.rst which could be directly used: /* Uninitialized data */ static u32 __tcmdata tcmvar; /* Initialized data */ static u32 __tcmdata tcmassigned = 0x2BADBABEU; /* Constant */ static const u32 __tcmconst tcmconst = 0xCAFEBABEU; static void __tcmlocalfunc tcm_to_tcm(void) { int i; for (i = 0; i < 100; i++) tcmvar ++; } static void __tcmfunc hello_tcm(void) { /* Some abstract code that runs in ITCM */ int i; for (i = 0; i < 100; i++) { tcmvar ++; } tcm_to_tcm(); } static void __init test_tcm(void) { u32 *tcmem; int i; hello_tcm(); printk("Hello TCM executed from ITCM RAM\n"); printk("TCM variable from testrun: %u @ %p\n", tcmvar, &tcmvar); tcmvar = 0xDEADBEEFU; printk("TCM variable: 0x%x @ %p\n", tcmvar, &tcmvar); printk("TCM assigned variable: 0x%x @ %p\n", tcmassigned, &tcmassigned); printk("TCM constant: 0x%x @ %p\n", tcmconst, &tcmconst); /* Allocate some TCM memory from the pool */ tcmem = tcm_alloc(20); if (tcmem) { printk("TCM Allocated 20 bytes of TCM @ %p\n", tcmem); tcmem[0] = 0xDEADBEEFU; tcmem[1] = 0x2BADBABEU; tcmem[2] = 0xCAFEBABEU; tcmem[3] = 0xDEADBEEFU; tcmem[4] = 0x2BADBABEU; for (i = 0; i < 5; i++) printk("TCM tcmem[%d] = %08x\n", i, tcmem[i]); tcm_free(tcmem, 20); } } TODO: - Separate fixup mapping from highmem - Support abiv1 Signed-off-by: Guo Ren <[email protected]>
2020-02-21csky: Initial stack protector supportMao Han3-0/+36
This is a basic -fstack-protector support without per-task canary switching. The protector will report something like when stack corruption is detected: It's tested with strcpy local array overflow in sys_kill and get: stack-protector: Kernel stack is corrupted in: sys_kill+0x23c/0x23c TODO: - Support task switch for different cannary Signed-off-by: Mao Han <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2020-02-21MAINTAINERS: csky: Add mailing list for cskyGuo Ren1-0/+1
Add mailing list and it's convenient for maintain C-SKY subsystem. Signed-off-by: Guo Ren <[email protected]>
2020-02-21ext4: fix potential race between s_group_info online resizing and accessSuraj Jitindar Singh2-21/+39
During an online resize an array of pointers to s_group_info gets replaced so it can get enlarged. If there is a concurrent access to the array in ext4_get_group_info() and this memory has been reused then this can lead to an invalid memory access. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Balbir Singh <[email protected]> Cc: [email protected]
2020-02-21ext4: fix potential race between online resizing and write operationsTheodore Ts'o4-25/+97
During an online resize an array of pointers to buffer heads gets replaced so it can get enlarged. If there is a racing block allocation or deallocation which uses the old array, and the old array has gotten reused this can lead to a GPF or some other random kernel memory getting modified. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443 Link: https://lore.kernel.org/r/[email protected] Reported-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected]
2020-02-21Merge tag 'drm-intel-fixes-2020-02-20' of ↵Dave Airlie19-108/+168
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.6-rc3: - Workaround missing Display Stream Compression (DSC) state readout by forcing modeset when its enabled at probe - Fix EHL port clock voltage level requirements - Fix queuing retire workers on the virtual engine - Fix use of partially initialized waiters - Stop using drm_pci_alloc/drm_pci/free - Fix rewind of RING_TAIL by forcing a context reload - Fix locking on resetting ring->head - Propagate our bug filing URL change to stable kernels Signed-off-by: Dave Airlie <[email protected]> From: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-21Merge tag 'drm-misc-fixes-2020-02-20' of ↵Dave Airlie10-22/+45
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.6-rc3: - Fix dt binding for sunxi. - Allow only 1 rotation argument, and allow 0 rotation in video cmdline. - Small compiler warning fix for panfrost. - Fix when using performance counters in panfrost when using per fd address space. Signed-off-by: Dave Airlie <[email protected]> From: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-20Merge branch 'bnxt_en-shutdown-and-kexec-kdump-related-fixes'David S. Miller1-2/+10
Michael Chan says: ==================== bnxt_en: shutdown and kexec/kdump related fixes. 2 small patches to fix kexec shutdown and kdump kernel driver init issues. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-02-20bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs.Vasundhara Volam1-0/+8
If crashed kernel does not shutdown the NIC properly, PCIe FLR is required in the kdump kernel in order to initialize all the functions properly. Fixes: d629522e1d66 ("bnxt_en: Reduce memory usage when running in kdump kernel.") Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-20bnxt_en: Improve device shutdown method.Vasundhara Volam1-2/+2
Especially when bnxt_shutdown() is called during kexec, we need to disable MSIX and disable Bus Master to completely quiesce the device. Make these 2 calls unconditionally in the shutdown method. Fixes: c20dc142dd7b ("bnxt_en: Disable bus master during PCI shutdown and driver unload.") Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>