Age | Commit message (Collapse) | Author | Files | Lines |
|
Consider certain values (0x00) as unset and load proper default if
an application has not set them properly.
Fixes: 0fe8c8d07134 ("Bluetooth: Split bt_iso_qos into dedicated structures")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
|
|
KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
pages are collected by VMM and the page table entries become write
protected during live migration. Unfortunately, the operand passed
to the TLBI RANGE instruction isn't correctly sorted out due to the
commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()").
It leads to crash on the destination VM after live migration because
TLBs aren't flushed completely and some of the dirty pages are missed.
For example, I have a VM where 8GB memory is assigned, starting from
0x40000000 (1GB). Note that the host has 4KB as the base page size.
In the middile of migration, kvm_tlb_flush_vmid_range() is executed
to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to
__kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3
and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported
by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned
from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop
in the __flush_tlb_range_op() until the variable @scale underflows
and becomes -9, 0xffff708000040000 is set as the operand. The operand
is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to
invalid @scale and @num.
Fix it by extending __TLBI_RANGE_NUM() to support the combination of
SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can
be returned from the macro, meaning the TLBs for 0x200000 pages in the
above example can be flushed in one shoot with SCALE#3 and NUM#31. The
macro TLBI_RANGE_MASK is dropped since no one uses it any more. The
comments are also adjusted accordingly.
Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()")
Cc: [email protected] # v6.6+
Reported-by: Yihuang Yu <[email protected]>
Suggested-by: Marc Zyngier <[email protected]>
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Shaoqin Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
|
|
When unloading a module, its state is changing MODULE_STATE_LIVE ->
MODULE_STATE_GOING -> MODULE_STATE_UNFORMED. Each change will take
a time. `is_module_text_address()` and `__module_text_address()`
works with MODULE_STATE_LIVE and MODULE_STATE_GOING.
If we use `is_module_text_address()` and `__module_text_address()`
separately, there is a chance that the first one is succeeded but the
next one is failed because module->state becomes MODULE_STATE_UNFORMED
between those operations.
In `check_kprobe_address_safe()`, if the second `__module_text_address()`
is failed, that is ignored because it expected a kernel_text address.
But it may have failed simply because module->state has been changed
to MODULE_STATE_UNFORMED. In this case, arm_kprobe() will try to modify
non-exist module text address (use-after-free).
To fix this problem, we should not use separated `is_module_text_address()`
and `__module_text_address()`, but use only `__module_text_address()`
once and do `try_module_get(module)` which is only available with
MODULE_STATE_LIVE.
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 28f6c37a2910 ("kprobes: Forbid probing on trampoline and BPF code areas")
Cc: [email protected]
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
|
|
Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built
with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly
states that disabling SPECULATION_MITIGATIONS is supposed to turn off all
mitigations by default.
│ If you say N, all mitigations will be disabled. You really
│ should know what you are doing to say so.
As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in
some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n.
Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs")
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Daniel Sneddon <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
topo_set_cpuids() updates cpu_present_map and cpu_possible map. It is
invoked during enumeration and "physical hotplug" operations. In the
latter case this results in a kernel crash because cpu_possible_map is
marked read only after init completes.
There is no reason to update cpu_possible_map in that function. During
enumeration cpu_possible_map is not relevant and gets fully initialized
after enumeration completed. On "physical hotplug" the bit is already set
because the kernel allows only CPUs to be plugged which have been
enumerated and associated to a CPU number during early boot.
Remove the bogus update of cpu_possible_map.
Fixes: 0e53e7b656cf ("x86/cpu/topology: Sanitize the APIC admission logic")
Reported-by: Jonathan Cameron <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/87ttkc6kwx.ffs@tglx
|
|
LoongArch's include/asm/addrspace.h uses SZ_32M and SZ_16K, so add
<linux/sizes.h> to provide those macros to prevent build errors:
In file included from ../arch/loongarch/include/asm/io.h:11,
from ../include/linux/io.h:13,
from ../include/linux/io-64-nonatomic-lo-hi.h:5,
from ../drivers/cxl/pci.c:4:
../include/asm-generic/io.h: In function 'ioport_map':
../arch/loongarch/include/asm/addrspace.h:124:25: error: 'SZ_32M' undeclared (first use in this function); did you mean 'PS_32M'?
124 | #define PCI_IOSIZE SZ_32M
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
|
|
Current dts file for Loongson-2K2000's GMAC/GNET is incomplete, both irq
and phy descriptions are missing. Add them to make GMAC/GNET work.
Signed-off-by: Huacai Chen <[email protected]>
|
|
Current dts file for Loongson-2K2000 misses the interrupt-controller &
interrupt-cells descriptions in the msi-controller node, and misses the
msi-parent link in the pci root node. Add them to support PCI-MSI.
Signed-off-by: Huacai Chen <[email protected]>
|
|
Some Loongson-2K2000 platforms have ISA/LPC devices such as Super-IO,
define an ISA node in the dts file to avoid access error. Also adjust
the PCI io resource range to avoid confliction.
Signed-off-by: Huacai Chen <[email protected]>
|
|
Some Loongson-2K1000 platforms have ISA/LPC devices such as Super-IO,
define an ISA node in the dts file to avoid access error.
Signed-off-by: Huacai Chen <[email protected]>
|
|
When enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the
following backtraces when running LongArch kernels.
[ 2.496257] kernel BUG at include/linux/scatterlist.h:187!
...
[ 2.501925] Call Trace:
[ 2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0
[ 2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4
[ 2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4
[ 2.502494] [<9000000004a41b48>] alg_test+0x128/0x690
[ 2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40
[ 2.502775] [<90000000041b4508>] kthread+0x138/0x158
[ 2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4
The backtrace is always similar but not exactly the same. It is always
triggered from cryptomgr_test, but not always from the same test.
Analysis shows that with CONFIG_KFENCE active, the address returned from
kmalloc() and friends is not always below vm_map_base. It is allocated
by kfence_alloc() which at least sometimes seems to get its memory from
an address space above vm_map_base. This causes __virt_addr_valid() to
return false for the affected objects.
Let __virt_addr_valid() return 1 for kfence pool addresses, this make
virt_addr_valid()/__virt_addr_valid() work with KFENCE.
Reported-by: Guenter Roeck <[email protected]>
Suggested-by: Guenter Roeck <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
|
|
KFENCE changes virt_to_page() to be able to translate tlb mapped virtual
addresses, but forget to change virt_to_phys()/phys_to_virt() and other
translation functions as well. This patch fix it, otherwise some drivers
(such as nvme and virtio-blk) cannot work with KFENCE.
All {virt, phys, page, pfn} translation functions are updated:
1, virt_to_pfn()/pfn_to_virt();
2, virt_to_page()/page_to_virt();
3, virt_to_phys()/phys_to_virt().
DMW/TLB mapped addresses are distinguished by comparing the vaddress
with vm_map_base in virt_to_xyz(), and we define WANT_PAGE_VIRTUAL in
the KFENCE case for the reverse translations, xyz_to_virt().
Signed-off-by: Huacai Chen <[email protected]>
|
|
LoongArch will override page_to_virt() which use page_address() in the
KFENCE case (by defining WANT_PAGE_VIRTUAL/HASHED_PAGE_VIRTUAL). So move
lowmem_page_address() a little later to avoid such build errors:
error: implicit declaration of function 'page_address'.
Acked-by: Andrew Morton <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
|
|
Much of turbostat can now run with perf, rather than using the MSR driver
Some of turbostat can now run as a regular non-root user.
Add some new output columns for some new GFX hardware.
[This patch updates the version, but otherwise changes no function;
it touches up some checkpatch issues from previous patches]
Signed-off-by: Len Brown <[email protected]>
|
|
Xe graphics driver uses different graphics sysfs knobs including
/sys/class/drm/card0/device/tile0/gt0/gtidle/idle_residency_ms
/sys/class/drm/card0/device/tile0/gt0/freq0/cur_freq
/sys/class/drm/card0/device/tile0/gt0/freq0/act_freq
/sys/class/drm/card0/device/tile0/gt1/gtidle/idle_residency_ms
/sys/class/drm/card0/device/tile0/gt1/freq0/cur_freq
/sys/class/drm/card0/device/tile0/gt1/freq0/act_freq
Plus that,
/sys/class/drm/card0/device/tile0/gt<n>/gtidle/name
returns either gt<n>-rc or gt<n>-mc. rc is for GFX and mc is SA Media.
Enhance turbostat to prefer the Xe sysfs knobs when they are available.
Export gt<n>-rc via BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz.
Export gt<n>-mc via BIC_SMA_mc6/BIC_SMAMHz/BIC_SMAACTMHz.
Signed-off-by: Zhang Rui <[email protected]>
|
|
On Meteorlake platform, i915 driver supports the traditional graphics
sysfs knobs including
/sys/class/drm/card0/power/rc6_residency_ms
/sys/class/drm/card0/gt_cur_freq_mhz
/sys/class/drm/card0/gt_act_freq_mhz
At the same time, it also supports
/sys/class/drm/card0/gt/gt0/rc6_residency_ms
/sys/class/drm/card0/gt/gt0/rps_cur_freq_mhz
/sys/class/drm/card0/gt/gt0/rps_act_freq_mhz
/sys/class/drm/card0/gt/gt1/rc6_residency_ms
/sys/class/drm/card0/gt/gt1/rps_cur_freq_mhz
/sys/class/drm/card0/gt/gt1/rps_act_freq_mhz
gt0 is for GFX and gt1 is for SA Media.
Enhance turbostat to prefer the i915 new sysfs knobs.
Export gt0 via BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz.
Export gt1 via BIC_SMA_mc6/BIC_SMAMHz/BIC_SMAACTMHz.
Signed-off-by: Zhang Rui <[email protected]>
|
|
Graphics driver (i915/Xe) on mordern platforms splits GFX and SA Media
information via different sysfs knobs.
Existing BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz columns can be reused for
GFX.
Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz columns for SA Media.
Signed-off-by: Zhang Rui <[email protected]>
|
|
Binding devm_led_classdev_register() to the netdev is problematic
because on module removal we get a RTNL-related deadlock. Fix this
by avoiding the device-managed LED functions.
Note: We can safely call led_classdev_unregister() for a LED even
if registering it failed, because led_classdev_unregister() detects
this and is a no-op in this case.
Fixes: 18764b883e15 ("r8169: add support for LED's on RTL8168/RTL8101")
Cc: [email protected]
Reported-by: Lukas Wunner <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
tick_do_timer_cpu is used lockless to check which CPU needs to take care
of the per tick timekeeping duty. This is done to avoid a thundering
herd problem on jiffies_lock.
The read and writes are not annotated so KCSAN complains about data races:
BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event
write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26:
tick_nohz_idle_stop_tick+0x3b1/0x4a0
do_idle+0x1e3/0x250
read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16:
tick_nohz_next_event+0xe7/0x1e0
tick_nohz_get_sleep_length+0xa7/0xe0
menu_select+0x82/0xb90
cpuidle_select+0x44/0x60
do_idle+0x1c2/0x250
value changed: 0x0000001a -> 0xffffffff
Annotate them with READ/WRITE_ONCE() to document the intentional data race.
Reported-by: Mirsad Todorovac <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Sean Anderson <[email protected]>
Link: https://lore.kernel.org/r/87cyqy7rt3.ffs@tglx
|
|
When the driver notices fw_status == 0xff it tries to perform a PCI
reset on itself via pci_reset_function() in the context of the driver's
health thread. However, pdsc_reset_prepare calls
pdsc_stop_health_thread(), which attempts to stop/flush the health
thread. This results in a deadlock because the stop/flush will never
complete since the driver called pci_reset_function() from the health
thread context. Fix by changing the pdsc_check_pci_health_function()
to queue a newly introduced pdsc_pci_reset_thread() on the pdsc's
work queue.
Unloading the driver in the fw_down/dead state uncovered another issue,
which can be seen in the following trace:
WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440
[...]
RIP: 0010:__queue_work+0x358/0x440
[...]
Call Trace:
<TASK>
? __warn+0x85/0x140
? __queue_work+0x358/0x440
? report_bug+0xfc/0x1e0
? handle_bug+0x3f/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? __queue_work+0x358/0x440
queue_work_on+0x28/0x30
pdsc_devcmd_locked+0x96/0xe0 [pds_core]
pdsc_devcmd_reset+0x71/0xb0 [pds_core]
pdsc_teardown+0x51/0xe0 [pds_core]
pdsc_remove+0x106/0x200 [pds_core]
pci_device_remove+0x37/0xc0
device_release_driver_internal+0xae/0x140
driver_detach+0x48/0x90
bus_remove_driver+0x6d/0xf0
pci_unregister_driver+0x2e/0xa0
pdsc_cleanup_module+0x10/0x780 [pds_core]
__x64_sys_delete_module+0x142/0x2b0
? syscall_trace_enter.isra.18+0x126/0x1a0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fbd9d03a14b
[...]
Fix this by preventing the devcmd reset if the FW is not running.
Fixes: d9407ff11809 ("pds_core: Prevent health thread from running during reset/remove")
Reviewed-by: Shannon Nelson <[email protected]>
Signed-off-by: Brett Creeley <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The definition of spectre_bhi_state() incorrectly returns a const char
* const. This causes the a compiler warning when building with W=1:
warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
2812 | static const char * const spectre_bhi_state(void)
Remove the const qualifier from the pointer.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Reported-by: Sean Christopherson <[email protected]>
Signed-off-by: Daniel Sneddon <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Prepare to fix aspects of the new BHI code.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
On x86 each struct cpu_hw_events maintains a table for counter assignment but
it missed to update one for the deleted event in x86_pmu_del(). This
can make perf_clear_dirty_counters() reset used counter if it's called
before event scheduling or enabling. Then it would return out of range
data which doesn't make sense.
The following code can reproduce the problem.
$ cat repro.c
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/perf_event.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/syscall.h>
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
.disabled = 1,
};
void *worker(void *arg)
{
int cpu = (long)arg;
int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
void *p;
do {
ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0);
p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);
ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0);
munmap(p, 4096);
ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0);
} while (1);
return NULL;
}
int main(void)
{
int i;
int n = sysconf(_SC_NPROCESSORS_ONLN);
pthread_t *th = calloc(n, sizeof(*th));
for (i = 0; i < n; i++)
pthread_create(&th[i], NULL, worker, (void *)(long)i);
for (i = 0; i < n; i++)
pthread_join(th[i], NULL);
free(th);
return 0;
}
And you can see the out of range data using perf stat like this.
Probably it'd be easier to see on a large machine.
$ gcc -o repro repro.c -pthread
$ ./repro &
$ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }'
1.001028462 CPU6 196,719,295,683,763 cycles # 194290.996 GHz (71.54%)
1.001028462 CPU3 396,077,485,787,730 branch-misses # 15804359784.80% of all branches (71.07%)
1.001028462 CPU17 197,608,350,727,877 branch-misses # 14594186554.56% of all branches (71.22%)
2.020064073 CPU4 198,372,472,612,140 cycles # 194681.113 GHz (70.95%)
2.020064073 CPU6 199,419,277,896,696 cycles # 195720.007 GHz (70.57%)
2.020064073 CPU20 198,147,174,025,639 cycles # 194474.654 GHz (71.03%)
2.020064073 CPU20 198,421,240,580,145 stalled-cycles-frontend # 100.14% frontend cycles idle (70.93%)
3.037443155 CPU4 197,382,689,923,416 cycles # 194043.065 GHz (71.30%)
3.037443155 CPU20 196,324,797,879,414 cycles # 193003.773 GHz (71.69%)
3.037443155 CPU5 197,679,956,608,205 stalled-cycles-backend # 1315606428.66% backend cycles idle (71.19%)
3.037443155 CPU5 198,571,860,474,851 instructions # 13215422.58 insn per cycle
It should move the contents in the cpuc->assign as well.
Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task")
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
This patch to differentiate external rev id for gfx 11.5.0.
Signed-off-by: Yifan Zhang <[email protected]>
Reviewed-by: Tim Huang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
[Why]
OLED panels show no display for large vtotal timings.
[How]
Check if ss is enabled and read from lut for spread spectrum percentage.
Adjust dprefclk as required. DP_DTO adjustment is for edp only.
Cc: [email protected]
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Zhongwei <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The previous check for the is_vsc_sdp_colorimetry_supported flag
for MST sink signals did nothing. Simplify the code and use the
same check for MST and SST.
Cc: [email protected]
Reviewed-by: Agustin Gutierrez <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In order for display colorimetry to work correctly on DP displays
we need to send the VSC SDP packet. We should only do so for
panels with DPCD revision greater or equal to 1.4 as older
receivers might have problems with it.
Cc: [email protected]
Cc: Joshua Ashton <[email protected]>
Cc: Xaver Hugl <[email protected]>
Cc: Melissa Wen <[email protected]>
Cc: Agustin Gutierrez <[email protected]>
Reviewed-by: Agustin Gutierrez <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
Wrong logic cause screen corruption.
[How]
Port logic from DCN35/314.
Cc: [email protected]
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Fudongwang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[WHY&HOW]
We should not be recursively calling the manual trigger programming function when
FAMS is not in use.
Cc: [email protected]
Reviewed-by: Alvin Lee <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Dillon Varone <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[why]
In current implemenation ODM mode is only reset when the last plane is
removed from dc state. For any dc validate we will always remove all
current planes and add new planes. However when switching from no planes
to 1 plane, ODM mode is not reset because no planes get removed. This
has caused an issue where we kept ODM combine when it should have been
remove when a plane is added. The change is to reset ODM mode when
adding the first plane.
Cc: [email protected]
Reviewed-by: Alvin Lee <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Wenjing Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The RB bitmap should be global active RB bitmap &
active RB bitmap based on active SA.
Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
mode_config's max width x height is 4096x2160 and is higher than DWB's
max resolution 3840x2160 which is returned instead.
Cc: [email protected]
Reviewed-by: Harry Wentland <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault.
Signed-off-by: Zhigang Luo <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE
packet to init shadow memory will be skiped, hence there has a page fault.
[How]
VM flush is needed after GPU reset, clear set_q_mode_offs when
emitting VM flush.
Fixes: 8bc75586ea01 ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2")
Reviewed-by: Christian König <[email protected]>
Signed-off-by: ZhenGuo Yin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In
certain configs, VCN instance can be exclusively allocated to a
partition even under CPX mode.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Reviewed-by: Asad Kamal <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
fix the high voltage issue after unload on smu 13.0.10
Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[WHY]
dynamic memory safety error detector (KASAN) catches and generates error
messages "BUG: KASAN: slab-out-of-bounds" as writeback connector does not
support certain features which are not initialized.
[HOW]
Skip them when connector type is DRM_MODE_CONNECTOR_WRITEBACK.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3199
Reviewed-by: Harry Wentland <[email protected]>
Reviewed-by: Rodrigo Siqueira <[email protected]>
Acked-by: Roman Li <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
SDMA_CNTL is not set in some cases, driver configures it by itself.
v2: simplify code
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch to add smu 14.0.1 support
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Yifan Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
update ppsmc.h pmfw.h and driver_if.h for smu v14_0_1
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: lima1002 <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
From MES version 0x54, the log entry increased and require the log buffer
size to be increased. The 16k is maximum size agreed
Signed-off-by: shaoyunl <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The MES log might slow down the performance for extra step of log the data,
disable it by default and introduce a parameter can enable it when necessary
Signed-off-by: shaoyunl <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
While doing multiple S4 stress tests, GC/RLC/PMFW get into
an invalid state resulting into hard hangs.
Adding a GFX reset as workaround just before sending the
MP1_UNLOAD message avoids this failure.
Signed-off-by: Tim Huang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is a new DCN veriosn 3.5.1 need to load
Signed-off-by: Li Ma <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For SOC21 ASICs, there is an issue in re-enabling PM features if a
suspend got aborted. In such cases, reset the device during resume
phase. This is a workaround till a proper solution is finalized.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Otherwise the old one will be used during GPU reset.
That's not expected.
Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Downgrade to debug information when IBs are skipped. Also, use dev_* to
identify the device.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Asad Kamal <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There are cases where soft reset seems to succeed, but
does not, so always use mode1/2 for now.
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Currently, with F32 HWS GPU reset is only when unmap queue fails.
However, if compute queue doesn't repond to preemption request in time
unmap will return without any error. In this case, only preemption error
is logged and Reset is not triggered. Call GPU reset in this case also.
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Harish Kasiviswanathan <[email protected]>
Reviewed-by: Mukul Joshi <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
still means hlist_for_each_entry_rcu can return an item that got removed
from the list. The memory itself of such item is not freed thanks to RCU
but nothing guarantees the actual content of the memory is sane.
In particular, the reference count can be zero. This can happen if
ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
timing, this can happen:
1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
2. Then, the whole ipv6_del_addr is executed for the given entry. The
reference count drops to zero and kfree_rcu is scheduled.
3. ipv6_get_ifaddr continues and tries to increments the reference count
(in6_ifa_hold).
4. The rcu is unlocked and the entry is freed.
5. The freed entry is returned.
Prevent increasing of the reference count in such case. The name
in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
[ 41.506330] refcount_t: addition on 0; use-after-free.
[ 41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130
[ 41.507413] Modules linked in: veth bridge stp llc
[ 41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14
[ 41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
[ 41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130
[ 41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff
[ 41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282
[ 41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000
[ 41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900
[ 41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff
[ 41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000
[ 41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48
[ 41.514086] FS: 00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000
[ 41.514726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0
[ 41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 41.516799] Call Trace:
[ 41.517037] <TASK>
[ 41.517249] ? __warn+0x7b/0x120
[ 41.517535] ? refcount_warn_saturate+0xa5/0x130
[ 41.517923] ? report_bug+0x164/0x190
[ 41.518240] ? handle_bug+0x3d/0x70
[ 41.518541] ? exc_invalid_op+0x17/0x70
[ 41.520972] ? asm_exc_invalid_op+0x1a/0x20
[ 41.521325] ? refcount_warn_saturate+0xa5/0x130
[ 41.521708] ipv6_get_ifaddr+0xda/0xe0
[ 41.522035] inet6_rtm_getaddr+0x342/0x3f0
[ 41.522376] ? __pfx_inet6_rtm_getaddr+0x10/0x10
[ 41.522758] rtnetlink_rcv_msg+0x334/0x3d0
[ 41.523102] ? netlink_unicast+0x30f/0x390
[ 41.523445] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 41.523832] netlink_rcv_skb+0x53/0x100
[ 41.524157] netlink_unicast+0x23b/0x390
[ 41.524484] netlink_sendmsg+0x1f2/0x440
[ 41.524826] __sys_sendto+0x1d8/0x1f0
[ 41.525145] __x64_sys_sendto+0x1f/0x30
[ 41.525467] do_syscall_64+0xa5/0x1b0
[ 41.525794] entry_SYSCALL_64_after_hwframe+0x72/0x7a
[ 41.526213] RIP: 0033:0x7fbc4cfcea9a
[ 41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a
[ 41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005
[ 41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c
[ 41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b
[ 41.531573] </TASK>
Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU")
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Jiri Benc <[email protected]>
Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
|