aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-10mm: Move lowmem_page_address() a little laterHuacai Chen1-5/+5
LoongArch will override page_to_virt() which use page_address() in the KFENCE case (by defining WANT_PAGE_VIRTUAL/HASHED_PAGE_VIRTUAL). So move lowmem_page_address() a little later to avoid such build errors: error: implicit declaration of function 'page_address'. Acked-by: Andrew Morton <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-04-10tools/power turbostat: v2024.04.10Len Brown3-15/+28
Much of turbostat can now run with perf, rather than using the MSR driver Some of turbostat can now run as a regular non-root user. Add some new output columns for some new GFX hardware. [This patch updates the version, but otherwise changes no function; it touches up some checkpatch issues from previous patches] Signed-off-by: Len Brown <[email protected]>
2024-04-10tools/power/turbostat: Add support for Xe sysfs knobsZhang Rui1-0/+51
Xe graphics driver uses different graphics sysfs knobs including /sys/class/drm/card0/device/tile0/gt0/gtidle/idle_residency_ms /sys/class/drm/card0/device/tile0/gt0/freq0/cur_freq /sys/class/drm/card0/device/tile0/gt0/freq0/act_freq /sys/class/drm/card0/device/tile0/gt1/gtidle/idle_residency_ms /sys/class/drm/card0/device/tile0/gt1/freq0/cur_freq /sys/class/drm/card0/device/tile0/gt1/freq0/act_freq Plus that, /sys/class/drm/card0/device/tile0/gt<n>/gtidle/name returns either gt<n>-rc or gt<n>-mc. rc is for GFX and mc is SA Media. Enhance turbostat to prefer the Xe sysfs knobs when they are available. Export gt<n>-rc via BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz. Export gt<n>-mc via BIC_SMA_mc6/BIC_SMAMHz/BIC_SMAACTMHz. Signed-off-by: Zhang Rui <[email protected]>
2024-04-10tools/power/turbostat: Add support for new i915 sysfs knobsZhang Rui1-0/+24
On Meteorlake platform, i915 driver supports the traditional graphics sysfs knobs including /sys/class/drm/card0/power/rc6_residency_ms /sys/class/drm/card0/gt_cur_freq_mhz /sys/class/drm/card0/gt_act_freq_mhz At the same time, it also supports /sys/class/drm/card0/gt/gt0/rc6_residency_ms /sys/class/drm/card0/gt/gt0/rps_cur_freq_mhz /sys/class/drm/card0/gt/gt0/rps_act_freq_mhz /sys/class/drm/card0/gt/gt1/rc6_residency_ms /sys/class/drm/card0/gt/gt1/rps_cur_freq_mhz /sys/class/drm/card0/gt/gt1/rps_act_freq_mhz gt0 is for GFX and gt1 is for SA Media. Enhance turbostat to prefer the i915 new sysfs knobs. Export gt0 via BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz. Export gt1 via BIC_SMA_mc6/BIC_SMAMHz/BIC_SMAACTMHz. Signed-off-by: Zhang Rui <[email protected]>
2024-04-10tools/power/turbostat: Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHzZhang Rui2-7/+96
Graphics driver (i915/Xe) on mordern platforms splits GFX and SA Media information via different sysfs knobs. Existing BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz columns can be reused for GFX. Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz columns for SA Media. Signed-off-by: Zhang Rui <[email protected]>
2024-04-10r8169: fix LED-related deadlock on module removalHeiner Kallweit3-16/+33
Binding devm_led_classdev_register() to the netdev is problematic because on module removal we get a RTNL-related deadlock. Fix this by avoiding the device-managed LED functions. Note: We can safely call led_classdev_unregister() for a LED even if registering it failed, because led_classdev_unregister() detects this and is a no-op in this case. Fixes: 18764b883e15 ("r8169: add support for LED's on RTL8168/RTL8101") Cc: [email protected] Reported-by: Lukas Wunner <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-10timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpuThomas Gleixner2-22/+31
tick_do_timer_cpu is used lockless to check which CPU needs to take care of the per tick timekeeping duty. This is done to avoid a thundering herd problem on jiffies_lock. The read and writes are not annotated so KCSAN complains about data races: BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26: tick_nohz_idle_stop_tick+0x3b1/0x4a0 do_idle+0x1e3/0x250 read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16: tick_nohz_next_event+0xe7/0x1e0 tick_nohz_get_sleep_length+0xa7/0xe0 menu_select+0x82/0xb90 cpuidle_select+0x44/0x60 do_idle+0x1c2/0x250 value changed: 0x0000001a -> 0xffffffff Annotate them with READ/WRITE_ONCE() to document the intentional data race. Reported-by: Mirsad Todorovac <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Sean Anderson <[email protected]> Link: https://lore.kernel.org/r/87cyqy7rt3.ffs@tglx
2024-04-10pds_core: Fix pdsc_check_pci_health function to use work threadBrett Creeley4-1/+18
When the driver notices fw_status == 0xff it tries to perform a PCI reset on itself via pci_reset_function() in the context of the driver's health thread. However, pdsc_reset_prepare calls pdsc_stop_health_thread(), which attempts to stop/flush the health thread. This results in a deadlock because the stop/flush will never complete since the driver called pci_reset_function() from the health thread context. Fix by changing the pdsc_check_pci_health_function() to queue a newly introduced pdsc_pci_reset_thread() on the pdsc's work queue. Unloading the driver in the fw_down/dead state uncovered another issue, which can be seen in the following trace: WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440 [...] RIP: 0010:__queue_work+0x358/0x440 [...] Call Trace: <TASK> ? __warn+0x85/0x140 ? __queue_work+0x358/0x440 ? report_bug+0xfc/0x1e0 ? handle_bug+0x3f/0x70 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? __queue_work+0x358/0x440 queue_work_on+0x28/0x30 pdsc_devcmd_locked+0x96/0xe0 [pds_core] pdsc_devcmd_reset+0x71/0xb0 [pds_core] pdsc_teardown+0x51/0xe0 [pds_core] pdsc_remove+0x106/0x200 [pds_core] pci_device_remove+0x37/0xc0 device_release_driver_internal+0xae/0x140 driver_detach+0x48/0x90 bus_remove_driver+0x6d/0xf0 pci_unregister_driver+0x2e/0xa0 pdsc_cleanup_module+0x10/0x780 [pds_core] __x64_sys_delete_module+0x142/0x2b0 ? syscall_trace_enter.isra.18+0x126/0x1a0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fbd9d03a14b [...] Fix this by preventing the devcmd reset if the FW is not running. Fixes: d9407ff11809 ("pds_core: Prevent health thread from running during reset/remove") Reviewed-by: Shannon Nelson <[email protected]> Signed-off-by: Brett Creeley <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-10x86/bugs: Fix return type of spectre_bhi_state()Daniel Sneddon1-1/+1
The definition of spectre_bhi_state() incorrectly returns a const char * const. This causes the a compiler warning when building with W=1: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] 2812 | static const char * const spectre_bhi_state(void) Remove the const qualifier from the pointer. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Reported-by: Sean Christopherson <[email protected]> Signed-off-by: Daniel Sneddon <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-10Merge branch 'linus' into x86/urgent, to pick up dependent commitsIngo Molnar33-87/+463
Prepare to fix aspects of the new BHI code. Signed-off-by: Ingo Molnar <[email protected]>
2024-04-10perf/x86: Fix out of range dataNamhyung Kim1-0/+1
On x86 each struct cpu_hw_events maintains a table for counter assignment but it missed to update one for the deleted event in x86_pmu_del(). This can make perf_clear_dirty_counters() reset used counter if it's called before event scheduling or enabling. Then it would return out of range data which doesn't make sense. The following code can reproduce the problem. $ cat repro.c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <linux/perf_event.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/syscall.h> struct perf_event_attr attr = { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, .disabled = 1, }; void *worker(void *arg) { int cpu = (long)arg; int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0); int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0); void *p; do { ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0); p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0); ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0); ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0); munmap(p, 4096); ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0); } while (1); return NULL; } int main(void) { int i; int n = sysconf(_SC_NPROCESSORS_ONLN); pthread_t *th = calloc(n, sizeof(*th)); for (i = 0; i < n; i++) pthread_create(&th[i], NULL, worker, (void *)(long)i); for (i = 0; i < n; i++) pthread_join(th[i], NULL); free(th); return 0; } And you can see the out of range data using perf stat like this. Probably it'd be easier to see on a large machine. $ gcc -o repro repro.c -pthread $ ./repro & $ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }' 1.001028462 CPU6 196,719,295,683,763 cycles # 194290.996 GHz (71.54%) 1.001028462 CPU3 396,077,485,787,730 branch-misses # 15804359784.80% of all branches (71.07%) 1.001028462 CPU17 197,608,350,727,877 branch-misses # 14594186554.56% of all branches (71.22%) 2.020064073 CPU4 198,372,472,612,140 cycles # 194681.113 GHz (70.95%) 2.020064073 CPU6 199,419,277,896,696 cycles # 195720.007 GHz (70.57%) 2.020064073 CPU20 198,147,174,025,639 cycles # 194474.654 GHz (71.03%) 2.020064073 CPU20 198,421,240,580,145 stalled-cycles-frontend # 100.14% frontend cycles idle (70.93%) 3.037443155 CPU4 197,382,689,923,416 cycles # 194043.065 GHz (71.30%) 3.037443155 CPU20 196,324,797,879,414 cycles # 193003.773 GHz (71.69%) 3.037443155 CPU5 197,679,956,608,205 stalled-cycles-backend # 1315606428.66% backend cycles idle (71.19%) 3.037443155 CPU5 198,571,860,474,851 instructions # 13215422.58 insn per cycle It should move the contents in the cpuc->assign as well. Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task") Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2024-04-10drm/amdgpu: differentiate external rev id for gfx 11.5.0Yifan Zhang1-1/+4
This patch to differentiate external rev id for gfx 11.5.0. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09drm/amd/display: Adjust dprefclk by down spread percentage.Zhongwei2-4/+54
[Why] OLED panels show no display for large vtotal timings. [How] Check if ss is enabled and read from lut for spread spectrum percentage. Adjust dprefclk as required. DP_DTO adjustment is for edp only. Cc: [email protected] Reviewed-by: Nicholas Kazlauskas <[email protected]> Acked-by: Hamza Mahfooz <[email protected]> Signed-off-by: Zhongwei <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/display: Set VSC SDP Colorimetry same way for MST and SSTHarry Wentland1-9/+3
The previous check for the is_vsc_sdp_colorimetry_supported flag for MST sink signals did nothing. Simplify the code and use the same check for MST and SST. Cc: [email protected] Reviewed-by: Agustin Gutierrez <[email protected]> Acked-by: Hamza Mahfooz <[email protected]> Signed-off-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4Harry Wentland1-2/+5
In order for display colorimetry to work correctly on DP displays we need to send the VSC SDP packet. We should only do so for panels with DPCD revision greater or equal to 1.4 as older receivers might have problems with it. Cc: [email protected] Cc: Joshua Ashton <[email protected]> Cc: Xaver Hugl <[email protected]> Cc: Melissa Wen <[email protected]> Cc: Agustin Gutierrez <[email protected]> Reviewed-by: Agustin Gutierrez <[email protected]> Acked-by: Hamza Mahfooz <[email protected]> Signed-off-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/display: fix disable otg wa logic in DCN316Fudongwang1-7/+12
[Why] Wrong logic cause screen corruption. [How] Port logic from DCN35/314. Cc: [email protected] Reviewed-by: Nicholas Kazlauskas <[email protected]> Acked-by: Hamza Mahfooz <[email protected]> Signed-off-by: Fudongwang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/display: Do not recursively call manual trigger programmingDillon Varone1-3/+0
[WHY&HOW] We should not be recursively calling the manual trigger programming function when FAMS is not in use. Cc: [email protected] Reviewed-by: Alvin Lee <[email protected]> Acked-by: Hamza Mahfooz <[email protected]> Signed-off-by: Dillon Varone <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/display: always reset ODM mode in context when adding first planeWenjing Liu1-0/+9
[why] In current implemenation ODM mode is only reset when the last plane is removed from dc state. For any dc validate we will always remove all current planes and add new planes. However when switching from no planes to 1 plane, ODM mode is not reset because no planes get removed. This has caused an issue where we kept ODM combine when it should have been remove when a plane is added. The change is to reset ODM mode when adding the first plane. Cc: [email protected] Reviewed-by: Alvin Lee <[email protected]> Acked-by: Hamza Mahfooz <[email protected]> Signed-off-by: Wenjing Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amdgpu: fix incorrect number of active RBs for gfx11Tim Huang1-1/+1
The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09drm/amd/display: Return max resolution supported by DWBAlex Hung1-4/+2
mode_config's max width x height is 4096x2160 and is higher than DWB's max resolution 3840x2160 which is returned instead. Cc: [email protected] Reviewed-by: Harry Wentland <[email protected]> Acked-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Hung <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09amd/amdkfd: sync all devices to wait all processes being evictedZhigang Luo1-11/+6
If there are more than one device doing reset in parallel, the first device will call kfd_suspend_all_processes() to evict all processes on all devices, this call takes time to finish. other device will start reset and recover without waiting. if the process has not been evicted before doing recover, it will be restored, then caused page fault. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amdgpu: clear set_q_mode_offs when VM changedZhenGuo Yin1-0/+1
[Why] set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE packet to init shadow memory will be skiped, hence there has a page fault. [How] VM flush is needed after GPU reset, clear set_q_mode_offs when emitting VM flush. Fixes: 8bc75586ea01 ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2") Reviewed-by: Christian König <[email protected]> Signed-off-by: ZhenGuo Yin <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09drm/amdgpu: Fix VCN allocation in CPX partitionLijo Lazar1-4/+11
VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In certain configs, VCN instance can be exclusively allocated to a partition even under CPX mode. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: James Zhu <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/pm: fix the high voltage issue after unloadKenneth Feng4-14/+48
fix the high voltage issue after unload on smu 13.0.10 Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/display: Skip on writeback when it's not applicableAlex Hung1-1/+8
[WHY] dynamic memory safety error detector (KASAN) catches and generates error messages "BUG: KASAN: slab-out-of-bounds" as writeback connector does not support certain features which are not initialized. [HOW] Skip them when connector type is DRM_MODE_CONNECTOR_WRITEBACK. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3199 Reviewed-by: Harry Wentland <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Acked-by: Roman Li <[email protected]> Signed-off-by: Alex Hung <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2Tao Zhou1-13/+3
SDMA_CNTL is not set in some cases, driver configures it by itself. v2: simplify code Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amdgpu: add smu 14.0.1 discovery supportYifan Zhang1-0/+1
This patch to add smu 14.0.1 support Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/swsmu: Update smu v14.0.0 headers to be 14.0.1 compatiblelima10026-43/+413
update ppsmc.h pmfw.h and driver_if.h for smu v14_0_1 Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: lima1002 <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amdgpu : Increase the mes log buffer size as per new MES FW versionshaoyunl2-3/+3
From MES version 0x54, the log entry increased and require the log buffer size to be increased. The 16k is maximum size agreed Signed-off-by: shaoyunl <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amdgpu : Add mes_log_enable to control mes log featureshaoyunl4-3/+20
The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11Tim Huang1-1/+11
While doing multiple S4 stress tests, GC/RLC/PMFW get into an invalid state resulting into hard hangs. Adding a GFX reset as workaround just before sending the MP1_UNLOAD message avoids this failure. Signed-off-by: Tim Huang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amd/display: add DCN 351 version for microcode loadLi Ma1-1/+6
There is a new DCN veriosn 3.5.1 need to load Signed-off-by: Li Ma <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amdgpu: Reset dGPU if suspend got abortedLijo Lazar1-0/+25
For SOC21 ASICs, there is an issue in re-enabling PM features if a suspend got aborted. In such cases, reset the device during resume phase. This is a workaround till a proper solution is finalized. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09drm/amdgpu/umsch: reinitialize write pointer in hw initLang Yu1-0/+2
Otherwise the old one will be used during GPU reset. That's not expected. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09drm/amdgpu: Refine IB schedule error loggingLijo Lazar1-2/+5
Downgrade to debug information when IBs are skipped. Also, use dev_* to identify the device. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-09drm/amdgpu: always force full reset for SOC21Alex Deucher1-2/+0
There are cases where soft reset seems to succeed, but does not, so always use mode1/2 for now. Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09drm/amdkfd: Reset GPU on queue preemption failureHarish Kasiviswanathan1-0/+1
Currently, with F32 HWS GPU reset is only when unmap queue fails. However, if compute queue doesn't repond to preemption request in time unmap will return without any error. In this case, only preemption error is logged and Reset is not triggered. Call GPU reset in this case also. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Harish Kasiviswanathan <[email protected]> Reviewed-by: Mukul Joshi <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-09ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addrJiri Benc2-3/+8
Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it still means hlist_for_each_entry_rcu can return an item that got removed from the list. The memory itself of such item is not freed thanks to RCU but nothing guarantees the actual content of the memory is sane. In particular, the reference count can be zero. This can happen if ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough timing, this can happen: 1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry. 2. Then, the whole ipv6_del_addr is executed for the given entry. The reference count drops to zero and kfree_rcu is scheduled. 3. ipv6_get_ifaddr continues and tries to increments the reference count (in6_ifa_hold). 4. The rcu is unlocked and the entry is freed. 5. The freed entry is returned. Prevent increasing of the reference count in such case. The name in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe. [ 41.506330] refcount_t: addition on 0; use-after-free. [ 41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130 [ 41.507413] Modules linked in: veth bridge stp llc [ 41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14 [ 41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) [ 41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130 [ 41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff [ 41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282 [ 41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000 [ 41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900 [ 41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff [ 41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000 [ 41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48 [ 41.514086] FS: 00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000 [ 41.514726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0 [ 41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 41.516799] Call Trace: [ 41.517037] <TASK> [ 41.517249] ? __warn+0x7b/0x120 [ 41.517535] ? refcount_warn_saturate+0xa5/0x130 [ 41.517923] ? report_bug+0x164/0x190 [ 41.518240] ? handle_bug+0x3d/0x70 [ 41.518541] ? exc_invalid_op+0x17/0x70 [ 41.520972] ? asm_exc_invalid_op+0x1a/0x20 [ 41.521325] ? refcount_warn_saturate+0xa5/0x130 [ 41.521708] ipv6_get_ifaddr+0xda/0xe0 [ 41.522035] inet6_rtm_getaddr+0x342/0x3f0 [ 41.522376] ? __pfx_inet6_rtm_getaddr+0x10/0x10 [ 41.522758] rtnetlink_rcv_msg+0x334/0x3d0 [ 41.523102] ? netlink_unicast+0x30f/0x390 [ 41.523445] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 41.523832] netlink_rcv_skb+0x53/0x100 [ 41.524157] netlink_unicast+0x23b/0x390 [ 41.524484] netlink_sendmsg+0x1f2/0x440 [ 41.524826] __sys_sendto+0x1d8/0x1f0 [ 41.525145] __x64_sys_sendto+0x1f/0x30 [ 41.525467] do_syscall_64+0xa5/0x1b0 [ 41.525794] entry_SYSCALL_64_after_hwframe+0x72/0x7a [ 41.526213] RIP: 0033:0x7fbc4cfcea9a [ 41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 [ 41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a [ 41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005 [ 41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c [ 41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b [ 41.531573] </TASK> Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU") Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jiri Benc <[email protected]> Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-09Merge branch 'net-start-to-replace-copy_from_sockptr'Jakub Kicinski3-11/+36
Eric Dumazet says: ==================== net: start to replace copy_from_sockptr() We got several syzbot reports about unsafe copy_from_sockptr() calls. After fixing some of them, it appears that we could use a new helper to factorize all the checks in one place. This series targets net tree, we can later start converting many call sites in net-next. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-09nfc: llcp: fix nfc_llcp_setsockopt() unsafe copiesEric Dumazet1-6/+6
syzbot reported unsafe calls to copy_from_sockptr() [1] Use copy_safe_from_sockptr() instead. [1] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255 Read of size 4 at addr ffff88801caa1ec3 by task syz-executor459/5078 CPU: 0 PID: 5078 Comm: syz-executor459 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] copy_from_sockptr include/linux/sockptr.h:55 [inline] nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255 do_sock_setsockopt+0x3b1/0x720 net/socket.c:2311 __sys_setsockopt+0x1ae/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340 do_syscall_64+0xfd/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7f7fac07fd89 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff660eb788 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f7fac07fd89 RDX: 0000000000000000 RSI: 0000000000000118 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000020000a80 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-09mISDN: fix MISDN_TIME_STAMP handlingEric Dumazet1-5/+5
syzbot reports one unsafe call to copy_from_sockptr() [1] Use copy_safe_from_sockptr() instead. [1] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417 Read of size 4 at addr ffff0000c6d54083 by task syz-executor406/6167 CPU: 1 PID: 6167 Comm: syz-executor406 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call trace: dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0x178/0x518 mm/kasan/report.c:488 kasan_report+0xd8/0x138 mm/kasan/report.c:601 __asan_report_load_n_noabort+0x1c/0x28 mm/kasan/report_generic.c:391 copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] copy_from_sockptr include/linux/sockptr.h:55 [inline] data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417 do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2311 __sys_setsockopt+0x128/0x1a8 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2340 __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:48 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:133 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:152 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Fixes: 1b2b03f8e514 ("Add mISDN core files") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Cc: Karsten Keil <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-09net: add copy_safe_from_sockptr() helperEric Dumazet1-0/+25
copy_from_sockptr() helper is unsafe, unless callers did the prior check against user provided optlen. Too many callers get this wrong, lets add a helper to fix them and avoid future copy/paste bugs. Instead of : if (optlen < sizeof(opt)) { err = -EINVAL; break; } if (copy_from_sockptr(&opt, optval, sizeof(opt)) { err = -EFAULT; break; } Use : err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break; Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-09bcachefs: btree_node_scan: Respect member.data_allowedKent Overstreet1-0/+3
If a device wasn't used for btree nodes, no need to scan for them. Signed-off-by: Kent Overstreet <[email protected]>
2024-04-10zonefs: Use str_plural() to fix Coccinelle warningThorsten Blum1-1/+1
Fixes the following Coccinelle/coccicheck warning reported by string_choices.cocci: opportunity for str_plural(zgroup->g_nr_zones) Signed-off-by: Thorsten Blum <[email protected]> Signed-off-by: Damien Le Moal <[email protected]>
2024-04-09io-uring: correct typo in comment for IOU_F_TWQ_LAZY_WAKEHaiyue Wang1-1/+1
The 'r' key is near to 't' key, that makes 'with' to be 'wirh' ? :) Signed-off-by: Haiyue Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-04-09tools/power/turbostat: Fix uncore frequency file stringJustin Ernst1-1/+1
Running turbostat on a 16 socket HPE Scale-up Compute 3200 (SapphireRapids) fails with: turbostat: /sys/devices/system/cpu/intel_uncore_frequency/package_010_die_00/current_freq_khz: open failed: No such file or directory We observe the sysfs uncore frequency directories named: ... package_09_die_00/ package_10_die_00/ package_11_die_00/ ... package_15_die_00/ The culprit is an incorrect sprintf format string "package_0%d_die_0%d" used with each instance of reading uncore frequency files. uncore-frequency-common.c creates the sysfs directory with the format "package_%02d_die_%02d". Once the package value reaches double digits, the formats diverge. Change each instance of "package_0%d_die_0%d" to "package_%02d_die_%02d". [lenb: deleted the probe part of this patch, as it was already fixed] Signed-off-by: Justin Ernst <[email protected]> Reviewed-by: Thomas Renninger <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-09tools/power/turbostat: Unify graphics sysfs snapshotsZhang Rui1-75/+34
Graphics sysfs snapshots share similar logic. Combine them into one function to avoid code duplication. No functional change. Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-09tools/power/turbostat: Cache graphics sysfs pathZhang Rui1-13/+32
Graphics drivers (i915/Xe) have different sysfs knobs on different platforms, and it is possible that different sysfs knobs fit into the same turbostat columns. Instead of specifying different sysfs knobs every time, detect them once and cache the path for future use. No functional change. Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-09tools/power/turbostat: Enable MSR_CORE_C1_RES support for ICXZhang Rui1-0/+1
Enable Core C1 hardware residency counter (MSR_CORE_C1_RES) on ICX. Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-09tools/power turbostat: Add selftestsPatryk Wlazlyn1-0/+59
Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>