aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-10watchdog: iTCO_wdt: Convert comma to semicolonChen Ni1-2/+2
Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2024-09-10watchdog: Add Watchdog Timer driver for RZ/V2H(P)Lad Prabhakar3-0/+282
Add Watchdog Timer driver for RZ/V2H(P) SoC. Signed-off-by: Lad Prabhakar <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2024-09-10dt-bindings: watchdog: renesas,wdt: Document RZ/V2H(P) SoCLad Prabhakar1-1/+16
Add support for the Watchdog Timer (WDT) hardware found in the Renesas RZ/V2H(P) SoC to the `renesas,wdt` device tree bindings. Signed-off-by: Lad Prabhakar <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2024-09-10watchdog: imx_sc_wdt: detect if already runningAlexander Sverdlin1-0/+22
Firmware (SC) WDT can be already enabled in U-Boot. Detect this case and make CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED functional by setting WDOG_HW_RUNNING. Signed-off-by: Alexander Sverdlin <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2024-09-10watchdog: imx2_wdt: Remove __maybe_unused notationsFabio Estevam1-5/+5
Use the DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr() macros to handle the .suspend/.resume callbacks. These macros allow the suspend and resume functions to be automatically dropped by the compiler when CONFIG_SUSPEND is disabled, without having to use __maybe_unused notation. Signed-off-by: Fabio Estevam <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2024-09-10watchdog: imx_sc_wdt: Don't disable WDT in suspendJonas Blixt1-24/+0
Parts of the suspend and resume chain is left unprotected if we disable the WDT here. >From experiments we can see that the SCU disables and re-enables the WDT when we enter and leave suspend to ram. By not touching the WDT here we are protected by the WDT all the way to the SCU. Signed-off-by: Jonas Blixt <[email protected]> CC: Anson Huang <[email protected]> Fixes: 986857acbc9a ("watchdog: imx_sc: Add i.MX system controller watchdog support") Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2024-09-10watchdog: imx7ulp_wdt: move post_rcs_wait into struct imx_wdt_hw_featureFrank Li1-12/+9
Move post_rcs_wait into struct imx_wdt_hw_feature to simple code logic for difference compatible string. i.MX93 watchdog needn't wait 2.5 clocks after RCS is done. So needn't set post_rcs_wait. Signed-off-by: Frank Li <[email protected]> Signed-off-by: Alice Guo <[email protected]> Reviewed-by: Ye Li <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2024-09-10Merge branch 'net-lan966x-use-the-newly-introduced-fdma-library'Paolo Abeni4-305/+164
Daniel Machon says: ==================== net: lan966x: use the newly introduced FDMA library This patch series is the second of a 2-part series [1], that adds a new common FDMA library for Microchip switch chips Sparx5 and lan966x. These chips share the same FDMA engine, and as such will benefit from a common library with a common implementation. This also has the benefit of removing a lot of open-coded bookkeeping and duplicate code for the two drivers. In this second series, the FDMA library will be taken into use by the lan966x switch driver. ################### # Example of use: # ################### - Initialize the rx and tx fdma structs with values for: number of DCB's, number of DB's, channel ID, DB size (data buffer size), and total size of the requested memory. Also provide two callbacks: nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr. - Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent(). - Initialize the DCB's with fdma_dcb_init(). - Add new DCB's with fdma_dcb_add(). - Free memory with fdma_free_phys() or fdma_free_coherent(). ##################### # Patch breakdown: # ##################### Patch #1: select FDMA library for lan966x. Patch #2: includes the fdma_api.h header and removes old symbols. Patch #3: replaces old rx and tx variables with equivalent ones from the fdma struct. Only the variables that can be changed without breaking traffic is changed in this patch. Patch #4: uses the library for allocation of rx buffers. This requires quite a bit of refactoring in this single patch. Patch #5: uses the library for adding DCB's in the rx path. Patch #6: uses the library for freeing rx buffers. Patch #7: uses the library for allocation of tx buffers. This requires quite a bit of refactoring in this single patch. Patch #8: uses the library for adding DCB's in the tx path. Patch #9: uses the library helpers in the tx path. Patch #10: ditch last_in_use variable and use library instead. Patch #11: uses library helpers throughout. Patch #12: refactor lan966x_fdma_reload() function. [1] https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Daniel Machon <[email protected]> ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: refactor buffer reload functionDaniel Machon1-10/+4
Now that we store everything in the fdma structs, refactor lan966x_fdma_reload() to store and restore the entire struct. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: use a few FDMA helpers throughoutDaniel Machon1-31/+11
The library provides helpers for a number of DCB and DB operations. Use these throughout the code and remove the old ones. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: ditch tx->last_in_use variableDaniel Machon2-18/+4
This variable is used in the tx path to determine the last used DCB. The library has the variable last_dcb for the exact same purpose. Ditch the last_in_use variable throughout. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: use library helper for freeing tx buffersDaniel Machon1-6/+1
The library has the helper fdma_free_phys() for freeing physical FDMA memory. Use it in the exit path. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: use FDMA library for adding DCB's in the tx pathDaniel Machon1-32/+30
Use the fdma_dcb_add() function to add DCB's in the tx path. This gets rid of the open-coding of nextptr and dataptr handling and leaves it to the library. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: use the FDMA library for allocation of tx buffersDaniel Machon2-57/+34
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx buffer allocation and use the new buffers throughout. In order to replace the old buffers with the new ones, we have to do the following refactoring: - use fdma_alloc_phys() and fdma_dcb_init() - replace the variables: tx->dma, tx->dcbs and tx->curr_entry with the equivalents from the FDMA struct. - add lan966x_fdma_tx_dataptr_cb callback for obtaining the dataptr. - Initialize FDMA struct values. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: use library helper for freeing rx buffersDaniel Machon1-14/+2
The library has the helper fdma_free_phys() for freeing physical FDMA memory. Use it in the exit path. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: use FDMA library for adding DCB's in the rx pathDaniel Machon1-49/+5
Use the fdma_dcb_add() function to add DCB's in the rx path. This gets rid of the open-coding of nextptr and dataptr handling and the functions for adding DCB's. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: use the FDMA library for allocation of rx buffersDaniel Machon2-76/+55
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx buffer allocation and use the new buffers throughout. In order to replace the old buffers with the new ones, we have to do the following refactoring: - use fdma_alloc_phys() and fdma_dcb_init() - replace the variables: rx->dma, rx->dcbs and rx->last_entry with the equivalents from the FDMA struct. - make use of fdma->db_size for rx buffer size. - add lan966x_fdma_rx_dataptr_cb callback for obtaining the dataptr. - Initialize FDMA struct values. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: replace a few variables with new equivalent onesDaniel Machon2-69/+81
Replace the old rx and tx variables: channel_id, FDMA_DCB_MAX, FDMA_RX_DCB_MAX_DBS, FDMA_TX_DCB_MAX_DBS, dcb_index and db_index with the equivalents from the FDMA rx and tx structs. These variables are not entangled in any buffer allocation and can therefore be replaced in advance. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: use FDMA library symbolsDaniel Machon2-9/+2
Include and use the new FDMA header, which now provides the required masks and bit offsets for operating on the DCB's and DB's. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10net: lan966x: select FDMA libraryDaniel Machon1-0/+1
Select the newly introduced FDMA library. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-09-10i2c: keba: Add KEBA I2C controller supportGerhard Engleder3-0/+610
The KEBA I2C controller is found in the system FPGA of KEBA PLC devices. It is used to connect EEPROMs and hardware monitoring chips. The It is a simple I2C controller with a fixed bus speed of 100 kbit/s. The whole message transmission is executed by the driver. The driver triggers all steps over control, status and data register. There are no FIFOs or interrupts. Signed-off-by: Gerhard Engleder <[email protected]> Signed-off-by: Andi Shyti <[email protected]>
2024-09-10Merge tag 'thermal-v6.12-rc1' of ↵Rafael J. Wysocki13-67/+45
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux into Merge thermal drivers changes for v6.12-rc1 from Daniel Lezcano: "- Add power domain DT bindings for new Amlogic SoCs (Georges Stark) - Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr() in the ST driver and add a Kconfig dependency on THERMAL_OF subsystem for the STi driver (Raphael Gallais-Pou) - Simplify with dev_err_probe() the error code path in the probe functions for the brcmstb driver (Yan Zhen) - Remove trailing space after \n newline in the Renesas driver (Colin Ian King) - Add DT binding compatible string for the SA8255p with the tsens driver (Nikunj Kela) - Use the devm_clk_get_enabled() helpers to simplify the init routine in the sprd driver (Huan Yang) - Remove __maybe_unused notations for the functions by using the new RUNTIME_PM_OPS() and SYSTEM_SLEEP_PM_OPS() macros on the IMx and Qoriq drivers (Fabio Estevam) - Remove unused declarations in the header file as the functions were removed in a previous change on the ti-soc-thermal driver (Zhang Zekun) - Simplify with dev_err_probe() the error code path in the probe functions for the imx_sc_thermal driver (Alexander Stein)" * tag 'thermal-v6.12-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal/drivers/imx_sc_thermal: Use dev_err_probe thermal/drivers/ti-soc-thermal: Remove unused declarations thermal/drivers/imx: Remove __maybe_unused notations thermal/drivers/qoriq: Remove __maybe_unused notations thermal/drivers/sprd: Use devm_clk_get_enabled() helpers dt-bindings: thermal: tsens: document support on SA8255p thermal/drivers/renesas: Remove trailing space after \n newline thermal/drivers/brcmstb_thermal: Simplify with dev_err_probe() thermal/drivers/sti: Depend on THERMAL_OF subsystem thermal/drivers/st: Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr() dt-bindings: thermal: amlogic,thermal: add optional power-domains
2024-09-10ALSA: hda/realtek: Refactor and simplify Samsung Galaxy Book initJoshua Grisham2-317/+144
I have done a lot of analysis for these type of devices and collaborated quite a bit with Nick Weihs (author of the first patch submitted for this including adding samsung_helper.c). More information can be found in the issue on Github [1] including additional rationale and testing. The existing implementation includes a large number of equalizer coef values that are not necessary to actually init and enable the speaker amps, as well as create a somewhat worse sound profile. Users have reported "muffled" or "muddy" sound; more information about this including my analysis of the differences can be found in the linked Github issue. This patch refactors the "v2" version of ALC298_FIXUP_SAMSUNG_AMP to a much simpler implementation which removes the new samsung_helper.c, reuses more of the existing patch_realtek.c, and sends significantly fewer unnecessary coef values (including removing all of these EQ-specific coef values). A pcm_playback_hook is used to dynamically enable and disable the speaker amps only when there will be audio playback; this is to match the behavior of how the driver for these devices is working in Windows, and is suspected but not yet tested or confirmed to help with power consumption. Support for models with 2 speaker amps vs 4 speaker amps is controlled by a specific quirk name for both types. A new int num_speaker_amps has been added to alc_spec so that the hooks can know how many speaker amps to enable or disable. This design was chosen to limit the number of places that subsystem ids will need to be maintained: like this, they can be maintained only once in the quirk table and there will not be another separate list of subsystem ids to maintain elsewhere in the code. Also updated the quirk name from ALC298_FIXUP_SAMSUNG_AMP2 to ALC298_FIXUP_SAMSUNG_AMP_V2_.. as this is not a quirk for "Amp #2" on ALC298 but is instead a different version of how to handle it. More devices have been added (see Github issue for testing confirmation), as well as a small cleanup to existing names. [1]: https://github.com/thesofproject/linux/issues/4055#issuecomment-2323411911 Signed-off-by: Joshua Grisham <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2024-09-10ALSA: hda/realtek: Enable mic on Vaio VJFH52Edson Juliano Drosdeck1-0/+12
Vaio VJFH52 is equipped with ACL256, and needs a fix to make the internal mic and headphone mic to work. Also must to limits the internal microphone boost. Signed-off-by: Edson Juliano Drosdeck <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2024-09-10Merge branch 'for-linus' into for-nextTakashi Iwai24-91/+130
Signed-off-by: Takashi Iwai <[email protected]>
2024-09-10xen: move max_pfn in xen_memory_setup() out of function scopeJuergen Gross1-26/+26
Instead of having max_pfn as a local variable of xen_memory_setup(), make it a static variable in setup.c instead. This avoids having to pass it to subfunctions, which will be needed in more cases in future. Rename it to ini_nr_pages, as the value denotes the currently usable number of memory pages as passed from the hypervisor at boot time. Signed-off-by: Juergen Gross <[email protected]> Tested-by: Marek Marczykowski-Górecki <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2024-09-10xen: move checks for e820 conflicts further upJuergen Gross1-22/+22
Move the checks for e820 memory map conflicts using the xen_chk_is_e820_usable() helper further up in order to prepare resolving some of the possible conflicts by doing some e820 map modifications, which must happen before evaluating the RAM layout. Signed-off-by: Juergen Gross <[email protected]> Tested-by: Marek Marczykowski-Górecki <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2024-09-10xen: introduce generic helper checking for memory map conflictsJuergen Gross3-11/+31
When booting as a Xen PV dom0 the memory layout of the dom0 is modified to match that of the host, as this requires less changes in the kernel for supporting Xen. There are some cases, though, which are problematic, as it is the Xen hypervisor selecting the kernel's load address plus some other data, which might conflict with the host's memory map. These conflicts are detected at boot time and result in a boot error. In order to support handling at least some of these conflicts in future, introduce a generic helper function which will later gain the ability to adapt the memory layout when possible. Add the missing check for the xen_start_info area. Note that possible p2m map and initrd memory conflicts are handled already by copying the data to memory areas not conflicting with the memory map. The initial stack allocated by Xen doesn't need to be checked, as early boot code is switching to the statically allocated initial kernel stack. Initial page tables and the kernel itself will be handled later. Signed-off-by: Juergen Gross <[email protected]> Tested-by: Marek Marczykowski-Górecki <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2024-09-10xen: use correct end address of kernel for conflict checkingJuergen Gross1-1/+1
When running as a Xen PV dom0 the kernel is loaded by the hypervisor using a different memory map than that of the host. In order to minimize the required changes in the kernel, the kernel adapts its memory map to that of the host. In order to do that it is checking for conflicts of its load address with the host memory map. Unfortunately the tested memory range does not include the .brk area, which might result in crashes or memory corruption when this area does conflict with the memory map of the host. Fix the test by using the _end label instead of __bss_stop. Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout") Signed-off-by: Juergen Gross <[email protected]> Tested-by: Marek Marczykowski-Górecki <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2024-09-10sched/debug: Fix the runnable tasks outputHuang Shijie1-8/+23
The current runnable tasks output looks like: runnable tasks: S task PID tree-key switches prio wait-time sum-exec sum-sleep ------------------------------------------------------------------------------------------------------------- Ikworker/R-rcu_g 4 0.129049 E 0.620179 0.750000 0.002920 2 100 0.000000 0.002920 0.000000 0.000000 0 0 / Ikworker/R-sync_ 5 0.125328 E 0.624147 0.750000 0.001840 2 100 0.000000 0.001840 0.000000 0.000000 0 0 / Ikworker/R-slub_ 6 0.120835 E 0.628680 0.750000 0.001800 2 100 0.000000 0.001800 0.000000 0.000000 0 0 / Ikworker/R-netns 7 0.114294 E 0.634701 0.750000 0.002400 2 100 0.000000 0.002400 0.000000 0.000000 0 0 / I kworker/0:1 9 508.781746 E 511.754666 3.000000 151.575240 224 120 0.000000 151.575240 0.000000 0.000000 0 0 / Which is messy. Remove the duplicate printing of sum_exec_runtime and tidy up the layout to make it look like: runnable tasks: S task PID vruntime eligible deadline slice sum-exec switches prio wait-time sum-sleep sum-block node group-id group-path ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- I kworker/0:3 1698 295.001459 E 297.977619 3.000000 38.862920 9 120 0.000000 0.000000 0.000000 0 0 / I kworker/0:4 1702 278.026303 E 281.026303 3.000000 9.918760 3 120 0.000000 0.000000 0.000000 0 0 / S NetworkManager 2646 0.377936 E 2.598104 3.000000 98.535880 314 120 0.000000 0.000000 0.000000 0 0 /system.slice/NetworkManager.service S virtqemud 2689 0.541016 E 2.440104 3.000000 50.967960 80 120 0.000000 0.000000 0.000000 0 0 /system.slice/virtqemud.service S gsd-smartcard 3058 73.604144 E 76.475904 3.000000 74.033320 88 120 0.000000 0.000000 0.000000 0 0 /user.slice/user-42.slice/session-c1.scope Reviewed-by: Christoph Lameter (Ampere) <[email protected]> Signed-off-by: Huang Shijie <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-09-10sched: Fix sched_delayed vs sched_corePeter Zijlstra1-0/+6
Completely analogous to commit dfa0a574cbc4 ("sched/uclamg: Handle delayed dequeue"), avoid double dequeue for the sched_core entries. Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue") Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
2024-09-10kernel/sched: Fix util_est accounting for DELAY_DEQUEUEDietmar Eggemann1-7/+9
Remove delayed tasks from util_est even they are runnable. Exclude delayed task which are (a) migrating between rq's or (b) in a SAVE/RESTORE dequeue/enqueue. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-09-10kthread: Fix task state in kthread worker if being frozenChen Yu1-1/+9
When analyzing a kernel waring message, Peter pointed out that there is a race condition when the kworker is being frozen and falls into try_to_freeze() with TASK_INTERRUPTIBLE, which could trigger a might_sleep() warning in try_to_freeze(). Although the root cause is not related to freeze()[1], it is still worthy to fix this issue ahead. One possible race scenario: CPU 0 CPU 1 ----- ----- // kthread_worker_fn set_current_state(TASK_INTERRUPTIBLE); suspend_freeze_processes() freeze_processes static_branch_inc(&freezer_active); freeze_kernel_threads pm_nosig_freezing = true; if (work) { //false __set_current_state(TASK_RUNNING); } else if (!freezing(current)) //false, been frozen freezing(): if (static_branch_unlikely(&freezer_active)) if (pm_nosig_freezing) return true; schedule() } // state is still TASK_INTERRUPTIBLE try_to_freeze() might_sleep() <--- warning Fix this by explicitly set the TASK_RUNNING before entering try_to_freeze(). Fixes: b56c0d8937e6 ("kthread: implement kthread_worker") Suggested-by: Peter Zijlstra <[email protected]> Suggested-by: Andrew Morton <[email protected]> Signed-off-by: Chen Yu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/lkml/Zs2ZoAcUsZMX2B%2FI@chenyu5-mobl2/ [1]
2024-09-10sched/pelt: Use rq_clock_task() for hw_pressureChen Yu1-1/+2
commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock") removed the decay_shift for hw_pressure. This commit uses the sched_clock_task() in sched_tick() while it replaces the sched_clock_task() with rq_clock_pelt() in __update_blocked_others(). This could bring inconsistence. One possible scenario I can think of is in ___update_load_sum(): u64 delta = now - sa->last_update_time 'now' could be calculated by rq_clock_pelt() from __update_blocked_others(), and last_update_time was calculated by rq_clock_task() previously from sched_tick(). Usually the former chases after the latter, it cause a very large 'delta' and brings unexpected behavior. Fixes: 97450eb90965 ("sched/pelt: Remove shift of thermal clock") Signed-off-by: Chen Yu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Hongyan Xia <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-09-10sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.cVincent Guittot2-101/+99
Move effective_cpu_util() and sched_cpu_util() functions in fair.c file with others utilization related functions. No functional change. Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-09-10sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()Peter Zijlstra1-19/+26
Since commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") an idle CPU in TIF_POLLING_NRFLAG mode can be pulled out of idle by setting TIF_NEED_RESCHED flag to service an IPI without actually sending an interrupt. Even in cases where the IPI handler does not queue a task on the idle CPU, do_idle() will call __schedule() since need_resched() returns true in these cases. Introduce and use SM_IDLE to identify call to __schedule() from schedule_idle() and shorten the idle re-entry time by skipping pick_next_task() when nr_running is 0 and the previous task is the idle task. With the SM_IDLE fast-path, the time taken to complete a fixed set of IPIs using ipistorm improves noticeably. Following are the numbers from a dual socket Intel Ice Lake Xeon server (2 x 32C/64T) and 3rd Generation AMD EPYC system (2 x 64C/128T) (boost on, C2 disabled) running ipistorm between CPU8 and CPU16: cmdline: insmod ipistorm.ko numipi=100000 single=1 offset=8 cpulist=8 wait=1 ================================================================== Test : ipistorm (modified) Units : Normalized runtime Interpretation: Lower is better Statistic : AMean ======================= Intel Ice Lake Xeon ====================== kernel: time [pct imp] tip:sched/core 1.00 [baseline] tip:sched/core + SM_IDLE 0.80 [20.51%] ==================== 3rd Generation AMD EPYC ===================== kernel: time [pct imp] tip:sched/core 1.00 [baseline] tip:sched/core + SM_IDLE 0.90 [10.17%] ================================================================== [ kprateek: Commit message, SM_RTLOCK_WAIT fix ] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Not-yet-signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: K Prateek Nayak <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Vincent Guittot <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-09-10erofs: simplify erofs_map_blocks_flatmode()Hongzhen Luo1-19/+9
Get rid of redundant variables (nblocks, offset) and a dead branch (!tailendpacking). Signed-off-by: Hongzhen Luo <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2024-09-10erofs: refactor read_inode calling conventionYiyang Wu1-59/+52
Refactor out the iop binding behavior out of the erofs_fill_symlink and move erofs_buf into the erofs_read_inode, so that erofs_fill_inode can only deal with inode operation bindings and can be decoupled from metabuf operations. This results in better calling conventions. Note that after this patch, we do not need erofs_buf and ofs as parameters any more when calling erofs_read_inode as all the data operations are now included in itself. Suggested-by: Al Viro <[email protected]> Link: https://lore.kernel.org/all/20240425222847.GN2118490@ZenIV/ Signed-off-by: Yiyang Wu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2024-09-10erofs: use kmemdup_nul in erofs_fill_symlinkYiyang Wu1-8/+2
Remove open coding in erofs_fill_symlink. Suggested-by: Al Viro <[email protected]> Link: https://lore.kernel.org/all/20240425222847.GN2118490@ZenIV Signed-off-by: Yiyang Wu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]>
2024-09-10erofs: mark experimental fscache backend deprecatedGao Xiang2-2/+5
Although fscache is still described as "General Filesystem Caching" for network filesystems and other things such as ISO9660 filesystems, it has actually become a part of netfslib recently, which was unexpected at the time when "EROFS over fscache" proposed (2021) since EROFS is entirely a disk filesystem and the dependency is redundant. Mark it deprecated and it will be removed after "fanotify pre-content hooks" lands, which will provide the same functionality for EROFS. Reviewed-by: Sandeep Dhavale <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-09-10erofs: support compressed inodes for fileioGao Xiang4-20/+43
Use pseudo bios just like the previous fscache approach since merged bio_vecs can be filled properly with unique interfaces. Reviewed-by: Sandeep Dhavale <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-09-10erofs: support unencoded inodes for fileioGao Xiang6-51/+248
Since EROFS only needs to handle read requests in simple contexts, Just directly use vfs_iocb_iter_read() for data I/Os. Reviewed-by: Sandeep Dhavale <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-09-10erofs: add file-backed mount supportGao Xiang5-44/+100
It actually has been around for years: For containers and other sandbox use cases, there will be thousands (and even more) of authenticated (sub)images running on the same host, unlike OS images. Of course, all scenarios can use the same EROFS on-disk format, but bdev-backed mounts just work well for OS images since golden data is dumped into real block devices. However, it's somewhat hard for container runtimes to manage and isolate so many unnecessary virtual block devices safely and efficiently [1]: they just look like a burden to orchestrators and file-backed mounts are preferred indeed. There were already enough attempts such as Incremental FS, the original ComposeFS and PuzzleFS acting in the same way for immutable fses. As for current EROFS users, ComposeFS, containerd and Android APEXs will be directly benefited from it. On the other hand, previous experimental feature "erofs over fscache" was once also intended to provide a similar solution (inspired by Incremental FS discussion [2]), but the following facts show file-backed mounts will be a better approach: - Fscache infrastructure has recently been moved into new Netfslib which is an unexpected dependency to EROFS really, although it originally claims "it could be used for caching other things such as ISO9660 filesystems too." [3] - It takes an unexpectedly long time to upstream Fscache/Cachefiles enhancements. For example, the failover feature took more than one year, and the deamonless feature is still far behind now; - Ongoing HSM "fanotify pre-content hooks" [4] together with this will perfectly supersede "erofs over fscache" in a simpler way since developers (mainly containerd folks) could leverage their existing caching mechanism entirely in userspace instead of strictly following the predefined in-kernel caching tree hierarchy. After "fanotify pre-content hooks" lands upstream to provide the same functionality, "erofs over fscache" will be removed then (as an EROFS internal improvement and EROFS will not have to bother with on-demand fetching and/or caching improvements anymore.) [1] https://github.com/containers/storage/pull/2039 [2] https://lore.kernel.org/r/CAOQ4uxjbVxnubaPjVaGYiSwoGDTdpWbB=w_AeM6YM=zVixsUfQ@mail.gmail.com [3] https://docs.kernel.org/filesystems/caching/fscache.html [4] https://lore.kernel.org/r/[email protected] Closes: https://github.com/containers/composefs/issues/144 Reviewed-by: Sandeep Dhavale <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-09-10erofs: handle overlapped pclusters out of crafted images properlyGao Xiang1-33/+38
syzbot reported a task hang issue due to a deadlock case where it is waiting for the folio lock of a cached folio that will be used for cache I/Os. After looking into the crafted fuzzed image, I found it's formed with several overlapped big pclusters as below: Ext: logical offset | length : physical offset | length 0: 0.. 16384 | 16384 : 151552.. 167936 | 16384 1: 16384.. 32768 | 16384 : 155648.. 172032 | 16384 2: 32768.. 49152 | 16384 : 537223168.. 537239552 | 16384 ... Here, extent 0/1 are physically overlapped although it's entirely _impossible_ for normal filesystem images generated by mkfs. First, managed folios containing compressed data will be marked as up-to-date and then unlocked immediately (unlike in-place folios) when compressed I/Os are complete. If physical blocks are not submitted in the incremental order, there should be separate BIOs to avoid dependency issues. However, the current code mis-arranges z_erofs_fill_bio_vec() and BIO submission which causes unexpected BIO waits. Second, managed folios will be connected to their own pclusters for efficient inter-queries. However, this is somewhat hard to implement easily if overlapped big pclusters exist. Again, these only appear in fuzzed images so let's simply fall back to temporary short-lived pages for correctness. Additionally, it justifies that referenced managed folios cannot be truncated for now and reverts part of commit 2080ca1ed3e4 ("erofs: tidy up `struct z_erofs_bvec`") for simplicity although it shouldn't be any difference. Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Tested-by: [email protected] Closes: https://lore.kernel.org/r/[email protected] Fixes: 8e6c8fa9f2e9 ("erofs: enable big pcluster feature") Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-09-10drm/i915/guc: prevent a possible int overflow in wq offsetsNikita Zhandarovich1-2/+2
It may be possible for the sum of the values derived from i915_ggtt_offset() and __get_parent_scratch_offset()/ i915_ggtt_offset() to go over the u32 limit before being assigned to wq offsets of u64 type. Mitigate these issues by expanding one of the right operands to u64 to avoid any overflow issues just in case. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: c2aa552ff09d ("drm/i915/guc: Add multi-lrc context registration") Cc: Matthew Brost <[email protected]> Cc: John Harrison <[email protected]> Signed-off-by: Nikita Zhandarovich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Rodrigo Vivi <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> (cherry picked from commit 1f1c1bd56620b80ae407c5790743e17caad69cec) Signed-off-by: Tvrtko Ursulin <[email protected]>
2024-09-10vdpa/mlx5: Add the support of set mac addressCindy Lu1-0/+28
Add the function to support setting the MAC address. For vdpa/mlx5, the function will use mlx5_mpfs_add_mac to set the mac address Tested in ConnectX-6 Dx device Signed-off-by: Cindy Lu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2024-09-10vdpa_sim_net: Add the support of set mac addressCindy Lu1-1/+20
Add the function to support setting the MAC address. For vdpa_sim_net, the driver will write the MAC address to the config space, and other devices can implement their own functions to support this. Signed-off-by: Cindy Lu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2024-09-10vdpa: support set mac address from vdpa toolCindy Lu3-0/+89
Add new UAPI to support the mac address from vdpa tool Function vdpa_nl_cmd_dev_attr_set_doit() will get the new MAC address from the vdpa tool and then set it to the device. The usage is: vdpa dev set name vdpa_name mac **:**:**:**:**:** Here is example: root@L1# vdpa -jp dev config show vdpa0 { "config": { "vdpa0": { "mac": "82:4d:e9:5d:d7:e6", "link ": "up", "link_announce ": false, "mtu": 1500 } } } root@L1# vdpa dev set name vdpa0 mac 00:11:22:33:44:55 root@L1# vdpa -jp dev config show vdpa0 { "config": { "vdpa0": { "mac": "00:11:22:33:44:55", "link ": "up", "link_announce ": false, "mtu": 1500 } } } Signed-off-by: Cindy Lu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2024-09-10tools/virtio:Fix the wrong format specifierZhu Jun1-1/+1
The unsigned int should use "%u" instead of "%d". Signed-off-by: Zhu Jun <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Eugenio Pérez <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]>
2024-09-10virtio_balloon: introduce memory scan/reclaim infozhenwei pi2-2/+19
Expose memory scan/reclaim information to the host side via virtio balloon device. Now we have a metric to analyze the memory performance: y: counter increases n: counter does not changes h: the rate of counter change is high l: the rate of counter change is low OOM: VIRTIO_BALLOON_S_OOM_KILL STALL: VIRTIO_BALLOON_S_ALLOC_STALL ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]: the guest runs under really critial memory pressure - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]: the memory allocation stalls due to cgroup, not the global memory pressure. - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]: the memory allocation stalls due to global memory pressure. The performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows quite effective memory reclaiming. - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]: the memory allocation stalls due to global memory pressure. the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing heavily, the serious case leads poor performance and difficult trouble shooting. Ex, sshd may block on memory allocation when accepting new connections, a user can't login a VM by ssh command. - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]: the low ratio between ARCLM/ASCAN shows that the guest tries to reclaim more memory, but it can't. Once more memory is required in future, it will struggle to reclaim memory. Acked-by: David Hildenbrand <[email protected]> Signed-off-by: zhenwei pi <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>