aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-19KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like informationMarc Zyngier1-1/+84
In order to be able to make S2 TLB invalidations more performant on NV, let's use a scheme derived from the FEAT_TTL extension. If bits [56:55] in the leaf descriptor translating the address in the corresponding shadow S2 are non-zero, they indicate a level which can be used as an invalidation range. This allows further reduction of the systematic over-invalidation that takes place otherwise. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 levelMarc Zyngier2-2/+25
Populate bits [56:55] of the leaf entry with the level provided by the guest's S2 translation. This will allow us to better scope the invalidation by remembering the mapping size. Of course, this assume that the guest will issue an invalidation with an address that falls into the same leaf. If the guest doesn't, we'll over-invalidate. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Handle FEAT_TTL hinted TLB operationsMarc Zyngier3-23/+92
Support guest-provided information information to size the range of required invalidation. This helps with reducing over-invalidation, provided that the guest actually provides accurate information. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operationsMarc Zyngier1-0/+96
TLBI IPAS2E1* are the last class of TLBI instructions we need to handle. For each matching S2 MMU context, we invalidate a range corresponding to the largest possible mapping for that context. At this stage, we don't handle TTL, which means we are likely over-invalidating. Further patches will aim at making this a bit better. Co-developed-by: Jintack Lim <[email protected]> Co-developed-by: Christoffer Dall <[email protected]> Signed-off-by: Jintack Lim <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Handle TLBI ALLE1{,IS} operationsMarc Zyngier1-0/+25
TLBI ALLE1* is a pretty big hammer that invalides all S1/S2 TLBs. This translates into the unmapping of all our shadow S2 PTs, itself resulting in the corresponding TLB invalidations. Co-developed-by: Jintack Lim <[email protected]> Co-developed-by: Christoffer Dall <[email protected]> Signed-off-by: Jintack Lim <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operationsMarc Zyngier1-0/+51
Emulating TLBI VMALLS12E1* results in tearing down all the shadow S2 PTs that match the current VMID, since our shadow S2s are just some form of SW-managed TLBs. That teardown itself results in a full TLB invalidation for both S1 and S2. This can result in over-invalidation if two vcpus use the same VMID to tag private S2 PTs, but this is still correct from an architecture perspective. Co-developed-by: Jintack Lim <[email protected]> Co-developed-by: Christoffer Dall <[email protected]> Signed-off-by: Jintack Lim <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1Marc Zyngier3-0/+122
While dealing with TLB invalidation targeting the guest hypervisor's own stage-1 was easy, doing the same thing for its own guests is a bit more involved. Since such an invalidation is scoped by VMID, it needs to apply to all s2_mmu contexts that have been tagged by that VMID, irrespective of the value of VTTBR_EL2.BADDR. So for each s2_mmu context matching that VMID, we invalidate the corresponding TLBs, each context having its own "physical" VMID. Co-developed-by: Jintack Lim <[email protected]> Co-developed-by: Christoffer Dall <[email protected]> Signed-off-by: Jintack Lim <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidationMarc Zyngier3-1/+122
Due to the way FEAT_NV2 suppresses traps when accessing EL2 system registers, we can't track when the guest changes its HCR_EL2.TGE setting. This means we always trap EL1 TLBIs, even if they don't affect any L2 guest. Given that invalidating the EL2 TLBs doesn't require any messing with the shadow stage-2 page-tables, we can simply emulate the instructions early and return directly to the guest. This is conditioned on the instruction being an EL1 one and the guest's HCR_EL2.{E2H,TGE} being {1,1} (indicating that the instruction targets the EL2 S1 TLBs), or the instruction being one of the EL2 ones (which are not ambiguous). EL1 TLBIs issued with HCR_EL2.{E2H,TGE}={1,0} are not handled here, and cause a full exit so that they can be handled in the context of a VMID. Co-developed-by: Jintack Lim <[email protected]> Co-developed-by: Christoffer Dall <[email protected]> Signed-off-by: Jintack Lim <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Add Stage-1 EL2 invalidation primitivesMarc Zyngier2-0/+67
Provide the primitives required to handle TLB invalidation for Stage-1 EL2 TLBs, which by definition do not require messing with the Stage-2 page tables. Co-developed-by: Jintack Lim <[email protected]> Co-developed-by: Christoffer Dall <[email protected]> Signed-off-by: Jintack Lim <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Unmap/flush shadow stage 2 page tablesChristoffer Dall4-5/+70
Unmap/flush shadow stage 2 page tables for the nested VMs as well as the stage 2 page table for the guest hypervisor. Note: A bunch of the code in mmu.c relating to MMU notifiers is currently dealt with in an extremely abrupt way, for example by clearing out an entire shadow stage-2 table. This will be handled in a more efficient way using the reverse mapping feature in a later version of the patch series. Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Jintack Lim <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Handle shadow stage 2 page faultsMarc Zyngier3-9/+166
If we are faulting on a shadow stage 2 translation, we first walk the guest hypervisor's stage 2 page table to see if it has a mapping. If not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we create a mapping in the shadow stage 2 page table. Note that we have to deal with two IPAs when we got a shadow stage 2 page fault. One is the address we faulted on, and is in the L2 guest phys space. The other is from the guest stage-2 page table walk, and is in the L1 guest phys space. To differentiate them, we rename variables so that fault_ipa is used for the former and ipa is used for the latter. When mapping a page in a shadow stage-2, special care must be taken not to be more permissive than the guest is. Co-developed-by: Christoffer Dall <[email protected]> Co-developed-by: Jintack Lim <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Jintack Lim <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Implement nested Stage-2 page table walk logicChristoffer Dall3-0/+278
Based on the pseudo-code in the ARM ARM, implement a stage 2 software page table walker. Co-developed-by: Jintack Lim <[email protected]> Signed-off-by: Jintack Lim <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19KVM: arm64: nv: Support multiple nested Stage-2 mmu structuresMarc Zyngier7-21/+349
Add Stage-2 mmu data structures for virtual EL2 and for nested guests. We don't yet populate shadow Stage-2 page tables, but we now have a framework for getting to a shadow Stage-2 pgd. We allocate twice the number of vcpus as Stage-2 mmu structures because that's sufficient for each vcpu running two translation regimes without having to flush the Stage-2 page tables. Co-developed-by: Christoffer Dall <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-06-19mm/memblock: Add "reserve_mem" to reserved named memory at boot upSteven Rostedt (Google)3-0/+141
In order to allow for requesting a memory region that can be used for things like pstore on multiple machines where the memory layout is not the same, add a new option to the kernel command line called "reserve_mem". The format is: reserve_mem=nn:align:name Where it will find nn amount of memory at the given alignment of align. The name field is to allow another subsystem to retrieve where the memory was found. For example: reserve_mem=12M:4096:oops ramoops.mem_name=oops Where ramoops.mem_name will tell ramoops that memory was reserved for it via the reserve_mem option and it can find it by calling: if (reserve_mem_find_by_name("oops", &start, &size)) { // start holds the start address and size holds the size given This is typically used for systems that do not wipe the RAM, and this command line will try to reserve the same physical memory on soft reboots. Note, it is not guaranteed to be the same location. For example, if KASLR places the kernel at the location of where the RAM reservation was from a previous boot, the new reservation will be at a different location. Any subsystem using this feature must add a way to verify that the contents of the physical memory is from a previous boot, as there may be cases where the memory will not be located at the same location. Not all systems may work either. There could be bit flips if the reboot goes through the BIOS. Using kexec to reboot the machine is likely to have better results in such cases. Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Mike Rapoport <[email protected]> Tested-by: Guilherme G. Piccoli <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
2024-06-19intel_alpm: Fix wrong offset for PORT_ALPM_* registersJouni Högander2-4/+7
PORT_ALPM_* registers are using MMIO_TRANS2 macro. This is not correct as they are port register. Use _PORT_MMIO instead. Fixes: 4ee30a448255 ("drm/i915/alpm: Add ALPM register definitions") Signed-off-by: Jouni Högander <[email protected]> Reviewed-by: Animesh Manna <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-19Revert "drm/i915/psr: Disable early transport by default"Jouni Högander1-3/+0
This reverts commit f3c2031db7dfdf470a2d9bf3bd1efa6edfa72d8d. We want to notice possible issues faced with PSR2 Region Early Transport as early as possible -> let's revert patch disabling Region Early Transport by default. Also eDP 1.5 Panel Replay requires Early Transport. Signed-off-by: Jouni Högander <[email protected]> Reviewed-by: Animesh Manna <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-19drm/i915/psr: Add new debug bit to disable Panel ReplayJouni Högander2-3/+9
Add new debug bit to be used with i915_edp_psr_debug debugfs interface. This can be used to disable Panel Replay. v2: ensure that fastset is performed when the bit changes Signed-off-by: Jouni Högander <[email protected]> Reviewed-by: Animesh Manna <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-19drm/i915/psr: Disable PSR/Panel Replay on sink side for PSR onlyJouni Högander1-6/+6
Enabling/disabling Panel Replay on sink side has to be done before link training. We can't disable it in sink side on PSR disable. Fixes: 88ae6c65ecdb ("drm/i915/psr: Unify panel replay enable/disable sink") Signed-off-by: Jouni Högander <[email protected]> Reviewed-by: Animesh Manna <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-19drm/i915/psr: Disable PSR2 SU Region ET if enable_psr module parameter is setJouni Högander1-1/+12
Currently PSR2 SU Region Early Transport is enabled by default on Lunarlake if panel supports it despite enable_psr module parameter value. This patch makes it possible for user to limit used PSR mode and prevent SU Region Early Transport by setting enable_psr as 2. With default (-1) PSR2 SU Region Early Transport is allowed. v2: fix/improve commit desciption Signed-off-by: Jouni Högander <[email protected]> Reviewed-by: Animesh Manna <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-19drm/i915/psr: Disable Panel Replay if PSR mode is set via module parameterJouni Högander2-4/+26
If user is specifically limiting PSR mode to PSR1 or PSR2: disable Panel Replay. With default value -1 all modes are allowed including Panel Replay. Disabling PSR using value 0 disables Panel Replay as well. Also own compute config helper is added for Panel Replay. This makes sense because number of Panel Replay specific checks are increasing. v2: Squash adding Panel Replay compute config helper Signed-off-by: Jouni Högander <[email protected]> Reviewed-by: Animesh Manna <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-19drm/i915/alpm: Fix port clock usage in AUX Less wake time calculationJouni Högander1-1/+2
Port clock is link rate in 10 kbit/s units. Take this into account when calculating AUX Less wake time. Fixes: da6a9836ac09 ("drm/i915/psr: Calculate aux less wake time") Signed-off-by: Jouni Högander <[email protected]> Reviewed-by: Animesh Manna <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-19drm/i915/display: Wa 16021440873 is writing wrong registerJouni Högander2-11/+5
Wa 16021440873 is writing wrong register. Instead of PIPE_SRCSZ_ERLY_TPT write CURPOS_ERLY_TPT. v2: use right offset as well Fixes: 29cdef8539c3 ("drm/i915/display: Implement Wa_16021440873") Signed-off-by: Jouni Högander <[email protected]> Reviewed-by: Mika Kahola <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-19drm/i915/psr: Set SU area width as pipe src widthJouni Högander1-1/+1
Currently SU area width is set as MAX_INT. This is causing problems. Instead set it as pipe src width. Fixes: 86b26b6aeac7 ("drm/i915/psr: Carry su area in crtc_state") Signed-off-by: Jouni Högander <[email protected]> Reviewed-by: Mika Kahola <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-19thunderbolt: debugfs: Use FIELD_GET()Aapo Vienamo2-27/+11
Use the FIELD_GET() macro instead of open coding the masks and shifts. This makes the code more compact and improves readability as it avoids the need to wrap excessively long lines. Signed-off-by: Aapo Vienamo <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Yehezkel Bernat <[email protected]> Signed-off-by: Mika Westerberg <[email protected]>
2024-06-18cifs: drop the incorrect assertion in cifs_swap_rw()Barry Song1-2/+0
Since commit 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS swap-space"), we can plug multiple pages then unplug them all together. That means iov_iter_count(iter) could be way bigger than PAGE_SIZE, it actually equals the size of iov_iter_npages(iter, INT_MAX). Note this issue has nothing to do with large folios as we don't support THP_SWPOUT to non-block devices. Fixes: 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS swap-space") Reported-by: Christoph Hellwig <[email protected]> Closes: https://lore.kernel.org/linux-mm/[email protected]/ Cc: NeilBrown <[email protected]> Cc: Anna Schumaker <[email protected]> Cc: Steve French <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Chuanhua Han <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Chris Li <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Paulo Alcantara <[email protected]> Cc: Ronnie Sahlberg <[email protected]> Cc: Shyam Prasad N <[email protected]> Cc: Tom Talpey <[email protected]> Cc: Bharath SM <[email protected]> Cc: <[email protected]> Signed-off-by: Barry Song <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-06-19cpufreq: sun50i: add Allwinner H700 speed binRyan Walklin1-0/+3
Support for the Allwinner H618, H618 and H700 was added to the sun50i cpufreq-nvmem driver recently [1] however at the time some operating points supported by the H700 (1.008, 1.032 and 1.512 GHz) and in use by vendor BSPs were found to be unstable during testing, so the H700 speed bin and the 1.032 GHz OPP were not included in the mainline driver. Retesting with kernel 6.10rc2 (which carries additional fixes for the driver) now shows stable operation with these points. Add the H700 speed bin to the driver. Reviewed-by: Andre Przywara <[email protected]> Signed-off-by: Ryan Walklin <[email protected]> -- [1] https://lore.kernel.org/linux-sunxi/[email protected] Signed-off-by: Viresh Kumar <[email protected]>
2024-06-18hwmon: (spd5118) Add support for Renesas/ITD SPD5118 hub controllersGuenter Roeck1-0/+56
The SPD5118 specification says, in its documentation of the page bits in the MR11 register: " This register only applies to non-volatile memory (1024) Bytes) access of SPD5 Hub device. For volatile memory access, this register must be programmed to '000'. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ " Renesas/ITD SPD5118 hub controllers take this literally and disable access to volatile memory if the page selected in MR11 is != 0. Since the BIOS or ROMMON will access the non-volatile memory and likely select a page != 0, this means that the driver will not instantiate since it can not identify the chip. Even if the driver instantiates, access to volatile registers is blocked after a nvram read operation which selects a page other than 0. To solve the problem, add initialization code to select page 0 during probe. Before doing that, use basic validation to ensure that this is really a SPD5118 device and not some random EEPROM. Cc: Sasha Kozachuk <[email protected]> Cc: John Hamrick <[email protected]> Cc: Chris Sarra <[email protected]> Tested-by: Armin Wolf <[email protected]> Signed-off-by: Guenter Roeck <[email protected]>
2024-06-18hwmon: (spd5118) Use regmap to implement pagingGuenter Roeck1-29/+18
Using regmap for paging significantly improves caching since the regmap cache no longer needs to be cleared after changing the page, so let's use it. Suggested-by: Armin Wolf <[email protected]> Tested-by: Armin Wolf <[email protected]> Signed-off-by: Guenter Roeck <[email protected]>
2024-06-18net: mana: Add support for page sizes other than 4KB on ARM64Haiyang Zhang7-25/+35
As defined by the MANA Hardware spec, the queue size for DMA is 4KB minimal, and power of 2. And, the HWC queue size has to be exactly 4KB. To support page sizes other than 4KB on ARM64, define the minimal queue size as a macro separately from the PAGE_SIZE, which we always assumed it to be 4KB before supporting ARM64. Also, add MANA specific macros and update code related to size alignment, DMA region calculations, etc. Signed-off-by: Haiyang Zhang <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18Merge branch 'net-mlx4_en-use-ethtool_puts-sprintf'Jakub Kicinski1-39/+20
Kamal Heib says: ==================== net/mlx4_en: Use ethtool_puts/sprintf This patchset updates the mlx4_en driver to use the ethtool_puts and ethtool_sprintf helper functions. Signed-off-by: Kamal Heib <[email protected]> ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18net/mlx4_en: Use ethtool_puts/sprintf to fill stats stringsKamal Heib1-35/+17
Use the ethtool_puts/ethtool_sprintf helper to print the stats strings into the ethtool strings interface. Signed-off-by: Kamal Heib <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18net/mlx4_en: Use ethtool_puts to fill selftest stringsKamal Heib1-2/+2
Use the ethtool_puts helper to print the selftest strings into the ethtool strings interface. Signed-off-by: Kamal Heib <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18net/mlx4_en: Use ethtool_puts to fill priv flags stringsKamal Heib1-2/+1
Use the ethtool_puts helper to print the priv flags strings into the ethtool strings interface. Signed-off-by: Kamal Heib <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18net: stmmac: No need to calculate speed divider when offload is disabledXiaolei Wang1-18/+22
commit be27b8965297 ("net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters") introduced a problem. When deleting, it prompts "Invalid portTransmitRate 0 (idleSlope - sendSlope)" and exits. Add judgment on cbs.enable. Only when offload is enabled, speed divider needs to be calculated. Fixes: be27b8965297 ("net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters") Signed-off-by: Xiaolei Wang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18net: phy: dp83tg720: get master/slave configuration in link down stateOleksij Rempel1-3/+21
Get master/slave configuration for initial system start with the link in down state. This ensures ethtool shows current configuration. Also fixes link reconfiguration with ethtool while in down state, preventing ethtool from displaying outdated configuration. Even though dp83tg720_config_init() is executed periodically as long as the link is in admin up state but no carrier is detected, this is not sufficient for the link in admin down state where dp83tg720_read_status() is not periodically executed. To cover this case, we need an extra read role configuration in dp83tg720_config_aneg(). Fixes: cb80ee2f9bee1 ("net: phy: Add support for the DP83TG720S Ethernet PHY") Cc: [email protected] Signed-off-by: Oleksij Rempel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18net: phy: dp83tg720: wake up PHYs in managed modeOleksij Rempel1-3/+15
In case this PHY is bootstrapped for managed mode, we need to manually wake it. Otherwise no link will be detected. Cc: [email protected] Fixes: cb80ee2f9bee1 ("net: phy: Add support for the DP83TG720S Ethernet PHY") Signed-off-by: Oleksij Rempel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-19riscv: dts: sophgo: disable write-protection for milkv duoHaylen Chu1-0/+1
Milkv Duo does not have a write-protect pin, so disable write protect to prevent SDcards misdetected as read-only. Fixes: 89a7056ed4f7 ("riscv: dts: sophgo: add sdcard support for milkv duo") Signed-off-by: Haylen Chu <[email protected]> Link: https://lore.kernel.org/r/SEYPR01MB4221943C7B101DD2318DA0D3D7CE2@SEYPR01MB4221.apcprd01.prod.exchangelabs.com Signed-off-by: Inochi Amaoto <[email protected]> Signed-off-by: Chen Wang <[email protected]>
2024-06-18cxl/mem: Fix no cxl_nvd during pmem region auto-assemblingLi Ming4-16/+23
When CXL subsystem is auto-assembling a pmem region during cxl endpoint port probing, always hit below calltrace. BUG: kernel NULL pointer dereference, address: 0000000000000078 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page RIP: 0010:cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x160 ? do_user_addr_fault+0x65/0x6b0 ? exc_page_fault+0x7d/0x170 ? asm_exc_page_fault+0x26/0x30 ? cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem] ? cxl_pmem_region_probe+0x1ac/0x360 [cxl_pmem] cxl_bus_probe+0x1b/0x60 [cxl_core] really_probe+0x173/0x410 ? __pfx___device_attach_driver+0x10/0x10 __driver_probe_device+0x80/0x170 driver_probe_device+0x1e/0x90 __device_attach_driver+0x90/0x120 bus_for_each_drv+0x84/0xe0 __device_attach+0xbc/0x1f0 bus_probe_device+0x90/0xa0 device_add+0x51c/0x710 devm_cxl_add_pmem_region+0x1b5/0x380 [cxl_core] cxl_bus_probe+0x1b/0x60 [cxl_core] The cxl_nvd of the memdev needs to be available during the pmem region probe. Currently the cxl_nvd is registered after the endpoint port probe. The endpoint probe, in the case of autoassembly of regions, can cause a pmem region probe requiring the not yet available cxl_nvd. Adjust the sequence so this dependency is met. This requires adding a port parameter to cxl_find_nvdimm_bridge() that can be used to query the ancestor root port. The endpoint port is not yet available, but will share a common ancestor with its parent, so start the query from there instead. Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue") Co-developed-by: Dan Williams <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Li Ming <[email protected]> Tested-by: Alison Schofield <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Alison Schofield <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Dave Jiang <[email protected]>
2024-06-18Merge tag 'linux_kselftest-fixes-6.10-rc5' of ↵Linus Torvalds4-8/+35
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: - filesystems: warn_unused_result warnings - seccomp: format-zero-length warnings - fchmodat2: clang build warnings due to-static-libasan - openat2: clang build warnings due to static-libasan, LOCAL_HDRS * tag 'linux_kselftest-fixes-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/fchmodat2: fix clang build failure due to -static-libasan selftests/openat2: fix clang build failures: -static-libasan, LOCAL_HDRS selftests: seccomp: fix format-zero-length warnings selftests: filesystems: fix warn_unused_result build warnings
2024-06-18selftests: openvswitch: Use bash as interpreterSimon Horman1-1/+1
openvswitch.sh makes use of substitutions of the form ${ns:0:1}, to obtain the first character of $ns. Empirically, this is works with bash but not dash. When run with dash these evaluate to an empty string and printing an error to stdout. # dash -c 'ns=client; echo "${ns:0:1}"' 2>error # cat error dash: 1: Bad substitution # bash -c 'ns=client; echo "${ns:0:1}"' 2>error c # cat error This leads to tests that neither pass nor fail. F.e. TEST: arp_ping [START] adding sandbox 'test_arp_ping' Adding DP/Bridge IF: sbx:test_arp_ping dp:arpping {, , } create namespaces ./openvswitch.sh: 282: eval: Bad substitution TEST: ct_connect_v4 [START] adding sandbox 'test_ct_connect_v4' Adding DP/Bridge IF: sbx:test_ct_connect_v4 dp:ct4 {, , } ./openvswitch.sh: 322: eval: Bad substitution create namespaces Resolve this by making openvswitch.sh a bash script. Fixes: 918423fda910 ("selftests: openvswitch: add an initial flow programming case") Signed-off-by: Simon Horman <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18ptp: fix integer overflow in max_vclocks_storeDan Carpenter1-2/+1
On 32bit systems, the "4 * max" multiply can overflow. Use kcalloc() to do the allocation to prevent this. Fixes: 44c494c8e30e ("ptp: track available ptp vclocks information") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Heng Qi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18spi: Fix SPI slave probe failureAmit Kumar Mahapatra1-4/+6
While adding a SPI device, the SPI core ensures that multiple logical CS doesn't map to the same physical CS. For example, spi->chip_select[0] != spi->chip_select[1] and so forth. However, unlike the SPI master, the SPI slave doesn't have the list of chip selects, this leads to probe failure when the SPI controller is configured as slave. Update the __spi_add_device() function to perform this check only if the SPI controller is configured as master. Fixes: 4d8ff6b0991d ("spi: Add multi-cs memories support in SPI core") Signed-off-by: Amit Kumar Mahapatra <[email protected]> Link: https://msgid.link/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2024-06-18sched_ext: Add selftestsDavid Vernet51-0/+3244
Add basic selftests. Signed-off-by: David Vernet <[email protected]> Acked-by: Tejun Heo <[email protected]>
2024-06-18sched_ext: Documentation: scheduler: Document extensible scheduler classTejun Heo7-0/+580
Add Documentation/scheduler/sched-ext.rst which gives a high-level overview and pointers to the examples. v6: - Add paragraph explaining debug dump. v5: - Updated to reflect /sys/kernel interface change. Kconfig options added. v4: - README improved, reformatted in markdown and renamed to README.md. v3: - Added tools/sched_ext/README. - Dropped _example prefix from scheduler names. v2: - Apply minor edits suggested by Bagas. Caveats section dropped as all of them are addressed. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: David Vernet <[email protected]> Acked-by: Josh Don <[email protected]> Acked-by: Hao Luo <[email protected]> Acked-by: Barret Rhoden <[email protected]> Cc: Bagas Sanjaya <[email protected]>
2024-06-18sched_ext: Add vtime-ordered priority queue to dispatch_q'sTejun Heo6-26/+267
Currently, a dsq is always a FIFO. A task which is dispatched earlier gets consumed or executed earlier. While this is sufficient when dsq's are used for simple staging areas for tasks which are ready to execute, it'd make dsq's a lot more useful if they can implement custom ordering. This patch adds a vtime-ordered priority queue to dsq's. When the BPF scheduler dispatches a task with the new scx_bpf_dispatch_vtime() helper, it can specify the vtime tha the task should be inserted at and the task is inserted into the priority queue in the dsq which is ordered according to time_before64() comparison of the vtime values. A DSQ can either be a FIFO or priority queue and automatically switches between the two depending on whether scx_bpf_dispatch() or scx_bpf_dispatch_vtime() is used. Using the wrong variant while the DSQ already has the other type queued is not allowed and triggers an ops error. Built-in DSQs must always be FIFOs. This makes it very easy for the BPF schedulers to implement proper vtime based scheduling within each dsq very easy and efficient at a negligible cost in terms of code complexity and overhead. scx_simple and scx_example_flatcg are updated to default to weighted vtime scheduling (the latter within each cgroup). FIFO scheduling can be selected with -f option. v4: - As allowing mixing priority queue and FIFO on the same DSQ sometimes led to unexpected starvations, DSQs now error out if both modes are used at the same time and the built-in DSQs are no longer allowed to be priority queues. - Explicit type struct scx_dsq_node added to contain fields needed to be linked on DSQs. This will be used to implement stateful iterator. - Tasks are now always linked on dsq->list whether the DSQ is in FIFO or PRIQ mode. This confines PRIQ related complexities to the enqueue and dequeue paths. Other paths only need to look at dsq->list. This will also ease implementing BPF iterator. - Print p->scx.dsq_flags in debug dump. v3: - SCX_TASK_DSQ_ON_PRIQ flag is moved from p->scx.flags into its own p->scx.dsq_flags. The flag is protected with the dsq lock unlike other flags in p->scx.flags. This led to flag corruption in some cases. - Add comments explaining the interaction between using consumption of p->scx.slice to determine vtime progress and yielding. v2: - p->scx.dsq_vtime was not initialized on load or across cgroup migrations leading to some tasks being stalled for extended period of time depending on how saturated the machine is. Fixed. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: David Vernet <[email protected]>
2024-06-18sched_ext: Implement core-sched supportTejun Heo7-20/+346
The core-sched support is composed of the following parts: - task_struct->scx.core_sched_at is added. This is a timestamp which can be used to order tasks. Depending on whether the BPF scheduler implements custom ordering, it tracks either global FIFO ordering of all tasks or local-DSQ ordering within the dispatched tasks on a CPU. - prio_less() is updated to call scx_prio_less() when comparing SCX tasks. scx_prio_less() calls ops.core_sched_before() if available or uses the core_sched_at timestamp. For global FIFO ordering, the BPF scheduler doesn't need to do anything. Otherwise, it should implement ops.core_sched_before() which reflects the ordering. - When core-sched is enabled, balance_scx() balances all SMT siblings so that they all have tasks dispatched if necessary before pick_task_scx() is called. pick_task_scx() picks between the current task and the first dispatched task on the local DSQ based on availability and the core_sched_at timestamps. Note that FIFO ordering is expected among the already dispatched tasks whether running or on the local DSQ, so this path always compares core_sched_at instead of calling into ops.core_sched_before(). qmap_core_sched_before() is added to scx_qmap. It scales the distances from the heads of the queues to compare the tasks across different priority queues and seems to behave as expected. v3: Fixed build error when !CONFIG_SCHED_SMT reported by Andrea Righi. v2: Sched core added the const qualifiers to prio_less task arguments. Explicitly drop them for ops.core_sched_before() task arguments. BPF enforces access control through the verifier, so the qualifier isn't actually operative and only gets in the way when interacting with various helpers. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: David Vernet <[email protected]> Reviewed-by: Josh Don <[email protected]> Cc: Andrea Righi <[email protected]>
2024-06-18sched_ext: Bypass BPF scheduler while PM events are in progressTejun Heo1-0/+34
PM operations freeze userspace. Some BPF schedulers have active userspace component and may misbehave as expected across PM events. While the system is frozen, nothing too interesting is happening in terms of scheduling and we can get by just fine with the fallback FIFO behavior. Let's make things easier by always bypassing the BPF scheduler while PM events are in progress. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: David Vernet <[email protected]>
2024-06-18sched_ext: Implement sched_ext_ops.cpu_online/offline()Tejun Heo10-12/+290
Add ops.cpu_online/offline() which are invoked when CPUs come online and offline respectively. As the enqueue path already automatically bypasses tasks to the local dsq on a deactivated CPU, BPF schedulers are guaranteed to see tasks only on CPUs which are between online() and offline(). If the BPF scheduler doesn't implement ops.cpu_online/offline(), the scheduler is automatically exited with SCX_ECODE_RESTART | SCX_ECODE_RSN_HOTPLUG. Userspace can implement CPU hotpplug support trivially by simply reinitializing and reloading the scheduler. scx_qmap is updated to print out online CPUs on hotplug events. Other schedulers are updated to restart based on ecode. v3: - The previous implementation added @reason to sched_class.rq_on/offline() to distinguish between CPU hotplug events and topology updates. This was buggy and fragile as the methods are skipped if the current state equals the target state. Instead, add scx_rq_[de]activate() which are directly called from sched_cpu_de/activate(). This also allows ops.cpu_on/offline() to sleep which can be useful. - ops.dispatch() could be called on a CPU that the BPF scheduler was told to be offline. The dispatch patch is updated to bypass in such cases. v2: - To accommodate lock ordering change between scx_cgroup_rwsem and cpus_read_lock(), CPU hotplug operations are put into its own SCX_OPI block and enabled eariler during scx_ope_enable() so that cpus_read_lock() can be dropped before acquiring scx_cgroup_rwsem. - Auto exit with ECODE added. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: David Vernet <[email protected]> Acked-by: Josh Don <[email protected]> Acked-by: Hao Luo <[email protected]> Acked-by: Barret Rhoden <[email protected]>
2024-06-18sched_ext: Implement sched_ext_ops.cpu_acquire/release()David Vernet7-7/+240
Scheduler classes are strictly ordered and when a higher priority class has tasks to run, the lower priority ones lose access to the CPU. Being able to monitor and act on these events are necessary for use cases includling strict core-scheduling and latency management. This patch adds two operations ops.cpu_acquire() and .cpu_release(). The former is invoked when a CPU becomes available to the BPF scheduler and the opposite for the latter. This patch also implements scx_bpf_reenqueue_local() which can be called from .cpu_release() to trigger requeueing of all tasks in the local dsq of the CPU so that the tasks can be reassigned to other available CPUs. scx_pair is updated to use .cpu_acquire/release() along with %SCX_KICK_WAIT to make the pair scheduling guarantee strict even when a CPU is preempted by a higher priority scheduler class. scx_qmap is updated to use .cpu_acquire/release() to empty the local dsq of a preempted CPU. A similar approach can be adopted by BPF schedulers that want to have a tight control over latency. v4: Use the new SCX_KICK_IDLE to wake up a CPU after re-enqueueing. v3: Drop the const qualifier from scx_cpu_release_args.task. BPF enforces access control through the verifier, so the qualifier isn't actually operative and only gets in the way when interacting with various helpers. v2: Add p->scx.kf_mask annotation to allow calling scx_bpf_reenqueue_local() from ops.cpu_release() nested inside ops.init() and other sleepable operations. Signed-off-by: David Vernet <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Acked-by: Josh Don <[email protected]> Acked-by: Hao Luo <[email protected]> Acked-by: Barret Rhoden <[email protected]>
2024-06-18sched_ext: Implement SCX_KICK_WAITDavid Vernet4-7/+85
If set when calling scx_bpf_kick_cpu(), the invoking CPU will busy wait for the kicked cpu to enter the scheduler. See the following for example usage: https://github.com/sched-ext/scx/blob/main/scheds/c/scx_pair.bpf.c v2: - Updated to fit the updated kick_cpus_irq_workfn() implementation. - Include SCX_KICK_WAIT related information in debug dump. Signed-off-by: David Vernet <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Acked-by: Josh Don <[email protected]> Acked-by: Hao Luo <[email protected]> Acked-by: Barret Rhoden <[email protected]>