aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-23Merge branch '100GbE' of ↵David S. Miller9-41/+90
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: prepare representor for SF support Michal Swiatkowski says: This is a series to prepare port representor for supporting also subfunctions. We need correct devlink locking and the possibility to update parent VSI after port representor is created. Refactor how devlink lock is taken to suite the subfunction use case. VSI configuration needs to be done after port representor is created. Port representor needs only allocated VSI. It doesn't need to be configured before. VSI needs to be reconfigured when update function is called. The code for this patchset was split from (too big) patchset [1]. [1] https://lore.kernel.org/netdev/[email protected]/ --- Originally from https://lore.kernel.org/netdev/20240605-next-2024-06-03-intel-next-batch-v2-0-39c23963fa78@intel.com/ Changes: - delete ice_repr_get_by_vsi() from header - rephrase commit message in moving devlink locking ==================== Signed-off-by: David S. Miller <[email protected]>
2024-06-23MAINTAINERS: ARM: alphascale: add Krzysztof Kozlowski as maintainerKrzysztof Kozlowski1-0/+10
Apparently there was never a maintainers entry for the ARM Alphascale ASM9260 SoC, thus patches end up nowhere. Add such entry, because even if platform is orphaned and on its way out of the kernel, it is nice to take patches if someone sends something. I do not plan to actively support/maintain ARM Alphascale but I can take odd fixes now and then. Cc: Oleksij Rempel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Kozlowski <[email protected]>
2024-06-23ARM: dts: nspire: Add full compatible for watchdog nodeAndrew Davis1-1/+4
The watchdog appears to be an ARM SP805, add the full compatible and the needed clocks properties. Leave this disabled for now as functionality is not fully tested. Signed-off-by: Andrew Davis <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Kozlowski <[email protected]>
2024-06-23ARM: dts: nspire: Add unit name addresses to memory nodesAndrew Davis2-2/+2
Fixes the following DTB check warning: > node has a reg or ranges property, but no unit name Signed-off-by: Andrew Davis <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Kozlowski <[email protected]>
2024-06-23MAINTAINERS: ARM: nspire: add Krzysztof Kozlowski as maintainerKrzysztof Kozlowski1-0/+9
Apparently there was never a maintainers entry for the ARM Texas Instruments Nspire SoC, thus patches end up nowhere. Add such entry, because even if platform is orphaned and on its way out of the kernel, it is nice to take patches if someone sends something. I do not plan to actively support/maintain Nspire platform but I can take odd fixes now and then. Cc: Daniel Tang <[email protected]> Cc: Andrew Davis <[email protected]> Acked-by: Andrew Davis <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Kozlowski <[email protected]>
2024-06-23MAINTAINERS: ARM: vt8500: add Alexey and Krzysztof as maintainersKrzysztof Kozlowski1-1/+4
The ARM VIA/WonderMedia VT8500 platform became orphaned in commit 8f1b7ba55c61 ("MAINTAINERS: ARM/VT8500, remove defunct e-mail") and clearly it is on the way out of the kernel. However few folks send patches to it and it is nice to actually take them, till the platform is in the kernel. Alexey Charkov still has VT8500 hardware and plans to work on upstream, so add Alexey as the maintainer. Krzysztof will collect patches. Extend the maintainer entry to cover also VT8500 DTS. Cc: Alexey Charkov <[email protected]> Acked-by: Alexey Charkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Kozlowski <[email protected]>
2024-06-23MAINTAINERS: ARM: axm: add Krzysztof Kozlowski as maintainerKrzysztof Kozlowski1-0/+8
There is no maintainers entry for the ARM LSI AXM SoC, thus patches end up nowhere. Add such entry, because even if platform is orphaned and on its way out of the kernel, it is nice to take patches if someone sends something. I do not plan to actively support/maintain AXM but I can take odd fixes now and then. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Kozlowski <[email protected]>
2024-06-23MAINTAINERS: ARM: moxa: add Krzysztof Kozlowski as maintainerKrzysztof Kozlowski1-0/+9
There is no maintainers entry for the ARM MOXA ART SoC, thus patches end up nowhere. Add such entry, because even if platform is orphaned and on its way out of the kernel, it is nice to take patches if someone sends something. I do not plan to actively support/maintain MOXA but I can take odd fixes now and then. Cc: Jonas Jensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Kozlowski <[email protected]>
2024-06-23Merge branch 'phy-microchip-ksz-9897-errata'David S. Miller6-2/+79
Enguerrand de Ribaucourt says: ==================== Handle new Microchip KSZ 9897 Errata These patches implement some suggested workarounds from the Microchip KSZ 9897 Errata [1]. [1] https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ9897R-Errata-DS80000758.pdf --- v7: - use dev_crit_once instead of dev_crit_ratelimited - add a comment to help users understand the consequences of half-duplex errors v6: https://lore.kernel.org/netdev/20240614094642.122464-1-enguerrand.de-ribaucourt@savoirfairelinux.com/ - remove KSZ9897 phy_id workaround (was a configuration issue) - use macros for checking link down in monitoring function - check if VLAN is enabled before monitoring resources v5: https://lore.kernel.org/all/20240604092304.314636-1-enguerrand.de-ribaucourt@savoirfairelinux.com/ - use macros for bitfields - rewrap comments - check ksz_pread* return values - fix spelling mistakes - remove KSZ9477 suspend/resume deletion patch v4: https://lore.kernel.org/all/20240531142430.678198-1-enguerrand.de-ribaucourt@savoirfairelinux.com/ - Rebase on net/main - Add Fixes: tags to the patches - reverse x-mas tree order - use pseudo phy_id instead of match_phy_device v3: https://lore.kernel.org/all/20240530102436.226189-1-enguerrand.de-ribaucourt@savoirfairelinux.com/ ==================== Signed-off-by: David S. Miller <[email protected]>
2024-06-23net: dsa: microchip: monitor potential faults in half-duplex modeEnguerrand de Ribaucourt5-2/+73
The errata DS80000754 recommends monitoring potential faults in half-duplex mode for the KSZ9477 family. half-duplex is not very common so I just added a critical message when the fault conditions are detected. The switch can be expected to be unable to communicate anymore in these states and a software reset of the switch would be required which I did not implement. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Enguerrand de Ribaucourt <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-23net: dsa: microchip: use collision based back pressure modeEnguerrand de Ribaucourt2-0/+5
Errata DS80000758 states that carrier sense back pressure mode can cause link down issues in 100BASE-TX half duplex mode. The datasheet also recommends to always use the collision based back pressure mode. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Enguerrand de Ribaucourt <[email protected]> Reviewed-by: Woojung Huh <[email protected]> Acked-by: Arun Ramadoss <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-23net: phy: micrel: add Microchip KSZ 9477 to the device tableEnguerrand de Ribaucourt1-0/+1
PHY_ID_KSZ9477 was supported but not added to the device table passed to MODULE_DEVICE_TABLE. Fixes: fc3973a1fa09 ("phy: micrel: add Microchip KSZ 9477 Switch PHY support") Signed-off-by: Enguerrand de Ribaucourt <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-23netlink: specs: Fix pse-set command attributesKory Maincent1-2/+5
Not all PSE attributes are used for the pse-set netlink command. Select only the ones used by ethtool. Fixes: f8586411e40e ("netlink: specs: Expand the pse netlink command with PoE interface") Signed-off-by: Kory Maincent <[email protected]> Reviewed-by: Donald Hunter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-23EDAC/dmc520: Use devm_platform_ioremap_resource()Jai Arora1-3/+1
platform_get_resource() and devm_ioremap_resource() are wrapped up in the devm_platform_ioremap_resource() helper. Use the helper and get rid of the local variable for struct resource *. Signed-off-by: Jai Arora <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-23accel/habanalabs: gradual sleep in polling memory macroDidi Freiman1-2/+9
It’s better to avoid long sleeps right from the beginning of the polling since the data may be available much sooner than the sleep period. Because polling host memory is inexpensive, this change gradually increases the sleep time up to the user-requested period. Signed-off-by: Didi Freiman <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: move heartbeat work initialization to early initTomer Tayar1-2/+4
The device heartbeat work is currently initialized at device_heartbeat_schedule() which is called at the end of hl_device_init(). However hl_device_init() can fail at a previous step, and in such a case, a subsequent call to hl_device_fini() will lead to calling cleanup_resources() and accessing this work uninitialized. As there is no real need to re-initialize this work every time it is rescheduled, move this initialization to device_early_init() to be done once and early enough. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: print timestamp of last PQ heartbeat on EQ heartbeat failureTomer Tayar3-12/+46
The test packet which is sent to FW for the PQ heartbeat is used also as the trigger in FW to send the EQ heartbeat event. Add the time of the last sent packet to the debug info which is printed upon a EQ heartbeat failure. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: dump the EQ entries headers on EQ heartbeat failureTomer Tayar4-0/+31
Add a dump of the EQ entries headers upon a EQ heartbeat failure. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: revise print on EQ heartbeat failureTomer Tayar1-9/+10
Don't print the "previous EQ index" value in case of a EQ heartbeat failure, because it is incremented along with the EQ CI and therefore redundant. In addition, as the CPU-CP PI is zeroed when it reaches a value that is twice the queue size, add a value of the CI with a similar wrap around, to make it easier to compare the values. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: add more info upon cpu pkt timeoutFarah Kassabri1-3/+11
In order to have better debuggability upon encountering FW issues, We are adding additional info once CPU packet timeout expires. Signed-off-by: Farah Kassabri <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: additional print in device-in-use infoIlia Levi4-9/+63
When device release triggers a hard reset, there is a printout of the cause. Currently listed causes (that increment context refcount) are active command submissions and exported DMA buffer objects. In any other case, the printout emits "unknown reason". We identify and print another reason - allocated command buffers. Signed-off-by: Ilia Levi <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23MAINTAINERS: Change habanalabs maintainer and git repo pathOded Gabbay1-2/+2
Because I left habana, Ofir Bitton is now the habanalabs driver maintainer. The git repo also changed location to the Habana GitHub website. Signed-off-by: Oded Gabbay <[email protected]> Acked-by: Daniel Vetter <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalbs/gaudi2: reduce interrupt count to 128Ofir Bitton2-6/+6
Some systems allow a maximum number of 128 MSI-X interrupts. Hence we reduce the interrupt count to 128 instead of 512. Reviewed-by: Tomer Tayar <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: disable EQ interrupt after disabling pciTal Cohen1-3/+4
When sending disable pci msg towards firmware, there is a possibility that an EQ packet is already pending, disabling EQ interrupt will prevent this from happening. The interrupt will be re-enabled after reset. Signed-off-by: Tal Cohen <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: change the heartbeat scheduling pointFarah Kassabri1-21/+33
Currently we schedule the heartbeat thread at late init, only then we set the INTS_REGISTER packet which enables events to be received from firmware. Init may take some time and we want to give firmware 2 full cycles of heartbeat thread after it received INTS_REGISTER. The patch will move the heartbeat thread scheduling to be after driver is done with all initializations. Signed-off-by: Farah Kassabri <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: unsecure edma max outstanding registerRakesh Ughreja1-0/+1
Netowrk EDMAs uses more outstanding transfers so this needs to be programmed by EDMA firmware. Signed-off-by: Rakesh Ughreja <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: remove timestamp registration debug printsOfir Bitton1-13/+0
There are several timestamp registration debug prints which spams the kernel log whenever dyn debug is enabled. Remove those prints. Reviewed-by: Tomer Tayar <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: add cpld ts cpld_timestamp cpucpVitaly Margolin2-3/+5
Add cpld_timestamp field to cpucp_info structure and return cpld timestamp as part of cpld version Signed-off-by: Vitaly Margolin <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: add a common handler for clock change eventsTomer Tayar2-0/+47
As the new dynamic EQ includes clock change events which are common and not ASIC-specific, add a common handler for these events. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: use device-name directory in debugfs-driver-habanalabsTomer Tayar1-3/+3
The device debugfs directory was modified to be named as the parent device name. Update the description of 'mmu' and 'mmu_error' to use the new path. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: add GAUDI2D revision supportFarah Kassabri6-1/+16
Gaudi2 with PCI revision ID with the value of '4' represents Gaudi2D device and should be detected and initialized as Gaudi2. Signed-off-by: Farah Kassabri <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: move hl_eq_heartbeat_event_handle() to common codeTomer Tayar3-6/+7
hl_eq_heartbeat_event_handle() doesn't have ASIC specific code, and therefore can be moved from Gaudi2-only code to common code, and possibly used for other ASICs. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: add an EQ size ASIC propertyTomer Tayar2-3/+10
Future supported ASICs might use the dynamic EQ mechanism with the firmware, and in that case the EQ size won't be equal to the default HL_EQ_SIZE_IN_BYTES value. Add an ASIC property to enable overriding this value. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: separate nonce from max_size in cpucp_packet structDani Liberman1-3/+3
In struct cpucp_packet both nonce and data_max_size members are in an union overlapping each other. This is a problem as they both are used in attestation and info_signed packets. The solution here is to move the nonce member to a different union under the same structure. Signed-off-by: Dani Liberman <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: assume hard-reset by FW upon MC SEI severe errorTomer Tayar1-2/+2
FW initiates a hard reset upon an MC SEI severe error. Align the driver to expect this reset and avoid accessing the device until the reset is done. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: revise return value handling in ↵Tomer Tayar1-4/+4
gaudi2_hbm_sei_handle_read_err() The return value in gaudi2_hbm_sei_handle_read_err() is boolean and not a bitmask, so there is need for "|= true". In addition, rename the 'rc' variable, as no "return code" is returned here but an indication if a hard reset is required. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: align interrupt names to tableAriel Suller1-75/+75
when reporting tpc events, the dcore and tpc in dcore should be reported and propagated, and not the generatl tpc number Signed-off-by: Ariel Suller <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: check for errors after preboot is readyFarah Kassabri1-12/+12
Driver should check and report any fatal errors detected by preboot, before it attempts to load the boot fit. Some errors may cause the driver to stop the boot process and mark the device as unusable. This check will allow the driver to fail and print the error reported by preboot and skip the time wasting attempt of trying to load the boot fit, which will fail due to the error. Signed-off-by: Farah Kassabri <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: use msg_header instead of desc_headerIgal Zeltser1-3/+3
Struct comms_desc_header is deprecated and replaced by struct comms_msg_header. As a preparation for removing comms_desc_header from FW, all it's usage in code is replaced by comms_msg_header. Signed-off-by: Igal Zeltser <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: add heartbeat debug infoFarah Kassabri3-1/+29
It is hard to debug the reason for heartbeat check failures. As an attempt to ease this task, this patch will provide more information when this failure happens. Heartbeat checks the communication with FW, so printing the CPU queue pi/ci and the counter of how many times that event was received would help in debugging the issue. Signed-off-by: Farah Kassabri <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: add device name to invalidation failure msgOhad Sharabi1-3/+5
This addition helps log parsers better define the error without the need to go back and search the device name on former log lines. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: expose server type in debugfsTal Risin2-0/+11
Exposing server type through debugfs to enable easier access via scripts. Signed-off-by: Tal Risin <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: use parent device for trace eventsTomer Tayar4-20/+24
Trace events might still be recorded after the accel device is released, while the device name is no longer available. Modify the trace functions to use the parent device instead, which is available at that point and still informative as the device name. Signed-off-by: Tomer Tayar <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: no CPUCP prints on heartbeat failureOhad Sharabi7-132/+128
If we detected heartbet event while some daemon in the background send (via driver interface) CPUCP messages the dmesg will be flooded. Instead, a slight refactor in hl_fw_send_cpu_message() returns -EAGAIN when CPU is disabled (i.e. heartbeat failure) and only then. Later, all calling functions that may be invoked by user space can issue prints only if the error code is not -EAGAIN. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: align embedded specs headersOfir Bitton4-29/+45
Align embedded headers to latest release. Reviewed-by: Tomer Tayar <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: restructure function that checks heartbeat receivedOhad Sharabi1-8/+8
The function returned an error code which isn't propagated up the stack (nor is it printed). The return value is only checked for =0 or !=0 which implies bool return value. The function signature is updated accordingly, renamed, and slightly refactored. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: update interrupts related headersFarah Kassabri1-47/+47
Align the interrupts related headers to latest release. Signed-off-by: Farah Kassabri <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs: add device name to error printDani Liberman1-7/+10
The extra info will help in better traceability and debug. Signed-off-by: Dani Liberman <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23accel/habanalabs/gaudi2: use single function to compare FW versionsOhad Sharabi3-68/+34
Currently, the code contains 2 types of FW version comparison functions: - hl_is_fw_sw_ver_[below/equal_or_greater]() - gaudi2 specific function of the type gaudi2_is_fw_ver_[below/above]x_y_z() Moreover, some functions use the inner FW version which shuold be only stage during development but not version dependencies. Finally, some tests are done to deprecated FW version to which LKD should hold no compatibility. This commit aligns all APIs to a single function that just compares the version and return an integers indicator (similar in some way to strcmp()). In addition, this generic function now considers also the sub-minor FW version and also remove dead code resulting in deprecated FW versions compatibility. Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Ofir Bitton <[email protected]> Signed-off-by: Ofir Bitton <[email protected]>
2024-06-23bcachefs: Fix btree_trans list orderingKent Overstreet2-11/+34
The debug code relies on btree_trans_list being ordered so that it can resume on subsequent calls or lock restarts. However, it was using trans->locknig_wait.task.pid, which is incorrect since btree_trans objects are cached and reused - typically by different tasks. Fix this by switching to pointer order, and also sort them lazily when required - speeding up the btree_trans_get() fastpath. Signed-off-by: Kent Overstreet <[email protected]>