diff options
author | Dave Airlie <airlied@redhat.com> | 2024-06-11 09:08:54 +1000 |
---|---|---|
committer | Dave Airlie <airlied@redhat.com> | 2024-06-11 09:09:07 +1000 |
commit | 7957066ca614b63aa6687e825ccbc215fa4584ea (patch) | |
tree | df0dc7f4f762cab6b59f84463c1a4a0949827c9d /drivers | |
parent | 83a7eefedc9b56fe7bfeff13b6c7356688ffa670 (diff) | |
parent | 6800e63cf97bae62bca56d8e691544540d945f53 (diff) |
Merge tag 'drm-xe-next-2024-06-06' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Expose the L3 bank mask (Francois)
Cross-subsystem Changes:
- Update Xe driver maintainers (Oded)
Display (i915):
- Add missing include to intel_vga.c (Michal Wajdeczko)
Driver Changes:
- Fix Display (xe-only) detection for ADL-N (Lucas)
- Runtime PM fixes that enabled PC-10 and D3Cold (Francois, Rodrigo)
- Fix unexpected silent drm backmerge issues (Thomas)
- More (a lot more) preparation for SR-IOV support (Michal Wajdeczko)
- Devcoredump fixes and improvements (Jose, Tejas, Matt Brost)
- Introduce device 'wedged' state (Rodrigo)
- Improve debug and info messages (Michal Wajdeczko, Rodrigo, Nirmoy)
- Adding or fixing workarounds (Tejas, Shekhar, Lucas, Bommu)
- Check result of drmm_mutex_init (Michal Wajdeczko)
- Enlarge the critical dma fence area for preempt fences (Matt Auld)
- Prevent UAF in VM's rebind work (Matt Auld)
- GuC submit related clean-ups and fixes (Matt Brost, Himal, Jonathan, Niranjana)
- Prefer local helpers to perform dma reservation locking (Himal)
- Spelling and typo fixes (Colin, Francois)
- Prep patches for 1 job per VM bind IOCTL (no uapi change yet) (Matt Brost)
- Remove uninitialized end var from xe_gt_tlb_invalidation_range (Nirmoy)
- GSC related changes targeting LNL support (Daniele)
- Fix assert in L3 bank mask generation (Francois)
- Perform dma_map when moving system buffer objects to TT (Thomas)
- Add helpers for manipulating macro arguments (Michal Wajdeczko)
- Refactor default device atomic settings (Nirmoy)
- Add debugfs node to dump mocs (Janga)
- Use ordered WQ for G2H handler (Matt Brost)
- Clean up and fixes in header includes (Michal Wajdeczko)
- Prefer flexible-array over deprecated zero-lenght ones (Lucas)
- Add Indirect Ring State support (Niranjana)
- Fix UBSAN shift-out-of-bounds failure (Shuicheng)
- HWMon fixes and additions (Karthik)
- Clean-up refactor around probe init functions (Lucas, Michal Wajdeczko)
- Fix PCODE init function (Himal)
- Only use reserved BCS instances for usm migrate exec queue (Matt Brost)
- Only zap PTEs as needed (Matt Brost)
- Per client usage info (Lucas)
- Core hotunplug improvements converting stuff towards devm (Matt Auld)
- Don't emit false error if running in execlist mode (Michal Wajdeczko)
- Remove unused struct (Dr. David)
- Support/debug for slow GuC loads (John Harrison)
- Decouple job seqno and lrc seqno (Matt Brost)
- Allow migrate vm gpu submissions from reclaim context (Thomas)
- Rename drm-client running time to run_ticks and fix a UAF (Umesh)
- Check empty pinned BO list with lock held (Nirmoy)
- Drop undesired prefix from the platform name (Michal Wajdeczko)
- Remove unwanted mutex locking on xe file close (Niranjana)
- Replace format-less snprintf() with strscpy() (Arnd)
- Other general clean-ups on registers definitions and function names (Michal Wajdeczko)
- Add kernel-doc to some xe_lrc interfaces (Niranajana)
- Use missing lock in relay_needs_worker (Nirmoy)
- Drop redundant W=1 warnings from Makefile (Jani)
- Simplify if condition in preempt fences code (Thorsten)
- Flush engine buffers before signalling user fence on all engines (Andrzej)
- Don't overmap identity VRAM mapping (Matt Brost)
- Do not dereference NULL job->fence in trace points (Matt Brost)
- Add synchronous gt reset debugfs (Jonathan)
- Xe gt_idle fixes (Riana)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZmItmuf7vq_xvRjJ@intel.com
Diffstat (limited to 'drivers')
158 files changed, 7993 insertions, 1775 deletions
diff --git a/drivers/gpu/drm/i915/display/intel_vga.c b/drivers/gpu/drm/i915/display/intel_vga.c index 4b98833bfa8c..0b5916c15307 100644 --- a/drivers/gpu/drm/i915/display/intel_vga.c +++ b/drivers/gpu/drm/i915/display/intel_vga.c @@ -3,6 +3,7 @@ * Copyright © 2019 Intel Corporation */ +#include <linux/delay.h> #include <linux/vgaarb.h> #include <video/vga.h> diff --git a/drivers/gpu/drm/xe/Kconfig.debug b/drivers/gpu/drm/xe/Kconfig.debug index df02e5d17d26..bc177368af6c 100644 --- a/drivers/gpu/drm/xe/Kconfig.debug +++ b/drivers/gpu/drm/xe/Kconfig.debug @@ -61,16 +61,6 @@ config DRM_XE_DEBUG_MEM If in doubt, say "N". -config DRM_XE_SIMPLE_ERROR_CAPTURE - bool "Enable simple error capture to dmesg on job timeout" - default n - help - Choose this option when debugging an unexpected job timeout - - Recommended for driver developers only. - - If in doubt, say "N". - config DRM_XE_KUNIT_TEST tristate "KUnit tests for the drm xe driver" if !KUNIT_ALL_TESTS depends on DRM_XE && KUNIT && DEBUG_FS diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile index b165bbf52aef..70738d1f85e9 100644 --- a/drivers/gpu/drm/xe/Makefile +++ b/drivers/gpu/drm/xe/Makefile @@ -3,31 +3,8 @@ # Makefile for the drm device driver. This driver provides support for the # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher. -# Unconditionally enable W=1 warnings locally -# --- begin copy-paste W=1 warnings from scripts/Makefile.extrawarn -subdir-ccflags-y += -Wextra -Wunused -Wno-unused-parameter -subdir-ccflags-y += -Wmissing-declarations -subdir-ccflags-y += $(call cc-option, -Wrestrict) -subdir-ccflags-y += -Wmissing-format-attribute -subdir-ccflags-y += -Wmissing-prototypes -subdir-ccflags-y += -Wold-style-definition -subdir-ccflags-y += -Wmissing-include-dirs -subdir-ccflags-y += $(call cc-option, -Wunused-but-set-variable) -subdir-ccflags-y += $(call cc-option, -Wunused-const-variable) -subdir-ccflags-y += $(call cc-option, -Wpacked-not-aligned) -subdir-ccflags-y += $(call cc-option, -Wformat-overflow) +# Enable W=1 warnings not enabled in drm subsystem Makefile subdir-ccflags-y += $(call cc-option, -Wformat-truncation) -subdir-ccflags-y += $(call cc-option, -Wstringop-truncation) -# The following turn off the warnings enabled by -Wextra -ifeq ($(findstring 2, $(KBUILD_EXTRA_WARN)),) -subdir-ccflags-y += -Wno-missing-field-initializers -subdir-ccflags-y += -Wno-type-limits -subdir-ccflags-y += -Wno-shift-negative-value -endif -ifeq ($(findstring 3, $(KBUILD_EXTRA_WARN)),) -subdir-ccflags-y += -Wno-sign-compare -endif -# --- end copy-paste # Enable -Werror in CI and development subdir-ccflags-$(CONFIG_DRM_XE_WERROR) += -Werror @@ -89,7 +66,7 @@ xe-y += xe_bb.o \ xe_gt_mcr.o \ xe_gt_pagefault.o \ xe_gt_sysfs.o \ - xe_gt_throttle_sysfs.o \ + xe_gt_throttle.o \ xe_gt_tlb_invalidation.o \ xe_gt_topology.o \ xe_guc.o \ @@ -143,6 +120,7 @@ xe-y += xe_bb.o \ xe_uc_debugfs.o \ xe_uc_fw.o \ xe_vm.o \ + xe_vram.o \ xe_vram_freq.o \ xe_wait_user_fence.o \ xe_wa.o \ @@ -155,6 +133,8 @@ xe-$(CONFIG_HWMON) += xe_hwmon.o # graphics virtualization (SR-IOV) support xe-y += \ + xe_gt_sriov_vf.o \ + xe_gt_sriov_vf_debugfs.o \ xe_guc_relay.o \ xe_memirq.o \ xe_sriov.o @@ -163,10 +143,14 @@ xe-$(CONFIG_PCI_IOV) += \ xe_gt_sriov_pf.o \ xe_gt_sriov_pf_config.o \ xe_gt_sriov_pf_control.o \ + xe_gt_sriov_pf_debugfs.o \ + xe_gt_sriov_pf_monitor.o \ xe_gt_sriov_pf_policy.o \ + xe_gt_sriov_pf_service.o \ xe_lmtt.o \ xe_lmtt_2l.o \ xe_lmtt_ml.o \ + xe_pci_sriov.o \ xe_sriov_pf.o # include helpers for tests even when XE is built-in diff --git a/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h index c1ad09b36453..181180f5945c 100644 --- a/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h @@ -172,6 +172,36 @@ #define VF2GUC_RELAY_TO_PF_REQUEST_MSG_NUM_RELAY_DATA GUC_RELAY_MSG_MAX_LEN /** + * DOC: GUC2PF_ADVERSE_EVENT + * + * This message is used by the GuC to notify PF about adverse events. + * + * This G2H message must be sent as `CTB HXG Message`_. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_GUC_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_EVENT_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | DATA0 = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_ACTION_GUC2PF_ADVERSE_EVENT` = 0x5104 | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | DATA1 = **VFID** - VF identifier | + * +---+-------+--------------------------------------------------------------+ + * | 2 | 31:0 | DATA2 = **THRESHOLD** - key of the exceeded threshold | + * +---+-------+--------------------------------------------------------------+ + */ +#define GUC_ACTION_GUC2PF_ADVERSE_EVENT 0x5104 + +#define GUC2PF_ADVERSE_EVENT_EVENT_MSG_LEN (GUC_HXG_EVENT_MSG_MIN_LEN + 2u) +#define GUC2PF_ADVERSE_EVENT_EVENT_MSG_0_MBZ GUC_HXG_EVENT_MSG_0_DATA0 +#define GUC2PF_ADVERSE_EVENT_EVENT_MSG_1_VFID GUC_HXG_EVENT_MSG_n_DATAn +#define GUC2PF_ADVERSE_EVENT_EVENT_MSG_2_THRESHOLD GUC_HXG_EVENT_MSG_n_DATAn + +/** * DOC: GUC2PF_VF_STATE_NOTIFY * * The GUC2PF_VF_STATE_NOTIFY message is used by the GuC to notify PF about change @@ -214,6 +244,73 @@ #define GUC_PF_NOTIFY_VF_FIXUP_DONE 4u /** + * DOC: VF2GUC_MATCH_VERSION + * + * This action is used to match VF interface version used by VF and GuC. + * + * This message must be sent as `MMIO HXG Message`_. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | DATA0 = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_ACTION_VF2GUC_MATCH_VERSION` = 0x5500 | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:24 | **BRANCH** - branch ID of the VF interface | + * | | | (use BRANCH_ANY to request latest version supported by GuC) | + * | +-------+--------------------------------------------------------------+ + * | | 23:16 | **MAJOR** - major version of the VF interface | + * | | | (use MAJOR_ANY to request latest version supported by GuC) | + * | +-------+--------------------------------------------------------------+ + * | | 15:8 | **MINOR** - minor version of the VF interface | + * | | | (use MINOR_ANY to request latest version supported by GuC) | + * | +-------+--------------------------------------------------------------+ + * | | 7:0 | **MBZ** | + * +---+-------+--------------------------------------------------------------+ + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_GUC_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | DATA0 = MBZ | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:24 | **BRANCH** - branch ID of the VF interface | + * | +-------+--------------------------------------------------------------+ + * | | 23:16 | **MAJOR** - major version of the VF interface | + * | +-------+--------------------------------------------------------------+ + * | | 15:8 | **MINOR** - minor version of the VF interface | + * | +-------+--------------------------------------------------------------+ + * | | 7:0 | **PATCH** - patch version of the VF interface | + * +---+-------+--------------------------------------------------------------+ + */ +#define GUC_ACTION_VF2GUC_MATCH_VERSION 0x5500u + +#define VF2GUC_MATCH_VERSION_REQUEST_MSG_LEN (GUC_HXG_REQUEST_MSG_MIN_LEN + 1u) +#define VF2GUC_MATCH_VERSION_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0 +#define VF2GUC_MATCH_VERSION_REQUEST_MSG_1_BRANCH (0xffu << 24) +#define GUC_VERSION_BRANCH_ANY 0 +#define VF2GUC_MATCH_VERSION_REQUEST_MSG_1_MAJOR (0xffu << 16) +#define GUC_VERSION_MAJOR_ANY 0 +#define VF2GUC_MATCH_VERSION_REQUEST_MSG_1_MINOR (0xffu << 8) +#define GUC_VERSION_MINOR_ANY 0 +#define VF2GUC_MATCH_VERSION_REQUEST_MSG_1_MBZ (0xffu << 0) + +#define VF2GUC_MATCH_VERSION_RESPONSE_MSG_LEN (GUC_HXG_RESPONSE_MSG_MIN_LEN + 1u) +#define VF2GUC_MATCH_VERSION_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 +#define VF2GUC_MATCH_VERSION_RESPONSE_MSG_1_BRANCH (0xffu << 24) +#define VF2GUC_MATCH_VERSION_RESPONSE_MSG_1_MAJOR (0xffu << 16) +#define VF2GUC_MATCH_VERSION_RESPONSE_MSG_1_MINOR (0xffu << 8) +#define VF2GUC_MATCH_VERSION_RESPONSE_MSG_1_PATCH (0xffu << 0) + +/** * DOC: PF2GUC_UPDATE_VGT_POLICY * * This message is used by the PF to set `GuC VGT Policy KLVs`_. @@ -367,4 +464,97 @@ #define GUC_PF_TRIGGER_VF_FLR_START 4u #define GUC_PF_TRIGGER_VF_FLR_FINISH 5u +/** + * DOC: VF2GUC_VF_RESET + * + * This action is used by VF to reset GuC's VF state. + * + * This message must be sent as `MMIO HXG Message`_. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | DATA0 = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_ACTION_VF2GUC_VF_RESET` = 0x5507 | + * +---+-------+--------------------------------------------------------------+ + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_GUC_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | DATA0 = MBZ | + * +---+-------+--------------------------------------------------------------+ + */ +#define GUC_ACTION_VF2GUC_VF_RESET 0x5507u + +#define VF2GUC_VF_RESET_REQUEST_MSG_LEN GUC_HXG_REQUEST_MSG_MIN_LEN +#define VF2GUC_VF_RESET_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0 + +#define VF2GUC_VF_RESET_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN +#define VF2GUC_VF_RESET_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 + +/** + * DOC: VF2GUC_QUERY_SINGLE_KLV + * + * This action is used by VF to query value of the single KLV data. + * + * This message must be sent as `MMIO HXG Message`_. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_ACTION_VF2GUC_QUERY_SINGLE_KLV` = 0x5509 | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:16 | MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | **KEY** - key for which value is requested | + * +---+-------+--------------------------------------------------------------+ + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_GUC_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | **LENGTH** - length of data in dwords | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | **VALUE32** - bits 31:0 of value if **LENGTH** >= 1 | + * +---+-------+--------------------------------------------------------------+ + * | 2 | 31:0 | **VALUE64** - bits 63:32 of value if **LENGTH** >= 2 | + * +---+-------+--------------------------------------------------------------+ + * | 3 | 31:0 | **VALUE96** - bits 95:64 of value if **LENGTH** >= 3 | + * +---+-------+--------------------------------------------------------------+ + */ +#define GUC_ACTION_VF2GUC_QUERY_SINGLE_KLV 0x5509u + +#define VF2GUC_QUERY_SINGLE_KLV_REQUEST_MSG_LEN (GUC_HXG_REQUEST_MSG_MIN_LEN + 1u) +#define VF2GUC_QUERY_SINGLE_KLV_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0 +#define VF2GUC_QUERY_SINGLE_KLV_REQUEST_MSG_1_MBZ (0xffffu << 16) +#define VF2GUC_QUERY_SINGLE_KLV_REQUEST_MSG_1_KEY (0xffffu << 0) + +#define VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_MIN_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN +#define VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_MAX_LEN (GUC_HXG_RESPONSE_MSG_MIN_LEN + 3u) +#define VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_0_MBZ (0xfffu << 16) +#define VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_0_LENGTH (0xffffu << 0) +#define VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_1_VALUE32 GUC_HXG_REQUEST_MSG_n_DATAn +#define VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_2_VALUE64 GUC_HXG_REQUEST_MSG_n_DATAn +#define VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_3_VALUE96 GUC_HXG_REQUEST_MSG_n_DATAn + #endif diff --git a/drivers/gpu/drm/xe/abi/guc_errors_abi.h b/drivers/gpu/drm/xe/abi/guc_errors_abi.h index ec83551bf9c0..d0b5fed6876f 100644 --- a/drivers/gpu/drm/xe/abi/guc_errors_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_errors_abi.h @@ -7,8 +7,12 @@ #define _ABI_GUC_ERRORS_ABI_H enum xe_guc_response_status { - XE_GUC_RESPONSE_STATUS_SUCCESS = 0x0, - XE_GUC_RESPONSE_STATUS_GENERIC_FAIL = 0xF000, + XE_GUC_RESPONSE_STATUS_SUCCESS = 0x0, + XE_GUC_RESPONSE_NOT_SUPPORTED = 0x20, + XE_GUC_RESPONSE_NO_ATTRIBUTE_TABLE = 0x201, + XE_GUC_RESPONSE_NO_DECRYPTION_KEY = 0x202, + XE_GUC_RESPONSE_DECRYPTION_FAILED = 0x204, + XE_GUC_RESPONSE_STATUS_GENERIC_FAIL = 0xF000, }; enum xe_guc_load_status { @@ -17,6 +21,9 @@ enum xe_guc_load_status { XE_GUC_LOAD_STATUS_ERROR_DEVID_BUILD_MISMATCH = 0x02, XE_GUC_LOAD_STATUS_GUC_PREPROD_BUILD_MISMATCH = 0x03, XE_GUC_LOAD_STATUS_ERROR_DEVID_INVALID_GUCTYPE = 0x04, + XE_GUC_LOAD_STATUS_HWCONFIG_START = 0x05, + XE_GUC_LOAD_STATUS_HWCONFIG_DONE = 0x06, + XE_GUC_LOAD_STATUS_HWCONFIG_ERROR = 0x07, XE_GUC_LOAD_STATUS_GDT_DONE = 0x10, XE_GUC_LOAD_STATUS_IDT_DONE = 0x20, XE_GUC_LOAD_STATUS_LAPIC_DONE = 0x30, @@ -34,4 +41,19 @@ enum xe_guc_load_status { XE_GUC_LOAD_STATUS_READY = 0xF0, }; +enum xe_bootrom_load_status { + XE_BOOTROM_STATUS_NO_KEY_FOUND = 0x13, + XE_BOOTROM_STATUS_AES_PROD_KEY_FOUND = 0x1A, + XE_BOOTROM_STATUS_PROD_KEY_CHECK_FAILURE = 0x2B, + XE_BOOTROM_STATUS_RSA_FAILED = 0x50, + XE_BOOTROM_STATUS_PAVPC_FAILED = 0x73, + XE_BOOTROM_STATUS_WOPCM_FAILED = 0x74, + XE_BOOTROM_STATUS_LOADLOC_FAILED = 0x75, + XE_BOOTROM_STATUS_JUMP_PASSED = 0x76, + XE_BOOTROM_STATUS_JUMP_FAILED = 0x77, + XE_BOOTROM_STATUS_RC6CTXCONFIG_FAILED = 0x79, + XE_BOOTROM_STATUS_MPUMAP_INCORRECT = 0x7A, + XE_BOOTROM_STATUS_EXCEPTION = 0x7E, +}; + #endif diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h index 511cf974d585..8f9f60b28306 100644 --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h @@ -36,6 +36,20 @@ #define GUC_KLV_n_VALUE (0xffffffffu << 0) /** + * DOC: GuC Global Config KLVs + * + * `GuC KLV`_ keys available for use with HOST2GUC_SELF_CFG_. + * + * _`GUC_KLV_GLOBAL_CFG_GMD_ID` : 0x3000 + * Refers to 32 bit architecture version as reported by the HW IP. + * This key is supported on MTL+ platforms only. + * Requires GuC ABI 1.2+. + */ + +#define GUC_KLV_GLOBAL_CFG_GMD_ID_KEY 0x3000u +#define GUC_KLV_GLOBAL_CFG_GMD_ID_LEN 1u + +/** * DOC: GuC Self Config KLVs * * `GuC KLV`_ keys available for use with HOST2GUC_SELF_CFG_. @@ -194,14 +208,18 @@ enum { * granularity) since the GPUs clock time runs off a different crystal * from the CPUs clock. Changing this KLV on a VF that is currently * running a context wont take effect until a new context is scheduled in. - * That said, when the PF is changing this value from 0xFFFFFFFF to - * something else, it might never take effect if the VF is running an - * inifinitely long compute or shader kernel. In such a scenario, the + * That said, when the PF is changing this value from 0x0 to + * a non-zero value, it might never take effect if the VF is running an + * infinitely long compute or shader kernel. In such a scenario, the * PF would need to trigger a VM PAUSE and then change the KLV to force * it to take effect. Such cases might typically happen on a 1PF+1VF * Virtualization config enabled for heavier workloads like AI/ML. * + * The max value for this KLV is 100 seconds, anything exceeding that + * will be clamped to the max. + * * :0: infinite exec quantum (default) + * :100000: maximum exec quantum (100000ms == 100s) * * _`GUC_KLV_VF_CFG_PREEMPT_TIMEOUT` : 0x8A02 * This config sets the VF-preemption-timeout in microseconds. @@ -211,15 +229,19 @@ enum { * different crystal from the CPUs clock. Changing this KLV on a VF * that is currently running a context wont take effect until a new * context is scheduled in. - * That said, when the PF is changing this value from 0xFFFFFFFF to - * something else, it might never take effect if the VF is running an - * inifinitely long compute or shader kernel. + * That said, when the PF is changing this value from 0x0 to + * a non-zero value, it might never take effect if the VF is running an + * infinitely long compute or shader kernel. * In this case, the PF would need to trigger a VM PAUSE and then change * the KLV to force it to take effect. Such cases might typically happen * on a 1PF+1VF Virtualization config enabled for heavier workloads like * AI/ML. * + * The max value for this KLV is 100 seconds, anything exceeding that + * will be clamped to the max. + * * :0: no preemption timeout (default) + * :100000000: maximum preemption timeout (100000000us == 100s) * * _`GUC_KLV_VF_CFG_THRESHOLD_CAT_ERR` : 0x8A03 * This config sets threshold for CAT errors caused by the VF. @@ -291,9 +313,11 @@ enum { #define GUC_KLV_VF_CFG_EXEC_QUANTUM_KEY 0x8a01 #define GUC_KLV_VF_CFG_EXEC_QUANTUM_LEN 1u +#define GUC_KLV_VF_CFG_EXEC_QUANTUM_MAX_VALUE 100000u -#define GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY 0x8a02 -#define GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_LEN 1u +#define GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY 0x8a02 +#define GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_LEN 1u +#define GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_MAX_VALUE 100000000u #define GUC_KLV_VF_CFG_THRESHOLD_CAT_ERR_KEY 0x8a03 #define GUC_KLV_VF_CFG_THRESHOLD_CAT_ERR_LEN 1u diff --git a/drivers/gpu/drm/xe/abi/guc_relay_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_relay_actions_abi.h index 747e428de421..6c2834613081 100644 --- a/drivers/gpu/drm/xe/abi/guc_relay_actions_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_relay_actions_abi.h @@ -1,11 +1,179 @@ /* SPDX-License-Identifier: MIT */ /* - * Copyright © 2023 Intel Corporation + * Copyright © 2023-2024 Intel Corporation */ #ifndef _ABI_GUC_RELAY_ACTIONS_ABI_H_ #define _ABI_GUC_RELAY_ACTIONS_ABI_H_ +#include "abi/guc_relay_communication_abi.h" + +/** + * DOC: GuC Relay VF/PF ABI Version + * + * The _`GUC_RELAY_VERSION_BASE` defines minimum VF/PF ABI version that + * drivers must support. Currently this is version 1.0. + * + * The _`GUC_RELAY_VERSION_LATEST` defines latest VF/PF ABI version that + * drivers may use. Currently this is version 1.0. + * + * Some platforms may require different base VF/PF ABI version. + * No supported VF/PF ABI version can be 0.0. + */ + +#define GUC_RELAY_VERSION_BASE_MAJOR 1 +#define GUC_RELAY_VERSION_BASE_MINOR 0 + +#define GUC_RELAY_VERSION_LATEST_MAJOR 1 +#define GUC_RELAY_VERSION_LATEST_MINOR 0 + +/** + * DOC: GuC Relay Actions + * + * The following actions are supported from VF/PF ABI version 1.0: + * + * * `VF2PF_HANDSHAKE`_ + * * `VF2PF_QUERY_RUNTIME`_ + */ + +/** + * DOC: VF2PF_HANDSHAKE + * + * This `Relay Message`_ is used by the VF to establish ABI version with the PF. + * + * Prior to exchanging any other messages, both VF driver and PF driver must + * negotiate the VF/PF ABI version that will be used in their communication. + * + * The VF driver shall use @MAJOR and @MINOR fields to pass requested ABI version. + * The VF driver may use special version 0.0 (both @MAJOR and @MINOR set to 0) + * to request latest (or any) ABI version that is supported by the PF driver. + * + * This message definition shall be supported by all future ABI versions. + * This message definition shall not be changed by future ABI versions. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | DATA0 = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_RELAY_ACTION_VF2PF_HANDSHAKE` = 0x0001 | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:16 | **MAJOR** - requested major version of the VFPF interface | + * | | | (use MAJOR_ANY to request latest version supported by PF) | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | **MINOR** - requested minor version of the VFPF interface | + * | | | (use MINOR_ANY to request latest version supported by PF) | + * +---+-------+--------------------------------------------------------------+ + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | DATA0 = MBZ | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:16 | **MAJOR** - agreed major version of the VFPF interface | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | **MINOR** - agreed minor version of the VFPF interface | + * +---+-------+--------------------------------------------------------------+ + */ +#define GUC_RELAY_ACTION_VF2PF_HANDSHAKE 0x0001u + +#define VF2PF_HANDSHAKE_REQUEST_MSG_LEN 2u +#define VF2PF_HANDSHAKE_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0 +#define VF2PF_HANDSHAKE_REQUEST_MSG_1_MAJOR (0xffffu << 16) +#define VF2PF_HANDSHAKE_MAJOR_ANY 0 +#define VF2PF_HANDSHAKE_REQUEST_MSG_1_MINOR (0xffffu << 0) +#define VF2PF_HANDSHAKE_MINOR_ANY 0 + +#define VF2PF_HANDSHAKE_RESPONSE_MSG_LEN 2u +#define VF2PF_HANDSHAKE_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 +#define VF2PF_HANDSHAKE_RESPONSE_MSG_1_MAJOR (0xffffu << 16) +#define VF2PF_HANDSHAKE_RESPONSE_MSG_1_MINOR (0xffffu << 0) + +/** + * DOC: VF2PF_QUERY_RUNTIME + * + * This `Relay Message`_ is used by the VF to query values of runtime registers. + * + * On some platforms, VF drivers may not have access to the some fuse registers + * (referred here as 'runtime registers') and therefore VF drivers need to ask + * the PF driver to obtain their values. + * + * However, the list of such registers, and their values, is fully owned and + * maintained by the PF driver and the VF driver may only initiate the query + * sequence and indicate in the @START field the starting index of the next + * requested register from this predefined list. + * + * In the response, the PF driver will return tuple of 32-bit register offset and + * the 32-bit value of that register (respectively @REG_OFFSET and @REG_VALUE). + * + * The VF driver can use @LIMIT field to limit number of returned register tuples. + * If @LIMIT is unset then PF decides about number of returned register tuples. + * + * This message definition is supported from ABI version 1.0. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | DATA0 = **LIMIT** - limit number of returned entries | + * | | | (use zero to not enforce any limits on the response) | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_RELAY_ACTION_VF2PF_QUERY_RUNTIME` = 0x0101 | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | DATA1 = **START** - index of the first requested entry | + * +---+-------+--------------------------------------------------------------+ + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | DATA0 = **COUNT** - number of entries included in response | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | DATA1 = **REMAINING** - number of remaining entries | + * +---+-------+--------------------------------------------------------------+ + * | 2 | 31:0 | DATA2 = **REG_OFFSET** - offset of register[START] | + * +---+-------+--------------------------------------------------------------+ + * | 3 | 31:0 | DATA3 = **REG_VALUE** - value of register[START] | + * +---+-------+--------------------------------------------------------------+ + * | | | | + * +---+-------+--------------------------------------------------------------+ + * |n-1| 31:0 | REG_OFFSET - offset of register[START + x] | + * +---+-------+--------------------------------------------------------------+ + * | n | 31:0 | REG_VALUE - value of register[START + x] | + * +---+-------+--------------------------------------------------------------+ + */ +#define GUC_RELAY_ACTION_VF2PF_QUERY_RUNTIME 0x0101u + +#define VF2PF_QUERY_RUNTIME_REQUEST_MSG_LEN 2u +#define VF2PF_QUERY_RUNTIME_REQUEST_MSG_0_LIMIT GUC_HXG_REQUEST_MSG_0_DATA0 +#define VF2PF_QUERY_RUNTIME_NO_LIMIT 0u +#define VF2PF_QUERY_RUNTIME_REQUEST_MSG_1_START GUC_HXG_REQUEST_MSG_n_DATAn + +#define VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN (GUC_HXG_MSG_MIN_LEN + 1u) +#define VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MAX_LEN \ + (VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN + VF2PF_QUERY_RUNTIME_MAX_COUNT * 2) +#define VF2PF_QUERY_RUNTIME_RESPONSE_MSG_0_COUNT GUC_HXG_RESPONSE_MSG_0_DATA0 +#define VF2PF_QUERY_RUNTIME_MIN_COUNT 0 +#define VF2PF_QUERY_RUNTIME_MAX_COUNT \ + ((GUC_RELAY_MSG_MAX_LEN - VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN) / 2) +#define VF2PF_QUERY_RUNTIME_RESPONSE_MSG_1_REMAINING GUC_HXG_RESPONSE_MSG_n_DATAn +#define VF2PF_QUERY_RUNTIME_RESPONSE_DATAn_REG_OFFSETx GUC_HXG_RESPONSE_MSG_n_DATAn +#define VF2PF_QUERY_RUNTIME_RESPONSE_DATAn_REG_VALUEx GUC_HXG_RESPONSE_MSG_n_DATAn + /** * DOC: GuC Relay Debug Actions * diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c index 0de0566e5b39..ff8863979065 100644 --- a/drivers/gpu/drm/xe/display/xe_display.c +++ b/drivers/gpu/drm/xe/display/xe_display.c @@ -126,15 +126,14 @@ int xe_display_init_nommio(struct xe_device *xe) return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe); } -static void xe_display_fini_noirq(struct drm_device *dev, void *dummy) +static void xe_display_fini_noirq(void *arg) { - struct xe_device *xe = to_xe_device(dev); + struct xe_device *xe = arg; if (!xe->info.enable_display) return; intel_display_driver_remove_noirq(xe); - intel_power_domains_driver_remove(xe); } int xe_display_init_noirq(struct xe_device *xe) @@ -163,12 +162,12 @@ int xe_display_init_noirq(struct xe_device *xe) if (err) return err; - return drmm_add_action_or_reset(&xe->drm, xe_display_fini_noirq, NULL); + return devm_add_action_or_reset(xe->drm.dev, xe_display_fini_noirq, xe); } -static void xe_display_fini_noaccel(struct drm_device *dev, void *dummy) +static void xe_display_fini_noaccel(void *arg) { - struct xe_device *xe = to_xe_device(dev); + struct xe_device *xe = arg; if (!xe->info.enable_display) return; @@ -187,7 +186,7 @@ int xe_display_init_noaccel(struct xe_device *xe) if (err) return err; - return drmm_add_action_or_reset(&xe->drm, xe_display_fini_noaccel, NULL); + return devm_add_action_or_reset(xe->drm.dev, xe_display_fini_noaccel, xe); } int xe_display_init(struct xe_device *xe) @@ -235,8 +234,6 @@ void xe_display_driver_remove(struct xe_device *xe) return; intel_display_driver_remove(xe); - - intel_display_device_remove(xe); } /* IRQ-related functions */ @@ -300,7 +297,7 @@ static bool suspend_to_idle(void) return false; } -void xe_display_pm_suspend(struct xe_device *xe) +void xe_display_pm_suspend(struct xe_device *xe, bool runtime) { bool s2idle = suspend_to_idle(); if (!xe->info.enable_display) @@ -314,7 +311,8 @@ void xe_display_pm_suspend(struct xe_device *xe) if (has_display(xe)) drm_kms_helper_poll_disable(&xe->drm); - intel_display_driver_suspend(xe); + if (!runtime) + intel_display_driver_suspend(xe); intel_dp_mst_suspend(xe); @@ -350,7 +348,7 @@ void xe_display_pm_resume_early(struct xe_device *xe) intel_power_domains_resume(xe); } -void xe_display_pm_resume(struct xe_device *xe) +void xe_display_pm_resume(struct xe_device *xe, bool runtime) { if (!xe->info.enable_display) return; @@ -365,7 +363,8 @@ void xe_display_pm_resume(struct xe_device *xe) /* MST sideband requires HPD interrupts enabled */ intel_dp_mst_resume(xe); - intel_display_driver_resume(xe); + if (!runtime) + intel_display_driver_resume(xe); intel_hpd_poll_disable(xe); if (has_display(xe)) @@ -378,17 +377,31 @@ void xe_display_pm_resume(struct xe_device *xe) intel_power_domains_enable(xe); } -void xe_display_probe(struct xe_device *xe) +static void display_device_remove(struct drm_device *dev, void *arg) +{ + struct xe_device *xe = arg; + + intel_display_device_remove(xe); +} + +int xe_display_probe(struct xe_device *xe) { + int err; + if (!xe->info.enable_display) goto no_display; intel_display_device_probe(xe); + err = drmm_add_action_or_reset(&xe->drm, display_device_remove, xe); + if (err) + return err; + if (has_display(xe)) - return; + return 0; no_display: xe->info.enable_display = false; unset_display_features(xe); + return 0; } diff --git a/drivers/gpu/drm/xe/display/xe_display.h b/drivers/gpu/drm/xe/display/xe_display.h index 710e56180b52..000fb5799df5 100644 --- a/drivers/gpu/drm/xe/display/xe_display.h +++ b/drivers/gpu/drm/xe/display/xe_display.h @@ -18,7 +18,7 @@ void xe_display_driver_remove(struct xe_device *xe); int xe_display_create(struct xe_device *xe); -void xe_display_probe(struct xe_device *xe); +int xe_display_probe(struct xe_device *xe); int xe_display_init_nommio(struct xe_device *xe); int xe_display_init_noirq(struct xe_device *xe); @@ -34,10 +34,10 @@ void xe_display_irq_enable(struct xe_device *xe, u32 gu_misc_iir); void xe_display_irq_reset(struct xe_device *xe); void xe_display_irq_postinstall(struct xe_device *xe, struct xe_gt *gt); -void xe_display_pm_suspend(struct xe_device *xe); +void xe_display_pm_suspend(struct xe_device *xe, bool runtime); void xe_display_pm_suspend_late(struct xe_device *xe); void xe_display_pm_resume_early(struct xe_device *xe); -void xe_display_pm_resume(struct xe_device *xe); +void xe_display_pm_resume(struct xe_device *xe, bool runtime); #else @@ -47,7 +47,7 @@ static inline void xe_display_driver_remove(struct xe_device *xe) {} static inline int xe_display_create(struct xe_device *xe) { return 0; } -static inline void xe_display_probe(struct xe_device *xe) { } +static inline int xe_display_probe(struct xe_device *xe) { return 0; } static inline int xe_display_init_nommio(struct xe_device *xe) { return 0; } static inline int xe_display_init_noirq(struct xe_device *xe) { return 0; } @@ -63,10 +63,10 @@ static inline void xe_display_irq_enable(struct xe_device *xe, u32 gu_misc_iir) static inline void xe_display_irq_reset(struct xe_device *xe) {} static inline void xe_display_irq_postinstall(struct xe_device *xe, struct xe_gt *gt) {} -static inline void xe_display_pm_suspend(struct xe_device *xe) {} +static inline void xe_display_pm_suspend(struct xe_device *xe, bool runtime) {} static inline void xe_display_pm_suspend_late(struct xe_device *xe) {} static inline void xe_display_pm_resume_early(struct xe_device *xe) {} -static inline void xe_display_pm_resume(struct xe_device *xe) {} +static inline void xe_display_pm_resume(struct xe_device *xe, bool runtime) {} #endif /* CONFIG_DRM_XE_DISPLAY */ #endif /* _XE_DISPLAY_H_ */ diff --git a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c index d46f87a039f2..eb67ecf08db2 100644 --- a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c +++ b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c @@ -13,6 +13,7 @@ #include "xe_bo.h" #include "xe_device.h" #include "xe_device_types.h" +#include "xe_force_wake.h" #include "xe_gsc_proxy.h" #include "xe_gsc_submit.h" #include "xe_gt.h" diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h index a255946b6f77..a255946b6f77 100644 --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h diff --git a/drivers/gpu/drm/xe/regs/xe_bars.h b/drivers/gpu/drm/xe/regs/xe_bars.h new file mode 100644 index 000000000000..ce05b6ae832f --- /dev/null +++ b/drivers/gpu/drm/xe/regs/xe_bars.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ +#ifndef _XE_BARS_H_ +#define _XE_BARS_H_ + +#define GTTMMADR_BAR 0 /* MMIO + GTT */ +#define LMEM_BAR 2 /* VRAM */ + +#endif diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h index af71b87d8030..263ffc7bc2ef 100644 --- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h @@ -44,9 +44,10 @@ #define GSCCS_RING_BASE 0x11a000 #define RING_TAIL(base) XE_REG((base) + 0x30) +#define TAIL_ADDR REG_GENMASK(20, 3) #define RING_HEAD(base) XE_REG((base) + 0x34) -#define HEAD_ADDR 0x001FFFFC +#define HEAD_ADDR REG_GENMASK(20, 2) #define RING_START(base) XE_REG((base) + 0x38) @@ -54,6 +55,8 @@ #define RING_CTL_SIZE(size) ((size) - PAGE_SIZE) /* in bytes -> pages */ #define RING_CTL_SIZE(size) ((size) - PAGE_SIZE) /* in bytes -> pages */ +#define RING_START_UDW(base) XE_REG((base) + 0x48) + #define RING_PSMI_CTL(base) XE_REG((base) + 0x50, XE_REG_OPTION_MASKED) #define RC_SEMA_IDLE_MSG_DISABLE REG_BIT(12) #define WAIT_FOR_EVENT_POWER_DOWN_DISABLE REG_BIT(7) @@ -65,6 +68,7 @@ #define RING_ACTHD_UDW(base) XE_REG((base) + 0x5c) #define RING_DMA_FADD_UDW(base) XE_REG((base) + 0x60) #define RING_IPEHR(base) XE_REG((base) + 0x68) +#define RING_INSTDONE(base) XE_REG((base) + 0x6c) #define RING_ACTHD(base) XE_REG((base) + 0x74) #define RING_DMA_FADD(base) XE_REG((base) + 0x78) #define RING_HWS_PGA(base) XE_REG((base) + 0x80) @@ -108,6 +112,8 @@ #define FF_DOP_CLOCK_GATE_DISABLE REG_BIT(1) #define REPLAY_MODE_GRANULARITY REG_BIT(0) +#define INDIRECT_RING_STATE(base) XE_REG((base) + 0x108) + #define RING_BBADDR(base) XE_REG((base) + 0x140) #define RING_BBADDR_UDW(base) XE_REG((base) + 0x168) @@ -123,6 +129,7 @@ #define RING_EXECLIST_STATUS_HI(base) XE_REG((base) + 0x234 + 4) #define RING_CONTEXT_CONTROL(base) XE_REG((base) + 0x244, XE_REG_OPTION_MASKED) +#define CTX_CTRL_INDIRECT_RING_STATE_ENABLE REG_BIT(4) #define CTX_CTRL_INHIBIT_SYN_CTX_SWITCH REG_BIT(3) #define CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT REG_BIT(0) @@ -135,7 +142,6 @@ #define RING_VALID_MASK 0x00000001 #define RING_VALID 0x00000001 #define STOP_RING REG_BIT(8) -#define TAIL_ADDR 0x001FFFF8 #define RING_CTX_TIMESTAMP(base) XE_REG((base) + 0x3a8) #define CSBE_DEBUG_STATUS(base) XE_REG((base) + 0x3fc) diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h index 94445810ccc9..d09b2473259f 100644 --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h @@ -59,6 +59,27 @@ #define XELP_GLOBAL_MOCS(i) XE_REG(0x4000 + (i) * 4) #define XEHP_GLOBAL_MOCS(i) XE_REG_MCR(0x4000 + (i) * 4) +#define LE_SSE_MASK REG_GENMASK(18, 17) +#define LE_SSE(value) REG_FIELD_PREP(LE_SSE_MASK, value) +#define LE_COS_MASK REG_GENMASK(16, 15) +#define LE_COS(value) REG_FIELD_PREP(LE_COS_MASK) +#define LE_SCF_MASK REG_BIT(14) +#define LE_SCF(value) REG_FIELD_PREP(LE_SCF_MASK, value) +#define LE_PFM_MASK REG_GENMASK(13, 11) +#define LE_PFM(value) REG_FIELD_PREP(LE_PFM_MASK, value) +#define LE_SCC_MASK REG_GENMASK(10, 8) +#define LE_SCC(value) REG_FIELD_PREP(LE_SCC_MASK, value) +#define LE_RSC_MASK REG_BIT(7) +#define LE_RSC(value) REG_FIELD_PREP(LE_RSC_MASK, value) +#define LE_AOM_MASK REG_BIT(6) +#define LE_AOM(value) REG_FIELD_PREP(LE_AOM_MASK, value) +#define LE_LRUM_MASK REG_GENMASK(5, 4) +#define LE_LRUM(value) REG_FIELD_PREP(LE_LRUM_MASK, value) +#define LE_TGT_CACHE_MASK REG_GENMASK(3, 2) +#define LE_TGT_CACHE(value) REG_FIELD_PREP(LE_TGT_CACHE_MASK, value) +#define LE_CACHEABILITY_MASK REG_GENMASK(1, 0) +#define LE_CACHEABILITY(value) REG_FIELD_PREP(LE_CACHEABILITY_MASK, value) + #define CCS_AUX_INV XE_REG(0x4208) #define VD0_AUX_INV XE_REG(0x4218) @@ -98,6 +119,8 @@ #define FF_MODE2_TDS_TIMER_MASK REG_GENMASK(23, 16) #define FF_MODE2_TDS_TIMER_128 REG_FIELD_PREP(FF_MODE2_TDS_TIMER_MASK, 4) +#define XEHPG_INSTDONE_GEOM_SVGUNIT XE_REG_MCR(0x666c) + #define CACHE_MODE_1 XE_REG(0x7004, XE_REG_OPTION_MASKED) #define MSAA_OPTIMIZATION_REDUC_DISABLE REG_BIT(11) @@ -115,6 +138,14 @@ #define FLSH_IGNORES_PSD REG_BIT(10) #define FD_END_COLLECT REG_BIT(5) +#define SC_INSTDONE XE_REG(0x7100) +#define SC_INSTDONE_EXTRA XE_REG(0x7104) +#define SC_INSTDONE_EXTRA2 XE_REG(0x7108) + +#define XEHPG_SC_INSTDONE XE_REG_MCR(0x7100) +#define XEHPG_SC_INSTDONE_EXTRA XE_REG_MCR(0x7104) +#define XEHPG_SC_INSTDONE_EXTRA2 XE_REG_MCR(0x7108) + #define COMMON_SLICE_CHICKEN4 XE_REG(0x7300, XE_REG_OPTION_MASKED) #define DISABLE_TDC_LOAD_BALANCING_CALC REG_BIT(6) @@ -173,8 +204,11 @@ #define MAX_MSLICES 4 #define MEML3_EN_MASK REG_GENMASK(3, 0) +#define MIRROR_FUSE1 XE_REG(0x911c) + #define XELP_EU_ENABLE XE_REG(0x9134) /* "_DISABLE" on Xe_LP */ #define XELP_EU_MASK REG_GENMASK(7, 0) +#define XELP_GT_SLICE_ENABLE XE_REG(0x9138) #define XELP_GT_GEOMETRY_DSS_ENABLE XE_REG(0x913c) #define GT_VEBOX_VDBOX_DISABLE XE_REG(0x9140) @@ -275,6 +309,8 @@ #define RC_CTL_RC6_ENABLE REG_BIT(18) #define RC_STATE XE_REG(0xa094) #define RC_IDLE_HYSTERSIS XE_REG(0xa0ac) +#define MEDIA_POWERGATE_IDLE_HYSTERESIS XE_REG(0xa0c4) +#define RENDER_POWERGATE_IDLE_HYSTERESIS XE_REG(0xa0c8) #define PMINTRMSK XE_REG(0xa168) #define PMINTR_DISABLE_REDIRECT_TO_GUC REG_BIT(31) @@ -282,11 +318,11 @@ #define FORCEWAKE_GT XE_REG(0xa188) -#define PG_ENABLE XE_REG(0xa210) -#define VD2_MFXVDENC_POWERGATE_ENABLE REG_BIT(8) -#define VD2_HCP_POWERGATE_ENABLE REG_BIT(7) -#define VD0_MFXVDENC_POWERGATE_ENABLE REG_BIT(4) -#define VD0_HCP_POWERGATE_ENABLE REG_BIT(3) +#define POWERGATE_ENABLE XE_REG(0xa210) +#define RENDER_POWERGATE_ENABLE REG_BIT(0) +#define MEDIA_POWERGATE_ENABLE REG_BIT(1) +#define VDN_HCP_POWERGATE_ENABLE(n) REG_BIT(3 + 2 * (n)) +#define VDN_MFXVDENC_POWERGATE_ENABLE(n) REG_BIT(4 + 2 * (n)) #define CTC_MODE XE_REG(0xa26c) #define CTC_SHIFT_PARAMETER_MASK REG_GENMASK(2, 1) @@ -301,9 +337,24 @@ #define XEHPC_OVRLSCCC REG_BIT(0) /* L3 Cache Control */ +#define LNCFCMOCS_REG_COUNT 32 #define XELP_LNCFCMOCS(i) XE_REG(0xb020 + (i) * 4) #define XEHP_LNCFCMOCS(i) XE_REG_MCR(0xb020 + (i) * 4) -#define LNCFCMOCS_REG_COUNT 32 +#define L3_UPPER_LKUP_MASK REG_BIT(23) +#define L3_UPPER_GLBGO_MASK REG_BIT(22) +#define L3_UPPER_IDX_CACHEABILITY_MASK REG_GENMASK(21, 20) +#define L3_UPPER_IDX_SCC_MASK REG_GENMASK(19, 17) +#define L3_UPPER_IDX_ESC_MASK REG_BIT(16) +#define L3_LKUP_MASK REG_BIT(7) +#define L3_LKUP(value) REG_FIELD_PREP(L3_LKUP_MASK, value) +#define L3_GLBGO_MASK REG_BIT(6) +#define L3_GLBGO(value) REG_FIELD_PREP(L3_GLBGO_MASK, value) +#define L3_CACHEABILITY_MASK REG_GENMASK(5, 4) +#define L3_CACHEABILITY(value) REG_FIELD_PREP(L3_CACHEABILITY_MASK, value) +#define L3_SCC_MASK REG_GENMASK(3, 1) +#define L3_SCC(value) REG_FIELD_PREP(L3_SCC_MASK, value) +#define L3_ESC_MASK REG_BIT(0) +#define L3_ESC(value) REG_FIELD_PREP(L3_ESC_MASK, value) #define XEHP_L3NODEARBCFG XE_REG_MCR(0xb0b4) #define XEHP_LNESPARE REG_BIT(19) @@ -342,6 +393,9 @@ #define HALF_SLICE_CHICKEN5 XE_REG_MCR(0xe188, XE_REG_OPTION_MASKED) #define DISABLE_SAMPLE_G_PERFORMANCE REG_BIT(0) +#define SAMPLER_INSTDONE XE_REG_MCR(0xe160) +#define ROW_INSTDONE XE_REG_MCR(0xe164) + #define SAMPLER_MODE XE_REG_MCR(0xe18c, XE_REG_OPTION_MASKED) #define ENABLE_SMALLPL REG_BIT(15) #define SC_DISABLE_POWER_OPTIMIZATION_EBB REG_BIT(9) @@ -350,6 +404,7 @@ #define HALF_SLICE_CHICKEN7 XE_REG_MCR(0xe194, XE_REG_OPTION_MASKED) #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15) +#define CLEAR_OPTIMIZATION_DISABLE REG_BIT(6) #define CACHE_MODE_SS XE_REG_MCR(0xe420, XE_REG_OPTION_MASKED) #define DISABLE_ECC REG_BIT(5) diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h index 11682e675e0f..a5fd14307f94 100644 --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h @@ -40,6 +40,8 @@ #define GS_BOOTROM_JUMP_PASSED REG_FIELD_PREP(GS_BOOTROM_MASK, 0x76) #define GS_MIA_IN_RESET REG_BIT(0) +#define GUC_HEADER_INFO XE_REG(0xc014) + #define GUC_WOPCM_SIZE XE_REG(0xc050) #define GUC_WOPCM_SIZE_MASK REG_GENMASK(31, 12) #define GUC_WOPCM_SIZE_LOCKED REG_BIT(0) diff --git a/drivers/gpu/drm/xe/regs/xe_lrc_layout.h b/drivers/gpu/drm/xe/regs/xe_lrc_layout.h index 1825d8f79db6..045dfd09db99 100644 --- a/drivers/gpu/drm/xe/regs/xe_lrc_layout.h +++ b/drivers/gpu/drm/xe/regs/xe_lrc_layout.h @@ -11,6 +11,8 @@ #define CTX_RING_TAIL (0x06 + 1) #define CTX_RING_START (0x08 + 1) #define CTX_RING_CTL (0x0a + 1) +#define CTX_TIMESTAMP (0x22 + 1) +#define CTX_INDIRECT_RING_STATE (0x26 + 1) #define CTX_PDP0_UDW (0x30 + 1) #define CTX_PDP0_LDW (0x32 + 1) @@ -23,4 +25,10 @@ #define CTX_INT_SRC_REPORT_REG (CTX_LRI_INT_REPORT_PTR + 3) #define CTX_INT_SRC_REPORT_PTR (CTX_LRI_INT_REPORT_PTR + 4) +#define INDIRECT_CTX_RING_HEAD (0x02 + 1) +#define INDIRECT_CTX_RING_TAIL (0x04 + 1) +#define INDIRECT_CTX_RING_START (0x06 + 1) +#define INDIRECT_CTX_RING_START_UDW (0x08 + 1) +#define INDIRECT_CTX_RING_CTL (0x0a + 1) + #endif diff --git a/drivers/gpu/drm/xe/regs/xe_pcode_regs.h b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h index 3dae858508c8..0b0b49d850ae 100644 --- a/drivers/gpu/drm/xe/regs/xe_pcode_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h @@ -18,4 +18,11 @@ #define PVC_GT0_PLATFORM_ENERGY_STATUS XE_REG(0x28106c) #define PVC_GT0_PACKAGE_POWER_SKU XE_REG(0x281080) +#define BMG_PACKAGE_POWER_SKU XE_REG(0x138098) +#define BMG_PACKAGE_POWER_SKU_UNIT XE_REG(0x1380dc) +#define BMG_PACKAGE_ENERGY_STATUS XE_REG(0x138120) +#define BMG_PACKAGE_RAPL_LIMIT XE_REG(0x138440) +#define BMG_PLATFORM_ENERGY_STATUS XE_REG(0x138458) +#define BMG_PLATFORM_POWER_LIMIT XE_REG(0x138460) + #endif /* _XE_PCODE_REGS_H_ */ diff --git a/drivers/gpu/drm/xe/regs/xe_regs.h b/drivers/gpu/drm/xe/regs/xe_regs.h index 722fb6dbb72e..23e33ec84902 100644 --- a/drivers/gpu/drm/xe/regs/xe_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_regs.h @@ -30,6 +30,9 @@ #define XEHP_CLOCK_GATE_DIS XE_REG(0x101014) #define SGSI_SIDECLK_DIS REG_BIT(17) +#define XEHP_MTCFG_ADDR XE_REG(0x101800) +#define TILE_COUNT REG_GENMASK(15, 8) + #define GGC XE_REG(0x108040) #define GMS_MASK REG_GENMASK(15, 8) #define GGMS_MASK REG_GENMASK(7, 6) diff --git a/drivers/gpu/drm/xe/regs/xe_sriov_regs.h b/drivers/gpu/drm/xe/regs/xe_sriov_regs.h index 617ddb84b7fa..017b4ddd1ecf 100644 --- a/drivers/gpu/drm/xe/regs/xe_sriov_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_sriov_regs.h @@ -14,6 +14,9 @@ #define LMEM_EN REG_BIT(31) #define LMTT_DIR_PTR REG_GENMASK(30, 0) /* in multiples of 64KB */ +#define VIRTUAL_CTRL_REG XE_REG(0x10108c) +#define GUEST_GTT_UPDATE_EN REG_BIT(8) + #define VF_CAP_REG XE_REG(0x1901f8, XE_REG_OPTION_VF) #define VF_CAP REG_BIT(0) diff --git a/drivers/gpu/drm/xe/tests/Makefile b/drivers/gpu/drm/xe/tests/Makefile index 8cf2367449d8..6e58931fddd4 100644 --- a/drivers/gpu/drm/xe/tests/Makefile +++ b/drivers/gpu/drm/xe/tests/Makefile @@ -11,6 +11,7 @@ xe_live_test-y = xe_live_test_mod.o \ # Normal kunit tests obj-$(CONFIG_DRM_XE_KUNIT_TEST) += xe_test.o xe_test-y = xe_test_mod.o \ + xe_args_test.o \ xe_pci_test.o \ xe_rtp_test.o \ xe_wa_test.o diff --git a/drivers/gpu/drm/xe/tests/xe_args_test.c b/drivers/gpu/drm/xe/tests/xe_args_test.c new file mode 100644 index 000000000000..f3fb23aa5d2e --- /dev/null +++ b/drivers/gpu/drm/xe/tests/xe_args_test.c @@ -0,0 +1,221 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright © 2024 Intel Corporation + */ + +#include <kunit/test.h> + +#include "xe_args.h" + +static void call_args_example(struct kunit *test) +{ +#define foo X, Y, Z, Q +#define bar COUNT_ARGS(foo) +#define buz CALL_ARGS(COUNT_ARGS, foo) + + KUNIT_EXPECT_EQ(test, bar, 1); + KUNIT_EXPECT_EQ(test, buz, 4); + +#undef foo +#undef bar +#undef buz +} + +static void drop_first_arg_example(struct kunit *test) +{ +#define foo X, Y, Z, Q +#define bar CALL_ARGS(COUNT_ARGS, DROP_FIRST_ARG(foo)) + + KUNIT_EXPECT_EQ(test, bar, 3); + +#undef foo +#undef bar +} + +static void first_arg_example(struct kunit *test) +{ + int X = 1; + +#define foo X, Y, Z, Q +#define bar FIRST_ARG(foo) + + KUNIT_EXPECT_EQ(test, bar, X); + KUNIT_EXPECT_STREQ(test, __stringify(bar), "X"); + +#undef foo +#undef bar +} + +static void last_arg_example(struct kunit *test) +{ + int Q = 1; + +#define foo X, Y, Z, Q +#define bar LAST_ARG(foo) + + KUNIT_EXPECT_EQ(test, bar, Q); + KUNIT_EXPECT_STREQ(test, __stringify(bar), "Q"); + +#undef foo +#undef bar +} + +static void pick_arg_example(struct kunit *test) +{ + int Y = 1, Z = 2; + +#define foo X, Y, Z, Q +#define bar PICK_ARG(2, foo) +#define buz PICK_ARG3(foo) + + KUNIT_EXPECT_EQ(test, bar, Y); + KUNIT_EXPECT_STREQ(test, __stringify(bar), "Y"); + KUNIT_EXPECT_EQ(test, buz, Z); + KUNIT_EXPECT_STREQ(test, __stringify(buz), "Z"); + +#undef foo +#undef bar +#undef buz +} + +static void sep_comma_example(struct kunit *test) +{ +#define foo(f) f(X) f(Y) f(Z) f(Q) +#define bar DROP_FIRST_ARG(foo(ARGS_SEP_COMMA __stringify)) +#define buz CALL_ARGS(COUNT_ARGS, DROP_FIRST_ARG(foo(ARGS_SEP_COMMA))) + + static const char * const a[] = { bar }; + + KUNIT_EXPECT_STREQ(test, a[0], "X"); + KUNIT_EXPECT_STREQ(test, a[1], "Y"); + KUNIT_EXPECT_STREQ(test, a[2], "Z"); + KUNIT_EXPECT_STREQ(test, a[3], "Q"); + + KUNIT_EXPECT_EQ(test, buz, 4); + +#undef foo +#undef bar +#undef buz +} + +#define NO_ARGS +#define FOO_ARGS X, Y, Z, Q +#define MAX_ARGS -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12 + +static void count_args_test(struct kunit *test) +{ + int count; + + /* COUNT_ARGS() counts to 12 */ + + count = COUNT_ARGS(); + KUNIT_EXPECT_EQ(test, count, 0); + + count = COUNT_ARGS(1); + KUNIT_EXPECT_EQ(test, count, 1); + + count = COUNT_ARGS(a, b, c, d, e); + KUNIT_EXPECT_EQ(test, count, 5); + + count = COUNT_ARGS(a, b, c, d, e, f, g, h, i, j, k, l); + KUNIT_EXPECT_EQ(test, count, 12); + + /* COUNT_ARGS() does not expand params */ + + count = COUNT_ARGS(NO_ARGS); + KUNIT_EXPECT_EQ(test, count, 1); + + count = COUNT_ARGS(FOO_ARGS); + KUNIT_EXPECT_EQ(test, count, 1); +} + +static void call_args_test(struct kunit *test) +{ + int count; + + count = CALL_ARGS(COUNT_ARGS, NO_ARGS); + KUNIT_EXPECT_EQ(test, count, 0); + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, NO_ARGS), 0); + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, FOO_ARGS), 4); + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, FOO_ARGS, FOO_ARGS), 8); + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, MAX_ARGS), 12); +} + +static void drop_first_arg_test(struct kunit *test) +{ + int Y = -2, Z = -3, Q = -4; + int a[] = { DROP_FIRST_ARG(FOO_ARGS) }; + + KUNIT_EXPECT_EQ(test, DROP_FIRST_ARG(0, -1), -1); + KUNIT_EXPECT_EQ(test, DROP_FIRST_ARG(DROP_FIRST_ARG(0, -1, -2)), -2); + + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, DROP_FIRST_ARG(FOO_ARGS)), 3); + KUNIT_EXPECT_EQ(test, DROP_FIRST_ARG(DROP_FIRST_ARG(DROP_FIRST_ARG(FOO_ARGS))), -4); + KUNIT_EXPECT_EQ(test, a[0], -2); + KUNIT_EXPECT_EQ(test, a[1], -3); + KUNIT_EXPECT_EQ(test, a[2], -4); + +#define foo DROP_FIRST_ARG(FOO_ARGS) +#define bar DROP_FIRST_ARG(DROP_FIRST_ARG(FOO_ARGS)) +#define buz DROP_FIRST_ARG(DROP_FIRST_ARG(DROP_FIRST_ARG(FOO_ARGS))) + + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, foo), 3); + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, bar), 2); + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, buz), 1); + KUNIT_EXPECT_STREQ(test, __stringify(buz), "Q"); + +#undef foo +#undef bar +#undef buz +} + +static void first_arg_test(struct kunit *test) +{ + int X = -1; + int a[] = { FIRST_ARG(FOO_ARGS) }; + + KUNIT_EXPECT_EQ(test, FIRST_ARG(-1, -2), -1); + + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, FIRST_ARG(FOO_ARGS)), 1); + KUNIT_EXPECT_EQ(test, FIRST_ARG(FOO_ARGS), -1); + KUNIT_EXPECT_EQ(test, a[0], -1); + KUNIT_EXPECT_STREQ(test, __stringify(FIRST_ARG(FOO_ARGS)), "X"); +} + +static void last_arg_test(struct kunit *test) +{ + int Q = -4; + int a[] = { LAST_ARG(FOO_ARGS) }; + + KUNIT_EXPECT_EQ(test, LAST_ARG(-1, -2), -2); + + KUNIT_EXPECT_EQ(test, CALL_ARGS(COUNT_ARGS, LAST_ARG(FOO_ARGS)), 1); + KUNIT_EXPECT_EQ(test, LAST_ARG(FOO_ARGS), -4); + KUNIT_EXPECT_EQ(test, a[0], -4); + KUNIT_EXPECT_STREQ(test, __stringify(LAST_ARG(FOO_ARGS)), "Q"); + + KUNIT_EXPECT_EQ(test, LAST_ARG(MAX_ARGS), -12); + KUNIT_EXPECT_STREQ(test, __stringify(LAST_ARG(MAX_ARGS)), "-12"); +} + +static struct kunit_case args_tests[] = { + KUNIT_CASE(count_args_test), + KUNIT_CASE(call_args_example), + KUNIT_CASE(call_args_test), + KUNIT_CASE(drop_first_arg_example), + KUNIT_CASE(drop_first_arg_test), + KUNIT_CASE(first_arg_example), + KUNIT_CASE(first_arg_test), + KUNIT_CASE(last_arg_example), + KUNIT_CASE(last_arg_test), + KUNIT_CASE(pick_arg_example), + KUNIT_CASE(sep_comma_example), + {} +}; + +static struct kunit_suite args_test_suite = { + .name = "args", + .test_cases = args_tests, +}; + +kunit_test_suite(args_test_suite); diff --git a/drivers/gpu/drm/xe/tests/xe_gt_sriov_pf_service_test.c b/drivers/gpu/drm/xe/tests/xe_gt_sriov_pf_service_test.c new file mode 100644 index 000000000000..b683585db852 --- /dev/null +++ b/drivers/gpu/drm/xe/tests/xe_gt_sriov_pf_service_test.c @@ -0,0 +1,232 @@ +// SPDX-License-Identifier: GPL-2.0 AND MIT +/* + * Copyright © 2024 Intel Corporation + */ + +#include <kunit/test.h> + +#include "xe_device.h" +#include "xe_kunit_helpers.h" +#include "xe_pci_test.h" + +static int pf_service_test_init(struct kunit *test) +{ + struct xe_pci_fake_data fake = { + .sriov_mode = XE_SRIOV_MODE_PF, + .platform = XE_TIGERLAKE, /* some random platform */ + .subplatform = XE_SUBPLATFORM_NONE, + }; + struct xe_device *xe; + struct xe_gt *gt; + + test->priv = &fake; + xe_kunit_helper_xe_device_test_init(test); + + xe = test->priv; + KUNIT_ASSERT_EQ(test, xe_sriov_init(xe), 0); + + gt = xe_device_get_gt(xe, 0); + pf_init_versions(gt); + + /* + * sanity check: + * - all supported platforms VF/PF ABI versions must be defined + * - base version can't be newer than latest + */ + KUNIT_ASSERT_NE(test, 0, gt->sriov.pf.service.version.base.major); + KUNIT_ASSERT_NE(test, 0, gt->sriov.pf.service.version.latest.major); + KUNIT_ASSERT_LE(test, gt->sriov.pf.service.version.base.major, + gt->sriov.pf.service.version.latest.major); + if (gt->sriov.pf.service.version.base.major == gt->sriov.pf.service.version.latest.major) + KUNIT_ASSERT_LE(test, gt->sriov.pf.service.version.base.minor, + gt->sriov.pf.service.version.latest.minor); + + test->priv = gt; + return 0; +} + +static void pf_negotiate_any(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + KUNIT_ASSERT_EQ(test, 0, + pf_negotiate_version(gt, VF2PF_HANDSHAKE_MAJOR_ANY, + VF2PF_HANDSHAKE_MINOR_ANY, + &major, &minor)); + KUNIT_ASSERT_EQ(test, major, gt->sriov.pf.service.version.latest.major); + KUNIT_ASSERT_EQ(test, minor, gt->sriov.pf.service.version.latest.minor); +} + +static void pf_negotiate_base_match(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + KUNIT_ASSERT_EQ(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.base.major, + gt->sriov.pf.service.version.base.minor, + &major, &minor)); + KUNIT_ASSERT_EQ(test, major, gt->sriov.pf.service.version.base.major); + KUNIT_ASSERT_EQ(test, minor, gt->sriov.pf.service.version.base.minor); +} + +static void pf_negotiate_base_newer(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + KUNIT_ASSERT_EQ(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.base.major, + gt->sriov.pf.service.version.base.minor + 1, + &major, &minor)); + KUNIT_ASSERT_EQ(test, major, gt->sriov.pf.service.version.base.major); + KUNIT_ASSERT_GE(test, minor, gt->sriov.pf.service.version.base.minor); + if (gt->sriov.pf.service.version.base.major == gt->sriov.pf.service.version.latest.major) + KUNIT_ASSERT_LE(test, minor, gt->sriov.pf.service.version.latest.minor); + else + KUNIT_FAIL(test, "FIXME: don't know how to test multi-version yet!\n"); +} + +static void pf_negotiate_base_next(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + KUNIT_ASSERT_EQ(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.base.major + 1, 0, + &major, &minor)); + KUNIT_ASSERT_GE(test, major, gt->sriov.pf.service.version.base.major); + KUNIT_ASSERT_LE(test, major, gt->sriov.pf.service.version.latest.major); + if (major == gt->sriov.pf.service.version.latest.major) + KUNIT_ASSERT_LE(test, minor, gt->sriov.pf.service.version.latest.minor); + else + KUNIT_FAIL(test, "FIXME: don't know how to test multi-version yet!\n"); +} + +static void pf_negotiate_base_older(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + if (!gt->sriov.pf.service.version.base.minor) + kunit_skip(test, "no older minor\n"); + + KUNIT_ASSERT_NE(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.base.major, + gt->sriov.pf.service.version.base.minor - 1, + &major, &minor)); +} + +static void pf_negotiate_base_prev(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + KUNIT_ASSERT_NE(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.base.major - 1, 1, + &major, &minor)); +} + +static void pf_negotiate_latest_match(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + KUNIT_ASSERT_EQ(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.latest.major, + gt->sriov.pf.service.version.latest.minor, + &major, &minor)); + KUNIT_ASSERT_EQ(test, major, gt->sriov.pf.service.version.latest.major); + KUNIT_ASSERT_EQ(test, minor, gt->sriov.pf.service.version.latest.minor); +} + +static void pf_negotiate_latest_newer(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + KUNIT_ASSERT_EQ(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.latest.major, + gt->sriov.pf.service.version.latest.minor + 1, + &major, &minor)); + KUNIT_ASSERT_EQ(test, major, gt->sriov.pf.service.version.latest.major); + KUNIT_ASSERT_EQ(test, minor, gt->sriov.pf.service.version.latest.minor); +} + +static void pf_negotiate_latest_next(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + KUNIT_ASSERT_EQ(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.latest.major + 1, 0, + &major, &minor)); + KUNIT_ASSERT_EQ(test, major, gt->sriov.pf.service.version.latest.major); + KUNIT_ASSERT_EQ(test, minor, gt->sriov.pf.service.version.latest.minor); +} + +static void pf_negotiate_latest_older(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + if (!gt->sriov.pf.service.version.latest.minor) + kunit_skip(test, "no older minor\n"); + + KUNIT_ASSERT_EQ(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.latest.major, + gt->sriov.pf.service.version.latest.minor - 1, + &major, &minor)); + KUNIT_ASSERT_EQ(test, major, gt->sriov.pf.service.version.latest.major); + KUNIT_ASSERT_EQ(test, minor, gt->sriov.pf.service.version.latest.minor - 1); +} + +static void pf_negotiate_latest_prev(struct kunit *test) +{ + struct xe_gt *gt = test->priv; + u32 major, minor; + + if (gt->sriov.pf.service.version.base.major == gt->sriov.pf.service.version.latest.major) + kunit_skip(test, "no prev major"); + + KUNIT_ASSERT_EQ(test, 0, + pf_negotiate_version(gt, + gt->sriov.pf.service.version.latest.major - 1, + gt->sriov.pf.service.version.base.minor + 1, + &major, &minor)); + KUNIT_ASSERT_EQ(test, major, gt->sriov.pf.service.version.latest.major - 1); + KUNIT_ASSERT_GE(test, major, gt->sriov.pf.service.version.base.major); +} + +static struct kunit_case pf_service_test_cases[] = { + KUNIT_CASE(pf_negotiate_any), + KUNIT_CASE(pf_negotiate_base_match), + KUNIT_CASE(pf_negotiate_base_newer), + KUNIT_CASE(pf_negotiate_base_next), + KUNIT_CASE(pf_negotiate_base_older), + KUNIT_CASE(pf_negotiate_base_prev), + KUNIT_CASE(pf_negotiate_latest_match), + KUNIT_CASE(pf_negotiate_latest_newer), + KUNIT_CASE(pf_negotiate_latest_next), + KUNIT_CASE(pf_negotiate_latest_older), + KUNIT_CASE(pf_negotiate_latest_prev), + {} +}; + +static struct kunit_suite pf_service_suite = { + .name = "pf_service", + .test_cases = pf_service_test_cases, + .init = pf_service_test_init, +}; + +kunit_test_suite(pf_service_suite); diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c index 977d5f4e4490..962f6438e219 100644 --- a/drivers/gpu/drm/xe/tests/xe_migrate.c +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c @@ -62,36 +62,6 @@ static int run_sanity_job(struct xe_migrate *m, struct xe_device *xe, return 0; } -static void -sanity_populate_cb(struct xe_migrate_pt_update *pt_update, - struct xe_tile *tile, struct iosys_map *map, void *dst, - u32 qword_ofs, u32 num_qwords, - const struct xe_vm_pgtable_update *update) -{ - struct migrate_test_params *p = - to_migrate_test_params(xe_cur_kunit_priv(XE_TEST_LIVE_MIGRATE)); - int i; - u64 *ptr = dst; - u64 value; - - for (i = 0; i < num_qwords; i++) { - value = (qword_ofs + i - update->ofs) * 0x1111111111111111ULL; - if (map) - xe_map_wr(tile_to_xe(tile), map, (qword_ofs + i) * - sizeof(u64), u64, value); - else - ptr[i] = value; - } - - kunit_info(xe_cur_kunit(), "Used %s.\n", map ? "CPU" : "GPU"); - if (p->force_gpu && map) - KUNIT_FAIL(xe_cur_kunit(), "GPU pagetable update used CPU.\n"); -} - -static const struct xe_migrate_pt_update_ops sanity_ops = { - .populate = sanity_populate_cb, -}; - #define check(_retval, _expected, str, _test) \ do { if ((_retval) != (_expected)) { \ KUNIT_FAIL(_test, "Sanity check failed: " str \ @@ -209,57 +179,6 @@ static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo, test_copy(m, bo, test, region); } -static void test_pt_update(struct xe_migrate *m, struct xe_bo *pt, - struct kunit *test, bool force_gpu) -{ - struct xe_device *xe = tile_to_xe(m->tile); - struct dma_fence *fence; - u64 retval, expected; - ktime_t then, now; - int i; - - struct xe_vm_pgtable_update update = { - .ofs = 1, - .qwords = 0x10, - .pt_bo = pt, - }; - struct xe_migrate_pt_update pt_update = { - .ops = &sanity_ops, - }; - struct migrate_test_params p = { - .base.id = XE_TEST_LIVE_MIGRATE, - .force_gpu = force_gpu, - }; - - test->priv = &p; - /* Test xe_migrate_update_pgtables() updates the pagetable as expected */ - expected = 0xf0f0f0f0f0f0f0f0ULL; - xe_map_memset(xe, &pt->vmap, 0, (u8)expected, pt->size); - - then = ktime_get(); - fence = xe_migrate_update_pgtables(m, m->q->vm, NULL, m->q, &update, 1, - NULL, 0, &pt_update); - now = ktime_get(); - if (sanity_fence_failed(xe, fence, "Migration pagetable update", test)) - return; - - kunit_info(test, "Updating without syncing took %llu us,\n", - (unsigned long long)ktime_to_us(ktime_sub(now, then))); - - dma_fence_put(fence); - retval = xe_map_rd(xe, &pt->vmap, 0, u64); - check(retval, expected, "PTE[0] must stay untouched", test); - - for (i = 0; i < update.qwords; i++) { - retval = xe_map_rd(xe, &pt->vmap, (update.ofs + i) * 8, u64); - check(retval, i * 0x1111111111111111ULL, "PTE update", test); - } - - retval = xe_map_rd(xe, &pt->vmap, 8 * (update.ofs + update.qwords), - u64); - check(retval, expected, "PTE[0x11] must stay untouched", test); -} - static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test) { struct xe_tile *tile = m->tile; @@ -398,11 +317,6 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test) test_copy_vram(m, big, test); } - kunit_info(test, "Testing page table update using CPU if GPU idle.\n"); - test_pt_update(m, pt, test, false); - kunit_info(test, "Testing page table update using GPU\n"); - test_pt_update(m, pt, test, true); - out: xe_bb_free(bb, NULL); free_tiny: @@ -430,7 +344,7 @@ static int migrate_test_run_device(struct xe_device *xe) struct xe_migrate *m = tile->migrate; kunit_info(test, "Testing tile id %d.\n", id); - xe_vm_lock(m->q->vm, true); + xe_vm_lock(m->q->vm, false); xe_migrate_sanity_test(m, test); xe_vm_unlock(m->q->vm); } diff --git a/drivers/gpu/drm/xe/xe_args.h b/drivers/gpu/drm/xe/xe_args.h new file mode 100644 index 000000000000..4dbc7e53c624 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_args.h @@ -0,0 +1,143 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef _XE_ARGS_H_ +#define _XE_ARGS_H_ + +#include <linux/args.h> + +/* + * Why don't the following macros have the XE prefix? + * + * Once we find more potential users outside of the Xe driver, we plan to move + * all of the following macros unchanged to linux/args.h. + */ + +/** + * CALL_ARGS - Invoke a macro, but allow parameters to be expanded beforehand. + * @f: name of the macro to invoke + * @args: arguments for the macro + * + * This macro allows calling macros which names might generated or we want to + * make sure it's arguments will be correctly expanded. + * + * Example: + * + * #define foo X,Y,Z,Q + * #define bar COUNT_ARGS(foo) + * #define buz CALL_ARGS(COUNT_ARGS, foo) + * + * With above definitions bar expands to 1 while buz expands to 4. + */ +#define CALL_ARGS(f, args...) __CALL_ARGS(f, args) +#define __CALL_ARGS(f, args...) f(args) + +/** + * DROP_FIRST_ARG - Returns all arguments except the first one. + * @args: arguments + * + * This helper macro allows manipulation the argument list before passing it + * to the next level macro. + * + * Example: + * + * #define foo X,Y,Z,Q + * #define bar CALL_ARGS(COUNT_ARGS, DROP_FIRST_ARG(foo)) + * + * With above definitions bar expands to 3. + */ +#define DROP_FIRST_ARG(args...) __DROP_FIRST_ARG(args) +#define __DROP_FIRST_ARG(a, b...) b + +/** + * FIRST_ARG - Returns the first argument. + * @args: arguments + * + * This helper macro allows manipulation the argument list before passing it + * to the next level macro. + * + * Example: + * + * #define foo X,Y,Z,Q + * #define bar FIRST_ARG(foo) + * + * With above definitions bar expands to X. + */ +#define FIRST_ARG(args...) __FIRST_ARG(args) +#define __FIRST_ARG(a, b...) a + +/** + * LAST_ARG - Returns the last argument. + * @args: arguments + * + * This helper macro allows manipulation the argument list before passing it + * to the next level macro. + * + * Like COUNT_ARGS() this macro works up to 12 arguments. + * + * Example: + * + * #define foo X,Y,Z,Q + * #define bar LAST_ARG(foo) + * + * With above definitions bar expands to Q. + */ +#define LAST_ARG(args...) __LAST_ARG(args) +#define __LAST_ARG(args...) PICK_ARG(COUNT_ARGS(args), args) + +/** + * PICK_ARG - Returns the n-th argument. + * @n: argument number to be returned + * @args: arguments + * + * This helper macro allows manipulation the argument list before passing it + * to the next level macro. + * + * Like COUNT_ARGS() this macro supports n up to 12. + * Specialized macros PICK_ARG1() to PICK_ARG12() are also available. + * + * Example: + * + * #define foo X,Y,Z,Q + * #define bar PICK_ARG(2, foo) + * #define buz PICK_ARG3(foo) + * + * With above definitions bar expands to Y and buz expands to Z. + */ +#define PICK_ARG(n, args...) __PICK_ARG(n, args) +#define __PICK_ARG(n, args...) CALL_ARGS(CONCATENATE(PICK_ARG, n), args) +#define PICK_ARG1(args...) FIRST_ARG(args) +#define PICK_ARG2(args...) PICK_ARG1(DROP_FIRST_ARG(args)) +#define PICK_ARG3(args...) PICK_ARG2(DROP_FIRST_ARG(args)) +#define PICK_ARG4(args...) PICK_ARG3(DROP_FIRST_ARG(args)) +#define PICK_ARG5(args...) PICK_ARG4(DROP_FIRST_ARG(args)) +#define PICK_ARG6(args...) PICK_ARG5(DROP_FIRST_ARG(args)) +#define PICK_ARG7(args...) PICK_ARG6(DROP_FIRST_ARG(args)) +#define PICK_ARG8(args...) PICK_ARG7(DROP_FIRST_ARG(args)) +#define PICK_ARG9(args...) PICK_ARG8(DROP_FIRST_ARG(args)) +#define PICK_ARG10(args...) PICK_ARG9(DROP_FIRST_ARG(args)) +#define PICK_ARG11(args...) PICK_ARG10(DROP_FIRST_ARG(args)) +#define PICK_ARG12(args...) PICK_ARG11(DROP_FIRST_ARG(args)) + +/** + * ARGS_SEP_COMMA - Definition of a comma character. + * + * This definition can be used in cases where any intermediate macro expects + * fixed number of arguments, but we want to pass more arguments which can + * be properly evaluated only by the next level macro. + * + * Example: + * + * #define foo(f) f(X) f(Y) f(Z) f(Q) + * #define bar DROP_FIRST_ARG(foo(ARGS_SEP_COMMA __stringify)) + * #define buz CALL_ARGS(COUNT_ARGS, DROP_FIRST_ARG(foo(ARGS_SEP_COMMA))) + * + * With above definitions bar expands to + * "X", "Y", "Z", "Q" + * and buz expands to 4. + */ +#define ARGS_SEP_COMMA , + +#endif diff --git a/drivers/gpu/drm/xe/xe_assert.h b/drivers/gpu/drm/xe/xe_assert.h index 34c142e6cfb0..8b0cc1bc9327 100644 --- a/drivers/gpu/drm/xe/xe_assert.h +++ b/drivers/gpu/drm/xe/xe_assert.h @@ -109,11 +109,11 @@ #define xe_assert_msg(xe, condition, msg, arg...) ({ \ const struct xe_device *__xe = (xe); \ __xe_assert_msg(__xe, condition, \ - "platform: %d subplatform: %d\n" \ + "platform: %s subplatform: %d\n" \ "graphics: %s %u.%02u step %s\n" \ "media: %s %u.%02u step %s\n" \ msg, \ - __xe->info.platform, __xe->info.subplatform, \ + __xe->info.platform_name, __xe->info.subplatform, \ __xe->info.graphics_name, \ __xe->info.graphics_verx100 / 100, \ __xe->info.graphics_verx100 % 100, \ diff --git a/drivers/gpu/drm/xe/xe_bb.c b/drivers/gpu/drm/xe/xe_bb.c index 541361caff3b..a13e0b3a169e 100644 --- a/drivers/gpu/drm/xe/xe_bb.c +++ b/drivers/gpu/drm/xe/xe_bb.c @@ -6,7 +6,7 @@ #include "xe_bb.h" #include "instructions/xe_mi_commands.h" -#include "regs/xe_gpu_commands.h" +#include "xe_assert.h" #include "xe_device.h" #include "xe_exec_queue_types.h" #include "xe_gt.h" diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index bc1f794e3e61..2bae01ce4e5b 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -96,6 +96,20 @@ bool xe_bo_is_stolen(struct xe_bo *bo) } /** + * xe_bo_has_single_placement - check if BO is placed only in one memory location + * @bo: The BO + * + * This function checks whether a given BO is placed in only one memory location. + * + * Returns: true if the BO is placed in a single memory location, false otherwise. + * + */ +bool xe_bo_has_single_placement(struct xe_bo *bo) +{ + return bo->placement.num_placement == 1; +} + +/** * xe_bo_is_stolen_devmem - check if BO is of stolen type accessed via PCI BAR * @bo: The BO * @@ -302,6 +316,18 @@ static int xe_tt_map_sg(struct ttm_tt *tt) return 0; } +static void xe_tt_unmap_sg(struct ttm_tt *tt) +{ + struct xe_ttm_tt *xe_tt = container_of(tt, struct xe_ttm_tt, ttm); + + if (xe_tt->sg) { + dma_unmap_sgtable(xe_tt->dev, xe_tt->sg, + DMA_BIDIRECTIONAL, 0); + sg_free_table(xe_tt->sg); + xe_tt->sg = NULL; + } +} + struct sg_table *xe_bo_sg(struct xe_bo *bo) { struct ttm_tt *tt = bo->ttm.ttm; @@ -377,27 +403,15 @@ static int xe_ttm_tt_populate(struct ttm_device *ttm_dev, struct ttm_tt *tt, if (err) return err; - /* A follow up may move this xe_bo_move when BO is moved to XE_PL_TT */ - err = xe_tt_map_sg(tt); - if (err) - ttm_pool_free(&ttm_dev->pool, tt); - return err; } static void xe_ttm_tt_unpopulate(struct ttm_device *ttm_dev, struct ttm_tt *tt) { - struct xe_ttm_tt *xe_tt = container_of(tt, struct xe_ttm_tt, ttm); - if (tt->page_flags & TTM_TT_FLAG_EXTERNAL) return; - if (xe_tt->sg) { - dma_unmap_sgtable(xe_tt->dev, xe_tt->sg, - DMA_BIDIRECTIONAL, 0); - sg_free_table(xe_tt->sg); - xe_tt->sg = NULL; - } + xe_tt_unmap_sg(tt); return ttm_pool_free(&ttm_dev->pool, tt); } @@ -628,17 +642,21 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, bool handle_system_ccs = (!IS_DGFX(xe) && xe_bo_needs_ccs_pages(bo) && ttm && ttm_tt_is_populated(ttm)) ? true : false; int ret = 0; + /* Bo creation path, moving to system or TT. */ if ((!old_mem && ttm) && !handle_system_ccs) { - ttm_bo_move_null(ttm_bo, new_mem); - return 0; + if (new_mem->mem_type == XE_PL_TT) + ret = xe_tt_map_sg(ttm); + if (!ret) + ttm_bo_move_null(ttm_bo, new_mem); + goto out; } if (ttm_bo->type == ttm_bo_type_sg) { ret = xe_bo_move_notify(bo, ctx); if (!ret) ret = xe_bo_move_dmabuf(ttm_bo, new_mem); - goto out; + return ret; } tt_has_data = ttm && (ttm_tt_is_populated(ttm) || @@ -650,6 +668,12 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, needs_clear = (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC) || (!ttm && ttm_bo->type == ttm_bo_type_device); + if (new_mem->mem_type == XE_PL_TT) { + ret = xe_tt_map_sg(ttm); + if (ret) + goto out; + } + if ((move_lacks_source && !needs_clear)) { ttm_bo_move_null(ttm_bo, new_mem); goto out; @@ -786,8 +810,11 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, xe_pm_runtime_put(xe); out: - return ret; + if ((!ttm_bo->resource || ttm_bo->resource->mem_type == XE_PL_SYSTEM) && + ttm_bo->ttm) + xe_tt_unmap_sg(ttm_bo->ttm); + return ret; } /** @@ -1731,11 +1758,10 @@ void xe_bo_unpin_external(struct xe_bo *bo) xe_assert(xe, xe_bo_is_pinned(bo)); xe_assert(xe, xe_bo_is_user(bo)); - if (bo->ttm.pin_count == 1 && !list_empty(&bo->pinned_link)) { - spin_lock(&xe->pinned.lock); + spin_lock(&xe->pinned.lock); + if (bo->ttm.pin_count == 1 && !list_empty(&bo->pinned_link)) list_del_init(&bo->pinned_link); - spin_unlock(&xe->pinned.lock); - } + spin_unlock(&xe->pinned.lock); ttm_bo_unpin(&bo->ttm); @@ -1758,9 +1784,8 @@ void xe_bo_unpin(struct xe_bo *bo) struct ttm_place *place = &(bo->placements[0]); if (mem_type_is_vram(place->mem_type)) { - xe_assert(xe, !list_empty(&bo->pinned_link)); - spin_lock(&xe->pinned.lock); + xe_assert(xe, !list_empty(&bo->pinned_link)); list_del_init(&bo->pinned_link); spin_unlock(&xe->pinned.lock); } diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h index a885b14bf595..6de894c728f5 100644 --- a/drivers/gpu/drm/xe/xe_bo.h +++ b/drivers/gpu/drm/xe/xe_bo.h @@ -206,6 +206,7 @@ bool mem_type_is_vram(u32 mem_type); bool xe_bo_is_vram(struct xe_bo *bo); bool xe_bo_is_stolen(struct xe_bo *bo); bool xe_bo_is_stolen_devmem(struct xe_bo *bo); +bool xe_bo_has_single_placement(struct xe_bo *bo); uint64_t vram_region_gpu_offset(struct ttm_resource *res); bool xe_bo_can_migrate(struct xe_bo *bo, u32 mem_type); diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c index 0b7aebaae843..1011e5d281fa 100644 --- a/drivers/gpu/drm/xe/xe_debugfs.c +++ b/drivers/gpu/drm/xe/xe_debugfs.c @@ -12,7 +12,10 @@ #include "xe_bo.h" #include "xe_device.h" +#include "xe_force_wake.h" #include "xe_gt_debugfs.h" +#include "xe_gt_printk.h" +#include "xe_guc_ads.h" #include "xe_pm.h" #include "xe_sriov.h" #include "xe_step.h" @@ -118,6 +121,58 @@ static const struct file_operations forcewake_all_fops = { .release = forcewake_release, }; +static ssize_t wedged_mode_show(struct file *f, char __user *ubuf, + size_t size, loff_t *pos) +{ + struct xe_device *xe = file_inode(f)->i_private; + char buf[32]; + int len = 0; + + len = scnprintf(buf, sizeof(buf), "%d\n", xe->wedged.mode); + + return simple_read_from_buffer(ubuf, size, pos, buf, len); +} + +static ssize_t wedged_mode_set(struct file *f, const char __user *ubuf, + size_t size, loff_t *pos) +{ + struct xe_device *xe = file_inode(f)->i_private; + struct xe_gt *gt; + u32 wedged_mode; + ssize_t ret; + u8 id; + + ret = kstrtouint_from_user(ubuf, size, 0, &wedged_mode); + if (ret) + return ret; + + if (wedged_mode > 2) + return -EINVAL; + + if (xe->wedged.mode == wedged_mode) + return 0; + + xe->wedged.mode = wedged_mode; + + xe_pm_runtime_get(xe); + for_each_gt(gt, xe, id) { + ret = xe_guc_ads_scheduler_policy_toggle_reset(>->uc.guc.ads); + if (ret) { + xe_gt_err(gt, "Failed to update GuC ADS scheduler policy. GuC may still cause engine reset even with wedged_mode=2\n"); + return -EIO; + } + } + xe_pm_runtime_put(xe); + + return size; +} + +static const struct file_operations wedged_mode_fops = { + .owner = THIS_MODULE, + .read = wedged_mode_show, + .write = wedged_mode_set, +}; + void xe_debugfs_register(struct xe_device *xe) { struct ttm_device *bdev = &xe->ttm; @@ -135,6 +190,9 @@ void xe_debugfs_register(struct xe_device *xe) debugfs_create_file("forcewake_all", 0400, root, xe, &forcewake_all_fops); + debugfs_create_file("wedged_mode", 0400, root, xe, + &wedged_mode_fops); + for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) { man = ttm_manager_type(bdev, mem_type); diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c index 3d7980232be1..d7f2d19a77c1 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.c +++ b/drivers/gpu/drm/xe/xe_devcoredump.c @@ -110,6 +110,7 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset, drm_printf(&p, "Snapshot time: %lld.%09ld\n", ts.tv_sec, ts.tv_nsec); ts = ktime_to_timespec64(ss->boot_time); drm_printf(&p, "Uptime: %lld.%09ld\n", ts.tv_sec, ts.tv_nsec); + drm_printf(&p, "Process: %s\n", ss->process_name); xe_device_snapshot_print(xe, &p); drm_printf(&p, "\n**** GuC CT ****\n"); @@ -166,12 +167,24 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump, enum xe_hw_engine_id id; u32 adj_logical_mask = q->logical_mask; u32 width_mask = (0x1 << q->width) - 1; + const char *process_name = "no process"; + struct task_struct *task = NULL; + int i; bool cookie; ss->snapshot_time = ktime_get_real(); ss->boot_time = ktime_get_boottime(); + if (q->vm && q->vm->xef) { + task = get_pid_task(q->vm->xef->drm->pid, PIDTYPE_PID); + if (task) + process_name = task->comm; + } + strscpy(ss->process_name, process_name); + if (task) + put_task_struct(task); + ss->gt = q->gt; INIT_WORK(&ss->work, xe_devcoredump_deferred_snap_work); @@ -238,13 +251,15 @@ void xe_devcoredump(struct xe_sched_job *job) xe_devcoredump_read, xe_devcoredump_free); } -static void xe_driver_devcoredump_fini(struct drm_device *drm, void *arg) +static void xe_driver_devcoredump_fini(void *arg) { + struct drm_device *drm = arg; + dev_coredump_put(drm->dev); } int xe_devcoredump_init(struct xe_device *xe) { - return drmm_add_action_or_reset(&xe->drm, xe_driver_devcoredump_fini, xe); + return devm_add_action_or_reset(xe->drm.dev, xe_driver_devcoredump_fini, &xe->drm); } #endif diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h index 6f654b63c7f1..923cdf72a816 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump_types.h +++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h @@ -26,6 +26,8 @@ struct xe_devcoredump_snapshot { ktime_t snapshot_time; /** @boot_time: Relative boot time so the uptime can be calculated. */ ktime_t boot_time; + /** @process_name: Name of process that triggered this gpu hang */ + char process_name[TASK_COMM_LEN]; /** @gt: Affected GT, used by forcewake for delayed capture */ struct xe_gt *gt; diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 5ef9b50a20d0..64691a56d59c 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -5,6 +5,7 @@ #include "xe_device.h" +#include <linux/delay.h> #include <linux/units.h> #include <drm/drm_aperture.h> @@ -17,6 +18,7 @@ #include <drm/xe_drm.h> #include "display/xe_display.h" +#include "instructions/xe_gpu_commands.h" #include "regs/xe_gt_regs.h" #include "regs/xe_regs.h" #include "xe_bo.h" @@ -27,10 +29,14 @@ #include "xe_drv.h" #include "xe_exec.h" #include "xe_exec_queue.h" +#include "xe_force_wake.h" #include "xe_ggtt.h" #include "xe_gsc_proxy.h" #include "xe_gt.h" #include "xe_gt_mcr.h" +#include "xe_gt_printk.h" +#include "xe_gt_sriov_vf.h" +#include "xe_guc.h" #include "xe_hwmon.h" #include "xe_irq.h" #include "xe_memirq.h" @@ -45,6 +51,7 @@ #include "xe_ttm_stolen_mgr.h" #include "xe_ttm_sys_mgr.h" #include "xe_vm.h" +#include "xe_vram.h" #include "xe_wait_user_fence.h" static int xe_file_open(struct drm_device *dev, struct drm_file *file) @@ -90,12 +97,16 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file) struct xe_exec_queue *q; unsigned long idx; - mutex_lock(&xef->exec_queue.lock); + /* + * No need for exec_queue.lock here as there is no contention for it + * when FD is closing as IOCTLs presumably can't be modifying the + * xarray. Taking exec_queue.lock here causes undue dependency on + * vm->lock taken during xe_exec_queue_kill(). + */ xa_for_each(&xef->exec_queue.xa, idx, q) { xe_exec_queue_kill(q); xe_exec_queue_put(q); } - mutex_unlock(&xef->exec_queue.lock); xa_destroy(&xef->exec_queue.xa); mutex_destroy(&xef->exec_queue.lock); mutex_lock(&xef->vm.lock); @@ -138,6 +149,9 @@ static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg) struct xe_device *xe = to_xe_device(file_priv->minor->dev); long ret; + if (xe_device_wedged(xe)) + return -ECANCELED; + ret = xe_pm_runtime_get_ioctl(xe); if (ret >= 0) ret = drm_ioctl(file, cmd, arg); @@ -153,6 +167,9 @@ static long xe_drm_compat_ioctl(struct file *file, unsigned int cmd, unsigned lo struct xe_device *xe = to_xe_device(file_priv->minor->dev); long ret; + if (xe_device_wedged(xe)) + return -ECANCELED; + ret = xe_pm_runtime_get_ioctl(xe); if (ret >= 0) ret = drm_compat_ioctl(file, cmd, arg); @@ -180,13 +197,6 @@ static const struct file_operations xe_driver_fops = { #endif }; -static void xe_driver_release(struct drm_device *dev) -{ - struct xe_device *xe = to_xe_device(dev); - - pci_set_drvdata(to_pci_dev(xe->drm.dev), NULL); -} - static struct drm_driver driver = { /* Don't use MTRRs here; the Xserver or userspace app should * deal with them for Intel hardware. @@ -205,8 +215,6 @@ static struct drm_driver driver = { #ifdef CONFIG_PROC_FS .show_fdinfo = xe_drm_client_fdinfo, #endif - .release = &xe_driver_release, - .ioctls = xe_ioctls, .num_ioctls = ARRAY_SIZE(xe_ioctls), .fops = &xe_driver_fops, @@ -269,7 +277,10 @@ struct xe_device *xe_device_create(struct pci_dev *pdev, init_waitqueue_head(&xe->ufence_wq); - drmm_mutex_init(&xe->drm, &xe->usm.lock); + err = drmm_mutex_init(&xe->drm, &xe->usm.lock); + if (err) + goto err; + xa_init_flags(&xe->usm.asid_to_vm, XA_FLAGS_ALLOC); if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) { @@ -378,7 +389,7 @@ static void xe_driver_flr(struct xe_device *xe) xe_mmio_write32(gt, GU_DEBUG, DRIVERFLR_STATUS); } -static void xe_driver_flr_fini(struct drm_device *drm, void *arg) +static void xe_driver_flr_fini(void *arg) { struct xe_device *xe = arg; @@ -386,7 +397,7 @@ static void xe_driver_flr_fini(struct drm_device *drm, void *arg) xe_driver_flr(xe); } -static void xe_device_sanitize(struct drm_device *drm, void *arg) +static void xe_device_sanitize(void *arg) { struct xe_device *xe = arg; struct xe_gt *gt; @@ -501,6 +512,8 @@ int xe_device_probe_early(struct xe_device *xe) if (err) return err; + xe->wedged.mode = xe_modparam.wedged_mode; + return 0; } @@ -551,14 +564,28 @@ int xe_device_probe(struct xe_device *xe) if (err) return err; - xe_mmio_probe_tiles(xe); + err = xe_mmio_probe_tiles(xe); + if (err) + return err; xe_ttm_sys_mgr_init(xe); - for_each_gt(gt, xe, id) - xe_force_wake_init_gt(gt, gt_to_fw(gt)); + for_each_gt(gt, xe, id) { + err = xe_gt_init_early(gt); + if (err) + return err; + } for_each_tile(tile, xe, id) { + if (IS_SRIOV_VF(xe)) { + xe_guc_comm_init_early(&tile->primary_gt->uc.guc); + err = xe_gt_sriov_vf_bootstrap(tile->primary_gt); + if (err) + return err; + err = xe_gt_sriov_vf_query_config(tile->primary_gt); + if (err) + return err; + } err = xe_ggtt_init_early(tile->mem.ggtt); if (err) return err; @@ -578,13 +605,10 @@ int xe_device_probe(struct xe_device *xe) err = xe_devcoredump_init(xe); if (err) return err; - err = drmm_add_action_or_reset(&xe->drm, xe_driver_flr_fini, xe); + err = devm_add_action_or_reset(xe->drm.dev, xe_driver_flr_fini, xe); if (err) return err; - for_each_gt(gt, xe, id) - xe_pcode_init(gt); - err = xe_display_init_noirq(xe); if (err) return err; @@ -593,17 +617,11 @@ int xe_device_probe(struct xe_device *xe) if (err) goto err; - for_each_gt(gt, xe, id) { - err = xe_gt_init_early(gt); - if (err) - goto err_irq_shutdown; - } - err = xe_device_set_has_flat_ccs(xe); if (err) goto err_irq_shutdown; - err = xe_mmio_probe_vram(xe); + err = xe_vram_probe(xe); if (err) goto err_irq_shutdown; @@ -650,7 +668,7 @@ int xe_device_probe(struct xe_device *xe) xe_hwmon_register(xe); - return drmm_add_action_or_reset(&xe->drm, xe_device_sanitize, xe); + return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe); err_fini_display: xe_display_driver_remove(xe); @@ -759,3 +777,34 @@ u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address) { return address & GENMASK_ULL(xe->info.va_bits - 1, 0); } + +/** + * xe_device_declare_wedged - Declare device wedged + * @xe: xe device instance + * + * This is a final state that can only be cleared with a mudule + * re-probe (unbind + bind). + * In this state every IOCTL will be blocked so the GT cannot be used. + * In general it will be called upon any critical error such as gt reset + * failure or guc loading failure. + * If xe.wedged module parameter is set to 2, this function will be called + * on every single execution timeout (a.k.a. GPU hang) right after devcoredump + * snapshot capture. In this mode, GT reset won't be attempted so the state of + * the issue is preserved for further debugging. + */ +void xe_device_declare_wedged(struct xe_device *xe) +{ + if (xe->wedged.mode == 0) { + drm_dbg(&xe->drm, "Wedged mode is forcibly disabled\n"); + return; + } + + if (!atomic_xchg(&xe->wedged.flag, 1)) { + xe->needs_flr_on_fini = true; + drm_err(&xe->drm, + "CRITICAL: Xe has declared device %s as wedged.\n" + "IOCTLs and executions are blocked. Only a rebind may clear the failure\n" + "Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new\n", + dev_name(xe->drm.dev)); + } +} diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h index 36d4434ebccc..3ed14072d8d1 100644 --- a/drivers/gpu/drm/xe/xe_device.h +++ b/drivers/gpu/drm/xe/xe_device.h @@ -6,15 +6,9 @@ #ifndef _XE_DEVICE_H_ #define _XE_DEVICE_H_ -struct xe_exec_queue; -struct xe_file; - #include <drm/drm_util.h> -#include "regs/xe_gpu_commands.h" #include "xe_device_types.h" -#include "xe_force_wake.h" -#include "xe_macros.h" static inline struct xe_device *to_xe_device(const struct drm_device *dev) { @@ -167,4 +161,11 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p); u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address); u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address); +static inline bool xe_device_wedged(struct xe_device *xe) +{ + return atomic_read(&xe->wedged.flag); +} + +void xe_device_declare_wedged(struct xe_device *xe); + #endif diff --git a/drivers/gpu/drm/xe/xe_device_sysfs.c b/drivers/gpu/drm/xe/xe_device_sysfs.c index 21677b8cd977..7375937934fa 100644 --- a/drivers/gpu/drm/xe/xe_device_sysfs.c +++ b/drivers/gpu/drm/xe/xe_device_sysfs.c @@ -69,7 +69,7 @@ vram_d3cold_threshold_store(struct device *dev, struct device_attribute *attr, static DEVICE_ATTR_RW(vram_d3cold_threshold); -static void xe_device_sysfs_fini(struct drm_device *drm, void *arg) +static void xe_device_sysfs_fini(void *arg) { struct xe_device *xe = arg; @@ -85,5 +85,5 @@ int xe_device_sysfs_init(struct xe_device *xe) if (ret) return ret; - return drmm_add_action_or_reset(&xe->drm, xe_device_sysfs_fini, xe); + return devm_add_action_or_reset(dev, xe_device_sysfs_fini, xe); } diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index 2e62450d86e1..52bc461171d5 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -196,6 +196,9 @@ struct xe_tile { struct { /** @sriov.vf.memirq: Memory Based Interrupts. */ struct xe_memirq memirq; + + /** @sriov.vf.ggtt_balloon: GGTT regions excluded from use. */ + struct drm_mm_node ggtt_balloon[2]; } vf; } sriov; @@ -218,6 +221,8 @@ struct xe_device { /** @info: device info */ struct intel_device_info { + /** @info.platform_name: platform name */ + const char *platform_name; /** @info.graphics_name: graphics IP name */ const char *graphics_name; /** @info.media_name: media IP name */ @@ -281,6 +286,10 @@ struct xe_device { u8 has_heci_gscfi:1; /** @info.skip_guc_pc: Skip GuC based PM feature init */ u8 skip_guc_pc:1; + /** @info.has_atomic_enable_pte_bit: Device has atomic enable PTE bit */ + u8 has_atomic_enable_pte_bit:1; + /** @info.has_device_atomics_on_smem: Supports device atomics on SMEM */ + u8 has_device_atomics_on_smem:1; #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY) struct { @@ -427,9 +436,6 @@ struct xe_device { /** @d3cold.allowed: Indicates if d3cold is a valid device state */ bool allowed; - /** @d3cold.power_lost: Indicates if card has really lost power. */ - bool power_lost; - /** * @d3cold.vram_threshold: * @@ -459,6 +465,14 @@ struct xe_device { /** @needs_flr_on_fini: requests function-reset on fini */ bool needs_flr_on_fini; + /** @wedged: Struct to control Wedged States and mode */ + struct { + /** @wedged.flag: Xe device faced a critical error and is now blocked. */ + atomic_t flag; + /** @wedged.mode: Mode controlled by kernel parameter and debugfs */ + int mode; + } wedged; + /* private: */ #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY) @@ -547,6 +561,9 @@ struct xe_file { struct mutex lock; } exec_queue; + /** @run_ticks: hw engine class run time in ticks for this drm client */ + u64 run_ticks[XE_ENGINE_CLASS_MAX]; + /** @client: drm client */ struct xe_drm_client *client; }; diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c index 08f0b7c95901..4a19b771e3a0 100644 --- a/drivers/gpu/drm/xe/xe_drm_client.c +++ b/drivers/gpu/drm/xe/xe_drm_client.c @@ -2,6 +2,7 @@ /* * Copyright © 2023 Intel Corporation */ +#include "xe_drm_client.h" #include <drm/drm_print.h> #include <drm/xe_drm.h> @@ -12,10 +13,67 @@ #include "xe_bo.h" #include "xe_bo_types.h" #include "xe_device_types.h" -#include "xe_drm_client.h" +#include "xe_exec_queue.h" +#include "xe_force_wake.h" +#include "xe_gt.h" +#include "xe_hw_engine.h" +#include "xe_pm.h" #include "xe_trace.h" /** + * DOC: DRM Client usage stats + * + * The drm/xe driver implements the DRM client usage stats specification as + * documented in :ref:`drm-client-usage-stats`. + * + * Example of the output showing the implemented key value pairs and entirety of + * the currently possible format options: + * + * :: + * + * pos: 0 + * flags: 0100002 + * mnt_id: 26 + * ino: 685 + * drm-driver: xe + * drm-client-id: 3 + * drm-pdev: 0000:03:00.0 + * drm-total-system: 0 + * drm-shared-system: 0 + * drm-active-system: 0 + * drm-resident-system: 0 + * drm-purgeable-system: 0 + * drm-total-gtt: 192 KiB + * drm-shared-gtt: 0 + * drm-active-gtt: 0 + * drm-resident-gtt: 192 KiB + * drm-total-vram0: 23992 KiB + * drm-shared-vram0: 16 MiB + * drm-active-vram0: 0 + * drm-resident-vram0: 23992 KiB + * drm-total-stolen: 0 + * drm-shared-stolen: 0 + * drm-active-stolen: 0 + * drm-resident-stolen: 0 + * drm-cycles-rcs: 28257900 + * drm-total-cycles-rcs: 7655183225 + * drm-cycles-bcs: 0 + * drm-total-cycles-bcs: 7655183225 + * drm-cycles-vcs: 0 + * drm-total-cycles-vcs: 7655183225 + * drm-engine-capacity-vcs: 2 + * drm-cycles-vecs: 0 + * drm-total-cycles-vecs: 7655183225 + * drm-engine-capacity-vecs: 2 + * drm-cycles-ccs: 0 + * drm-total-cycles-ccs: 7655183225 + * drm-engine-capacity-ccs: 4 + * + * Possible `drm-cycles-` key names are: `rcs`, `ccs`, `bcs`, `vcs`, `vecs` and + * "other". + */ + +/** * xe_drm_client_alloc() - Allocate drm client * @void: No arg * @@ -179,6 +237,69 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file) } } +static void show_run_ticks(struct drm_printer *p, struct drm_file *file) +{ + unsigned long class, i, gt_id, capacity[XE_ENGINE_CLASS_MAX] = { }; + struct xe_file *xef = file->driver_priv; + struct xe_device *xe = xef->xe; + struct xe_gt *gt; + struct xe_hw_engine *hwe; + struct xe_exec_queue *q; + u64 gpu_timestamp; + + xe_pm_runtime_get(xe); + + /* Accumulate all the exec queues from this client */ + mutex_lock(&xef->exec_queue.lock); + xa_for_each(&xef->exec_queue.xa, i, q) { + xe_exec_queue_update_run_ticks(q); + xef->run_ticks[q->class] += q->run_ticks - q->old_run_ticks; + q->old_run_ticks = q->run_ticks; + } + mutex_unlock(&xef->exec_queue.lock); + + /* Get the total GPU cycles */ + for_each_gt(gt, xe, gt_id) { + hwe = xe_gt_any_hw_engine(gt); + if (!hwe) + continue; + + xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); + gpu_timestamp = xe_hw_engine_read_timestamp(hwe); + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); + break; + } + + xe_pm_runtime_put(xe); + + if (unlikely(!hwe)) + return; + + for (class = 0; class < XE_ENGINE_CLASS_MAX; class++) { + const char *class_name; + + for_each_gt(gt, xe, gt_id) + capacity[class] += gt->user_engines.instances_per_class[class]; + + /* + * Engines may be fused off or not exposed to userspace. Don't + * return anything if this entire class is not available + */ + if (!capacity[class]) + continue; + + class_name = xe_hw_engine_class_to_str(class); + drm_printf(p, "drm-cycles-%s:\t%llu\n", + class_name, xef->run_ticks[class]); + drm_printf(p, "drm-total-cycles-%s:\t%llu\n", + class_name, gpu_timestamp); + + if (capacity[class] > 1) + drm_printf(p, "drm-engine-capacity-%s:\t%lu\n", + class_name, capacity[class]); + } +} + /** * xe_drm_client_fdinfo() - Callback for fdinfo interface * @p: The drm_printer ptr @@ -192,5 +313,6 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file) void xe_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file) { show_meminfo(p, file); + show_run_ticks(p, file); } #endif diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 395de93579fa..27215075c799 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -86,7 +86,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe, if (extensions) { /* - * may set q->usm, must come before xe_lrc_init(), + * may set q->usm, must come before xe_lrc_create(), * may overwrite q->sched_props, must come before q->ops->init() */ err = exec_queue_user_extensions(xe, q, extensions, 0); @@ -96,45 +96,30 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe, } } - if (xe_exec_queue_is_parallel(q)) { - q->parallel.composite_fence_ctx = dma_fence_context_alloc(1); - q->parallel.composite_fence_seqno = XE_FENCE_INITIAL_SEQNO; - } - return q; } static int __xe_exec_queue_init(struct xe_exec_queue *q) { - struct xe_device *xe = gt_to_xe(q->gt); int i, err; for (i = 0; i < q->width; ++i) { - err = xe_lrc_init(q->lrc + i, q->hwe, q, q->vm, SZ_16K); - if (err) + q->lrc[i] = xe_lrc_create(q->hwe, q->vm, SZ_16K); + if (IS_ERR(q->lrc[i])) { + err = PTR_ERR(q->lrc[i]); goto err_lrc; + } } err = q->ops->init(q); if (err) goto err_lrc; - /* - * Normally the user vm holds an rpm ref to keep the device - * awake, and the context holds a ref for the vm, however for - * some engines we use the kernels migrate vm underneath which offers no - * such rpm ref, or we lack a vm. Make sure we keep a ref here, so we - * can perform GuC CT actions when needed. Caller is expected to have - * already grabbed the rpm ref outside any sensitive locks. - */ - if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm)) - xe_pm_runtime_get_noresume(xe); - return 0; err_lrc: for (i = i - 1; i >= 0; --i) - xe_lrc_finish(q->lrc + i); + xe_lrc_put(q->lrc[i]); return err; } @@ -215,9 +200,7 @@ void xe_exec_queue_fini(struct xe_exec_queue *q) int i; for (i = 0; i < q->width; ++i) - xe_lrc_finish(q->lrc + i); - if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm)) - xe_pm_runtime_put(gt_to_xe(q->gt)); + xe_lrc_put(q->lrc[i]); __xe_exec_queue_free(q); } @@ -720,7 +703,7 @@ bool xe_exec_queue_is_lr(struct xe_exec_queue *q) static s32 xe_exec_queue_num_job_inflight(struct xe_exec_queue *q) { - return q->lrc->fence_ctx.next_seqno - xe_lrc_seqno(q->lrc) - 1; + return q->lrc[0]->fence_ctx.next_seqno - xe_lrc_seqno(q->lrc[0]) - 1; } /** @@ -731,7 +714,7 @@ static s32 xe_exec_queue_num_job_inflight(struct xe_exec_queue *q) */ bool xe_exec_queue_ring_full(struct xe_exec_queue *q) { - struct xe_lrc *lrc = q->lrc; + struct xe_lrc *lrc = q->lrc[0]; s32 max_job = lrc->ring.size / MAX_JOB_SIZE_BYTES; return xe_exec_queue_num_job_inflight(q) >= max_job; @@ -757,16 +740,50 @@ bool xe_exec_queue_is_idle(struct xe_exec_queue *q) int i; for (i = 0; i < q->width; ++i) { - if (xe_lrc_seqno(&q->lrc[i]) != - q->lrc[i].fence_ctx.next_seqno - 1) + if (xe_lrc_seqno(q->lrc[i]) != + q->lrc[i]->fence_ctx.next_seqno - 1) return false; } return true; } - return xe_lrc_seqno(&q->lrc[0]) == - q->lrc[0].fence_ctx.next_seqno - 1; + return xe_lrc_seqno(q->lrc[0]) == + q->lrc[0]->fence_ctx.next_seqno - 1; +} + +/** + * xe_exec_queue_update_run_ticks() - Update run time in ticks for this exec queue + * from hw + * @q: The exec queue + * + * Update the timestamp saved by HW for this exec queue and save run ticks + * calculated by using the delta from last update. + */ +void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q) +{ + struct xe_lrc *lrc; + u32 old_ts, new_ts; + + /* + * Jobs that are run during driver load may use an exec_queue, but are + * not associated with a user xe file, so avoid accumulating busyness + * for kernel specific work. + */ + if (!q->vm || !q->vm->xef) + return; + + /* + * Only sample the first LRC. For parallel submission, all of them are + * scheduled together and we compensate that below by multiplying by + * width - this may introduce errors if that premise is not true and + * they don't exit 100% aligned. On the other hand, looping through + * the LRCs and reading them in different time could also introduce + * errors. + */ + lrc = q->lrc[0]; + new_ts = xe_lrc_update_timestamp(lrc, &old_ts); + q->run_ticks += (new_ts - old_ts) * q->width; } void xe_exec_queue_kill(struct xe_exec_queue *q) diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h index 02ce8d204622..289a3a51d2a2 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.h +++ b/drivers/gpu/drm/xe/xe_exec_queue.h @@ -26,6 +26,15 @@ void xe_exec_queue_fini(struct xe_exec_queue *q); void xe_exec_queue_destroy(struct kref *ref); void xe_exec_queue_assign_name(struct xe_exec_queue *q, u32 instance); +static inline struct xe_exec_queue * +xe_exec_queue_get_unless_zero(struct xe_exec_queue *q) +{ + if (kref_get_unless_zero(&q->refcount)) + return q; + + return NULL; +} + struct xe_exec_queue *xe_exec_queue_lookup(struct xe_file *xef, u32 id); static inline struct xe_exec_queue *xe_exec_queue_get(struct xe_exec_queue *q) @@ -66,5 +75,6 @@ struct dma_fence *xe_exec_queue_last_fence_get(struct xe_exec_queue *e, struct xe_vm *vm); void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm, struct dma_fence *fence); +void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q); #endif diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h index ee78d497d838..18d8b2a60928 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h @@ -103,16 +103,6 @@ struct xe_exec_queue { struct xe_guc_exec_queue *guc; }; - /** - * @parallel: parallel submission state - */ - struct { - /** @parallel.composite_fence_ctx: context composite fence */ - u64 composite_fence_ctx; - /** @parallel.composite_fence_seqno: seqno for composite fence */ - u32 composite_fence_seqno; - } parallel; - /** @sched_props: scheduling properties */ struct { /** @sched_props.timeslice_us: timeslice period in micro-seconds */ @@ -151,8 +141,12 @@ struct xe_exec_queue { * Protected by @vm's resv. Unused if @vm == NULL. */ u64 tlb_flush_seqno; + /** @old_run_ticks: prior hw engine class run time in ticks for this exec queue */ + u64 old_run_ticks; + /** @run_ticks: hw engine class run time in ticks for this exec queue */ + u64 run_ticks; /** @lrc: logical ring context for this exec queue */ - struct xe_lrc lrc[]; + struct xe_lrc *lrc[]; }; /** diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c index dece2785933c..db906117db6d 100644 --- a/drivers/gpu/drm/xe/xe_execlist.c +++ b/drivers/gpu/drm/xe/xe_execlist.c @@ -9,7 +9,6 @@ #include "instructions/xe_mi_commands.h" #include "regs/xe_engine_regs.h" -#include "regs/xe_gpu_commands.h" #include "regs/xe_gt_regs.h" #include "regs/xe_lrc_layout.h" #include "xe_assert.h" @@ -110,7 +109,7 @@ static void __xe_execlist_port_start(struct xe_execlist_port *port, port->last_ctx_id = 1; } - __start_lrc(port->hwe, exl->q->lrc, port->last_ctx_id); + __start_lrc(port->hwe, exl->q->lrc[0], port->last_ctx_id); port->running_exl = exl; exl->has_run = true; } @@ -124,14 +123,14 @@ static void __xe_execlist_port_idle(struct xe_execlist_port *port) if (!port->running_exl) return; - xe_lrc_write_ring(&port->hwe->kernel_lrc, noop, sizeof(noop)); - __start_lrc(port->hwe, &port->hwe->kernel_lrc, 0); + xe_lrc_write_ring(port->hwe->kernel_lrc, noop, sizeof(noop)); + __start_lrc(port->hwe, port->hwe->kernel_lrc, 0); port->running_exl = NULL; } static bool xe_execlist_is_idle(struct xe_execlist_exec_queue *exl) { - struct xe_lrc *lrc = exl->q->lrc; + struct xe_lrc *lrc = exl->q->lrc[0]; return lrc->ring.tail == lrc->ring.old_tail; } @@ -307,6 +306,7 @@ static void execlist_job_free(struct drm_sched_job *drm_job) { struct xe_sched_job *job = to_xe_sched_job(drm_job); + xe_exec_queue_update_run_ticks(job->q); xe_sched_job_put(job); } @@ -333,7 +333,7 @@ static int execlist_exec_queue_init(struct xe_exec_queue *q) exl->q = q; err = drm_sched_init(&exl->sched, &drm_sched_ops, NULL, 1, - q->lrc[0].ring.size / MAX_JOB_SIZE_BYTES, + q->lrc[0]->ring.size / MAX_JOB_SIZE_BYTES, XE_SCHED_HANG_LIMIT, XE_SCHED_JOB_TIMEOUT, NULL, NULL, q->hwe->name, gt_to_xe(q->gt)->drm.dev); diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c index 0d541f55b4fc..b01a670fecb8 100644 --- a/drivers/gpu/drm/xe/xe_ggtt.c +++ b/drivers/gpu/drm/xe/xe_ggtt.c @@ -8,6 +8,7 @@ #include <linux/io-64-nonatomic-lo-hi.h> #include <linux/sizes.h> +#include <drm/drm_drv.h> #include <drm/drm_managed.h> #include <drm/i915_drm.h> @@ -19,6 +20,7 @@ #include "xe_device.h" #include "xe_gt.h" #include "xe_gt_printk.h" +#include "xe_gt_sriov_vf.h" #include "xe_gt_tlb_invalidation.h" #include "xe_map.h" #include "xe_pm.h" @@ -140,6 +142,7 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt) struct xe_device *xe = tile_to_xe(ggtt->tile); struct pci_dev *pdev = to_pci_dev(xe->drm.dev); unsigned int gsm_size; + int err; if (IS_SRIOV_VF(xe)) gsm_size = SZ_8M; /* GGTT is expected to be 4GiB */ @@ -193,7 +196,17 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt) mutex_init(&ggtt->lock); primelockdep(ggtt); - return drmm_add_action_or_reset(&xe->drm, ggtt_fini_early, ggtt); + err = drmm_add_action_or_reset(&xe->drm, ggtt_fini_early, ggtt); + if (err) + return err; + + if (IS_SRIOV_VF(xe)) { + err = xe_gt_sriov_vf_prepare_ggtt(xe_tile_get_gt(ggtt->tile, 0)); + if (err) + return err; + } + + return 0; } static void xe_ggtt_invalidate(struct xe_ggtt *ggtt); @@ -433,18 +446,29 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo) void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node, bool invalidate) { - xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile)); + struct xe_device *xe = tile_to_xe(ggtt->tile); + bool bound; + int idx; + + bound = drm_dev_enter(&xe->drm, &idx); + if (bound) + xe_pm_runtime_get_noresume(xe); mutex_lock(&ggtt->lock); - xe_ggtt_clear(ggtt, node->start, node->size); + if (bound) + xe_ggtt_clear(ggtt, node->start, node->size); drm_mm_remove_node(node); node->size = 0; mutex_unlock(&ggtt->lock); + if (!bound) + return; + if (invalidate) xe_ggtt_invalidate(ggtt); - xe_pm_runtime_put(tile_to_xe(ggtt->tile)); + xe_pm_runtime_put(xe); + drm_dev_exit(idx); } void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo) diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c index 60202b903687..80a61934decc 100644 --- a/drivers/gpu/drm/xe/xe_gsc.c +++ b/drivers/gpu/drm/xe/xe_gsc.c @@ -5,6 +5,8 @@ #include "xe_gsc.h" +#include <linux/delay.h> + #include <drm/drm_managed.h> #include <generated/xe_wa_oob.h> @@ -14,6 +16,7 @@ #include "xe_bo.h" #include "xe_device.h" #include "xe_exec_queue.h" +#include "xe_force_wake.h" #include "xe_gsc_proxy.h" #include "xe_gsc_submit.h" #include "xe_gt.h" diff --git a/drivers/gpu/drm/xe/xe_gsc.h b/drivers/gpu/drm/xe/xe_gsc.h index dd16e9b8b894..1c7a623faf11 100644 --- a/drivers/gpu/drm/xe/xe_gsc.h +++ b/drivers/gpu/drm/xe/xe_gsc.h @@ -6,8 +6,9 @@ #ifndef _XE_GSC_H_ #define _XE_GSC_H_ -#include "xe_gsc_types.h" +#include <linux/types.h> +struct xe_gsc; struct xe_gt; struct xe_hw_engine; diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.c b/drivers/gpu/drm/xe/xe_gsc_proxy.c index 1b908d238bd1..6d6d1068cf23 100644 --- a/drivers/gpu/drm/xe/xe_gsc_proxy.c +++ b/drivers/gpu/drm/xe/xe_gsc_proxy.c @@ -15,6 +15,7 @@ #include "abi/gsc_proxy_commands_abi.h" #include "regs/xe_gsc_regs.h" #include "xe_bo.h" +#include "xe_force_wake.h" #include "xe_gsc.h" #include "xe_gsc_submit.h" #include "xe_gt.h" diff --git a/drivers/gpu/drm/xe/xe_gsc_submit.c b/drivers/gpu/drm/xe/xe_gsc_submit.c index d34d03248843..9ede483d37ef 100644 --- a/drivers/gpu/drm/xe/xe_gsc_submit.c +++ b/drivers/gpu/drm/xe/xe_gsc_submit.c @@ -8,6 +8,7 @@ #include <linux/poison.h> #include "abi/gsc_command_header_abi.h" +#include "xe_assert.h" #include "xe_bb.h" #include "xe_exec_queue.h" #include "xe_gt_printk.h" diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c index 491d0413de15..57d84751e160 100644 --- a/drivers/gpu/drm/xe/xe_gt.c +++ b/drivers/gpu/drm/xe/xe_gt.c @@ -44,6 +44,7 @@ #include "xe_migrate.h" #include "xe_mmio.h" #include "xe_pat.h" +#include "xe_pcode.h" #include "xe_pm.h" #include "xe_mocs.h" #include "xe_reg_sr.h" @@ -57,9 +58,17 @@ #include "xe_wa.h" #include "xe_wopcm.h" +static void gt_fini(struct drm_device *drm, void *arg) +{ + struct xe_gt *gt = arg; + + destroy_workqueue(gt->ordered_wq); +} + struct xe_gt *xe_gt_alloc(struct xe_tile *tile) { struct xe_gt *gt; + int err; gt = drmm_kzalloc(&tile_to_xe(tile)->drm, sizeof(*gt), GFP_KERNEL); if (!gt) @@ -68,6 +77,10 @@ struct xe_gt *xe_gt_alloc(struct xe_tile *tile) gt->tile = tile; gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0); + err = drmm_add_action_or_reset(>_to_xe(gt)->drm, gt_fini, gt); + if (err) + return ERR_PTR(err); + return gt; } @@ -90,15 +103,9 @@ void xe_gt_sanitize(struct xe_gt *gt) */ void xe_gt_remove(struct xe_gt *gt) { - xe_uc_remove(>->uc); -} - -static void gt_fini(struct drm_device *drm, void *arg) -{ - struct xe_gt *gt = arg; int i; - destroy_workqueue(gt->ordered_wq); + xe_uc_remove(>->uc); for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i) xe_hw_fence_irq_finish(>->fence_irq[i]); @@ -160,7 +167,7 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q) if (q->hwe->class == XE_ENGINE_CLASS_RENDER) /* Big enough to emit all of the context's 3DSTATE */ - bb = xe_bb_new(gt, xe_lrc_size(gt_to_xe(gt), q->hwe->class), false); + bb = xe_bb_new(gt, xe_gt_lrc_size(gt, q->hwe->class), false); else /* Just pick a large BB size */ bb = xe_bb_new(gt, SZ_4K, false); @@ -244,7 +251,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt) xe_tuning_process_lrc(hwe); default_lrc = drmm_kzalloc(&xe->drm, - xe_lrc_size(xe, hwe->class), + xe_gt_lrc_size(gt, hwe->class), GFP_KERNEL); if (!default_lrc) return -ENOMEM; @@ -292,9 +299,9 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt) } xe_map_memcpy_from(xe, default_lrc, - &q->lrc[0].bo->vmap, - xe_lrc_pphwsp_offset(&q->lrc[0]), - xe_lrc_size(xe, hwe->class)); + &q->lrc[0]->bo->vmap, + xe_lrc_pphwsp_offset(q->lrc[0]), + xe_gt_lrc_size(gt, hwe->class)); gt->default_lrc[hwe->class] = default_lrc; put_nop_q: @@ -318,14 +325,6 @@ int xe_gt_init_early(struct xe_gt *gt) return err; } - err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); - if (err) - return err; - - err = xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); - if (err) - return err; - xe_reg_sr_init(>->reg_sr, "GT", gt_to_xe(gt)); err = xe_wa_init(gt); @@ -336,6 +335,9 @@ int xe_gt_init_early(struct xe_gt *gt) xe_wa_process_oob(gt); xe_tuning_process_gt(gt); + xe_force_wake_init_gt(gt, gt_to_fw(gt)); + xe_pcode_init(gt); + return 0; } @@ -366,10 +368,6 @@ static int gt_fw_domain_init(struct xe_gt *gt) xe_lmtt_init(>_to_tile(gt)->sriov.pf.lmtt); } - err = xe_gt_idle_sysfs_init(>->gtidle); - if (err) - goto err_force_wake; - /* Enable per hw engine IRQs */ xe_irq_enable_hwe(gt); @@ -434,6 +432,10 @@ static int all_fw_domain_init(struct xe_gt *gt) if (err) goto err_force_wake; + err = xe_uc_init_post_hwconfig(>->uc); + if (err) + goto err_force_wake; + if (!xe_gt_is_media_type(gt)) { /* * USM has its only SA pool to non-block behind user operations @@ -460,10 +462,6 @@ static int all_fw_domain_init(struct xe_gt *gt) } } - err = xe_uc_init_post_hwconfig(>->uc); - if (err) - goto err_force_wake; - err = xe_uc_init_hw(>->uc); if (err) goto err_force_wake; @@ -477,6 +475,9 @@ static int all_fw_domain_init(struct xe_gt *gt) if (IS_SRIOV_PF(gt_to_xe(gt)) && !xe_gt_is_media_type(gt)) xe_lmtt_init_hw(>_to_tile(gt)->sriov.pf.lmtt); + if (IS_SRIOV_PF(gt_to_xe(gt))) + xe_gt_sriov_pf_init_hw(gt); + err = xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL); XE_WARN_ON(err); @@ -503,8 +504,7 @@ int xe_gt_init_hwconfig(struct xe_gt *gt) if (err) goto out; - xe_gt_topology_init(gt); - xe_gt_mcr_init(gt); + xe_gt_mcr_init_early(gt); xe_pat_init(gt); err = xe_uc_init(>->uc); @@ -515,8 +515,8 @@ int xe_gt_init_hwconfig(struct xe_gt *gt) if (err) goto out_fw; - /* XXX: Fake that we pull the engine mask from hwconfig blob */ - gt->info.engine_mask = gt->info.__engine_mask; + xe_gt_topology_init(gt); + xe_gt_mcr_init(gt); out_fw: xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); @@ -554,6 +554,10 @@ int xe_gt_init(struct xe_gt *gt) if (err) return err; + err = xe_gt_idle_init(>->gtidle); + if (err) + return err; + err = xe_gt_freq_init(gt); if (err) return err; @@ -564,7 +568,30 @@ int xe_gt_init(struct xe_gt *gt) if (err) return err; - return drmm_add_action_or_reset(>_to_xe(gt)->drm, gt_fini, gt); + xe_gt_record_user_engines(gt); + + return 0; +} + +void xe_gt_record_user_engines(struct xe_gt *gt) +{ + struct xe_hw_engine *hwe; + enum xe_hw_engine_id id; + + gt->user_engines.mask = 0; + memset(gt->user_engines.instances_per_class, 0, + sizeof(gt->user_engines.instances_per_class)); + + for_each_hw_engine(hwe, gt, id) { + if (xe_hw_engine_is_reserved(hwe)) + continue; + + gt->user_engines.mask |= BIT_ULL(id); + gt->user_engines.instances_per_class[hwe->class]++; + } + + xe_gt_assert(gt, (gt->user_engines.mask | gt->info.engine_mask) + == gt->info.engine_mask); } static int do_gt_reset(struct xe_gt *gt) @@ -584,12 +611,34 @@ static int do_gt_reset(struct xe_gt *gt) return err; } +static int vf_gt_restart(struct xe_gt *gt) +{ + int err; + + err = xe_uc_sanitize_reset(>->uc); + if (err) + return err; + + err = xe_uc_init_hw(>->uc); + if (err) + return err; + + err = xe_uc_start(>->uc); + if (err) + return err; + + return 0; +} + static int do_gt_restart(struct xe_gt *gt) { struct xe_hw_engine *hwe; enum xe_hw_engine_id id; int err; + if (IS_SRIOV_VF(gt_to_xe(gt))) + return vf_gt_restart(gt); + xe_pat_init(gt); xe_gt_mcr_set_implicit_defaults(gt); @@ -613,6 +662,9 @@ static int do_gt_restart(struct xe_gt *gt) if (IS_SRIOV_PF(gt_to_xe(gt)) && !xe_gt_is_media_type(gt)) xe_lmtt_init_hw(>_to_tile(gt)->sriov.pf.lmtt); + if (IS_SRIOV_PF(gt_to_xe(gt))) + xe_gt_sriov_pf_init_hw(gt); + xe_mocs_init(gt); err = xe_uc_start(>->uc); if (err) @@ -633,6 +685,9 @@ static int gt_reset(struct xe_gt *gt) { int err; + if (xe_device_wedged(gt_to_xe(gt))) + return -ECANCELED; + /* We only support GT resets with GuC submission */ if (!xe_device_uc_enabled(gt_to_xe(gt))) return -ENODEV; @@ -655,9 +710,7 @@ static int gt_reset(struct xe_gt *gt) xe_uc_stop_prepare(>->uc); xe_gt_pagefault_reset(gt); - err = xe_uc_stop(>->uc); - if (err) - goto err_out; + xe_uc_stop(>->uc); xe_gt_tlb_invalidation_reset(gt); @@ -685,7 +738,7 @@ err_msg: err_fail: xe_gt_err(gt, "reset failed (%pe)\n", ERR_PTR(err)); - gt_to_xe(gt)->needs_flr_on_fini = true; + xe_device_declare_wedged(gt_to_xe(gt)); return err; } @@ -733,6 +786,8 @@ int xe_gt_suspend(struct xe_gt *gt) if (err) goto err_force_wake; + xe_gt_idle_disable_pg(gt); + XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL)); xe_gt_dbg(gt, "suspended\n"); @@ -759,6 +814,8 @@ int xe_gt_resume(struct xe_gt *gt) if (err) goto err_force_wake; + xe_gt_idle_enable_pg(gt); + XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL)); xe_gt_dbg(gt, "resumed\n"); @@ -810,3 +867,14 @@ struct xe_hw_engine *xe_gt_any_hw_engine_by_reset_domain(struct xe_gt *gt, return NULL; } + +struct xe_hw_engine *xe_gt_any_hw_engine(struct xe_gt *gt) +{ + struct xe_hw_engine *hwe; + enum xe_hw_engine_id id; + + for_each_hw_engine(hwe, gt, id) + return hwe; + + return NULL; +} diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h index ed6ea8057e35..9073ac68a777 100644 --- a/drivers/gpu/drm/xe/xe_gt.h +++ b/drivers/gpu/drm/xe/xe_gt.h @@ -8,6 +8,7 @@ #include <drm/drm_util.h> +#include "xe_device.h" #include "xe_device_types.h" #include "xe_hw_engine.h" @@ -37,6 +38,19 @@ int xe_gt_init_hwconfig(struct xe_gt *gt); int xe_gt_init_early(struct xe_gt *gt); int xe_gt_init(struct xe_gt *gt); int xe_gt_record_default_lrcs(struct xe_gt *gt); + +/** + * xe_gt_record_user_engines - save data related to engines available to + * usersapce + * @gt: GT structure + * + * Walk the available HW engines from gt->info.engine_mask and calculate data + * related to those engines that may be used by userspace. To be used whenever + * available engines change in runtime (e.g. with ccs_mode) or during + * initialization + */ +void xe_gt_record_user_engines(struct xe_gt *gt); + void xe_gt_suspend_prepare(struct xe_gt *gt); int xe_gt_suspend(struct xe_gt *gt); int xe_gt_resume(struct xe_gt *gt); @@ -53,11 +67,24 @@ void xe_gt_remove(struct xe_gt *gt); struct xe_hw_engine * xe_gt_any_hw_engine_by_reset_domain(struct xe_gt *gt, enum xe_engine_class class); +/** + * xe_gt_any_hw_engine - scan the list of engines and return the + * first available + * @gt: GT structure + */ +struct xe_hw_engine *xe_gt_any_hw_engine(struct xe_gt *gt); + struct xe_hw_engine *xe_gt_hw_engine(struct xe_gt *gt, enum xe_engine_class class, u16 instance, bool logical); +static inline bool xe_gt_has_indirect_ring_state(struct xe_gt *gt) +{ + return gt->info.has_indirect_ring_state && + xe_device_uc_enabled(gt_to_xe(gt)); +} + static inline bool xe_gt_is_media_type(struct xe_gt *gt) { return gt->info.type == XE_GT_TYPE_MEDIA; diff --git a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c index 396aeb5b9924..5d4cdbd69bc3 100644 --- a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c +++ b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c @@ -9,6 +9,7 @@ #include "xe_assert.h" #include "xe_gt.h" #include "xe_gt_ccs_mode.h" +#include "xe_gt_printk.h" #include "xe_gt_sysfs.h" #include "xe_mmio.h" @@ -68,8 +69,8 @@ static void __xe_gt_apply_ccs_mode(struct xe_gt *gt, u32 num_engines) xe_mmio_write32(gt, CCS_MODE, mode); - xe_gt_info(gt, "CCS_MODE=%x config:%08x, num_engines:%d, num_slices:%d\n", - mode, config, num_engines, num_slices); + xe_gt_dbg(gt, "CCS_MODE=%x config:%08x, num_engines:%d, num_slices:%d\n", + mode, config, num_engines, num_slices); } void xe_gt_apply_ccs_mode(struct xe_gt *gt) @@ -134,6 +135,7 @@ ccs_mode_store(struct device *kdev, struct device_attribute *attr, if (gt->ccs_mode != num_engines) { xe_gt_info(gt, "Setting compute mode to %d\n", num_engines); gt->ccs_mode = num_engines; + xe_gt_record_user_engines(gt); xe_gt_reset_async(gt); } @@ -150,7 +152,7 @@ static const struct attribute *gt_ccs_mode_attrs[] = { NULL, }; -static void xe_gt_ccs_mode_sysfs_fini(struct drm_device *drm, void *arg) +static void xe_gt_ccs_mode_sysfs_fini(void *arg) { struct xe_gt *gt = arg; @@ -182,5 +184,5 @@ int xe_gt_ccs_mode_sysfs_init(struct xe_gt *gt) if (err) return err; - return drmm_add_action_or_reset(&xe->drm, xe_gt_ccs_mode_sysfs_fini, gt); + return devm_add_action_or_reset(xe->drm.dev, xe_gt_ccs_mode_sysfs_fini, gt); } diff --git a/drivers/gpu/drm/xe/xe_gt_clock.c b/drivers/gpu/drm/xe/xe_gt_clock.c index c7bca20f6b65..9ff2061133df 100644 --- a/drivers/gpu/drm/xe/xe_gt_clock.c +++ b/drivers/gpu/drm/xe/xe_gt_clock.c @@ -7,6 +7,7 @@ #include "regs/xe_gt_regs.h" #include "regs/xe_regs.h" +#include "xe_assert.h" #include "xe_device.h" #include "xe_gt.h" #include "xe_macros.h" diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c b/drivers/gpu/drm/xe/xe_gt_debugfs.c index 8cf0b2625efc..5e7fd937917a 100644 --- a/drivers/gpu/drm/xe/xe_gt_debugfs.c +++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c @@ -15,14 +15,18 @@ #include "xe_ggtt.h" #include "xe_gt.h" #include "xe_gt_mcr.h" +#include "xe_gt_sriov_pf_debugfs.h" +#include "xe_gt_sriov_vf_debugfs.h" #include "xe_gt_topology.h" #include "xe_hw_engine.h" #include "xe_lrc.h" #include "xe_macros.h" +#include "xe_mocs.h" #include "xe_pat.h" #include "xe_pm.h" #include "xe_reg_sr.h" #include "xe_reg_whitelist.h" +#include "xe_sriov.h" #include "xe_uc_debugfs.h" #include "xe_wa.h" @@ -112,6 +116,17 @@ static int force_reset(struct xe_gt *gt, struct drm_printer *p) return 0; } +static int force_reset_sync(struct xe_gt *gt, struct drm_printer *p) +{ + xe_pm_runtime_get(gt_to_xe(gt)); + xe_gt_reset_async(gt); + xe_pm_runtime_put(gt_to_xe(gt)); + + flush_work(>->reset.worker); + + return 0; +} + static int sa_info(struct xe_gt *gt, struct drm_printer *p) { struct xe_tile *tile = gt_to_tile(gt); @@ -200,6 +215,15 @@ static int pat(struct xe_gt *gt, struct drm_printer *p) return 0; } +static int mocs(struct xe_gt *gt, struct drm_printer *p) +{ + xe_pm_runtime_get(gt_to_xe(gt)); + xe_mocs_dump(gt, p); + xe_pm_runtime_put(gt_to_xe(gt)); + + return 0; +} + static int rcs_default_lrc(struct xe_gt *gt, struct drm_printer *p) { xe_pm_runtime_get(gt_to_xe(gt)); @@ -248,6 +272,7 @@ static int vecs_default_lrc(struct xe_gt *gt, struct drm_printer *p) static const struct drm_info_list debugfs_list[] = { {"hw_engines", .show = xe_gt_debugfs_simple_show, .data = hw_engines}, {"force_reset", .show = xe_gt_debugfs_simple_show, .data = force_reset}, + {"force_reset_sync", .show = xe_gt_debugfs_simple_show, .data = force_reset_sync}, {"sa_info", .show = xe_gt_debugfs_simple_show, .data = sa_info}, {"topology", .show = xe_gt_debugfs_simple_show, .data = topology}, {"steering", .show = xe_gt_debugfs_simple_show, .data = steering}, @@ -255,6 +280,7 @@ static const struct drm_info_list debugfs_list[] = { {"register-save-restore", .show = xe_gt_debugfs_simple_show, .data = register_save_restore}, {"workarounds", .show = xe_gt_debugfs_simple_show, .data = workarounds}, {"pat", .show = xe_gt_debugfs_simple_show, .data = pat}, + {"mocs", .show = xe_gt_debugfs_simple_show, .data = mocs}, {"default_lrc_rcs", .show = xe_gt_debugfs_simple_show, .data = rcs_default_lrc}, {"default_lrc_ccs", .show = xe_gt_debugfs_simple_show, .data = ccs_default_lrc}, {"default_lrc_bcs", .show = xe_gt_debugfs_simple_show, .data = bcs_default_lrc}, @@ -290,4 +316,9 @@ void xe_gt_debugfs_register(struct xe_gt *gt) root, minor); xe_uc_debugfs_register(>->uc, root); + + if (IS_SRIOV_PF(xe)) + xe_gt_sriov_pf_debugfs_register(gt, root); + else if (IS_SRIOV_VF(xe)) + xe_gt_sriov_vf_debugfs_register(gt, root); } diff --git a/drivers/gpu/drm/xe/xe_gt_freq.c b/drivers/gpu/drm/xe/xe_gt_freq.c index 855de40e40ea..68a5778b4319 100644 --- a/drivers/gpu/drm/xe/xe_gt_freq.c +++ b/drivers/gpu/drm/xe/xe_gt_freq.c @@ -13,7 +13,7 @@ #include "xe_device_types.h" #include "xe_gt_sysfs.h" -#include "xe_gt_throttle_sysfs.h" +#include "xe_gt_throttle.h" #include "xe_guc_pc.h" #include "xe_pm.h" @@ -209,7 +209,7 @@ static const struct attribute *freq_attrs[] = { NULL }; -static void freq_fini(struct drm_device *drm, void *arg) +static void freq_fini(void *arg) { struct kobject *kobj = arg; @@ -237,7 +237,7 @@ int xe_gt_freq_init(struct xe_gt *gt) if (!gt->freq) return -ENOMEM; - err = drmm_add_action_or_reset(&xe->drm, freq_fini, gt->freq); + err = devm_add_action(xe->drm.dev, freq_fini, gt->freq); if (err) return err; @@ -245,5 +245,5 @@ int xe_gt_freq_init(struct xe_gt *gt) if (err) return err; - return xe_gt_throttle_sysfs_init(gt); + return xe_gt_throttle_init(gt); } diff --git a/drivers/gpu/drm/xe/xe_gt_idle.c b/drivers/gpu/drm/xe/xe_gt_idle.c index 8fc0f3f6ecc5..5d6181117ab2 100644 --- a/drivers/gpu/drm/xe/xe_gt_idle.c +++ b/drivers/gpu/drm/xe/xe_gt_idle.c @@ -5,12 +5,14 @@ #include <drm/drm_managed.h> +#include "xe_force_wake.h" #include "xe_device.h" #include "xe_gt.h" #include "xe_gt_idle.h" #include "xe_gt_sysfs.h" #include "xe_guc_pc.h" #include "regs/xe_gt_regs.h" +#include "xe_macros.h" #include "xe_mmio.h" #include "xe_pm.h" @@ -92,6 +94,50 @@ static u64 get_residency_ms(struct xe_gt_idle *gtidle, u64 cur_residency) return cur_residency; } +void xe_gt_idle_enable_pg(struct xe_gt *gt) +{ + struct xe_device *xe = gt_to_xe(gt); + u32 pg_enable; + int i, j; + + /* Disable CPG for PVC */ + if (xe->info.platform == XE_PVC) + return; + + xe_device_assert_mem_access(gt_to_xe(gt)); + + pg_enable = RENDER_POWERGATE_ENABLE | MEDIA_POWERGATE_ENABLE; + + for (i = XE_HW_ENGINE_VCS0, j = 0; i <= XE_HW_ENGINE_VCS7; ++i, ++j) { + if ((gt->info.engine_mask & BIT(i))) + pg_enable |= (VDN_HCP_POWERGATE_ENABLE(j) | + VDN_MFXVDENC_POWERGATE_ENABLE(j)); + } + + XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)); + if (xe->info.skip_guc_pc) { + /* + * GuC sets the hysteresis value when GuC PC is enabled + * else set it to 25 (25 * 1.28us) + */ + xe_mmio_write32(gt, MEDIA_POWERGATE_IDLE_HYSTERESIS, 25); + xe_mmio_write32(gt, RENDER_POWERGATE_IDLE_HYSTERESIS, 25); + } + + xe_mmio_write32(gt, POWERGATE_ENABLE, pg_enable); + XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT)); +} + +void xe_gt_idle_disable_pg(struct xe_gt *gt) +{ + xe_device_assert_mem_access(gt_to_xe(gt)); + XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)); + + xe_mmio_write32(gt, POWERGATE_ENABLE, 0); + + XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT)); +} + static ssize_t name_show(struct device *dev, struct device_attribute *attr, char *buff) { @@ -144,15 +190,24 @@ static const struct attribute *gt_idle_attrs[] = { NULL, }; -static void gt_idle_sysfs_fini(struct drm_device *drm, void *arg) +static void gt_idle_fini(void *arg) { struct kobject *kobj = arg; + struct xe_gt *gt = kobj_to_gt(kobj->parent); + + xe_gt_idle_disable_pg(gt); + + if (gt_to_xe(gt)->info.skip_guc_pc) { + XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)); + xe_gt_idle_disable_c6(gt); + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); + } sysfs_remove_files(kobj, gt_idle_attrs); kobject_put(kobj); } -int xe_gt_idle_sysfs_init(struct xe_gt_idle *gtidle) +int xe_gt_idle_init(struct xe_gt_idle *gtidle) { struct xe_gt *gt = gtidle_to_gt(gtidle); struct xe_device *xe = gt_to_xe(gt); @@ -181,7 +236,9 @@ int xe_gt_idle_sysfs_init(struct xe_gt_idle *gtidle) return err; } - return drmm_add_action_or_reset(&xe->drm, gt_idle_sysfs_fini, kobj); + xe_gt_idle_enable_pg(gt); + + return devm_add_action_or_reset(xe->drm.dev, gt_idle_fini, kobj); } void xe_gt_idle_enable_c6(struct xe_gt *gt) @@ -199,9 +256,8 @@ void xe_gt_idle_enable_c6(struct xe_gt *gt) void xe_gt_idle_disable_c6(struct xe_gt *gt) { xe_device_assert_mem_access(gt_to_xe(gt)); - xe_force_wake_assert_held(gt_to_fw(gt), XE_FORCEWAKE_ALL); + xe_force_wake_assert_held(gt_to_fw(gt), XE_FW_GT); - xe_mmio_write32(gt, PG_ENABLE, 0); xe_mmio_write32(gt, RC_CONTROL, 0); xe_mmio_write32(gt, RC_STATE, 0); } diff --git a/drivers/gpu/drm/xe/xe_gt_idle.h b/drivers/gpu/drm/xe/xe_gt_idle.h index 75bd99659b1b..554447b5d46d 100644 --- a/drivers/gpu/drm/xe/xe_gt_idle.h +++ b/drivers/gpu/drm/xe/xe_gt_idle.h @@ -10,8 +10,10 @@ struct xe_gt; -int xe_gt_idle_sysfs_init(struct xe_gt_idle *gtidle); +int xe_gt_idle_init(struct xe_gt_idle *gtidle); void xe_gt_idle_enable_c6(struct xe_gt *gt); void xe_gt_idle_disable_c6(struct xe_gt *gt); +void xe_gt_idle_enable_pg(struct xe_gt *gt); +void xe_gt_idle_disable_pg(struct xe_gt *gt); #endif /* _XE_GT_IDLE_H_ */ diff --git a/drivers/gpu/drm/xe/xe_gt_mcr.c b/drivers/gpu/drm/xe/xe_gt_mcr.c index 577bd7043740..386ac3269909 100644 --- a/drivers/gpu/drm/xe/xe_gt_mcr.c +++ b/drivers/gpu/drm/xe/xe_gt_mcr.c @@ -375,18 +375,35 @@ static const struct { [IMPLICIT_STEERING] = { "IMPLICIT", NULL }, }; -void xe_gt_mcr_init(struct xe_gt *gt) +/** + * xe_gt_mcr_init_early - Early initialization of the MCR support + * @gt: GT structure + * + * Perform early software only initialization of the MCR lock to allow + * the synchronization on accessing the STEER_SEMAPHORE register and + * use the xe_gt_mcr_multicast_write() function. + */ +void xe_gt_mcr_init_early(struct xe_gt *gt) { - struct xe_device *xe = gt_to_xe(gt); - BUILD_BUG_ON(IMPLICIT_STEERING + 1 != NUM_STEERING_TYPES); BUILD_BUG_ON(ARRAY_SIZE(xe_steering_types) != NUM_STEERING_TYPES); + spin_lock_init(>->mcr_lock); +} + +/** + * xe_gt_mcr_init - Normal initialization of the MCR support + * @gt: GT structure + * + * Perform normal initialization of the MCR for all usages. + */ +void xe_gt_mcr_init(struct xe_gt *gt) +{ + struct xe_device *xe = gt_to_xe(gt); + if (IS_SRIOV_VF(xe)) return; - spin_lock_init(>->mcr_lock); - if (gt->info.type == XE_GT_TYPE_MEDIA) { drm_WARN_ON(&xe->drm, MEDIA_VER(xe) < 13); diff --git a/drivers/gpu/drm/xe/xe_gt_mcr.h b/drivers/gpu/drm/xe/xe_gt_mcr.h index a7f4ab1aa584..8d119a0d5493 100644 --- a/drivers/gpu/drm/xe/xe_gt_mcr.h +++ b/drivers/gpu/drm/xe/xe_gt_mcr.h @@ -12,6 +12,7 @@ struct drm_printer; struct xe_gt; +void xe_gt_mcr_init_early(struct xe_gt *gt); void xe_gt_mcr_init(struct xe_gt *gt); void xe_gt_mcr_set_implicit_defaults(struct xe_gt *gt); @@ -40,4 +41,28 @@ void xe_gt_mcr_get_dss_steering(struct xe_gt *gt, unsigned int dss, u16 *group, for_each_dss((dss), (gt)) \ for_each_if((xe_gt_mcr_get_dss_steering((gt), (dss), &(group), &(instance)), true)) +/* + * Loop over each DSS available for geometry and determine the group and + * instance IDs that should be used to steer MCR accesses toward this DSS. + * @dss: DSS ID to obtain steering for + * @gt: GT structure + * @group: steering group ID, data type: u16 + * @instance: steering instance ID, data type: u16 + */ +#define for_each_geometry_dss(dss, gt, group, instance) \ + for_each_dss_steering(dss, gt, group, instance) \ + if (xe_gt_has_geometry_dss(gt, dss)) + +/* + * Loop over each DSS available for compute and determine the group and + * instance IDs that should be used to steer MCR accesses toward this DSS. + * @dss: DSS ID to obtain steering for + * @gt: GT structure + * @group: steering group ID, data type: u16 + * @instance: steering instance ID, data type: u16 + */ +#define for_each_compute_dss(dss, gt, group, instance) \ + for_each_dss_steering(dss, gt, group, instance) \ + if (xe_gt_has_compute_dss(gt, dss)) + #endif /* _XE_GT_MCR_H_ */ diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index fa9e9853c53b..040dd142c49c 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -19,7 +19,6 @@ #include "xe_guc.h" #include "xe_guc_ct.h" #include "xe_migrate.h" -#include "xe_pt.h" #include "xe_trace.h" #include "xe_vm.h" @@ -204,15 +203,14 @@ retry_userptr: drm_exec_retry_on_contention(&exec); if (ret) goto unlock_dma_resv; - } - /* Bind VMA only to the GT that has faulted */ - trace_xe_vma_pf_bind(vma); - fence = __xe_pt_bind_vma(tile, vma, xe_tile_migrate_engine(tile), NULL, 0, - vma->tile_present & BIT(tile->id)); - if (IS_ERR(fence)) { - ret = PTR_ERR(fence); - goto unlock_dma_resv; + /* Bind VMA only to the GT that has faulted */ + trace_xe_vma_pf_bind(vma); + fence = xe_vma_rebind(vm, vma, BIT(tile->id)); + if (IS_ERR(fence)) { + ret = PTR_ERR(fence); + goto unlock_dma_resv; + } } /* diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c index 791dcdd767e2..7decf71c2b7d 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c @@ -5,8 +5,12 @@ #include <drm/drm_managed.h> +#include "regs/xe_sriov_regs.h" + #include "xe_gt_sriov_pf.h" #include "xe_gt_sriov_pf_helpers.h" +#include "xe_gt_sriov_pf_service.h" +#include "xe_mmio.h" /* * VF's metadata is maintained in the flexible array where: @@ -48,5 +52,33 @@ int xe_gt_sriov_pf_init_early(struct xe_gt *gt) if (err) return err; + err = xe_gt_sriov_pf_service_init(gt); + if (err) + return err; + return 0; } + +static bool pf_needs_enable_ggtt_guest_update(struct xe_device *xe) +{ + return GRAPHICS_VERx100(xe) == 1200; +} + +static void pf_enable_ggtt_guest_update(struct xe_gt *gt) +{ + xe_mmio_write32(gt, VIRTUAL_CTRL_REG, GUEST_GTT_UPDATE_EN); +} + +/** + * xe_gt_sriov_pf_init_hw - Initialize SR-IOV hardware support. + * @gt: the &xe_gt to initialize + * + * On some platforms the PF must explicitly enable VF's access to the GGTT. + */ +void xe_gt_sriov_pf_init_hw(struct xe_gt *gt) +{ + if (pf_needs_enable_ggtt_guest_update(gt_to_xe(gt))) + pf_enable_ggtt_guest_update(gt); + + xe_gt_sriov_pf_service_update(gt); +} diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf.h index 05142ffc4319..37d7d6c3df03 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.h +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.h @@ -10,11 +10,16 @@ struct xe_gt; #ifdef CONFIG_PCI_IOV int xe_gt_sriov_pf_init_early(struct xe_gt *gt); +void xe_gt_sriov_pf_init_hw(struct xe_gt *gt); #else static inline int xe_gt_sriov_pf_init_early(struct xe_gt *gt) { return 0; } + +static inline void xe_gt_sriov_pf_init_hw(struct xe_gt *gt) +{ +} #endif #endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c index 476d613333a9..f49fc2917f93 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c @@ -25,6 +25,7 @@ #include "xe_guc_fwif.h" #include "xe_guc_id_mgr.h" #include "xe_guc_klv_helpers.h" +#include "xe_guc_klv_thresholds_set.h" #include "xe_guc_submit.h" #include "xe_lmtt.h" #include "xe_map.h" @@ -187,14 +188,20 @@ static int pf_push_vf_cfg_dbs(struct xe_gt *gt, unsigned int vfid, u32 begin, u3 return pf_push_vf_cfg_klvs(gt, vfid, 2, klvs, ARRAY_SIZE(klvs)); } -static int pf_push_vf_cfg_exec_quantum(struct xe_gt *gt, unsigned int vfid, u32 exec_quantum) +static int pf_push_vf_cfg_exec_quantum(struct xe_gt *gt, unsigned int vfid, u32 *exec_quantum) { - return pf_push_vf_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_EXEC_QUANTUM_KEY, exec_quantum); + /* GuC will silently clamp values exceeding max */ + *exec_quantum = min_t(u32, *exec_quantum, GUC_KLV_VF_CFG_EXEC_QUANTUM_MAX_VALUE); + + return pf_push_vf_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_EXEC_QUANTUM_KEY, *exec_quantum); } -static int pf_push_vf_cfg_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u32 preempt_timeout) +static int pf_push_vf_cfg_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u32 *preempt_timeout) { - return pf_push_vf_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY, preempt_timeout); + /* GuC will silently clamp values exceeding max */ + *preempt_timeout = min_t(u32, *preempt_timeout, GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_MAX_VALUE); + + return pf_push_vf_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY, *preempt_timeout); } static int pf_push_vf_cfg_lmem(struct xe_gt *gt, unsigned int vfid, u64 size) @@ -202,6 +209,15 @@ static int pf_push_vf_cfg_lmem(struct xe_gt *gt, unsigned int vfid, u64 size) return pf_push_vf_cfg_u64(gt, vfid, GUC_KLV_VF_CFG_LMEM_SIZE_KEY, size); } +static int pf_push_vf_cfg_threshold(struct xe_gt *gt, unsigned int vfid, + enum xe_guc_klv_threshold_index index, u32 value) +{ + u32 key = xe_guc_klv_threshold_index_to_key(index); + + xe_gt_assert(gt, key); + return pf_push_vf_cfg_u32(gt, vfid, key, value); +} + static struct xe_gt_sriov_config *pf_pick_vf_config(struct xe_gt *gt, unsigned int vfid) { xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); @@ -1604,7 +1620,7 @@ static int pf_provision_exec_quantum(struct xe_gt *gt, unsigned int vfid, struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid); int err; - err = pf_push_vf_cfg_exec_quantum(gt, vfid, exec_quantum); + err = pf_push_vf_cfg_exec_quantum(gt, vfid, &exec_quantum); if (unlikely(err)) return err; @@ -1674,7 +1690,7 @@ static int pf_provision_preempt_timeout(struct xe_gt *gt, unsigned int vfid, struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid); int err; - err = pf_push_vf_cfg_preempt_timeout(gt, vfid, preempt_timeout); + err = pf_push_vf_cfg_preempt_timeout(gt, vfid, &preempt_timeout); if (unlikely(err)) return err; @@ -1742,6 +1758,83 @@ static void pf_reset_config_sched(struct xe_gt *gt, struct xe_gt_sriov_config *c config->preempt_timeout = 0; } +static int pf_provision_threshold(struct xe_gt *gt, unsigned int vfid, + enum xe_guc_klv_threshold_index index, u32 value) +{ + struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid); + int err; + + err = pf_push_vf_cfg_threshold(gt, vfid, index, value); + if (unlikely(err)) + return err; + + config->thresholds[index] = value; + + return 0; +} + +static int pf_get_threshold(struct xe_gt *gt, unsigned int vfid, + enum xe_guc_klv_threshold_index index) +{ + struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid); + + return config->thresholds[index]; +} + +static const char *threshold_unit(u32 threshold) +{ + return threshold ? "" : "(disabled)"; +} + +/** + * xe_gt_sriov_pf_config_set_threshold - Configure threshold for the VF. + * @gt: the &xe_gt + * @vfid: the VF identifier + * @index: the threshold index + * @value: requested value (0 means disabled) + * + * This function can only be called on PF. + * + * Return: 0 on success or a negative error code on failure. + */ +int xe_gt_sriov_pf_config_set_threshold(struct xe_gt *gt, unsigned int vfid, + enum xe_guc_klv_threshold_index index, u32 value) +{ + u32 key = xe_guc_klv_threshold_index_to_key(index); + const char *name = xe_guc_klv_key_to_string(key); + int err; + + mutex_lock(xe_gt_sriov_pf_master_mutex(gt)); + err = pf_provision_threshold(gt, vfid, index, value); + mutex_unlock(xe_gt_sriov_pf_master_mutex(gt)); + + return pf_config_set_u32_done(gt, vfid, value, + xe_gt_sriov_pf_config_get_threshold(gt, vfid, index), + name, threshold_unit, err); +} + +/** + * xe_gt_sriov_pf_config_get_threshold - Get VF's threshold. + * @gt: the &xe_gt + * @vfid: the VF identifier + * @index: the threshold index + * + * This function can only be called on PF. + * + * Return: value of VF's (or PF's) threshold. + */ +u32 xe_gt_sriov_pf_config_get_threshold(struct xe_gt *gt, unsigned int vfid, + enum xe_guc_klv_threshold_index index) +{ + u32 value; + + mutex_lock(xe_gt_sriov_pf_master_mutex(gt)); + value = pf_get_threshold(gt, vfid, index); + mutex_unlock(xe_gt_sriov_pf_master_mutex(gt)); + + return value; +} + static void pf_release_vf_config(struct xe_gt *gt, unsigned int vfid) { struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid); diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h index 5e6b36f00b5b..e8238c1ad06a 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h @@ -8,6 +8,7 @@ #include <linux/types.h> +enum xe_guc_klv_threshold_index; struct drm_printer; struct xe_gt; @@ -43,6 +44,11 @@ u32 xe_gt_sriov_pf_config_get_preempt_timeout(struct xe_gt *gt, unsigned int vfi int xe_gt_sriov_pf_config_set_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u32 preempt_timeout); +u32 xe_gt_sriov_pf_config_get_threshold(struct xe_gt *gt, unsigned int vfid, + enum xe_guc_klv_threshold_index index); +int xe_gt_sriov_pf_config_set_threshold(struct xe_gt *gt, unsigned int vfid, + enum xe_guc_klv_threshold_index index, u32 value); + int xe_gt_sriov_pf_config_set_fair(struct xe_gt *gt, unsigned int vfid, unsigned int num_vfs); int xe_gt_sriov_pf_config_release(struct xe_gt *gt, unsigned int vfid, bool force); int xe_gt_sriov_pf_config_push(struct xe_gt *gt, unsigned int vfid, bool refresh); diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h index d3745c355957..7bc66656fcc7 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h @@ -8,6 +8,8 @@ #include <drm/drm_mm.h> +#include "xe_guc_klv_thresholds_set_types.h" + struct xe_bo; /** @@ -32,6 +34,8 @@ struct xe_gt_sriov_config { u32 exec_quantum; /** @preempt_timeout: preemption timeout in microseconds. */ u32 preempt_timeout; + /** @thresholds: GuC thresholds for adverse events notifications. */ + u32 thresholds[XE_GUC_KLV_NUM_THRESHOLDS]; }; /** diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c new file mode 100644 index 000000000000..2290ddaf9594 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c @@ -0,0 +1,427 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#include <linux/debugfs.h> + +#include <drm/drm_print.h> +#include <drm/drm_debugfs.h> + +#include "xe_bo.h" +#include "xe_debugfs.h" +#include "xe_device.h" +#include "xe_gt.h" +#include "xe_gt_debugfs.h" +#include "xe_gt_sriov_pf_config.h" +#include "xe_gt_sriov_pf_control.h" +#include "xe_gt_sriov_pf_debugfs.h" +#include "xe_gt_sriov_pf_helpers.h" +#include "xe_gt_sriov_pf_monitor.h" +#include "xe_gt_sriov_pf_policy.h" +#include "xe_gt_sriov_pf_service.h" +#include "xe_pm.h" + +/* + * /sys/kernel/debug/dri/0/ + * ├── gt0 # d_inode->i_private = gt + * │ ├── pf # d_inode->i_private = gt + * │ ├── vf1 # d_inode->i_private = VFID(1) + * : : + * │ ├── vfN # d_inode->i_private = VFID(N) + */ + +static void *extract_priv(struct dentry *d) +{ + return d->d_inode->i_private; +} + +static struct xe_gt *extract_gt(struct dentry *d) +{ + return extract_priv(d->d_parent); +} + +static unsigned int extract_vfid(struct dentry *d) +{ + return extract_priv(d) == extract_gt(d) ? PFID : (uintptr_t)extract_priv(d); +} + +/* + * /sys/kernel/debug/dri/0/ + * ├── gt0 + * │ ├── pf + * │ │ ├── ggtt_available + * │ │ ├── ggtt_provisioned + * │ │ ├── contexts_provisioned + * │ │ ├── doorbells_provisioned + * │ │ ├── runtime_registers + * │ │ ├── negotiated_versions + * │ │ ├── adverse_events + */ + +static const struct drm_info_list pf_info[] = { + { + "ggtt_available", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_pf_config_print_available_ggtt, + }, + { + "ggtt_provisioned", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_pf_config_print_ggtt, + }, + { + "contexts_provisioned", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_pf_config_print_ctxs, + }, + { + "doorbells_provisioned", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_pf_config_print_dbs, + }, + { + "runtime_registers", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_pf_service_print_runtime, + }, + { + "negotiated_versions", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_pf_service_print_version, + }, + { + "adverse_events", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_pf_monitor_print_events, + }, +}; + +/* + * /sys/kernel/debug/dri/0/ + * ├── gt0 + * │ ├── pf + * │ │ ├── reset_engine + * │ │ ├── sample_period + * │ │ ├── sched_if_idle + */ + +#define DEFINE_SRIOV_GT_POLICY_DEBUGFS_ATTRIBUTE(POLICY, TYPE, FORMAT) \ + \ +static int POLICY##_set(void *data, u64 val) \ +{ \ + struct xe_gt *gt = extract_gt(data); \ + struct xe_device *xe = gt_to_xe(gt); \ + int err; \ + \ + if (val > (TYPE)~0ull) \ + return -EOVERFLOW; \ + \ + xe_pm_runtime_get(xe); \ + err = xe_gt_sriov_pf_policy_set_##POLICY(gt, val); \ + xe_pm_runtime_put(xe); \ + \ + return err; \ +} \ + \ +static int POLICY##_get(void *data, u64 *val) \ +{ \ + struct xe_gt *gt = extract_gt(data); \ + \ + *val = xe_gt_sriov_pf_policy_get_##POLICY(gt); \ + return 0; \ +} \ + \ +DEFINE_DEBUGFS_ATTRIBUTE(POLICY##_fops, POLICY##_get, POLICY##_set, FORMAT) + +DEFINE_SRIOV_GT_POLICY_DEBUGFS_ATTRIBUTE(reset_engine, bool, "%llu\n"); +DEFINE_SRIOV_GT_POLICY_DEBUGFS_ATTRIBUTE(sched_if_idle, bool, "%llu\n"); +DEFINE_SRIOV_GT_POLICY_DEBUGFS_ATTRIBUTE(sample_period, u32, "%llu\n"); + +static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent) +{ + xe_gt_assert(gt, gt == extract_gt(parent)); + xe_gt_assert(gt, PFID == extract_vfid(parent)); + + debugfs_create_file_unsafe("reset_engine", 0644, parent, parent, &reset_engine_fops); + debugfs_create_file_unsafe("sched_if_idle", 0644, parent, parent, &sched_if_idle_fops); + debugfs_create_file_unsafe("sample_period_ms", 0644, parent, parent, &sample_period_fops); +} + +/* + * /sys/kernel/debug/dri/0/ + * ├── gt0 + * │ ├── pf + * │ │ ├── ggtt_spare + * │ │ ├── lmem_spare + * │ │ ├── doorbells_spare + * │ │ ├── contexts_spare + * │ │ ├── exec_quantum_ms + * │ │ ├── preempt_timeout_us + * │ ├── vf1 + * │ │ ├── ggtt_quota + * │ │ ├── lmem_quota + * │ │ ├── doorbells_quota + * │ │ ├── contexts_quota + * │ │ ├── exec_quantum_ms + * │ │ ├── preempt_timeout_us + */ + +#define DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(CONFIG, TYPE, FORMAT) \ + \ +static int CONFIG##_set(void *data, u64 val) \ +{ \ + struct xe_gt *gt = extract_gt(data); \ + unsigned int vfid = extract_vfid(data); \ + struct xe_device *xe = gt_to_xe(gt); \ + int err; \ + \ + if (val > (TYPE)~0ull) \ + return -EOVERFLOW; \ + \ + xe_pm_runtime_get(xe); \ + err = xe_gt_sriov_pf_config_set_##CONFIG(gt, vfid, val); \ + xe_pm_runtime_put(xe); \ + \ + return err; \ +} \ + \ +static int CONFIG##_get(void *data, u64 *val) \ +{ \ + struct xe_gt *gt = extract_gt(data); \ + unsigned int vfid = extract_vfid(data); \ + \ + *val = xe_gt_sriov_pf_config_get_##CONFIG(gt, vfid); \ + return 0; \ +} \ + \ +DEFINE_DEBUGFS_ATTRIBUTE(CONFIG##_fops, CONFIG##_get, CONFIG##_set, FORMAT) + +DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(ggtt, u64, "%llu\n"); +DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(lmem, u64, "%llu\n"); +DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(ctxs, u32, "%llu\n"); +DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(dbs, u32, "%llu\n"); +DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(exec_quantum, u32, "%llu\n"); +DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(preempt_timeout, u32, "%llu\n"); + +/* + * /sys/kernel/debug/dri/0/ + * ├── gt0 + * │ ├── pf + * │ │ ├── threshold_cat_error_count + * │ │ ├── threshold_doorbell_time_us + * │ │ ├── threshold_engine_reset_count + * │ │ ├── threshold_guc_time_us + * │ │ ├── threshold_irq_time_us + * │ │ ├── threshold_page_fault_count + * │ ├── vf1 + * │ │ ├── threshold_cat_error_count + * │ │ ├── threshold_doorbell_time_us + * │ │ ├── threshold_engine_reset_count + * │ │ ├── threshold_guc_time_us + * │ │ ├── threshold_irq_time_us + * │ │ ├── threshold_page_fault_count + */ + +static int set_threshold(void *data, u64 val, enum xe_guc_klv_threshold_index index) +{ + struct xe_gt *gt = extract_gt(data); + unsigned int vfid = extract_vfid(data); + struct xe_device *xe = gt_to_xe(gt); + int err; + + if (val > (u32)~0ull) + return -EOVERFLOW; + + xe_pm_runtime_get(xe); + err = xe_gt_sriov_pf_config_set_threshold(gt, vfid, index, val); + xe_pm_runtime_put(xe); + + return err; +} + +static int get_threshold(void *data, u64 *val, enum xe_guc_klv_threshold_index index) +{ + struct xe_gt *gt = extract_gt(data); + unsigned int vfid = extract_vfid(data); + + *val = xe_gt_sriov_pf_config_get_threshold(gt, vfid, index); + return 0; +} + +#define DEFINE_SRIOV_GT_THRESHOLD_DEBUGFS_ATTRIBUTE(THRESHOLD, INDEX) \ + \ +static int THRESHOLD##_set(void *data, u64 val) \ +{ \ + return set_threshold(data, val, INDEX); \ +} \ + \ +static int THRESHOLD##_get(void *data, u64 *val) \ +{ \ + return get_threshold(data, val, INDEX); \ +} \ + \ +DEFINE_DEBUGFS_ATTRIBUTE(THRESHOLD##_fops, THRESHOLD##_get, THRESHOLD##_set, "%llu\n") + +/* generate all threshold attributes */ +#define define_threshold_attribute(TAG, NAME, ...) \ + DEFINE_SRIOV_GT_THRESHOLD_DEBUGFS_ATTRIBUTE(NAME, MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)); +MAKE_XE_GUC_KLV_THRESHOLDS_SET(define_threshold_attribute) +#undef define_threshold_attribute + +static void pf_add_config_attrs(struct xe_gt *gt, struct dentry *parent, unsigned int vfid) +{ + xe_gt_assert(gt, gt == extract_gt(parent)); + xe_gt_assert(gt, vfid == extract_vfid(parent)); + + if (!xe_gt_is_media_type(gt)) { + debugfs_create_file_unsafe(vfid ? "ggtt_quota" : "ggtt_spare", + 0644, parent, parent, &ggtt_fops); + if (IS_DGFX(gt_to_xe(gt))) + debugfs_create_file_unsafe(vfid ? "lmem_quota" : "lmem_spare", + 0644, parent, parent, &lmem_fops); + } + debugfs_create_file_unsafe(vfid ? "doorbells_quota" : "doorbells_spare", + 0644, parent, parent, &dbs_fops); + debugfs_create_file_unsafe(vfid ? "contexts_quota" : "contexts_spare", + 0644, parent, parent, &ctxs_fops); + debugfs_create_file_unsafe("exec_quantum_ms", 0644, parent, parent, + &exec_quantum_fops); + debugfs_create_file_unsafe("preempt_timeout_us", 0644, parent, parent, + &preempt_timeout_fops); + + /* register all threshold attributes */ +#define register_threshold_attribute(TAG, NAME, ...) \ + debugfs_create_file_unsafe("threshold_" #NAME, 0644, parent, parent, \ + &NAME##_fops); + MAKE_XE_GUC_KLV_THRESHOLDS_SET(register_threshold_attribute) +#undef register_threshold_attribute +} + +/* + * /sys/kernel/debug/dri/0/ + * ├── gt0 + * │ ├── vf1 + * │ │ ├── control { stop, pause, resume } + */ + +static const struct { + const char *cmd; + int (*fn)(struct xe_gt *gt, unsigned int vfid); +} control_cmds[] = { + { "stop", xe_gt_sriov_pf_control_stop_vf }, + { "pause", xe_gt_sriov_pf_control_pause_vf }, + { "resume", xe_gt_sriov_pf_control_resume_vf }, +}; + +static ssize_t control_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) +{ + struct dentry *dent = file_dentry(file); + struct dentry *parent = dent->d_parent; + struct xe_gt *gt = extract_gt(parent); + struct xe_device *xe = gt_to_xe(gt); + unsigned int vfid = extract_vfid(parent); + int ret = -EINVAL; + char cmd[32]; + size_t n; + + xe_gt_assert(gt, vfid); + xe_gt_sriov_pf_assert_vfid(gt, vfid); + + if (*pos) + return -ESPIPE; + + if (count > sizeof(cmd) - 1) + return -EINVAL; + + ret = simple_write_to_buffer(cmd, sizeof(cmd) - 1, pos, buf, count); + if (ret < 0) + return ret; + cmd[ret] = '\0'; + + for (n = 0; n < ARRAY_SIZE(control_cmds); n++) { + xe_gt_assert(gt, sizeof(cmd) > strlen(control_cmds[n].cmd)); + + if (sysfs_streq(cmd, control_cmds[n].cmd)) { + xe_pm_runtime_get(xe); + ret = control_cmds[n].fn ? (*control_cmds[n].fn)(gt, vfid) : 0; + xe_pm_runtime_put(xe); + break; + } + } + + return (ret < 0) ? ret : count; +} + +static ssize_t control_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) +{ + char help[128]; + size_t n; + + help[0] = '\0'; + for (n = 0; n < ARRAY_SIZE(control_cmds); n++) { + strlcat(help, control_cmds[n].cmd, sizeof(help)); + strlcat(help, "\n", sizeof(help)); + } + + return simple_read_from_buffer(buf, count, ppos, help, strlen(help)); +} + +static const struct file_operations control_ops = { + .owner = THIS_MODULE, + .open = simple_open, + .write = control_write, + .read = control_read, + .llseek = default_llseek, +}; + +/** + * xe_gt_sriov_pf_debugfs_register - Register SR-IOV PF specific entries in GT debugfs. + * @gt: the &xe_gt to register + * @root: the &dentry that represents the GT directory + * + * Register SR-IOV PF entries that are GT related and must be shown under GT debugfs. + */ +void xe_gt_sriov_pf_debugfs_register(struct xe_gt *gt, struct dentry *root) +{ + struct xe_device *xe = gt_to_xe(gt); + struct drm_minor *minor = xe->drm.primary; + int n, totalvfs = xe_sriov_pf_get_totalvfs(xe); + struct dentry *pfdentry; + struct dentry *vfdentry; + char buf[14]; /* should be enough up to "vf%u\0" for 2^32 - 1 */ + + xe_gt_assert(gt, IS_SRIOV_PF(xe)); + xe_gt_assert(gt, root->d_inode->i_private == gt); + + /* + * /sys/kernel/debug/dri/0/ + * ├── gt0 + * │ ├── pf + */ + pfdentry = debugfs_create_dir("pf", root); + if (IS_ERR(pfdentry)) + return; + pfdentry->d_inode->i_private = gt; + + drm_debugfs_create_files(pf_info, ARRAY_SIZE(pf_info), pfdentry, minor); + pf_add_policy_attrs(gt, pfdentry); + pf_add_config_attrs(gt, pfdentry, PFID); + + for (n = 1; n <= totalvfs; n++) { + /* + * /sys/kernel/debug/dri/0/ + * ├── gt0 + * │ ├── vf1 + * │ ├── vf2 + */ + snprintf(buf, sizeof(buf), "vf%u", n); + vfdentry = debugfs_create_dir(buf, root); + if (IS_ERR(vfdentry)) + break; + vfdentry->d_inode->i_private = (void *)(uintptr_t)n; + + pf_add_config_attrs(gt, vfdentry, VFID(n)); + debugfs_create_file("control", 0600, vfdentry, NULL, &control_ops); + } +} diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.h new file mode 100644 index 000000000000..038cc8ddc244 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_GT_SRIOV_PF_DEBUGFS_H_ +#define _XE_GT_SRIOV_PF_DEBUGFS_H_ + +struct xe_gt; +struct dentry; + +#ifdef CONFIG_PCI_IOV +void xe_gt_sriov_pf_debugfs_register(struct xe_gt *gt, struct dentry *root); +#else +static inline void xe_gt_sriov_pf_debugfs_register(struct xe_gt *gt, struct dentry *root) { } +#endif + +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_monitor.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_monitor.c new file mode 100644 index 000000000000..7d532bded02a --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_monitor.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#include "abi/guc_actions_sriov_abi.h" +#include "abi/guc_messages_abi.h" + +#include "xe_gt_sriov_pf_config.h" +#include "xe_gt_sriov_pf_helpers.h" +#include "xe_gt_sriov_pf_monitor.h" +#include "xe_gt_sriov_printk.h" +#include "xe_guc_klv_helpers.h" +#include "xe_guc_klv_thresholds_set.h" + +/** + * xe_gt_sriov_pf_monitor_flr - Cleanup VF data after VF FLR. + * @gt: the &xe_gt + * @vfid: the VF identifier + * + * On FLR this function will reset all event data related to the VF. + * This function is for PF only. + */ +void xe_gt_sriov_pf_monitor_flr(struct xe_gt *gt, u32 vfid) +{ + int e; + + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); + xe_gt_sriov_pf_assert_vfid(gt, vfid); + + for (e = 0; e < XE_GUC_KLV_NUM_THRESHOLDS; e++) + gt->sriov.pf.vfs[vfid].monitor.guc.events[e] = 0; +} + +static void pf_update_event_counter(struct xe_gt *gt, u32 vfid, + enum xe_guc_klv_threshold_index e) +{ + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); + xe_gt_assert(gt, e < XE_GUC_KLV_NUM_THRESHOLDS); + + gt->sriov.pf.vfs[vfid].monitor.guc.events[e]++; +} + +static int pf_handle_vf_threshold_event(struct xe_gt *gt, u32 vfid, u32 threshold) +{ + char origin[8]; + int e; + + e = xe_guc_klv_threshold_key_to_index(threshold); + xe_sriov_function_name(vfid, origin, sizeof(origin)); + + /* was there a new KEY added that we missed? */ + if (unlikely(e < 0)) { + xe_gt_sriov_notice(gt, "unknown threshold key %#x reported for %s\n", + threshold, origin); + return -ENOTCONN; + } + + xe_gt_sriov_dbg(gt, "%s exceeded threshold %u %s\n", + origin, xe_gt_sriov_pf_config_get_threshold(gt, vfid, e), + xe_guc_klv_key_to_string(threshold)); + + pf_update_event_counter(gt, vfid, e); + + return 0; +} + +/** + * xe_gt_sriov_pf_monitor_process_guc2pf - Handle adverse event notification from the GuC. + * @gt: the &xe_gt + * @msg: G2H event message + * @len: length of the message + * + * This function is intended for PF only. + * + * Return: 0 on success or a negative error code on failure. + */ +int xe_gt_sriov_pf_monitor_process_guc2pf(struct xe_gt *gt, const u32 *msg, u32 len) +{ + struct xe_device *xe = gt_to_xe(gt); + u32 vfid; + u32 threshold; + + xe_gt_assert(gt, len >= GUC_HXG_MSG_MIN_LEN); + xe_gt_assert(gt, FIELD_GET(GUC_HXG_MSG_0_ORIGIN, msg[0]) == GUC_HXG_ORIGIN_GUC); + xe_gt_assert(gt, FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]) == GUC_HXG_TYPE_EVENT); + xe_gt_assert(gt, FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[0]) == + GUC_ACTION_GUC2PF_ADVERSE_EVENT); + + if (unlikely(!IS_SRIOV_PF(xe))) + return -EPROTO; + + if (unlikely(FIELD_GET(GUC2PF_ADVERSE_EVENT_EVENT_MSG_0_MBZ, msg[0]))) + return -EPFNOSUPPORT; + + if (unlikely(len < GUC2PF_ADVERSE_EVENT_EVENT_MSG_LEN)) + return -EPROTO; + + vfid = FIELD_GET(GUC2PF_ADVERSE_EVENT_EVENT_MSG_1_VFID, msg[1]); + threshold = FIELD_GET(GUC2PF_ADVERSE_EVENT_EVENT_MSG_2_THRESHOLD, msg[2]); + + if (unlikely(vfid > xe_gt_sriov_pf_get_totalvfs(gt))) + return -EINVAL; + + return pf_handle_vf_threshold_event(gt, vfid, threshold); +} + +/** + * xe_gt_sriov_pf_monitor_print_events - Print adverse events counters. + * @gt: the &xe_gt to print events from + * @p: the &drm_printer + * + * Print adverse events counters for all VFs. + * VFs with no events are not printed. + * + * This function can only be called on PF. + */ +void xe_gt_sriov_pf_monitor_print_events(struct xe_gt *gt, struct drm_printer *p) +{ + unsigned int n, total_vfs = xe_gt_sriov_pf_get_totalvfs(gt); + const struct xe_gt_sriov_monitor *data; + int e; + + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); + + for (n = 1; n <= total_vfs; n++) { + data = >->sriov.pf.vfs[n].monitor; + + for (e = 0; e < XE_GUC_KLV_NUM_THRESHOLDS; e++) + if (data->guc.events[e]) + break; + + /* skip empty unless in debug mode */ + if (e >= XE_GUC_KLV_NUM_THRESHOLDS && + !IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV)) + continue; + +#define __format(...) "%s:%u " +#define __value(TAG, NAME, ...) , #NAME, data->guc.events[MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)] + + drm_printf(p, "VF%u:\t" MAKE_XE_GUC_KLV_THRESHOLDS_SET(__format) "\n", + n MAKE_XE_GUC_KLV_THRESHOLDS_SET(__value)); + +#undef __format +#undef __value + } +} diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_monitor.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_monitor.h new file mode 100644 index 000000000000..7ca9351a271b --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_monitor.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_GT_SRIOV_PF_MONITOR_H_ +#define _XE_GT_SRIOV_PF_MONITOR_H_ + +#include <linux/errno.h> +#include <linux/types.h> + +struct drm_printer; +struct xe_gt; + +void xe_gt_sriov_pf_monitor_flr(struct xe_gt *gt, u32 vfid); +void xe_gt_sriov_pf_monitor_print_events(struct xe_gt *gt, struct drm_printer *p); + +#ifdef CONFIG_PCI_IOV +int xe_gt_sriov_pf_monitor_process_guc2pf(struct xe_gt *gt, const u32 *msg, u32 len); +#else +static inline int xe_gt_sriov_pf_monitor_process_guc2pf(struct xe_gt *gt, const u32 *msg, u32 len) +{ + return -EPROTO; +} +#endif + +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_monitor_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_monitor_types.h new file mode 100644 index 000000000000..e27c0308c5db --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_monitor_types.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_GT_SRIOV_PF_MONITOR_TYPES_H_ +#define _XE_GT_SRIOV_PF_MONITOR_TYPES_H_ + +#include "xe_guc_klv_thresholds_set_types.h" + +/** + * struct xe_gt_sriov_monitor - GT level per-VF monitoring data. + */ +struct xe_gt_sriov_monitor { + /** @guc: monitoring data related to the GuC. */ + struct { + /** @guc.events: number of adverse events reported by the GuC. */ + unsigned int events[XE_GUC_KLV_NUM_THRESHOLDS]; + } guc; +}; + +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c new file mode 100644 index 000000000000..0e23b7ea4f3e --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c @@ -0,0 +1,550 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#include <drm/drm_managed.h> + +#include "abi/guc_actions_sriov_abi.h" +#include "abi/guc_relay_actions_abi.h" + +#include "regs/xe_gt_regs.h" +#include "regs/xe_guc_regs.h" +#include "regs/xe_regs.h" + +#include "xe_mmio.h" +#include "xe_gt_sriov_printk.h" +#include "xe_gt_sriov_pf_helpers.h" +#include "xe_gt_sriov_pf_service.h" +#include "xe_gt_sriov_pf_service_types.h" +#include "xe_guc_ct.h" +#include "xe_guc_hxg_helpers.h" + +static void pf_init_versions(struct xe_gt *gt) +{ + BUILD_BUG_ON(!GUC_RELAY_VERSION_BASE_MAJOR && !GUC_RELAY_VERSION_BASE_MINOR); + BUILD_BUG_ON(GUC_RELAY_VERSION_BASE_MAJOR > GUC_RELAY_VERSION_LATEST_MAJOR); + + /* base versions may differ between platforms */ + gt->sriov.pf.service.version.base.major = GUC_RELAY_VERSION_BASE_MAJOR; + gt->sriov.pf.service.version.base.minor = GUC_RELAY_VERSION_BASE_MINOR; + + /* latest version is same for all platforms */ + gt->sriov.pf.service.version.latest.major = GUC_RELAY_VERSION_LATEST_MAJOR; + gt->sriov.pf.service.version.latest.minor = GUC_RELAY_VERSION_LATEST_MINOR; +} + +/* Return: 0 on success or a negative error code on failure. */ +static int pf_negotiate_version(struct xe_gt *gt, + u32 wanted_major, u32 wanted_minor, + u32 *major, u32 *minor) +{ + struct xe_gt_sriov_pf_service_version base = gt->sriov.pf.service.version.base; + struct xe_gt_sriov_pf_service_version latest = gt->sriov.pf.service.version.latest; + + xe_gt_assert(gt, base.major); + xe_gt_assert(gt, base.major <= latest.major); + xe_gt_assert(gt, (base.major < latest.major) || (base.minor <= latest.minor)); + + /* VF doesn't care - return our latest */ + if (wanted_major == VF2PF_HANDSHAKE_MAJOR_ANY && + wanted_minor == VF2PF_HANDSHAKE_MINOR_ANY) { + *major = latest.major; + *minor = latest.minor; + return 0; + } + + /* VF wants newer than our - return our latest */ + if (wanted_major > latest.major) { + *major = latest.major; + *minor = latest.minor; + return 0; + } + + /* VF wants older than min required - reject */ + if (wanted_major < base.major || + (wanted_major == base.major && wanted_minor < base.minor)) { + return -EPERM; + } + + /* previous major - return wanted, as we should still support it */ + if (wanted_major < latest.major) { + /* XXX: we are not prepared for multi-versions yet */ + xe_gt_assert(gt, base.major == latest.major); + return -ENOPKG; + } + + /* same major - return common minor */ + *major = wanted_major; + *minor = min_t(u32, latest.minor, wanted_minor); + return 0; +} + +static void pf_connect(struct xe_gt *gt, u32 vfid, u32 major, u32 minor) +{ + xe_gt_sriov_pf_assert_vfid(gt, vfid); + xe_gt_assert(gt, major || minor); + + gt->sriov.pf.vfs[vfid].version.major = major; + gt->sriov.pf.vfs[vfid].version.minor = minor; +} + +static void pf_disconnect(struct xe_gt *gt, u32 vfid) +{ + xe_gt_sriov_pf_assert_vfid(gt, vfid); + + gt->sriov.pf.vfs[vfid].version.major = 0; + gt->sriov.pf.vfs[vfid].version.minor = 0; +} + +static bool pf_is_negotiated(struct xe_gt *gt, u32 vfid, u32 major, u32 minor) +{ + xe_gt_sriov_pf_assert_vfid(gt, vfid); + + return major == gt->sriov.pf.vfs[vfid].version.major && + minor <= gt->sriov.pf.vfs[vfid].version.minor; +} + +static const struct xe_reg tgl_runtime_regs[] = { + RPM_CONFIG0, /* _MMIO(0x0d00) */ + MIRROR_FUSE3, /* _MMIO(0x9118) */ + XELP_EU_ENABLE, /* _MMIO(0x9134) */ + XELP_GT_SLICE_ENABLE, /* _MMIO(0x9138) */ + XELP_GT_GEOMETRY_DSS_ENABLE, /* _MMIO(0x913c) */ + GT_VEBOX_VDBOX_DISABLE, /* _MMIO(0x9140) */ + CTC_MODE, /* _MMIO(0xa26c) */ + HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ + TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ +}; + +static const struct xe_reg ats_m_runtime_regs[] = { + RPM_CONFIG0, /* _MMIO(0x0d00) */ + MIRROR_FUSE3, /* _MMIO(0x9118) */ + MIRROR_FUSE1, /* _MMIO(0x911c) */ + XELP_EU_ENABLE, /* _MMIO(0x9134) */ + XELP_GT_GEOMETRY_DSS_ENABLE, /* _MMIO(0x913c) */ + GT_VEBOX_VDBOX_DISABLE, /* _MMIO(0x9140) */ + XEHP_GT_COMPUTE_DSS_ENABLE, /* _MMIO(0x9144) */ + CTC_MODE, /* _MMIO(0xa26c) */ + HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ + TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ +}; + +static const struct xe_reg pvc_runtime_regs[] = { + RPM_CONFIG0, /* _MMIO(0x0d00) */ + MIRROR_FUSE3, /* _MMIO(0x9118) */ + XELP_EU_ENABLE, /* _MMIO(0x9134) */ + XELP_GT_GEOMETRY_DSS_ENABLE, /* _MMIO(0x913c) */ + GT_VEBOX_VDBOX_DISABLE, /* _MMIO(0x9140) */ + XEHP_GT_COMPUTE_DSS_ENABLE, /* _MMIO(0x9144) */ + XEHPC_GT_COMPUTE_DSS_ENABLE_EXT,/* _MMIO(0x9148) */ + CTC_MODE, /* _MMIO(0xA26C) */ + HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ + TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ +}; + +static const struct xe_reg ver_1270_runtime_regs[] = { + RPM_CONFIG0, /* _MMIO(0x0d00) */ + XEHP_FUSE4, /* _MMIO(0x9114) */ + MIRROR_FUSE3, /* _MMIO(0x9118) */ + MIRROR_FUSE1, /* _MMIO(0x911c) */ + XELP_EU_ENABLE, /* _MMIO(0x9134) */ + XELP_GT_GEOMETRY_DSS_ENABLE, /* _MMIO(0x913c) */ + GT_VEBOX_VDBOX_DISABLE, /* _MMIO(0x9140) */ + XEHP_GT_COMPUTE_DSS_ENABLE, /* _MMIO(0x9144) */ + XEHPC_GT_COMPUTE_DSS_ENABLE_EXT,/* _MMIO(0x9148) */ + CTC_MODE, /* _MMIO(0xa26c) */ + HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ + TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ +}; + +static const struct xe_reg ver_2000_runtime_regs[] = { + RPM_CONFIG0, /* _MMIO(0x0d00) */ + XEHP_FUSE4, /* _MMIO(0x9114) */ + MIRROR_FUSE3, /* _MMIO(0x9118) */ + MIRROR_FUSE1, /* _MMIO(0x911c) */ + XELP_EU_ENABLE, /* _MMIO(0x9134) */ + XELP_GT_GEOMETRY_DSS_ENABLE, /* _MMIO(0x913c) */ + GT_VEBOX_VDBOX_DISABLE, /* _MMIO(0x9140) */ + XEHP_GT_COMPUTE_DSS_ENABLE, /* _MMIO(0x9144) */ + XEHPC_GT_COMPUTE_DSS_ENABLE_EXT,/* _MMIO(0x9148) */ + XE2_GT_COMPUTE_DSS_2, /* _MMIO(0x914c) */ + XE2_GT_GEOMETRY_DSS_1, /* _MMIO(0x9150) */ + XE2_GT_GEOMETRY_DSS_2, /* _MMIO(0x9154) */ + CTC_MODE, /* _MMIO(0xa26c) */ + HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ + TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ +}; + +static const struct xe_reg *pick_runtime_regs(struct xe_device *xe, unsigned int *count) +{ + const struct xe_reg *regs; + + if (GRAPHICS_VERx100(xe) >= 2000) { + *count = ARRAY_SIZE(ver_2000_runtime_regs); + regs = ver_2000_runtime_regs; + } else if (GRAPHICS_VERx100(xe) >= 1270) { + *count = ARRAY_SIZE(ver_1270_runtime_regs); + regs = ver_1270_runtime_regs; + } else if (GRAPHICS_VERx100(xe) == 1260) { + *count = ARRAY_SIZE(pvc_runtime_regs); + regs = pvc_runtime_regs; + } else if (GRAPHICS_VERx100(xe) == 1255) { + *count = ARRAY_SIZE(ats_m_runtime_regs); + regs = ats_m_runtime_regs; + } else if (GRAPHICS_VERx100(xe) == 1200) { + *count = ARRAY_SIZE(tgl_runtime_regs); + regs = tgl_runtime_regs; + } else { + regs = ERR_PTR(-ENOPKG); + *count = 0; + } + + return regs; +} + +static int pf_alloc_runtime_info(struct xe_gt *gt) +{ + struct xe_device *xe = gt_to_xe(gt); + const struct xe_reg *regs; + unsigned int size; + u32 *values; + + xe_gt_assert(gt, IS_SRIOV_PF(xe)); + xe_gt_assert(gt, !gt->sriov.pf.service.runtime.size); + xe_gt_assert(gt, !gt->sriov.pf.service.runtime.regs); + xe_gt_assert(gt, !gt->sriov.pf.service.runtime.values); + + regs = pick_runtime_regs(xe, &size); + if (IS_ERR(regs)) + return PTR_ERR(regs); + + if (unlikely(!size)) + return 0; + + values = drmm_kcalloc(&xe->drm, size, sizeof(u32), GFP_KERNEL); + if (!values) + return -ENOMEM; + + gt->sriov.pf.service.runtime.size = size; + gt->sriov.pf.service.runtime.regs = regs; + gt->sriov.pf.service.runtime.values = values; + + return 0; +} + +static void read_many(struct xe_gt *gt, unsigned int count, + const struct xe_reg *regs, u32 *values) +{ + while (count--) + *values++ = xe_mmio_read32(gt, *regs++); +} + +static void pf_prepare_runtime_info(struct xe_gt *gt) +{ + const struct xe_reg *regs; + unsigned int size; + u32 *values; + + if (!gt->sriov.pf.service.runtime.size) + return; + + size = gt->sriov.pf.service.runtime.size; + regs = gt->sriov.pf.service.runtime.regs; + values = gt->sriov.pf.service.runtime.values; + + read_many(gt, size, regs, values); + + if (IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV)) { + struct drm_printer p = xe_gt_info_printer(gt); + + xe_gt_sriov_pf_service_print_runtime(gt, &p); + } +} + +/** + * xe_gt_sriov_pf_service_init - Early initialization of the GT SR-IOV PF services. + * @gt: the &xe_gt to initialize + * + * Performs early initialization of the GT SR-IOV PF services, including preparation + * of the runtime info that will be shared with VFs. + * + * This function can only be called on PF. + */ +int xe_gt_sriov_pf_service_init(struct xe_gt *gt) +{ + int err; + + pf_init_versions(gt); + + err = pf_alloc_runtime_info(gt); + if (unlikely(err)) + goto failed; + + return 0; +failed: + xe_gt_sriov_err(gt, "Failed to initialize service (%pe)\n", ERR_PTR(err)); + return err; +} + +/** + * xe_gt_sriov_pf_service_update - Update PF SR-IOV services. + * @gt: the &xe_gt to update + * + * Updates runtime data shared with VFs. + * + * This function can be called more than once. + * This function can only be called on PF. + */ +void xe_gt_sriov_pf_service_update(struct xe_gt *gt) +{ + pf_prepare_runtime_info(gt); +} + +/** + * xe_gt_sriov_pf_service_reset - Reset a connection with the VF. + * @gt: the &xe_gt + * @vfid: the VF identifier + * + * Reset a VF driver negotiated VF/PF ABI version. + * After that point, the VF driver will have to perform new version handshake + * to continue use of the PF services again. + * + * This function can only be called on PF. + */ +void xe_gt_sriov_pf_service_reset(struct xe_gt *gt, unsigned int vfid) +{ + pf_disconnect(gt, vfid); +} + +/* Return: 0 on success or a negative error code on failure. */ +static int pf_process_handshake(struct xe_gt *gt, u32 vfid, + u32 wanted_major, u32 wanted_minor, + u32 *major, u32 *minor) +{ + int err; + + xe_gt_sriov_dbg_verbose(gt, "VF%u wants ABI version %u.%u\n", + vfid, wanted_major, wanted_minor); + + err = pf_negotiate_version(gt, wanted_major, wanted_minor, major, minor); + + if (err < 0) { + xe_gt_sriov_notice(gt, "VF%u failed to negotiate ABI %u.%u (%pe)\n", + vfid, wanted_major, wanted_minor, ERR_PTR(err)); + pf_disconnect(gt, vfid); + } else { + xe_gt_sriov_dbg(gt, "VF%u negotiated ABI version %u.%u\n", + vfid, *major, *minor); + pf_connect(gt, vfid, *major, *minor); + } + + return 0; +} + +/* Return: length of the response message or a negative error code on failure. */ +static int pf_process_handshake_msg(struct xe_gt *gt, u32 origin, + const u32 *request, u32 len, u32 *response, u32 size) +{ + u32 wanted_major, wanted_minor; + u32 major, minor; + u32 mbz; + int err; + + if (unlikely(len != VF2PF_HANDSHAKE_REQUEST_MSG_LEN)) + return -EMSGSIZE; + + mbz = FIELD_GET(VF2PF_HANDSHAKE_REQUEST_MSG_0_MBZ, request[0]); + if (unlikely(mbz)) + return -EPFNOSUPPORT; + + wanted_major = FIELD_GET(VF2PF_HANDSHAKE_REQUEST_MSG_1_MAJOR, request[1]); + wanted_minor = FIELD_GET(VF2PF_HANDSHAKE_REQUEST_MSG_1_MINOR, request[1]); + + err = pf_process_handshake(gt, origin, wanted_major, wanted_minor, &major, &minor); + if (err < 0) + return err; + + xe_gt_assert(gt, major || minor); + xe_gt_assert(gt, size >= VF2PF_HANDSHAKE_RESPONSE_MSG_LEN); + + response[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_SUCCESS) | + FIELD_PREP(GUC_HXG_RESPONSE_MSG_0_DATA0, 0); + response[1] = FIELD_PREP(VF2PF_HANDSHAKE_RESPONSE_MSG_1_MAJOR, major) | + FIELD_PREP(VF2PF_HANDSHAKE_RESPONSE_MSG_1_MINOR, minor); + + return VF2PF_HANDSHAKE_RESPONSE_MSG_LEN; +} + +struct reg_data { + u32 offset; + u32 value; +} __packed; +static_assert(hxg_sizeof(struct reg_data) == 2); + +/* Return: number of entries copied or negative error code on failure. */ +static int pf_service_runtime_query(struct xe_gt *gt, u32 start, u32 limit, + struct reg_data *data, u32 *remaining) +{ + struct xe_gt_sriov_pf_service_runtime_regs *runtime; + unsigned int count, i; + u32 addr; + + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); + + runtime = >->sriov.pf.service.runtime; + + if (start > runtime->size) + return -ERANGE; + + count = min_t(u32, runtime->size - start, limit); + + for (i = 0; i < count; ++i, ++data) { + addr = runtime->regs[start + i].addr; + data->offset = xe_mmio_adjusted_addr(gt, addr); + data->value = runtime->values[start + i]; + } + + *remaining = runtime->size - start - count; + return count; +} + +/* Return: length of the response message or a negative error code on failure. */ +static int pf_process_runtime_query_msg(struct xe_gt *gt, u32 origin, + const u32 *msg, u32 msg_len, u32 *response, u32 resp_size) +{ + const u32 chunk_size = hxg_sizeof(struct reg_data); + struct reg_data *reg_data_buf; + u32 limit, start, max_chunks; + u32 remaining = 0; + int ret; + + if (!pf_is_negotiated(gt, origin, 1, 0)) + return -EACCES; + if (unlikely(msg_len > VF2PF_QUERY_RUNTIME_REQUEST_MSG_LEN)) + return -EMSGSIZE; + if (unlikely(msg_len < VF2PF_QUERY_RUNTIME_REQUEST_MSG_LEN)) + return -EPROTO; + if (unlikely(resp_size < VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN)) + return -EINVAL; + + limit = FIELD_GET(VF2PF_QUERY_RUNTIME_REQUEST_MSG_0_LIMIT, msg[0]); + start = FIELD_GET(VF2PF_QUERY_RUNTIME_REQUEST_MSG_1_START, msg[1]); + + resp_size = min_t(u32, resp_size, VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MAX_LEN); + max_chunks = (resp_size - VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN) / chunk_size; + limit = limit == VF2PF_QUERY_RUNTIME_NO_LIMIT ? max_chunks : min_t(u32, max_chunks, limit); + reg_data_buf = (void *)(response + VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN); + + ret = pf_service_runtime_query(gt, start, limit, reg_data_buf, &remaining); + if (ret < 0) + return ret; + + response[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_SUCCESS) | + FIELD_PREP(VF2PF_QUERY_RUNTIME_RESPONSE_MSG_0_COUNT, ret); + response[1] = FIELD_PREP(VF2PF_QUERY_RUNTIME_RESPONSE_MSG_1_REMAINING, remaining); + + return VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN + ret * hxg_sizeof(struct reg_data); +} + +/** + * xe_gt_sriov_pf_service_process_request - Service GT level SR-IOV request message from the VF. + * @gt: the &xe_gt that provides the service + * @origin: VF number that is requesting the service + * @msg: request message + * @msg_len: length of the request message (in dwords) + * @response: placeholder for the response message + * @resp_size: length of the response message buffer (in dwords) + * + * This function processes `Relay Message`_ request from the VF. + * + * Return: length of the response message or a negative error code on failure. + */ +int xe_gt_sriov_pf_service_process_request(struct xe_gt *gt, u32 origin, + const u32 *msg, u32 msg_len, + u32 *response, u32 resp_size) +{ + u32 action, data __maybe_unused; + int ret; + + xe_gt_assert(gt, msg_len >= GUC_HXG_MSG_MIN_LEN); + xe_gt_assert(gt, FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]) == GUC_HXG_TYPE_REQUEST); + + action = FIELD_GET(GUC_HXG_REQUEST_MSG_0_ACTION, msg[0]); + data = FIELD_GET(GUC_HXG_REQUEST_MSG_0_DATA0, msg[0]); + xe_gt_sriov_dbg_verbose(gt, "service action %#x:%u from VF%u\n", + action, data, origin); + + switch (action) { + case GUC_RELAY_ACTION_VF2PF_HANDSHAKE: + ret = pf_process_handshake_msg(gt, origin, msg, msg_len, response, resp_size); + break; + case GUC_RELAY_ACTION_VF2PF_QUERY_RUNTIME: + ret = pf_process_runtime_query_msg(gt, origin, msg, msg_len, response, resp_size); + break; + default: + ret = -EOPNOTSUPP; + break; + } + + return ret; +} + +/** + * xe_gt_sriov_pf_service_print_runtime - Print PF runtime data shared with VFs. + * @gt: the &xe_gt + * @p: the &drm_printer + * + * This function is for PF use only. + */ +int xe_gt_sriov_pf_service_print_runtime(struct xe_gt *gt, struct drm_printer *p) +{ + const struct xe_reg *regs; + unsigned int size; + u32 *values; + + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); + + size = gt->sriov.pf.service.runtime.size; + regs = gt->sriov.pf.service.runtime.regs; + values = gt->sriov.pf.service.runtime.values; + + for (; size--; regs++, values++) { + drm_printf(p, "reg[%#x] = %#x\n", + xe_mmio_adjusted_addr(gt, regs->addr), *values); + } + + return 0; +} + +/** + * xe_gt_sriov_pf_service_print_version - Print ABI versions negotiated with VFs. + * @gt: the &xe_gt + * @p: the &drm_printer + * + * This function is for PF use only. + */ +int xe_gt_sriov_pf_service_print_version(struct xe_gt *gt, struct drm_printer *p) +{ + struct xe_device *xe = gt_to_xe(gt); + unsigned int n, total_vfs = xe_sriov_pf_get_totalvfs(xe); + struct xe_gt_sriov_pf_service_version *version; + + xe_gt_assert(gt, IS_SRIOV_PF(xe)); + + for (n = 1; n <= total_vfs; n++) { + version = >->sriov.pf.vfs[n].version; + if (!version->major && !version->minor) + continue; + + drm_printf(p, "VF%u:\t%u.%u\n", n, version->major, version->minor); + } + + return 0; +} + +#if IS_BUILTIN(CONFIG_DRM_XE_KUNIT_TEST) +#include "tests/xe_gt_sriov_pf_service_test.c" +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.h new file mode 100644 index 000000000000..56aaadf0360d --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_GT_SRIOV_PF_SERVICE_H_ +#define _XE_GT_SRIOV_PF_SERVICE_H_ + +#include <linux/errno.h> +#include <linux/types.h> + +struct drm_printer; +struct xe_gt; + +int xe_gt_sriov_pf_service_init(struct xe_gt *gt); +void xe_gt_sriov_pf_service_update(struct xe_gt *gt); +void xe_gt_sriov_pf_service_reset(struct xe_gt *gt, unsigned int vfid); + +int xe_gt_sriov_pf_service_print_version(struct xe_gt *gt, struct drm_printer *p); +int xe_gt_sriov_pf_service_print_runtime(struct xe_gt *gt, struct drm_printer *p); + +#ifdef CONFIG_PCI_IOV +int xe_gt_sriov_pf_service_process_request(struct xe_gt *gt, u32 origin, + const u32 *msg, u32 msg_len, + u32 *response, u32 resp_size); +#else +static inline int +xe_gt_sriov_pf_service_process_request(struct xe_gt *gt, u32 origin, + const u32 *msg, u32 msg_len, + u32 *response, u32 resp_size) +{ + return -EPROTO; +} +#endif + +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_service_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_service_types.h new file mode 100644 index 000000000000..ad6dd75f0056 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_service_types.h @@ -0,0 +1,52 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_GT_SRIOV_PF_SERVICE_TYPES_H_ +#define _XE_GT_SRIOV_PF_SERVICE_TYPES_H_ + +#include <linux/types.h> + +struct xe_reg; + +/** + * struct xe_gt_sriov_pf_service_version - VF/PF ABI Version. + * @major: the major version of the VF/PF ABI + * @minor: the minor version of the VF/PF ABI + * + * See `GuC Relay Communication`_. + */ +struct xe_gt_sriov_pf_service_version { + u16 major; + u16 minor; +}; + +/** + * struct xe_gt_sriov_pf_service_runtime_regs - Runtime data shared with VFs. + * @regs: pointer to static array with register offsets. + * @values: pointer to array with captured register values. + * @size: size of the regs and value arrays. + */ +struct xe_gt_sriov_pf_service_runtime_regs { + const struct xe_reg *regs; + u32 *values; + u32 size; +}; + +/** + * struct xe_gt_sriov_pf_service - Data used by the PF service. + * @version: information about VF/PF ABI versions for current platform. + * @version.base: lowest VF/PF ABI version that could be negotiated with VF. + * @version.latest: latest VF/PF ABI version supported by the PF driver. + * @runtime: runtime data shared with VFs. + */ +struct xe_gt_sriov_pf_service { + struct { + struct xe_gt_sriov_pf_service_version base; + struct xe_gt_sriov_pf_service_version latest; + } version; + struct xe_gt_sriov_pf_service_runtime_regs runtime; +}; + +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h index faf9ee8266ce..40cbaea3ef44 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h @@ -9,7 +9,9 @@ #include <linux/types.h> #include "xe_gt_sriov_pf_config_types.h" +#include "xe_gt_sriov_pf_monitor_types.h" #include "xe_gt_sriov_pf_policy_types.h" +#include "xe_gt_sriov_pf_service_types.h" /** * struct xe_gt_sriov_metadata - GT level per-VF metadata. @@ -17,15 +19,23 @@ struct xe_gt_sriov_metadata { /** @config: per-VF provisioning data. */ struct xe_gt_sriov_config config; + + /** @monitor: per-VF monitoring data. */ + struct xe_gt_sriov_monitor monitor; + + /** @version: negotiated VF/PF ABI version */ + struct xe_gt_sriov_pf_service_version version; }; /** * struct xe_gt_sriov_pf - GT level PF virtualization data. + * @service: service data. * @policy: policy data. * @spare: PF-only provisioning configuration. * @vfs: metadata for all VFs. */ struct xe_gt_sriov_pf { + struct xe_gt_sriov_pf_service service; struct xe_gt_sriov_pf_policy policy; struct xe_gt_sriov_spare_config spare; struct xe_gt_sriov_metadata *vfs; diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c new file mode 100644 index 000000000000..41e46a00c01e --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c @@ -0,0 +1,979 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#include <linux/bitfield.h> +#include <linux/bsearch.h> + +#include <drm/drm_managed.h> +#include <drm/drm_print.h> + +#include "abi/guc_actions_sriov_abi.h" +#include "abi/guc_communication_mmio_abi.h" +#include "abi/guc_klvs_abi.h" +#include "abi/guc_relay_actions_abi.h" +#include "regs/xe_gt_regs.h" +#include "regs/xe_gtt_defs.h" + +#include "xe_assert.h" +#include "xe_device.h" +#include "xe_ggtt.h" +#include "xe_gt_sriov_printk.h" +#include "xe_gt_sriov_vf.h" +#include "xe_gt_sriov_vf_types.h" +#include "xe_guc.h" +#include "xe_guc_hxg_helpers.h" +#include "xe_guc_relay.h" +#include "xe_mmio.h" +#include "xe_sriov.h" +#include "xe_uc_fw.h" +#include "xe_wopcm.h" + +#define make_u64_from_u32(hi, lo) ((u64)((u64)(u32)(hi) << 32 | (u32)(lo))) + +static int guc_action_vf_reset(struct xe_guc *guc) +{ + u32 request[GUC_HXG_REQUEST_MSG_MIN_LEN] = { + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_VF2GUC_VF_RESET), + }; + int ret; + + ret = xe_guc_mmio_send(guc, request, ARRAY_SIZE(request)); + + return ret > 0 ? -EPROTO : ret; +} + +static int vf_reset_guc_state(struct xe_gt *gt) +{ + struct xe_guc *guc = >->uc.guc; + int err; + + err = guc_action_vf_reset(guc); + if (unlikely(err)) + xe_gt_sriov_err(gt, "Failed to reset GuC state (%pe)\n", ERR_PTR(err)); + return err; +} + +static int guc_action_match_version(struct xe_guc *guc, + u32 wanted_branch, u32 wanted_major, u32 wanted_minor, + u32 *branch, u32 *major, u32 *minor, u32 *patch) +{ + u32 request[VF2GUC_MATCH_VERSION_REQUEST_MSG_LEN] = { + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, + GUC_ACTION_VF2GUC_MATCH_VERSION), + FIELD_PREP(VF2GUC_MATCH_VERSION_REQUEST_MSG_1_BRANCH, wanted_branch) | + FIELD_PREP(VF2GUC_MATCH_VERSION_REQUEST_MSG_1_MAJOR, wanted_major) | + FIELD_PREP(VF2GUC_MATCH_VERSION_REQUEST_MSG_1_MINOR, wanted_minor), + }; + u32 response[GUC_MAX_MMIO_MSG_LEN]; + int ret; + + BUILD_BUG_ON(VF2GUC_MATCH_VERSION_RESPONSE_MSG_LEN > GUC_MAX_MMIO_MSG_LEN); + + ret = xe_guc_mmio_send_recv(guc, request, ARRAY_SIZE(request), response); + if (unlikely(ret < 0)) + return ret; + + if (unlikely(FIELD_GET(VF2GUC_MATCH_VERSION_RESPONSE_MSG_0_MBZ, response[0]))) + return -EPROTO; + + *branch = FIELD_GET(VF2GUC_MATCH_VERSION_RESPONSE_MSG_1_BRANCH, response[1]); + *major = FIELD_GET(VF2GUC_MATCH_VERSION_RESPONSE_MSG_1_MAJOR, response[1]); + *minor = FIELD_GET(VF2GUC_MATCH_VERSION_RESPONSE_MSG_1_MINOR, response[1]); + *patch = FIELD_GET(VF2GUC_MATCH_VERSION_RESPONSE_MSG_1_PATCH, response[1]); + + return 0; +} + +static void vf_minimum_guc_version(struct xe_gt *gt, u32 *branch, u32 *major, u32 *minor) +{ + struct xe_device *xe = gt_to_xe(gt); + + switch (xe->info.platform) { + case XE_TIGERLAKE ... XE_PVC: + /* 1.1 this is current baseline for Xe driver */ + *branch = 0; + *major = 1; + *minor = 1; + break; + default: + /* 1.2 has support for the GMD_ID KLV */ + *branch = 0; + *major = 1; + *minor = 2; + break; + } +} + +static void vf_wanted_guc_version(struct xe_gt *gt, u32 *branch, u32 *major, u32 *minor) +{ + /* for now it's the same as minimum */ + return vf_minimum_guc_version(gt, branch, major, minor); +} + +static int vf_handshake_with_guc(struct xe_gt *gt) +{ + struct xe_gt_sriov_vf_guc_version *guc_version = >->sriov.vf.guc_version; + struct xe_guc *guc = >->uc.guc; + u32 wanted_branch, wanted_major, wanted_minor; + u32 branch, major, minor, patch; + int err; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + /* select wanted version - prefer previous (if any) */ + if (guc_version->major || guc_version->minor) { + wanted_branch = guc_version->branch; + wanted_major = guc_version->major; + wanted_minor = guc_version->minor; + } else { + vf_wanted_guc_version(gt, &wanted_branch, &wanted_major, &wanted_minor); + xe_gt_assert(gt, wanted_major != GUC_VERSION_MAJOR_ANY); + } + + err = guc_action_match_version(guc, wanted_branch, wanted_major, wanted_minor, + &branch, &major, &minor, &patch); + if (unlikely(err)) + goto fail; + + /* we don't support interface version change */ + if ((guc_version->major || guc_version->minor) && + (guc_version->branch != branch || guc_version->major != major || + guc_version->minor != minor)) { + xe_gt_sriov_err(gt, "New GuC interface version detected: %u.%u.%u.%u\n", + branch, major, minor, patch); + xe_gt_sriov_info(gt, "Previously used version was: %u.%u.%u.%u\n", + guc_version->branch, guc_version->major, + guc_version->minor, guc_version->patch); + err = -EREMCHG; + goto fail; + } + + /* illegal */ + if (major > wanted_major) { + err = -EPROTO; + goto unsupported; + } + + /* there's no fallback on major version. */ + if (major != wanted_major) { + err = -ENOPKG; + goto unsupported; + } + + /* check against minimum version supported by us */ + vf_minimum_guc_version(gt, &wanted_branch, &wanted_major, &wanted_minor); + xe_gt_assert(gt, major != GUC_VERSION_MAJOR_ANY); + if (major < wanted_major || (major == wanted_major && minor < wanted_minor)) { + err = -ENOKEY; + goto unsupported; + } + + xe_gt_sriov_dbg(gt, "using GuC interface version %u.%u.%u.%u\n", + branch, major, minor, patch); + + guc_version->branch = branch; + guc_version->major = major; + guc_version->minor = minor; + guc_version->patch = patch; + return 0; + +unsupported: + xe_gt_sriov_err(gt, "Unsupported GuC version %u.%u.%u.%u (%pe)\n", + branch, major, minor, patch, ERR_PTR(err)); +fail: + xe_gt_sriov_err(gt, "Unable to confirm GuC version %u.%u (%pe)\n", + wanted_major, wanted_minor, ERR_PTR(err)); + + /* try again with *any* just to query which version is supported */ + if (!guc_action_match_version(guc, GUC_VERSION_BRANCH_ANY, + GUC_VERSION_MAJOR_ANY, GUC_VERSION_MINOR_ANY, + &branch, &major, &minor, &patch)) + xe_gt_sriov_notice(gt, "GuC reports interface version %u.%u.%u.%u\n", + branch, major, minor, patch); + return err; +} + +/** + * xe_gt_sriov_vf_bootstrap - Query and setup GuC ABI interface version. + * @gt: the &xe_gt + * + * This function is for VF use only. + * It requires functional `GuC MMIO based communication`_. + * + * Return: 0 on success or a negative error code on failure. + */ +int xe_gt_sriov_vf_bootstrap(struct xe_gt *gt) +{ + int err; + + err = vf_reset_guc_state(gt); + if (unlikely(err)) + return err; + + err = vf_handshake_with_guc(gt); + if (unlikely(err)) + return err; + + return 0; +} + +static int guc_action_query_single_klv(struct xe_guc *guc, u32 key, + u32 *value, u32 value_len) +{ + u32 request[VF2GUC_QUERY_SINGLE_KLV_REQUEST_MSG_LEN] = { + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, + GUC_ACTION_VF2GUC_QUERY_SINGLE_KLV), + FIELD_PREP(VF2GUC_QUERY_SINGLE_KLV_REQUEST_MSG_1_KEY, key), + }; + u32 response[GUC_MAX_MMIO_MSG_LEN]; + u32 length; + int ret; + + BUILD_BUG_ON(VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_MAX_LEN > GUC_MAX_MMIO_MSG_LEN); + ret = xe_guc_mmio_send_recv(guc, request, ARRAY_SIZE(request), response); + if (unlikely(ret < 0)) + return ret; + + if (unlikely(FIELD_GET(VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_0_MBZ, response[0]))) + return -EPROTO; + + length = FIELD_GET(VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_0_LENGTH, response[0]); + if (unlikely(length > value_len)) + return -EOVERFLOW; + if (unlikely(length < value_len)) + return -ENODATA; + + switch (value_len) { + default: + xe_gt_WARN_ON(guc_to_gt(guc), value_len > 3); + fallthrough; + case 3: + value[2] = FIELD_GET(VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_3_VALUE96, response[3]); + fallthrough; + case 2: + value[1] = FIELD_GET(VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_2_VALUE64, response[2]); + fallthrough; + case 1: + value[0] = FIELD_GET(VF2GUC_QUERY_SINGLE_KLV_RESPONSE_MSG_1_VALUE32, response[1]); + fallthrough; + case 0: + break; + } + + return 0; +} + +static int guc_action_query_single_klv32(struct xe_guc *guc, u32 key, u32 *value32) +{ + return guc_action_query_single_klv(guc, key, value32, hxg_sizeof(u32)); +} + +static int guc_action_query_single_klv64(struct xe_guc *guc, u32 key, u64 *value64) +{ + u32 value[2]; + int err; + + err = guc_action_query_single_klv(guc, key, value, hxg_sizeof(value)); + if (unlikely(err)) + return err; + + *value64 = make_u64_from_u32(value[1], value[0]); + return 0; +} + +static bool has_gmdid(struct xe_device *xe) +{ + return GRAPHICS_VERx100(xe) >= 1270; +} + +/** + * xe_gt_sriov_vf_gmdid - Query GMDID over MMIO. + * @gt: the &xe_gt + * + * This function is for VF use only. + * + * Return: value of GMDID KLV on success or 0 on failure. + */ +u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt) +{ + const char *type = xe_gt_is_media_type(gt) ? "media" : "graphics"; + struct xe_guc *guc = >->uc.guc; + u32 value; + int err; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + xe_gt_assert(gt, !GRAPHICS_VERx100(gt_to_xe(gt)) || has_gmdid(gt_to_xe(gt))); + xe_gt_assert(gt, gt->sriov.vf.guc_version.major > 1 || gt->sriov.vf.guc_version.minor >= 2); + + err = guc_action_query_single_klv32(guc, GUC_KLV_GLOBAL_CFG_GMD_ID_KEY, &value); + if (unlikely(err)) { + xe_gt_sriov_err(gt, "Failed to obtain %s GMDID (%pe)\n", + type, ERR_PTR(err)); + return 0; + } + + xe_gt_sriov_dbg(gt, "%s GMDID = %#x\n", type, value); + return value; +} + +static int vf_get_ggtt_info(struct xe_gt *gt) +{ + struct xe_gt_sriov_vf_selfconfig *config = >->sriov.vf.self_config; + struct xe_guc *guc = >->uc.guc; + u64 start, size; + int err; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + err = guc_action_query_single_klv64(guc, GUC_KLV_VF_CFG_GGTT_START_KEY, &start); + if (unlikely(err)) + return err; + + err = guc_action_query_single_klv64(guc, GUC_KLV_VF_CFG_GGTT_SIZE_KEY, &size); + if (unlikely(err)) + return err; + + if (config->ggtt_size && config->ggtt_size != size) { + xe_gt_sriov_err(gt, "Unexpected GGTT reassignment: %lluK != %lluK\n", + size / SZ_1K, config->ggtt_size / SZ_1K); + return -EREMCHG; + } + + xe_gt_sriov_dbg_verbose(gt, "GGTT %#llx-%#llx = %lluK\n", + start, start + size - 1, size / SZ_1K); + + config->ggtt_base = start; + config->ggtt_size = size; + + return config->ggtt_size ? 0 : -ENODATA; +} + +static int vf_get_lmem_info(struct xe_gt *gt) +{ + struct xe_gt_sriov_vf_selfconfig *config = >->sriov.vf.self_config; + struct xe_guc *guc = >->uc.guc; + char size_str[10]; + u64 size; + int err; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + err = guc_action_query_single_klv64(guc, GUC_KLV_VF_CFG_LMEM_SIZE_KEY, &size); + if (unlikely(err)) + return err; + + if (config->lmem_size && config->lmem_size != size) { + xe_gt_sriov_err(gt, "Unexpected LMEM reassignment: %lluM != %lluM\n", + size / SZ_1M, config->lmem_size / SZ_1M); + return -EREMCHG; + } + + string_get_size(size, 1, STRING_UNITS_2, size_str, sizeof(size_str)); + xe_gt_sriov_dbg_verbose(gt, "LMEM %lluM %s\n", size / SZ_1M, size_str); + + config->lmem_size = size; + + return config->lmem_size ? 0 : -ENODATA; +} + +static int vf_get_submission_cfg(struct xe_gt *gt) +{ + struct xe_gt_sriov_vf_selfconfig *config = >->sriov.vf.self_config; + struct xe_guc *guc = >->uc.guc; + u32 num_ctxs, num_dbs; + int err; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + err = guc_action_query_single_klv32(guc, GUC_KLV_VF_CFG_NUM_CONTEXTS_KEY, &num_ctxs); + if (unlikely(err)) + return err; + + err = guc_action_query_single_klv32(guc, GUC_KLV_VF_CFG_NUM_DOORBELLS_KEY, &num_dbs); + if (unlikely(err)) + return err; + + if (config->num_ctxs && config->num_ctxs != num_ctxs) { + xe_gt_sriov_err(gt, "Unexpected CTXs reassignment: %u != %u\n", + num_ctxs, config->num_ctxs); + return -EREMCHG; + } + if (config->num_dbs && config->num_dbs != num_dbs) { + xe_gt_sriov_err(gt, "Unexpected DBs reassignment: %u != %u\n", + num_dbs, config->num_dbs); + return -EREMCHG; + } + + xe_gt_sriov_dbg_verbose(gt, "CTXs %u DBs %u\n", num_ctxs, num_dbs); + + config->num_ctxs = num_ctxs; + config->num_dbs = num_dbs; + + return config->num_ctxs ? 0 : -ENODATA; +} + +static void vf_cache_gmdid(struct xe_gt *gt) +{ + xe_gt_assert(gt, has_gmdid(gt_to_xe(gt))); + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + gt->sriov.vf.runtime.gmdid = xe_gt_sriov_vf_gmdid(gt); +} + +/** + * xe_gt_sriov_vf_query_config - Query SR-IOV config data over MMIO. + * @gt: the &xe_gt + * + * This function is for VF use only. + * + * Return: 0 on success or a negative error code on failure. + */ +int xe_gt_sriov_vf_query_config(struct xe_gt *gt) +{ + struct xe_device *xe = gt_to_xe(gt); + int err; + + err = vf_get_ggtt_info(gt); + if (unlikely(err)) + return err; + + if (IS_DGFX(xe) && !xe_gt_is_media_type(gt)) { + err = vf_get_lmem_info(gt); + if (unlikely(err)) + return err; + } + + err = vf_get_submission_cfg(gt); + if (unlikely(err)) + return err; + + if (has_gmdid(xe)) + vf_cache_gmdid(gt); + + return 0; +} + +/** + * xe_gt_sriov_vf_guc_ids - VF GuC context IDs configuration. + * @gt: the &xe_gt + * + * This function is for VF use only. + * + * Return: number of GuC context IDs assigned to VF. + */ +u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt) +{ + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + xe_gt_assert(gt, gt->sriov.vf.guc_version.major); + xe_gt_assert(gt, gt->sriov.vf.self_config.num_ctxs); + + return gt->sriov.vf.self_config.num_ctxs; +} + +/** + * xe_gt_sriov_vf_lmem - VF LMEM configuration. + * @gt: the &xe_gt + * + * This function is for VF use only. + * + * Return: size of the LMEM assigned to VF. + */ +u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt) +{ + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + xe_gt_assert(gt, gt->sriov.vf.guc_version.major); + xe_gt_assert(gt, gt->sriov.vf.self_config.lmem_size); + + return gt->sriov.vf.self_config.lmem_size; +} + +static int vf_balloon_ggtt(struct xe_gt *gt) +{ + struct xe_gt_sriov_vf_selfconfig *config = >->sriov.vf.self_config; + struct xe_tile *tile = gt_to_tile(gt); + struct xe_ggtt *ggtt = tile->mem.ggtt; + struct xe_device *xe = gt_to_xe(gt); + u64 start, end; + int err; + + xe_gt_assert(gt, IS_SRIOV_VF(xe)); + xe_gt_assert(gt, !xe_gt_is_media_type(gt)); + + if (!config->ggtt_size) + return -ENODATA; + + /* + * VF can only use part of the GGTT as allocated by the PF: + * + * WOPCM GUC_GGTT_TOP + * |<------------ Total GGTT size ------------------>| + * + * VF GGTT base -->|<- size ->| + * + * +--------------------+----------+-----------------+ + * |////////////////////| block |\\\\\\\\\\\\\\\\\| + * +--------------------+----------+-----------------+ + * + * |<--- balloon[0] --->|<-- VF -->|<-- balloon[1] ->| + */ + + start = xe_wopcm_size(xe); + end = config->ggtt_base; + if (end != start) { + err = xe_ggtt_balloon(ggtt, start, end, &tile->sriov.vf.ggtt_balloon[0]); + if (err) + goto failed; + } + + start = config->ggtt_base + config->ggtt_size; + end = GUC_GGTT_TOP; + if (end != start) { + err = xe_ggtt_balloon(ggtt, start, end, &tile->sriov.vf.ggtt_balloon[1]); + if (err) + goto deballoon; + } + + return 0; + +deballoon: + xe_ggtt_deballoon(ggtt, &tile->sriov.vf.ggtt_balloon[0]); +failed: + return err; +} + +static void deballoon_ggtt(struct drm_device *drm, void *arg) +{ + struct xe_tile *tile = arg; + struct xe_ggtt *ggtt = tile->mem.ggtt; + + xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile))); + xe_ggtt_deballoon(ggtt, &tile->sriov.vf.ggtt_balloon[1]); + xe_ggtt_deballoon(ggtt, &tile->sriov.vf.ggtt_balloon[0]); +} + +/** + * xe_gt_sriov_vf_prepare_ggtt - Prepare a VF's GGTT configuration. + * @gt: the &xe_gt + * + * This function is for VF use only. + * + * Return: 0 on success or a negative error code on failure. + */ +int xe_gt_sriov_vf_prepare_ggtt(struct xe_gt *gt) +{ + struct xe_tile *tile = gt_to_tile(gt); + struct xe_device *xe = tile_to_xe(tile); + int err; + + if (xe_gt_is_media_type(gt)) + return 0; + + err = vf_balloon_ggtt(gt); + if (err) + return err; + + return drmm_add_action_or_reset(&xe->drm, deballoon_ggtt, tile); +} + +static int relay_action_handshake(struct xe_gt *gt, u32 *major, u32 *minor) +{ + u32 request[VF2PF_HANDSHAKE_REQUEST_MSG_LEN] = { + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_RELAY_ACTION_VF2PF_HANDSHAKE), + FIELD_PREP(VF2PF_HANDSHAKE_REQUEST_MSG_1_MAJOR, *major) | + FIELD_PREP(VF2PF_HANDSHAKE_REQUEST_MSG_1_MINOR, *minor), + }; + u32 response[VF2PF_HANDSHAKE_RESPONSE_MSG_LEN]; + int ret; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + ret = xe_guc_relay_send_to_pf(>->uc.guc.relay, + request, ARRAY_SIZE(request), + response, ARRAY_SIZE(response)); + if (unlikely(ret < 0)) + return ret; + + if (unlikely(ret != VF2PF_HANDSHAKE_RESPONSE_MSG_LEN)) + return -EPROTO; + + if (unlikely(FIELD_GET(VF2PF_HANDSHAKE_RESPONSE_MSG_0_MBZ, response[0]))) + return -EPROTO; + + *major = FIELD_GET(VF2PF_HANDSHAKE_RESPONSE_MSG_1_MAJOR, response[1]); + *minor = FIELD_GET(VF2PF_HANDSHAKE_RESPONSE_MSG_1_MINOR, response[1]); + + return 0; +} + +static void vf_connect_pf(struct xe_gt *gt, u16 major, u16 minor) +{ + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + gt->sriov.vf.pf_version.major = major; + gt->sriov.vf.pf_version.minor = minor; +} + +static void vf_disconnect_pf(struct xe_gt *gt) +{ + vf_connect_pf(gt, 0, 0); +} + +static int vf_handshake_with_pf(struct xe_gt *gt) +{ + u32 major_wanted = GUC_RELAY_VERSION_LATEST_MAJOR; + u32 minor_wanted = GUC_RELAY_VERSION_LATEST_MINOR; + u32 major = major_wanted, minor = minor_wanted; + int err; + + err = relay_action_handshake(gt, &major, &minor); + if (unlikely(err)) + goto failed; + + if (!major && !minor) { + err = -ENODATA; + goto failed; + } + + xe_gt_sriov_dbg(gt, "using VF/PF ABI %u.%u\n", major, minor); + vf_connect_pf(gt, major, minor); + return 0; + +failed: + xe_gt_sriov_err(gt, "Unable to confirm VF/PF ABI version %u.%u (%pe)\n", + major, minor, ERR_PTR(err)); + vf_disconnect_pf(gt); + return err; +} + +/** + * xe_gt_sriov_vf_connect - Establish connection with the PF driver. + * @gt: the &xe_gt + * + * This function is for VF use only. + * + * Return: 0 on success or a negative error code on failure. + */ +int xe_gt_sriov_vf_connect(struct xe_gt *gt) +{ + int err; + + err = vf_handshake_with_pf(gt); + if (unlikely(err)) + goto failed; + + return 0; + +failed: + xe_gt_sriov_err(gt, "Failed to get version info (%pe)\n", ERR_PTR(err)); + return err; +} + +static bool vf_is_negotiated(struct xe_gt *gt, u16 major, u16 minor) +{ + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + return major == gt->sriov.vf.pf_version.major && + minor <= gt->sriov.vf.pf_version.minor; +} + +static int vf_prepare_runtime_info(struct xe_gt *gt, unsigned int num_regs) +{ + struct vf_runtime_reg *regs = gt->sriov.vf.runtime.regs; + unsigned int regs_size = round_up(num_regs, 4); + struct xe_device *xe = gt_to_xe(gt); + + xe_gt_assert(gt, IS_SRIOV_VF(xe)); + + if (regs) { + if (num_regs <= gt->sriov.vf.runtime.regs_size) { + memset(regs, 0, num_regs * sizeof(*regs)); + gt->sriov.vf.runtime.num_regs = num_regs; + return 0; + } + + drmm_kfree(&xe->drm, regs); + gt->sriov.vf.runtime.regs = NULL; + gt->sriov.vf.runtime.num_regs = 0; + gt->sriov.vf.runtime.regs_size = 0; + } + + regs = drmm_kcalloc(&xe->drm, regs_size, sizeof(*regs), GFP_KERNEL); + if (unlikely(!regs)) + return -ENOMEM; + + gt->sriov.vf.runtime.regs = regs; + gt->sriov.vf.runtime.num_regs = num_regs; + gt->sriov.vf.runtime.regs_size = regs_size; + return 0; +} + +static int vf_query_runtime_info(struct xe_gt *gt) +{ + u32 request[VF2PF_QUERY_RUNTIME_REQUEST_MSG_LEN]; + u32 response[VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN + 32]; /* up to 16 regs */ + u32 limit = (ARRAY_SIZE(response) - VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN) / 2; + u32 count, remaining, num, i; + u32 start = 0; + int ret; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + xe_gt_assert(gt, limit); + + /* this is part of the 1.0 PF/VF ABI */ + if (!vf_is_negotiated(gt, 1, 0)) + return -ENOPKG; + + request[0] = FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, + GUC_RELAY_ACTION_VF2PF_QUERY_RUNTIME) | + FIELD_PREP(VF2PF_QUERY_RUNTIME_REQUEST_MSG_0_LIMIT, limit); + +repeat: + request[1] = FIELD_PREP(VF2PF_QUERY_RUNTIME_REQUEST_MSG_1_START, start); + ret = xe_guc_relay_send_to_pf(>->uc.guc.relay, + request, ARRAY_SIZE(request), + response, ARRAY_SIZE(response)); + if (unlikely(ret < 0)) + goto failed; + + if (unlikely(ret < VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN)) { + ret = -EPROTO; + goto failed; + } + if (unlikely((ret - VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN) % 2)) { + ret = -EPROTO; + goto failed; + } + + num = (ret - VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN) / 2; + count = FIELD_GET(VF2PF_QUERY_RUNTIME_RESPONSE_MSG_0_COUNT, response[0]); + remaining = FIELD_GET(VF2PF_QUERY_RUNTIME_RESPONSE_MSG_1_REMAINING, response[1]); + + xe_gt_sriov_dbg_verbose(gt, "count=%u num=%u ret=%d start=%u remaining=%u\n", + count, num, ret, start, remaining); + + if (unlikely(count != num)) { + ret = -EPROTO; + goto failed; + } + + if (start == 0) { + ret = vf_prepare_runtime_info(gt, num + remaining); + if (unlikely(ret < 0)) + goto failed; + } else if (unlikely(start + num > gt->sriov.vf.runtime.num_regs)) { + ret = -EPROTO; + goto failed; + } + + for (i = 0; i < num; ++i) { + struct vf_runtime_reg *reg = >->sriov.vf.runtime.regs[start + i]; + + reg->offset = response[VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN + 2 * i]; + reg->value = response[VF2PF_QUERY_RUNTIME_RESPONSE_MSG_MIN_LEN + 2 * i + 1]; + } + + if (remaining) { + start += num; + goto repeat; + } + + return 0; + +failed: + vf_prepare_runtime_info(gt, 0); + return ret; +} + +static void vf_show_runtime_info(struct xe_gt *gt) +{ + struct vf_runtime_reg *vf_regs = gt->sriov.vf.runtime.regs; + unsigned int size = gt->sriov.vf.runtime.num_regs; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + for (; size--; vf_regs++) + xe_gt_sriov_dbg(gt, "runtime(%#x) = %#x\n", + vf_regs->offset, vf_regs->value); +} + +/** + * xe_gt_sriov_vf_query_runtime - Query SR-IOV runtime data. + * @gt: the &xe_gt + * + * This function is for VF use only. + * + * Return: 0 on success or a negative error code on failure. + */ +int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt) +{ + int err; + + err = vf_query_runtime_info(gt); + if (unlikely(err)) + goto failed; + + if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) + vf_show_runtime_info(gt); + + return 0; + +failed: + xe_gt_sriov_err(gt, "Failed to get runtime info (%pe)\n", + ERR_PTR(err)); + return err; +} + +static int vf_runtime_reg_cmp(const void *a, const void *b) +{ + const struct vf_runtime_reg *ra = a; + const struct vf_runtime_reg *rb = b; + + return (int)ra->offset - (int)rb->offset; +} + +static struct vf_runtime_reg *vf_lookup_reg(struct xe_gt *gt, u32 addr) +{ + struct xe_gt_sriov_vf_runtime *runtime = >->sriov.vf.runtime; + struct vf_runtime_reg key = { .offset = addr }; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + return bsearch(&key, runtime->regs, runtime->regs_size, sizeof(key), + vf_runtime_reg_cmp); +} + +/** + * xe_gt_sriov_vf_read32 - Get a register value from the runtime data. + * @gt: the &xe_gt + * @reg: the register to read + * + * This function is for VF use only. + * This function shall be called after VF has connected to PF. + * This function is dedicated for registers that VFs can't read directly. + * + * Return: register value obtained from the PF or 0 if not found. + */ +u32 xe_gt_sriov_vf_read32(struct xe_gt *gt, struct xe_reg reg) +{ + u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); + struct vf_runtime_reg *rr; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + xe_gt_assert(gt, gt->sriov.vf.pf_version.major); + xe_gt_assert(gt, !reg.vf); + + if (reg.addr == GMD_ID.addr) { + xe_gt_sriov_dbg_verbose(gt, "gmdid(%#x) = %#x\n", + addr, gt->sriov.vf.runtime.gmdid); + return gt->sriov.vf.runtime.gmdid; + } + + rr = vf_lookup_reg(gt, addr); + if (!rr) { + xe_gt_WARN(gt, IS_ENABLED(CONFIG_DRM_XE_DEBUG), + "VF is trying to read an inaccessible register %#x+%#x\n", + reg.addr, addr - reg.addr); + return 0; + } + + xe_gt_sriov_dbg_verbose(gt, "runtime[%#x] = %#x\n", addr, rr->value); + return rr->value; +} + +/** + * xe_gt_sriov_vf_print_config - Print VF self config. + * @gt: the &xe_gt + * @p: the &drm_printer + * + * This function is for VF use only. + */ +void xe_gt_sriov_vf_print_config(struct xe_gt *gt, struct drm_printer *p) +{ + struct xe_gt_sriov_vf_selfconfig *config = >->sriov.vf.self_config; + struct xe_device *xe = gt_to_xe(gt); + char buf[10]; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + drm_printf(p, "GGTT range:\t%#llx-%#llx\n", + config->ggtt_base, + config->ggtt_base + config->ggtt_size - 1); + + string_get_size(config->ggtt_size, 1, STRING_UNITS_2, buf, sizeof(buf)); + drm_printf(p, "GGTT size:\t%llu (%s)\n", config->ggtt_size, buf); + + if (IS_DGFX(xe) && !xe_gt_is_media_type(gt)) { + string_get_size(config->lmem_size, 1, STRING_UNITS_2, buf, sizeof(buf)); + drm_printf(p, "LMEM size:\t%llu (%s)\n", config->lmem_size, buf); + } + + drm_printf(p, "GuC contexts:\t%u\n", config->num_ctxs); + drm_printf(p, "GuC doorbells:\t%u\n", config->num_dbs); +} + +/** + * xe_gt_sriov_vf_print_runtime - Print VF's runtime regs received from PF. + * @gt: the &xe_gt + * @p: the &drm_printer + * + * This function is for VF use only. + */ +void xe_gt_sriov_vf_print_runtime(struct xe_gt *gt, struct drm_printer *p) +{ + struct vf_runtime_reg *vf_regs = gt->sriov.vf.runtime.regs; + unsigned int size = gt->sriov.vf.runtime.num_regs; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + for (; size--; vf_regs++) + drm_printf(p, "%#x = %#x\n", vf_regs->offset, vf_regs->value); +} + +/** + * xe_gt_sriov_vf_print_version - Print VF ABI versions. + * @gt: the &xe_gt + * @p: the &drm_printer + * + * This function is for VF use only. + */ +void xe_gt_sriov_vf_print_version(struct xe_gt *gt, struct drm_printer *p) +{ + struct xe_gt_sriov_vf_guc_version *guc_version = >->sriov.vf.guc_version; + struct xe_gt_sriov_vf_relay_version *pf_version = >->sriov.vf.pf_version; + u32 branch, major, minor; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + drm_printf(p, "GuC ABI:\n"); + + vf_minimum_guc_version(gt, &branch, &major, &minor); + drm_printf(p, "\tbase:\t%u.%u.%u.*\n", branch, major, minor); + + vf_wanted_guc_version(gt, &branch, &major, &minor); + drm_printf(p, "\twanted:\t%u.%u.%u.*\n", branch, major, minor); + + drm_printf(p, "\thandshake:\t%u.%u.%u.%u\n", + guc_version->branch, guc_version->major, + guc_version->minor, guc_version->patch); + + drm_printf(p, "PF ABI:\n"); + + drm_printf(p, "\tbase:\t%u.%u\n", + GUC_RELAY_VERSION_BASE_MAJOR, GUC_RELAY_VERSION_BASE_MINOR); + drm_printf(p, "\twanted:\t%u.%u\n", + GUC_RELAY_VERSION_LATEST_MAJOR, GUC_RELAY_VERSION_LATEST_MINOR); + drm_printf(p, "\thandshake:\t%u.%u\n", + pf_version->major, pf_version->minor); +} diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h new file mode 100644 index 000000000000..0de7f8cbcfa6 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_GT_SRIOV_VF_H_ +#define _XE_GT_SRIOV_VF_H_ + +#include <linux/types.h> + +struct drm_printer; +struct xe_gt; +struct xe_reg; + +int xe_gt_sriov_vf_bootstrap(struct xe_gt *gt); +int xe_gt_sriov_vf_query_config(struct xe_gt *gt); +int xe_gt_sriov_vf_connect(struct xe_gt *gt); +int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt); +int xe_gt_sriov_vf_prepare_ggtt(struct xe_gt *gt); + +u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt); +u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt); +u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt); +u32 xe_gt_sriov_vf_read32(struct xe_gt *gt, struct xe_reg reg); + +void xe_gt_sriov_vf_print_config(struct xe_gt *gt, struct drm_printer *p); +void xe_gt_sriov_vf_print_runtime(struct xe_gt *gt, struct drm_printer *p); +void xe_gt_sriov_vf_print_version(struct xe_gt *gt, struct drm_printer *p); + +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf_debugfs.c new file mode 100644 index 000000000000..f3ddcbefc6bc --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_debugfs.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#include <linux/debugfs.h> + +#include <drm/drm_debugfs.h> + +#include "xe_gt_debugfs.h" +#include "xe_gt_sriov_vf.h" +#include "xe_gt_sriov_vf_debugfs.h" +#include "xe_gt_types.h" +#include "xe_sriov.h" + +/* + * /sys/kernel/debug/dri/0/ + * ├── gt0 + * │ ├── vf + * │ │ ├── self_config + * │ │ ├── abi_versions + * │ │ ├── runtime_regs + */ + +static const struct drm_info_list vf_info[] = { + { + "self_config", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_vf_print_config, + }, + { + "abi_versions", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_vf_print_version, + }, +#if defined(CONFIG_DRM_XE_DEBUG) || defined(CONFIG_DRM_XE_DEBUG_SRIOV) + { + "runtime_regs", + .show = xe_gt_debugfs_simple_show, + .data = xe_gt_sriov_vf_print_runtime, + }, +#endif +}; + +/** + * xe_gt_sriov_vf_debugfs_register - Register SR-IOV VF specific entries in GT debugfs. + * @gt: the &xe_gt to register + * @root: the &dentry that represents the GT directory + * + * Register SR-IOV VF entries that are GT related and must be shown under GT debugfs. + */ +void xe_gt_sriov_vf_debugfs_register(struct xe_gt *gt, struct dentry *root) +{ + struct xe_device *xe = gt_to_xe(gt); + struct drm_minor *minor = xe->drm.primary; + struct dentry *vfdentry; + + xe_assert(xe, IS_SRIOV_VF(xe)); + xe_assert(xe, root->d_inode->i_private == gt); + + /* + * /sys/kernel/debug/dri/0/ + * ├── gt0 + * │ ├── vf + */ + vfdentry = debugfs_create_dir("vf", root); + if (IS_ERR(vfdentry)) + return; + vfdentry->d_inode->i_private = gt; + + drm_debugfs_create_files(vf_info, ARRAY_SIZE(vf_info), vfdentry, minor); +} diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_debugfs.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_debugfs.h new file mode 100644 index 000000000000..b2cff7ef5c78 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_debugfs.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_GT_SRIOV_VF_DEBUGFS_H_ +#define _XE_GT_SRIOV_VF_DEBUGFS_H_ + +struct xe_gt; +struct dentry; + +void xe_gt_sriov_vf_debugfs_register(struct xe_gt *gt, struct dentry *root); + +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h new file mode 100644 index 000000000000..a57f13b5afcd --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h @@ -0,0 +1,84 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_GT_SRIOV_VF_TYPES_H_ +#define _XE_GT_SRIOV_VF_TYPES_H_ + +#include <linux/types.h> + +/** + * struct xe_gt_sriov_vf_guc_version - GuC ABI version details. + */ +struct xe_gt_sriov_vf_guc_version { + /** @branch: branch version. */ + u8 branch; + /** @major: major version. */ + u8 major; + /** @minor: minor version. */ + u8 minor; + /** @patch: patch version. */ + u8 patch; +}; + +/** + * struct xe_gt_sriov_vf_relay_version - PF ABI version details. + */ +struct xe_gt_sriov_vf_relay_version { + /** @major: major version. */ + u16 major; + /** @minor: minor version. */ + u16 minor; +}; + +/** + * struct xe_gt_sriov_vf_selfconfig - VF configuration data. + */ +struct xe_gt_sriov_vf_selfconfig { + /** @ggtt_base: assigned base offset of the GGTT region. */ + u64 ggtt_base; + /** @ggtt_size: assigned size of the GGTT region. */ + u64 ggtt_size; + /** @lmem_size: assigned size of the LMEM. */ + u64 lmem_size; + /** @num_ctxs: assigned number of GuC submission context IDs. */ + u16 num_ctxs; + /** @num_dbs: assigned number of GuC doorbells IDs. */ + u16 num_dbs; +}; + +/** + * struct xe_gt_sriov_vf_runtime - VF runtime data. + */ +struct xe_gt_sriov_vf_runtime { + /** @gmdid: cached value of the GDMID register. */ + u32 gmdid; + /** @regs_size: size of runtime register array. */ + u32 regs_size; + /** @num_regs: number of runtime registers in the array. */ + u32 num_regs; + /** @regs: pointer to array of register offset/value pairs. */ + struct vf_runtime_reg { + /** @regs.offset: register offset. */ + u32 offset; + /** @regs.value: register value. */ + u32 value; + } *regs; +}; + +/** + * struct xe_gt_sriov_vf - GT level VF virtualization data. + */ +struct xe_gt_sriov_vf { + /** @guc_version: negotiated GuC ABI version. */ + struct xe_gt_sriov_vf_guc_version guc_version; + /** @self_config: resource configurations. */ + struct xe_gt_sriov_vf_selfconfig self_config; + /** @pf_version: negotiated VF/PF ABI version. */ + struct xe_gt_sriov_vf_relay_version pf_version; + /** @runtime: runtime data retrieved from the PF. */ + struct xe_gt_sriov_vf_runtime runtime; +}; + +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_sysfs.c b/drivers/gpu/drm/xe/xe_gt_sysfs.c index 1e5971072bc8..a05c3699e8b9 100644 --- a/drivers/gpu/drm/xe/xe_gt_sysfs.c +++ b/drivers/gpu/drm/xe/xe_gt_sysfs.c @@ -22,7 +22,7 @@ static const struct kobj_type xe_gt_sysfs_kobj_type = { .sysfs_ops = &kobj_sysfs_ops, }; -static void gt_sysfs_fini(struct drm_device *drm, void *arg) +static void gt_sysfs_fini(void *arg) { struct xe_gt *gt = arg; @@ -51,5 +51,5 @@ int xe_gt_sysfs_init(struct xe_gt *gt) gt->sysfs = &kg->base; - return drmm_add_action_or_reset(&xe->drm, gt_sysfs_fini, gt); + return devm_add_action(xe->drm.dev, gt_sysfs_fini, gt); } diff --git a/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c b/drivers/gpu/drm/xe/xe_gt_throttle.c index fbe21a8599ca..25963e33a383 100644 --- a/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c +++ b/drivers/gpu/drm/xe/xe_gt_throttle.c @@ -9,14 +9,14 @@ #include "xe_device.h" #include "xe_gt.h" #include "xe_gt_sysfs.h" -#include "xe_gt_throttle_sysfs.h" +#include "xe_gt_throttle.h" #include "xe_mmio.h" #include "xe_pm.h" /** * DOC: Xe GT Throttle * - * Provides sysfs entries for frequency throttle reasons in GT + * Provides sysfs entries and other helpers for frequency throttle reasons in GT * * device/gt#/freq0/throttle/status - Overall status * device/gt#/freq0/throttle/reason_pl1 - Frequency throttle due to PL1 @@ -35,7 +35,7 @@ dev_to_gt(struct device *dev) return kobj_to_gt(dev->kobj.parent); } -static u32 read_perf_limit_reasons(struct xe_gt *gt) +u32 xe_gt_throttle_get_limit_reasons(struct xe_gt *gt) { u32 reg; @@ -51,63 +51,63 @@ static u32 read_perf_limit_reasons(struct xe_gt *gt) static u32 read_status(struct xe_gt *gt) { - u32 status = read_perf_limit_reasons(gt) & GT0_PERF_LIMIT_REASONS_MASK; + u32 status = xe_gt_throttle_get_limit_reasons(gt) & GT0_PERF_LIMIT_REASONS_MASK; return status; } static u32 read_reason_pl1(struct xe_gt *gt) { - u32 pl1 = read_perf_limit_reasons(gt) & POWER_LIMIT_1_MASK; + u32 pl1 = xe_gt_throttle_get_limit_reasons(gt) & POWER_LIMIT_1_MASK; return pl1; } static u32 read_reason_pl2(struct xe_gt *gt) { - u32 pl2 = read_perf_limit_reasons(gt) & POWER_LIMIT_2_MASK; + u32 pl2 = xe_gt_throttle_get_limit_reasons(gt) & POWER_LIMIT_2_MASK; return pl2; } static u32 read_reason_pl4(struct xe_gt *gt) { - u32 pl4 = read_perf_limit_reasons(gt) & POWER_LIMIT_4_MASK; + u32 pl4 = xe_gt_throttle_get_limit_reasons(gt) & POWER_LIMIT_4_MASK; return pl4; } static u32 read_reason_thermal(struct xe_gt *gt) { - u32 thermal = read_perf_limit_reasons(gt) & THERMAL_LIMIT_MASK; + u32 thermal = xe_gt_throttle_get_limit_reasons(gt) & THERMAL_LIMIT_MASK; return thermal; } static u32 read_reason_prochot(struct xe_gt *gt) { - u32 prochot = read_perf_limit_reasons(gt) & PROCHOT_MASK; + u32 prochot = xe_gt_throttle_get_limit_reasons(gt) & PROCHOT_MASK; return prochot; } static u32 read_reason_ratl(struct xe_gt *gt) { - u32 ratl = read_perf_limit_reasons(gt) & RATL_MASK; + u32 ratl = xe_gt_throttle_get_limit_reasons(gt) & RATL_MASK; return ratl; } static u32 read_reason_vr_thermalert(struct xe_gt *gt) { - u32 thermalert = read_perf_limit_reasons(gt) & VR_THERMALERT_MASK; + u32 thermalert = xe_gt_throttle_get_limit_reasons(gt) & VR_THERMALERT_MASK; return thermalert; } static u32 read_reason_vr_tdc(struct xe_gt *gt) { - u32 tdc = read_perf_limit_reasons(gt) & VR_TDC_MASK; + u32 tdc = xe_gt_throttle_get_limit_reasons(gt) & VR_TDC_MASK; return tdc; } @@ -229,14 +229,14 @@ static const struct attribute_group throttle_group_attrs = { .attrs = throttle_attrs, }; -static void gt_throttle_sysfs_fini(struct drm_device *drm, void *arg) +static void gt_throttle_sysfs_fini(void *arg) { struct xe_gt *gt = arg; sysfs_remove_group(gt->freq, &throttle_group_attrs); } -int xe_gt_throttle_sysfs_init(struct xe_gt *gt) +int xe_gt_throttle_init(struct xe_gt *gt) { struct xe_device *xe = gt_to_xe(gt); int err; @@ -245,5 +245,5 @@ int xe_gt_throttle_sysfs_init(struct xe_gt *gt) if (err) return err; - return drmm_add_action_or_reset(&xe->drm, gt_throttle_sysfs_fini, gt); + return devm_add_action_or_reset(xe->drm.dev, gt_throttle_sysfs_fini, gt); } diff --git a/drivers/gpu/drm/xe/xe_gt_throttle.h b/drivers/gpu/drm/xe/xe_gt_throttle.h new file mode 100644 index 000000000000..02277494715d --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_throttle.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023 Intel Corporation + */ + +#ifndef _XE_GT_THROTTLE_H_ +#define _XE_GT_THROTTLE_H_ + +#include <linux/types.h> + +struct xe_gt; + +int xe_gt_throttle_init(struct xe_gt *gt); + +u32 xe_gt_throttle_get_limit_reasons(struct xe_gt *gt); + +#endif /* _XE_GT_THROTTLE_H_ */ diff --git a/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.h b/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.h deleted file mode 100644 index 6c61e6f228a8..000000000000 --- a/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.h +++ /dev/null @@ -1,16 +0,0 @@ -/* SPDX-License-Identifier: MIT */ -/* - * Copyright © 2023 Intel Corporation - */ - -#ifndef _XE_GT_THROTTLE_SYSFS_H_ -#define _XE_GT_THROTTLE_SYSFS_H_ - -#include <drm/drm_managed.h> - -struct xe_gt; - -int xe_gt_throttle_sysfs_init(struct xe_gt *gt); - -#endif /* _XE_GT_THROTTLE_SYSFS_H_ */ - diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c index 93df2d7969b3..105797776a6c 100644 --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c @@ -7,6 +7,7 @@ #include "abi/guc_actions_abi.h" #include "xe_device.h" +#include "xe_force_wake.h" #include "xe_gt.h" #include "xe_gt_printk.h" #include "xe_guc.h" @@ -245,7 +246,7 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt) return seqno; xe_gt_tlb_invalidation_wait(gt, seqno); - } else if (xe_device_uc_enabled(xe)) { + } else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) { xe_gt_WARN_ON(gt, xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)); if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) { xe_mmio_write32(gt, PVC_GUC_TLB_INV_DESC1, @@ -263,11 +264,15 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt) } /** - * xe_gt_tlb_invalidation_vma - Issue a TLB invalidation on this GT for a VMA + * xe_gt_tlb_invalidation_range - Issue a TLB invalidation on this GT for an + * address range + * * @gt: graphics tile * @fence: invalidation fence which will be signal on TLB invalidation * completion, can be NULL - * @vma: VMA to invalidate + * @start: start address + * @end: end address + * @asid: address space id * * Issue a range based TLB invalidation if supported, if not fallback to a full * TLB invalidation. Completion of TLB is asynchronous and caller can either use @@ -277,17 +282,15 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt) * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success, * negative error code on error. */ -int xe_gt_tlb_invalidation_vma(struct xe_gt *gt, - struct xe_gt_tlb_invalidation_fence *fence, - struct xe_vma *vma) +int xe_gt_tlb_invalidation_range(struct xe_gt *gt, + struct xe_gt_tlb_invalidation_fence *fence, + u64 start, u64 end, u32 asid) { struct xe_device *xe = gt_to_xe(gt); #define MAX_TLB_INVALIDATION_LEN 7 u32 action[MAX_TLB_INVALIDATION_LEN]; int len = 0; - xe_gt_assert(gt, vma); - /* Execlists not supported */ if (gt_to_xe(gt)->info.force_execlist) { if (fence) @@ -301,9 +304,9 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt, if (!xe->info.has_range_tlb_invalidation) { action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL); } else { - u64 start = xe_vma_start(vma); - u64 length = xe_vma_size(vma); - u64 align, end; + u64 orig_start = start; + u64 length = end - start; + u64 align; if (length < SZ_4K) length = SZ_4K; @@ -315,12 +318,12 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt, * address mask covering the required range. */ align = roundup_pow_of_two(length); - start = ALIGN_DOWN(xe_vma_start(vma), align); - end = ALIGN(xe_vma_end(vma), align); + start = ALIGN_DOWN(start, align); + end = ALIGN(end, align); length = align; while (start + length < end) { length <<= 1; - start = ALIGN_DOWN(xe_vma_start(vma), length); + start = ALIGN_DOWN(orig_start, length); } /* @@ -329,16 +332,17 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt, */ if (length >= SZ_2M) { length = max_t(u64, SZ_16M, length); - start = ALIGN_DOWN(xe_vma_start(vma), length); + start = ALIGN_DOWN(orig_start, length); } xe_gt_assert(gt, length >= SZ_4K); xe_gt_assert(gt, is_power_of_2(length)); - xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1, ilog2(SZ_2M) + 1))); + xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1, + ilog2(SZ_2M) + 1))); xe_gt_assert(gt, IS_ALIGNED(start, length)); action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE); - action[len++] = xe_vma_vm(vma)->usm.asid; + action[len++] = asid; action[len++] = lower_32_bits(start); action[len++] = upper_32_bits(start); action[len++] = ilog2(length) - ilog2(SZ_4K); @@ -350,6 +354,32 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt, } /** + * xe_gt_tlb_invalidation_vma - Issue a TLB invalidation on this GT for a VMA + * @gt: graphics tile + * @fence: invalidation fence which will be signal on TLB invalidation + * completion, can be NULL + * @vma: VMA to invalidate + * + * Issue a range based TLB invalidation if supported, if not fallback to a full + * TLB invalidation. Completion of TLB is asynchronous and caller can either use + * the invalidation fence or seqno + xe_gt_tlb_invalidation_wait to wait for + * completion. + * + * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success, + * negative error code on error. + */ +int xe_gt_tlb_invalidation_vma(struct xe_gt *gt, + struct xe_gt_tlb_invalidation_fence *fence, + struct xe_vma *vma) +{ + xe_gt_assert(gt, vma); + + return xe_gt_tlb_invalidation_range(gt, fence, xe_vma_start(vma), + xe_vma_end(vma), + xe_vma_vm(vma)->usm.asid); +} + +/** * xe_gt_tlb_invalidation_wait - Wait for TLB to complete * @gt: graphics tile * @seqno: seqno to wait which was returned from xe_gt_tlb_invalidation diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h index fbb743d80d2c..bf3bebd9f985 100644 --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h @@ -20,6 +20,9 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt); int xe_gt_tlb_invalidation_vma(struct xe_gt *gt, struct xe_gt_tlb_invalidation_fence *fence, struct xe_vma *vma); +int xe_gt_tlb_invalidation_range(struct xe_gt *gt, + struct xe_gt_tlb_invalidation_fence *fence, + u64 start, u64 end, u32 asid); int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno); int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len); diff --git a/drivers/gpu/drm/xe/xe_gt_topology.c b/drivers/gpu/drm/xe/xe_gt_topology.c index 3733e7a6860d..25ff03ab8448 100644 --- a/drivers/gpu/drm/xe/xe_gt_topology.c +++ b/drivers/gpu/drm/xe/xe_gt_topology.c @@ -108,7 +108,9 @@ gen_l3_mask_from_pattern(struct xe_device *xe, xe_l3_bank_mask_t dst, { unsigned long bit; - xe_assert(xe, fls(mask) <= patternbits); + xe_assert(xe, find_last_bit(pattern, XE_MAX_L3_BANK_MASK_BITS) < patternbits || + bitmap_empty(pattern, XE_MAX_L3_BANK_MASK_BITS)); + xe_assert(xe, !mask || patternbits * (__fls(mask) + 1) <= XE_MAX_L3_BANK_MASK_BITS); for_each_set_bit(bit, &mask, 32) { xe_l3_bank_mask_t shifted_pattern = {}; @@ -278,3 +280,13 @@ bool xe_gt_topology_has_dss_in_quadrant(struct xe_gt *gt, int quad) return quad_first < (quad + 1) * dss_per_quad; } + +bool xe_gt_has_geometry_dss(struct xe_gt *gt, unsigned int dss) +{ + return test_bit(dss, gt->fuse_topo.g_dss_mask); +} + +bool xe_gt_has_compute_dss(struct xe_gt *gt, unsigned int dss) +{ + return test_bit(dss, gt->fuse_topo.c_dss_mask); +} diff --git a/drivers/gpu/drm/xe/xe_gt_topology.h b/drivers/gpu/drm/xe/xe_gt_topology.h index b3e357777a6e..746b325bbf6e 100644 --- a/drivers/gpu/drm/xe/xe_gt_topology.h +++ b/drivers/gpu/drm/xe/xe_gt_topology.h @@ -33,4 +33,7 @@ bool xe_dss_mask_empty(const xe_dss_mask_t mask); bool xe_gt_topology_has_dss_in_quadrant(struct xe_gt *gt, int quad); +bool xe_gt_has_geometry_dss(struct xe_gt *gt, unsigned int dss); +bool xe_gt_has_compute_dss(struct xe_gt *gt, unsigned int dss); + #endif /* _XE_GT_TOPOLOGY_H_ */ diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h index cfdc761ff7f4..10a9a9529377 100644 --- a/drivers/gpu/drm/xe/xe_gt_types.h +++ b/drivers/gpu/drm/xe/xe_gt_types.h @@ -9,6 +9,7 @@ #include "xe_force_wake_types.h" #include "xe_gt_idle_types.h" #include "xe_gt_sriov_pf_types.h" +#include "xe_gt_sriov_vf_types.h" #include "xe_hw_engine_types.h" #include "xe_hw_fence_types.h" #include "xe_reg_sr_types.h" @@ -110,20 +111,20 @@ struct xe_gt { struct { /** @info.type: type of GT */ enum xe_gt_type type; - /** @info.id: Unique ID of this GT within the PCI Device */ - u8 id; /** @info.reference_clock: clock frequency */ u32 reference_clock; - /** @info.engine_mask: mask of engines present on GT */ - u64 engine_mask; /** - * @info.__engine_mask: mask of engines present on GT read from - * xe_pci.c, used to fake reading the engine_mask from the - * hwconfig blob. + * @info.engine_mask: mask of engines present on GT. Some of + * them may be reserved in runtime and not available for user. + * See @user_engines.mask */ - u64 __engine_mask; + u64 engine_mask; /** @info.gmdid: raw GMD_ID value from hardware */ u32 gmdid; + /** @info.id: Unique ID of this GT within the PCI Device */ + u8 id; + /** @info.has_indirect_ring_state: GT has indirect ring state support */ + u8 has_indirect_ring_state:1; } info; /** @@ -147,6 +148,8 @@ struct xe_gt { union { /** @sriov.pf: PF data. Valid only if driver is running as PF */ struct xe_gt_sriov_pf pf; + /** @sriov.vf: VF data. Valid only if driver is running as VF */ + struct xe_gt_sriov_vf vf; } sriov; /** @@ -369,6 +372,21 @@ struct xe_gt { /** @wa_active.oob: bitmap with active OOB workaroudns */ unsigned long *oob; } wa_active; + + /** @user_engines: engines present in GT and available to userspace */ + struct { + /** + * @user_engines.mask: like @info->engine_mask, but take in + * consideration only engines available to userspace + */ + u64 mask; + + /** + * @user_engines.instances_per_class: aggregate per class the + * number of engines available to userspace + */ + u8 instances_per_class[XE_ENGINE_CLASS_MAX]; + } user_engines; }; #endif diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index 240e7a4bbff1..0bf6e01b8910 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -19,8 +19,11 @@ #include "xe_force_wake.h" #include "xe_gt.h" #include "xe_gt_printk.h" +#include "xe_gt_sriov_vf.h" +#include "xe_gt_throttle.h" #include "xe_guc_ads.h" #include "xe_guc_ct.h" +#include "xe_guc_db_mgr.h" #include "xe_guc_hwconfig.h" #include "xe_guc_log.h" #include "xe_guc_pc.h" @@ -239,7 +242,7 @@ static void guc_write_params(struct xe_guc *guc) xe_mmio_write32(gt, SOFT_SCRATCH(1 + i), guc->params[i]); } -static void guc_fini(struct drm_device *drm, void *arg) +static void guc_fini_hw(void *arg) { struct xe_guc *guc = arg; struct xe_gt *gt = guc_to_gt(guc); @@ -293,6 +296,23 @@ static int xe_guc_realloc_post_hwconfig(struct xe_guc *guc) return 0; } +static int vf_guc_init(struct xe_guc *guc) +{ + int err; + + xe_guc_comm_init_early(guc); + + err = xe_guc_ct_init(&guc->ct); + if (err) + return err; + + err = xe_guc_relay_init(&guc->relay); + if (err) + return err; + + return 0; +} + int xe_guc_init(struct xe_guc *guc) { struct xe_device *xe = guc_to_xe(guc); @@ -307,6 +327,13 @@ int xe_guc_init(struct xe_guc *guc) if (!xe_uc_fw_is_enabled(&guc->fw)) return 0; + if (IS_SRIOV_VF(xe)) { + ret = vf_guc_init(guc); + if (ret) + goto out; + return 0; + } + ret = xe_guc_log_init(&guc->log); if (ret) goto out; @@ -323,7 +350,7 @@ int xe_guc_init(struct xe_guc *guc) if (ret) goto out; - ret = drmm_add_action_or_reset(&xe->drm, guc_fini, guc); + ret = devm_add_action_or_reset(xe->drm.dev, guc_fini_hw, guc); if (ret) goto out; @@ -340,6 +367,19 @@ out: return ret; } +static int vf_guc_init_post_hwconfig(struct xe_guc *guc) +{ + int err; + + err = xe_guc_submit_init(guc, xe_gt_sriov_vf_guc_ids(guc_to_gt(guc))); + if (err) + return err; + + /* XXX xe_guc_db_mgr_init not needed for now */ + + return 0; +} + /** * xe_guc_init_post_hwconfig - initialize GuC post hwconfig load * @guc: The GuC object @@ -350,12 +390,23 @@ int xe_guc_init_post_hwconfig(struct xe_guc *guc) { int ret; + if (IS_SRIOV_VF(guc_to_xe(guc))) + return vf_guc_init_post_hwconfig(guc); + ret = xe_guc_realloc_post_hwconfig(guc); if (ret) return ret; guc_init_params_post_hwconfig(guc); + ret = xe_guc_submit_init(guc, ~0); + if (ret) + return ret; + + ret = xe_guc_db_mgr_init(&guc->dbm, ~0); + if (ret) + return ret; + ret = xe_guc_pc_init(&guc->pc); if (ret) return ret; @@ -379,6 +430,9 @@ int xe_guc_reset(struct xe_guc *guc) xe_force_wake_assert_held(gt_to_fw(gt), XE_FW_GT); + if (IS_SRIOV_VF(gt_to_xe(gt))) + return xe_gt_sriov_vf_bootstrap(gt); + xe_mmio_write32(gt, GDRST, GRDOM_GUC); ret = xe_mmio_wait32(gt, GDRST, GRDOM_GUC, 0, 5000, &gdrst, false); @@ -451,58 +505,194 @@ static int guc_xfer_rsa(struct xe_guc *guc) return 0; } -static int guc_wait_ucode(struct xe_guc *guc) +/* + * Check a previously read GuC status register (GUC_STATUS) looking for + * known terminal states (either completion or failure) of either the + * microkernel status field or the boot ROM status field. Returns +1 for + * successful completion, -1 for failure and 0 for any intermediate state. + */ +static int guc_load_done(u32 status) { - struct xe_gt *gt = guc_to_gt(guc); - u32 status; - int ret; + u32 uk_val = REG_FIELD_GET(GS_UKERNEL_MASK, status); + u32 br_val = REG_FIELD_GET(GS_BOOTROM_MASK, status); + + switch (uk_val) { + case XE_GUC_LOAD_STATUS_READY: + return 1; + + case XE_GUC_LOAD_STATUS_ERROR_DEVID_BUILD_MISMATCH: + case XE_GUC_LOAD_STATUS_GUC_PREPROD_BUILD_MISMATCH: + case XE_GUC_LOAD_STATUS_ERROR_DEVID_INVALID_GUCTYPE: + case XE_GUC_LOAD_STATUS_HWCONFIG_ERROR: + case XE_GUC_LOAD_STATUS_DPC_ERROR: + case XE_GUC_LOAD_STATUS_EXCEPTION: + case XE_GUC_LOAD_STATUS_INIT_DATA_INVALID: + case XE_GUC_LOAD_STATUS_MPU_DATA_INVALID: + case XE_GUC_LOAD_STATUS_INIT_MMIO_SAVE_RESTORE_INVALID: + return -1; + } + + switch (br_val) { + case XE_BOOTROM_STATUS_NO_KEY_FOUND: + case XE_BOOTROM_STATUS_RSA_FAILED: + case XE_BOOTROM_STATUS_PAVPC_FAILED: + case XE_BOOTROM_STATUS_WOPCM_FAILED: + case XE_BOOTROM_STATUS_LOADLOC_FAILED: + case XE_BOOTROM_STATUS_JUMP_FAILED: + case XE_BOOTROM_STATUS_RC6CTXCONFIG_FAILED: + case XE_BOOTROM_STATUS_MPUMAP_INCORRECT: + case XE_BOOTROM_STATUS_EXCEPTION: + case XE_BOOTROM_STATUS_PROD_KEY_CHECK_FAILURE: + return -1; + } + + return 0; +} + +static s32 guc_pc_get_cur_freq(struct xe_guc_pc *guc_pc) +{ + u32 freq; + int ret = xe_guc_pc_get_cur_freq(guc_pc, &freq); + + return ret ? ret : freq; +} +/* + * Wait for the GuC to start up. + * + * Measurements indicate this should take no more than 20ms (assuming the GT + * clock is at maximum frequency). However, thermal throttling and other issues + * can prevent the clock hitting max and thus making the load take significantly + * longer. Allow up to 200ms as a safety margin for real world worst case situations. + * + * However, bugs anywhere from KMD to GuC to PCODE to fan failure in a CI farm can + * lead to even longer times. E.g. if the GT is clamped to minimum frequency then + * the load times can be in the seconds range. So the timeout is increased for debug + * builds to ensure that problems can be correctly analysed. For release builds, the + * timeout is kept short so that users don't wait forever to find out that there is a + * problem. In either case, if the load took longer than is reasonable even with some + * 'sensible' throttling, then flag a warning because something is not right. + * + * Note that there is a limit on how long an individual usleep_range() can wait for, + * hence longer waits require wrapping a shorter wait in a loop. + * + * Note that the only reason an end user should hit the shorter timeout is in case of + * extreme thermal throttling. And a system that is that hot during boot is probably + * dead anyway! + */ +#if defined(CONFIG_DRM_XE_DEBUG) +#define GUC_LOAD_RETRY_LIMIT 20 +#else +#define GUC_LOAD_RETRY_LIMIT 3 +#endif +#define GUC_LOAD_TIME_WARN_MS 200 + +static void guc_wait_ucode(struct xe_guc *guc) +{ + struct xe_gt *gt = guc_to_gt(guc); + struct xe_guc_pc *guc_pc = >->uc.guc.pc; + ktime_t before, after, delta; + int load_done; + u32 status = 0; + int count = 0; + u64 delta_ms; + u32 before_freq; + + before_freq = xe_guc_pc_get_act_freq(guc_pc); + before = ktime_get(); /* - * Wait for the GuC to start up. - * NB: Docs recommend not using the interrupt for completion. - * Measurements indicate this should take no more than 20ms - * (assuming the GT clock is at maximum frequency). So, a - * timeout here indicates that the GuC has failed and is unusable. - * (Higher levels of the driver may decide to reset the GuC and - * attempt the ucode load again if this happens.) - * - * FIXME: There is a known (but exceedingly unlikely) race condition - * where the asynchronous frequency management code could reduce - * the GT clock while a GuC reload is in progress (during a full - * GT reset). A fix is in progress but there are complex locking - * issues to be resolved. In the meantime bump the timeout to - * 200ms. Even at slowest clock, this should be sufficient. And - * in the working case, a larger timeout makes no difference. + * Note, can't use any kind of timing information from the call to xe_mmio_wait. + * It could return a thousand intermediate stages at random times. Instead, must + * manually track the total time taken and locally implement the timeout. */ - ret = xe_mmio_wait32(gt, GUC_STATUS, GS_UKERNEL_MASK, - FIELD_PREP(GS_UKERNEL_MASK, XE_GUC_LOAD_STATUS_READY), - 200000, &status, false); + do { + u32 last_status = status & (GS_UKERNEL_MASK | GS_BOOTROM_MASK); + int ret; - if (ret) { - xe_gt_info(gt, "GuC load failed: status = 0x%08X\n", status); - xe_gt_info(gt, "GuC status: Reset = %u, BootROM = %#X, UKernel = %#X, MIA = %#X, Auth = %#X\n", - REG_FIELD_GET(GS_MIA_IN_RESET, status), - REG_FIELD_GET(GS_BOOTROM_MASK, status), - REG_FIELD_GET(GS_UKERNEL_MASK, status), - REG_FIELD_GET(GS_MIA_MASK, status), - REG_FIELD_GET(GS_AUTH_STATUS_MASK, status)); - - if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) { - xe_gt_info(gt, "GuC firmware signature verification failed\n"); - ret = -ENOEXEC; + /* + * Wait for any change (intermediate or terminal) in the status register. + * Note, the return value is a don't care. The only failure code is timeout + * but the timeouts need to be accumulated over all the intermediate partial + * timeouts rather than allowing a huge timeout each time. So basically, need + * to treat a timeout no different to a value change. + */ + ret = xe_mmio_wait32_not(gt, GUC_STATUS, GS_UKERNEL_MASK | GS_BOOTROM_MASK, + last_status, 1000 * 1000, &status, false); + if (ret < 0) + count++; + after = ktime_get(); + delta = ktime_sub(after, before); + delta_ms = ktime_to_ms(delta); + + load_done = guc_load_done(status); + if (load_done != 0) + break; + + if (delta_ms >= (GUC_LOAD_RETRY_LIMIT * 1000)) + break; + + xe_gt_dbg(gt, "load still in progress, timeouts = %d, freq = %dMHz (req %dMHz), status = 0x%08X [0x%02X/%02X]\n", + count, xe_guc_pc_get_act_freq(guc_pc), + guc_pc_get_cur_freq(guc_pc), status, + REG_FIELD_GET(GS_BOOTROM_MASK, status), + REG_FIELD_GET(GS_UKERNEL_MASK, status)); + } while (1); + + if (load_done != 1) { + u32 ukernel = REG_FIELD_GET(GS_UKERNEL_MASK, status); + u32 bootrom = REG_FIELD_GET(GS_BOOTROM_MASK, status); + + xe_gt_err(gt, "load failed: status = 0x%08X, time = %lldms, freq = %dMHz (req %dMHz), done = %d\n", + status, delta_ms, xe_guc_pc_get_act_freq(guc_pc), + guc_pc_get_cur_freq(guc_pc), load_done); + xe_gt_err(gt, "load failed: status: Reset = %d, BootROM = 0x%02X, UKernel = 0x%02X, MIA = 0x%02X, Auth = 0x%02X\n", + REG_FIELD_GET(GS_MIA_IN_RESET, status), + bootrom, ukernel, + REG_FIELD_GET(GS_MIA_MASK, status), + REG_FIELD_GET(GS_AUTH_STATUS_MASK, status)); + + switch (bootrom) { + case XE_BOOTROM_STATUS_NO_KEY_FOUND: + xe_gt_err(gt, "invalid key requested, header = 0x%08X\n", + xe_mmio_read32(gt, GUC_HEADER_INFO)); + break; + + case XE_BOOTROM_STATUS_RSA_FAILED: + xe_gt_err(gt, "firmware signature verification failed\n"); + break; + + case XE_BOOTROM_STATUS_PROD_KEY_CHECK_FAILURE: + xe_gt_err(gt, "firmware production part check failure\n"); + break; } - if (REG_FIELD_GET(GS_UKERNEL_MASK, status) == - XE_GUC_LOAD_STATUS_EXCEPTION) { - xe_gt_info(gt, "GuC firmware exception. EIP: %#x\n", - xe_mmio_read32(gt, SOFT_SCRATCH(13))); - ret = -ENXIO; + switch (ukernel) { + case XE_GUC_LOAD_STATUS_EXCEPTION: + xe_gt_err(gt, "firmware exception. EIP: %#x\n", + xe_mmio_read32(gt, SOFT_SCRATCH(13))); + break; + + case XE_GUC_LOAD_STATUS_INIT_MMIO_SAVE_RESTORE_INVALID: + xe_gt_err(gt, "illegal register in save/restore workaround list\n"); + break; + + case XE_GUC_LOAD_STATUS_HWCONFIG_START: + xe_gt_err(gt, "still extracting hwconfig table.\n"); + break; } + + xe_device_declare_wedged(gt_to_xe(gt)); + } else if (delta_ms > GUC_LOAD_TIME_WARN_MS) { + xe_gt_warn(gt, "excessive init time: %lldms! [status = 0x%08X, timeouts = %d]\n", + delta_ms, status, count); + xe_gt_warn(gt, "excessive init time: [freq = %dMHz (req = %dMHz), before = %dMHz, perf_limit_reasons = 0x%08X]\n", + xe_guc_pc_get_act_freq(guc_pc), guc_pc_get_cur_freq(guc_pc), + before_freq, xe_gt_throttle_get_limit_reasons(gt)); } else { - xe_gt_dbg(gt, "GuC successfully loaded\n"); + xe_gt_dbg(gt, "init took %lldms, freq = %dMHz (req = %dMHz), before = %dMHz, status = 0x%08X, timeouts = %d\n", + delta_ms, xe_guc_pc_get_act_freq(guc_pc), guc_pc_get_cur_freq(guc_pc), + before_freq, status, count); } - - return ret; } static int __xe_guc_upload(struct xe_guc *guc) @@ -532,9 +722,7 @@ static int __xe_guc_upload(struct xe_guc *guc) goto out; /* Wait for authentication */ - ret = guc_wait_ucode(guc); - if (ret) - goto out; + guc_wait_ucode(guc); xe_uc_fw_change_status(&guc->fw, XE_UC_FIRMWARE_RUNNING); return 0; @@ -544,6 +732,38 @@ out: return 0 /* FIXME: ret, don't want to stop load currently */; } +static int vf_guc_min_load_for_hwconfig(struct xe_guc *guc) +{ + struct xe_gt *gt = guc_to_gt(guc); + int ret; + + ret = xe_gt_sriov_vf_bootstrap(gt); + if (ret) + return ret; + + ret = xe_gt_sriov_vf_query_config(gt); + if (ret) + return ret; + + ret = xe_guc_hwconfig_init(guc); + if (ret) + return ret; + + ret = xe_guc_enable_communication(guc); + if (ret) + return ret; + + ret = xe_gt_sriov_vf_connect(gt); + if (ret) + return ret; + + ret = xe_gt_sriov_vf_query_runtime(gt); + if (ret) + return ret; + + return 0; +} + /** * xe_guc_min_load_for_hwconfig - load minimal GuC and read hwconfig table * @guc: The GuC object @@ -559,6 +779,9 @@ int xe_guc_min_load_for_hwconfig(struct xe_guc *guc) { int ret; + if (IS_SRIOV_VF(guc_to_xe(guc))) + return vf_guc_min_load_for_hwconfig(guc); + xe_guc_ads_populate_minimal(&guc->ads); /* Raise GT freq to speed up HuC/GuC load */ @@ -891,17 +1114,11 @@ void xe_guc_stop_prepare(struct xe_guc *guc) XE_WARN_ON(xe_guc_pc_stop(&guc->pc)); } -int xe_guc_stop(struct xe_guc *guc) +void xe_guc_stop(struct xe_guc *guc) { - int ret; - xe_guc_ct_stop(&guc->ct); - ret = xe_guc_submit_stop(guc); - if (ret) - return ret; - - return 0; + xe_guc_submit_stop(guc); } int xe_guc_start(struct xe_guc *guc) @@ -950,30 +1167,3 @@ void xe_guc_print_info(struct xe_guc *guc, struct drm_printer *p) xe_guc_ct_print(&guc->ct, p, false); xe_guc_submit_print(guc, p); } - -/** - * xe_guc_in_reset() - Detect if GuC MIA is in reset. - * @guc: The GuC object - * - * This function detects runtime resume from d3cold by leveraging - * GUC_STATUS, GUC doesn't get reset during d3hot, - * it strictly to be called from RPM resume handler. - * - * Return: true if failed to get forcewake or GuC MIA is in Reset, - * otherwise false. - */ -bool xe_guc_in_reset(struct xe_guc *guc) -{ - struct xe_gt *gt = guc_to_gt(guc); - u32 status; - int err; - - err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); - if (err) - return true; - - status = xe_mmio_read32(gt, GUC_STATUS); - xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); - - return status & GS_MIA_IN_RESET; -} diff --git a/drivers/gpu/drm/xe/xe_guc.h b/drivers/gpu/drm/xe/xe_guc.h index 94f2dc5f6f90..af59c9545753 100644 --- a/drivers/gpu/drm/xe/xe_guc.h +++ b/drivers/gpu/drm/xe/xe_guc.h @@ -35,9 +35,8 @@ void xe_guc_print_info(struct xe_guc *guc, struct drm_printer *p); int xe_guc_reset_prepare(struct xe_guc *guc); void xe_guc_reset_wait(struct xe_guc *guc); void xe_guc_stop_prepare(struct xe_guc *guc); -int xe_guc_stop(struct xe_guc *guc); +void xe_guc_stop(struct xe_guc *guc); int xe_guc_start(struct xe_guc *guc); -bool xe_guc_in_reset(struct xe_guc *guc); static inline u16 xe_engine_class_to_guc_class(enum xe_engine_class class) { diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c index 7f5a523795c8..1c60b685dbc6 100644 --- a/drivers/gpu/drm/xe/xe_guc_ads.c +++ b/drivers/gpu/drm/xe/xe_guc_ads.c @@ -9,13 +9,16 @@ #include <generated/xe_wa_oob.h> +#include "abi/guc_actions_abi.h" #include "regs/xe_engine_regs.h" #include "regs/xe_gt_regs.h" #include "regs/xe_guc_regs.h" #include "xe_bo.h" #include "xe_gt.h" #include "xe_gt_ccs_mode.h" +#include "xe_gt_printk.h" #include "xe_guc.h" +#include "xe_guc_ct.h" #include "xe_hw_engine.h" #include "xe_lrc.h" #include "xe_map.h" @@ -265,7 +268,6 @@ static u32 engine_enable_mask(struct xe_gt *gt, enum xe_engine_class class) static size_t calculate_golden_lrc_size(struct xe_guc_ads *ads) { - struct xe_device *xe = ads_to_xe(ads); struct xe_gt *gt = ads_to_gt(ads); size_t total_size = 0, alloc_size, real_size; int class; @@ -274,7 +276,7 @@ static size_t calculate_golden_lrc_size(struct xe_guc_ads *ads) if (!engine_enable_mask(gt, class)) continue; - real_size = xe_lrc_size(xe, class); + real_size = xe_gt_lrc_size(gt, class); alloc_size = PAGE_ALIGN(real_size); total_size += alloc_size; } @@ -440,11 +442,18 @@ int xe_guc_ads_init_post_hwconfig(struct xe_guc_ads *ads) static void guc_policies_init(struct xe_guc_ads *ads) { + struct xe_device *xe = ads_to_xe(ads); + u32 global_flags = 0; + ads_blob_write(ads, policies.dpc_promote_time, GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US); ads_blob_write(ads, policies.max_num_work_items, GLOBAL_POLICY_MAX_NUM_WI); - ads_blob_write(ads, policies.global_flags, 0); + + if (xe->wedged.mode == 2) + global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET; + + ads_blob_write(ads, policies.global_flags, global_flags); ads_blob_write(ads, policies.is_valid, 1); } @@ -765,7 +774,7 @@ static void guc_populate_golden_lrc(struct xe_guc_ads *ads) xe_gt_assert(gt, gt->default_lrc[class]); - real_size = xe_lrc_size(xe, class); + real_size = xe_gt_lrc_size(gt, class); alloc_size = PAGE_ALIGN(real_size); total_size += alloc_size; @@ -799,3 +808,57 @@ void xe_guc_ads_populate_post_load(struct xe_guc_ads *ads) { guc_populate_golden_lrc(ads); } + +static int guc_ads_action_update_policies(struct xe_guc_ads *ads, u32 policy_offset) +{ + struct xe_guc_ct *ct = &ads_to_guc(ads)->ct; + u32 action[] = { + XE_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE, + policy_offset + }; + + return xe_guc_ct_send(ct, action, ARRAY_SIZE(action), 0, 0); +} + +/** + * xe_guc_ads_scheduler_policy_toggle_reset - Toggle reset policy + * @ads: Additional data structures object + * + * This function update the GuC's engine reset policy based on wedged.mode. + * + * Return: 0 on success, and negative error code otherwise. + */ +int xe_guc_ads_scheduler_policy_toggle_reset(struct xe_guc_ads *ads) +{ + struct xe_device *xe = ads_to_xe(ads); + struct xe_gt *gt = ads_to_gt(ads); + struct xe_tile *tile = gt_to_tile(gt); + struct guc_policies *policies; + struct xe_bo *bo; + int ret = 0; + + policies = kmalloc(sizeof(*policies), GFP_KERNEL); + if (!policies) + return -ENOMEM; + + policies->dpc_promote_time = ads_blob_read(ads, policies.dpc_promote_time); + policies->max_num_work_items = ads_blob_read(ads, policies.max_num_work_items); + policies->is_valid = 1; + if (xe->wedged.mode == 2) + policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET; + else + policies->global_flags &= ~GLOBAL_POLICY_DISABLE_ENGINE_RESET; + + bo = xe_managed_bo_create_from_data(xe, tile, policies, sizeof(struct guc_policies), + XE_BO_FLAG_VRAM_IF_DGFX(tile) | + XE_BO_FLAG_GGTT); + if (IS_ERR(bo)) { + ret = PTR_ERR(bo); + goto out; + } + + ret = guc_ads_action_update_policies(ads, xe_bo_ggtt_addr(bo)); +out: + kfree(policies); + return ret; +} diff --git a/drivers/gpu/drm/xe/xe_guc_ads.h b/drivers/gpu/drm/xe/xe_guc_ads.h index 138ef6267671..2e6674c760ff 100644 --- a/drivers/gpu/drm/xe/xe_guc_ads.h +++ b/drivers/gpu/drm/xe/xe_guc_ads.h @@ -6,12 +6,13 @@ #ifndef _XE_GUC_ADS_H_ #define _XE_GUC_ADS_H_ -#include "xe_guc_ads_types.h" +struct xe_guc_ads; int xe_guc_ads_init(struct xe_guc_ads *ads); int xe_guc_ads_init_post_hwconfig(struct xe_guc_ads *ads); void xe_guc_ads_populate(struct xe_guc_ads *ads); void xe_guc_ads_populate_minimal(struct xe_guc_ads *ads); void xe_guc_ads_populate_post_load(struct xe_guc_ads *ads); +int xe_guc_ads_scheduler_policy_toggle_reset(struct xe_guc_ads *ads); #endif diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 0151d29b3c58..c1f258348f5c 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -22,6 +22,7 @@ #include "xe_gt_pagefault.h" #include "xe_gt_printk.h" #include "xe_gt_sriov_pf_control.h" +#include "xe_gt_sriov_pf_monitor.h" #include "xe_gt_tlb_invalidation.h" #include "xe_guc.h" #include "xe_guc_relay.h" @@ -1071,6 +1072,9 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len) case GUC_ACTION_GUC2PF_VF_STATE_NOTIFY: ret = xe_gt_sriov_pf_control_process_guc2pf(gt, hxg, hxg_len); break; + case GUC_ACTION_GUC2PF_ADVERSE_EVENT: + ret = xe_gt_sriov_pf_monitor_process_guc2pf(gt, hxg, hxg_len); + break; default: xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action); } diff --git a/drivers/gpu/drm/xe/xe_guc_db_mgr.c b/drivers/gpu/drm/xe/xe_guc_db_mgr.c index 8d9a0287df6b..6767e8076e6b 100644 --- a/drivers/gpu/drm/xe/xe_guc_db_mgr.c +++ b/drivers/gpu/drm/xe/xe_guc_db_mgr.c @@ -106,7 +106,8 @@ int xe_guc_db_mgr_init(struct xe_guc_db_mgr *dbm, unsigned int count) if (ret) return ret; done: - xe_gt_dbg(dbm_to_gt(dbm), "using %u doorbell(s)\n", dbm->count); + xe_gt_dbg(dbm_to_gt(dbm), "using %u doorbell%s\n", + dbm->count, str_plural(dbm->count)); return 0; } diff --git a/drivers/gpu/drm/xe/xe_guc_id_mgr.c b/drivers/gpu/drm/xe/xe_guc_id_mgr.c index 0fb7c6b78c31..cd0549d0ef89 100644 --- a/drivers/gpu/drm/xe/xe_guc_id_mgr.c +++ b/drivers/gpu/drm/xe/xe_guc_id_mgr.c @@ -97,7 +97,8 @@ int xe_guc_id_mgr_init(struct xe_guc_id_mgr *idm, unsigned int limit) if (ret) return ret; - xe_gt_info(idm_to_gt(idm), "using %u GUC ID(s)\n", idm->total); + xe_gt_info(idm_to_gt(idm), "using %u GUC ID%s\n", + idm->total, str_plural(idm->total)); return 0; } diff --git a/drivers/gpu/drm/xe/xe_guc_klv_helpers.c b/drivers/gpu/drm/xe/xe_guc_klv_helpers.c index ceca949932a0..9d99fe266d97 100644 --- a/drivers/gpu/drm/xe/xe_guc_klv_helpers.c +++ b/drivers/gpu/drm/xe/xe_guc_klv_helpers.c @@ -8,6 +8,7 @@ #include "abi/guc_klvs_abi.h" #include "xe_guc_klv_helpers.h" +#include "xe_guc_klv_thresholds_set.h" #define make_u64(hi, lo) ((u64)((u64)(u32)(hi) << 32 | (u32)(lo))) @@ -48,6 +49,17 @@ const char *xe_guc_klv_key_to_string(u16 key) return "begin_db_id"; case GUC_KLV_VF_CFG_BEGIN_CONTEXT_ID_KEY: return "begin_ctx_id"; + + /* VF CFG threshold keys */ +#define define_threshold_key_to_string_case(TAG, NAME, ...) \ + \ + case MAKE_GUC_KLV_VF_CFG_THRESHOLD_KEY(TAG): \ + return #NAME; + + /* private: auto-generated case statements */ + MAKE_XE_GUC_KLV_THRESHOLDS_SET(define_threshold_key_to_string_case) +#undef define_threshold_key_to_string_case + default: return "(unknown)"; } diff --git a/drivers/gpu/drm/xe/xe_guc_klv_helpers.h b/drivers/gpu/drm/xe/xe_guc_klv_helpers.h index b835e0ebe6db..c676d21c173b 100644 --- a/drivers/gpu/drm/xe/xe_guc_klv_helpers.h +++ b/drivers/gpu/drm/xe/xe_guc_klv_helpers.h @@ -6,6 +6,7 @@ #ifndef _XE_GUC_KLV_HELPERS_H_ #define _XE_GUC_KLV_HELPERS_H_ +#include <linux/args.h> #include <linux/types.h> struct drm_printer; @@ -38,6 +39,18 @@ int xe_guc_klv_count(const u32 *klvs, u32 num_dwords); FIELD_PREP_CONST(GUC_KLV_0_LEN, (len))) /** + * MAKE_GUC_KLV_KEY - Prepare KLV KEY name based on unique KLV definition tag. + * @TAG: unique tag of the KLV definition + */ +#define MAKE_GUC_KLV_KEY(TAG) CONCATENATE(CONCATENATE(GUC_KLV_, TAG), _KEY) + +/** + * MAKE_GUC_KLV_LEN - Prepare KLV LEN name based on unique KLV definition tag. + * @TAG: unique tag of the KLV definition + */ +#define MAKE_GUC_KLV_LEN(TAG) CONCATENATE(CONCATENATE(GUC_KLV_, TAG), _LEN) + +/** * PREP_GUC_KLV_TAG - Prepare KLV header value based on unique KLV definition tag. * @TAG: unique tag of the KLV definition * @@ -46,6 +59,6 @@ int xe_guc_klv_count(const u32 *klvs, u32 num_dwords); * Return: value of the KLV header (u32). */ #define PREP_GUC_KLV_TAG(TAG) \ - PREP_GUC_KLV_CONST(GUC_KLV_##TAG##_KEY, GUC_KLV_##TAG##_LEN) + PREP_GUC_KLV_CONST(MAKE_GUC_KLV_KEY(TAG), MAKE_GUC_KLV_LEN(TAG)) #endif diff --git a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set.h b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set.h new file mode 100644 index 000000000000..da0fedbbdbaf --- /dev/null +++ b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set.h @@ -0,0 +1,64 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef _XE_GUC_KLV_THRESHOLDS_SET_H_ +#define _XE_GUC_KLV_THRESHOLDS_SET_H_ + +#include "abi/guc_klvs_abi.h" +#include "xe_guc_klv_helpers.h" +#include "xe_guc_klv_thresholds_set_types.h" + +/** + * MAKE_GUC_KLV_VF_CFG_THRESHOLD_KEY - Prepare the name of the KLV key constant. + * @TAG: unique tag of the GuC threshold KLV key. + */ +#define MAKE_GUC_KLV_VF_CFG_THRESHOLD_KEY(TAG) \ + MAKE_GUC_KLV_KEY(CONCATENATE(VF_CFG_THRESHOLD_, TAG)) + +/** + * xe_guc_klv_threshold_key_to_index - Find index of the tracked GuC threshold. + * @key: GuC threshold KLV key. + * + * This translation is automatically generated using &MAKE_XE_GUC_KLV_THRESHOLDS_SET. + * Return: index of the GuC threshold KLV or -1 if not found. + */ +static inline int xe_guc_klv_threshold_key_to_index(u32 key) +{ + switch (key) { +#define define_xe_guc_klv_threshold_key_to_index_case(TAG, ...) \ + \ + case MAKE_GUC_KLV_VF_CFG_THRESHOLD_KEY(TAG): \ + return MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG); + + /* private: auto-generated case statements */ + MAKE_XE_GUC_KLV_THRESHOLDS_SET(define_xe_guc_klv_threshold_key_to_index_case) + } + return -1; +#undef define_xe_guc_klv_threshold_key_to_index_case +} + +/** + * xe_guc_klv_threshold_index_to_key - Get tracked GuC threshold KLV key. + * @index: GuC threshold KLV index. + * + * This translation is automatically generated using &MAKE_XE_GUC_KLV_THRESHOLDS_SET. + * Return: key of the GuC threshold KLV or 0 on malformed index. + */ +static inline u32 xe_guc_klv_threshold_index_to_key(enum xe_guc_klv_threshold_index index) +{ + switch (index) { +#define define_xe_guc_klv_threshold_index_to_key_case(TAG, ...) \ + \ + case MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG): \ + return MAKE_GUC_KLV_VF_CFG_THRESHOLD_KEY(TAG); + + /* private: auto-generated case statements */ + MAKE_XE_GUC_KLV_THRESHOLDS_SET(define_xe_guc_klv_threshold_index_to_key_case) + } + return 0; /* unreachable */ +#undef define_xe_guc_klv_threshold_index_to_key_case +} + +#endif diff --git a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h new file mode 100644 index 000000000000..0a028c94756d --- /dev/null +++ b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef _XE_GUC_KLV_THRESHOLDS_SET_TYPES_H_ +#define _XE_GUC_KLV_THRESHOLDS_SET_TYPES_H_ + +#include "xe_args.h" + +/** + * MAKE_XE_GUC_KLV_THRESHOLDS_SET - Generate various GuC thresholds definitions. + * @define: name of the inner macro to expand. + * + * The GuC firmware is able to monitor VF's adverse activity and will notify the + * PF driver once any threshold is exceeded. + * + * This super macro allows various conversions between the GuC adverse event + * threshold KLV definitions and the driver code without repeating similar code + * or risking missing some cases. + * + * For each GuC threshold definition, the inner macro &define will be provided + * with the &TAG, that corresponds to the GuC threshold KLV key name defined by + * ABI and the associated &NAME, that may be used in code or debugfs/sysfs:: + * + * define(TAG, NAME) + */ +#define MAKE_XE_GUC_KLV_THRESHOLDS_SET(define) \ + define(CAT_ERR, cat_error_count) \ + define(ENGINE_RESET, engine_reset_count) \ + define(PAGE_FAULT, page_fault_count) \ + define(H2G_STORM, guc_time_us) \ + define(IRQ_STORM, irq_time_us) \ + define(DOORBELL_STORM, doorbell_time_us) \ + /* end */ + +/** + * XE_GUC_KLV_NUM_THRESHOLDS - Number of GuC thresholds KLVs. + * + * Calculated automatically using &MAKE_XE_GUC_KLV_THRESHOLDS_SET. + */ +#define XE_GUC_KLV_NUM_THRESHOLDS \ + (CALL_ARGS(COUNT_ARGS, MAKE_XE_GUC_KLV_THRESHOLDS_SET(ARGS_SEP_COMMA)) - 1) + +/** + * MAKE_XE_GUC_KLV_THRESHOLD_INDEX - Create enumerator name. + * @TAG: unique TAG of the enum xe_guc_klv_threshold_index. + */ +#define MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG) \ + CONCATENATE(XE_GUC_KLV_THRESHOLD_INDEX_, TAG) + +/** + * enum xe_guc_klv_threshold_index - Index of the tracked GuC threshold. + * + * This enum is automatically generated using &MAKE_XE_GUC_KLV_THRESHOLDS_SET. + * All these generated enumerators will only be used by the also generated code. + */ +enum xe_guc_klv_threshold_index { +#define define_xe_guc_klv_threshold_index_enum(TAG, ...) \ + \ + MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG), + + /* private: auto-generated enum definitions */ + MAKE_XE_GUC_KLV_THRESHOLDS_SET(define_xe_guc_klv_threshold_index_enum) +#undef define_xe_guc_klv_threshold_index_enum +}; + +#endif diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c index 509649d0e65e..508f0d39b4ad 100644 --- a/drivers/gpu/drm/xe/xe_guc_pc.c +++ b/drivers/gpu/drm/xe/xe_guc_pc.c @@ -15,6 +15,7 @@ #include "regs/xe_regs.h" #include "xe_bo.h" #include "xe_device.h" +#include "xe_force_wake.h" #include "xe_gt.h" #include "xe_gt_idle.h" #include "xe_gt_sysfs.h" @@ -888,19 +889,16 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc) } /** - * xe_guc_pc_fini - Finalize GuC's Power Conservation component - * @drm: DRM device + * xe_guc_pc_fini_hw - Finalize GuC's Power Conservation component * @arg: opaque pointer that should point to Xe_GuC_PC instance */ -static void xe_guc_pc_fini(struct drm_device *drm, void *arg) +static void xe_guc_pc_fini_hw(void *arg) { struct xe_guc_pc *pc = arg; struct xe_device *xe = pc_to_xe(pc); - if (xe->info.skip_guc_pc) { - xe_gt_idle_disable_c6(pc_to_gt(pc)); + if (xe_device_wedged(xe)) return; - } XE_WARN_ON(xe_force_wake_get(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL)); XE_WARN_ON(xe_guc_pc_gucrc_disable(pc)); @@ -937,5 +935,5 @@ int xe_guc_pc_init(struct xe_guc_pc *pc) pc->bo = bo; - return drmm_add_action_or_reset(&xe->drm, xe_guc_pc_fini, pc); + return devm_add_action_or_reset(xe->drm.dev, xe_guc_pc_fini_hw, pc); } diff --git a/drivers/gpu/drm/xe/xe_guc_pc.h b/drivers/gpu/drm/xe/xe_guc_pc.h index d3680d89490e..532cac985a6d 100644 --- a/drivers/gpu/drm/xe/xe_guc_pc.h +++ b/drivers/gpu/drm/xe/xe_guc_pc.h @@ -6,7 +6,9 @@ #ifndef _XE_GUC_PC_H_ #define _XE_GUC_PC_H_ -#include "xe_guc_pc_types.h" +#include <linux/types.h> + +struct xe_guc_pc; int xe_guc_pc_init(struct xe_guc_pc *pc); int xe_guc_pc_start(struct xe_guc_pc *pc); @@ -27,4 +29,5 @@ enum xe_gt_idle_state xe_guc_pc_c_status(struct xe_guc_pc *pc); u64 xe_guc_pc_rc6_residency(struct xe_guc_pc *pc); u64 xe_guc_pc_mc6_residency(struct xe_guc_pc *pc); void xe_guc_pc_init_early(struct xe_guc_pc *pc); + #endif /* _XE_GUC_PC_H_ */ diff --git a/drivers/gpu/drm/xe/xe_guc_relay.c b/drivers/gpu/drm/xe/xe_guc_relay.c index c0a2d8d5d3b3..ade6162dc259 100644 --- a/drivers/gpu/drm/xe/xe_guc_relay.c +++ b/drivers/gpu/drm/xe/xe_guc_relay.c @@ -19,6 +19,7 @@ #include "xe_device.h" #include "xe_gt.h" #include "xe_gt_sriov_printk.h" +#include "xe_gt_sriov_pf_service.h" #include "xe_guc.h" #include "xe_guc_ct.h" #include "xe_guc_hxg_helpers.h" @@ -664,6 +665,7 @@ static int relay_testloop_action_handler(struct xe_guc_relay *relay, u32 origin, static int relay_action_handler(struct xe_guc_relay *relay, u32 origin, const u32 *msg, u32 len, u32 *response, u32 size) { + struct xe_gt *gt = relay_to_gt(relay); u32 type; int ret; @@ -674,8 +676,10 @@ static int relay_action_handler(struct xe_guc_relay *relay, u32 origin, type = FIELD_GET(GUC_HXG_MSG_0_TYPE, msg[0]); - /* XXX: PF services will be added later */ - ret = -EOPNOTSUPP; + if (IS_SRIOV_PF(relay_to_xe(relay))) + ret = xe_gt_sriov_pf_service_process_request(gt, origin, msg, len, response, size); + else + ret = -EOPNOTSUPP; if (type == GUC_HXG_TYPE_EVENT) relay_assert(relay, ret <= 0); @@ -757,7 +761,14 @@ static void relay_process_incoming_action(struct xe_guc_relay *relay) static bool relay_needs_worker(struct xe_guc_relay *relay) { - return !list_empty(&relay->incoming_actions); + bool is_empty; + + spin_lock(&relay->lock); + is_empty = list_empty(&relay->incoming_actions); + spin_unlock(&relay->lock); + + return !is_empty; + } static void relay_kick_worker(struct xe_guc_relay *relay) diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index e4e3658e6a13..47aab04cf34f 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -35,6 +35,7 @@ #include "xe_macros.h" #include "xe_map.h" #include "xe_mocs.h" +#include "xe_pm.h" #include "xe_ring_ops_types.h" #include "xe_sched_job.h" #include "xe_trace.h" @@ -52,13 +53,14 @@ exec_queue_to_guc(struct xe_exec_queue *q) * engine done being processed). */ #define EXEC_QUEUE_STATE_REGISTERED (1 << 0) -#define ENGINE_STATE_ENABLED (1 << 1) -#define EXEC_QUEUE_STATE_PENDING_ENABLE (1 << 2) +#define EXEC_QUEUE_STATE_ENABLED (1 << 1) +#define EXEC_QUEUE_STATE_PENDING_ENABLE (1 << 2) #define EXEC_QUEUE_STATE_PENDING_DISABLE (1 << 3) #define EXEC_QUEUE_STATE_DESTROYED (1 << 4) -#define ENGINE_STATE_SUSPENDED (1 << 5) -#define EXEC_QUEUE_STATE_RESET (1 << 6) -#define ENGINE_STATE_KILLED (1 << 7) +#define EXEC_QUEUE_STATE_SUSPENDED (1 << 5) +#define EXEC_QUEUE_STATE_RESET (1 << 6) +#define EXEC_QUEUE_STATE_KILLED (1 << 7) +#define EXEC_QUEUE_STATE_WEDGED (1 << 8) static bool exec_queue_registered(struct xe_exec_queue *q) { @@ -77,17 +79,17 @@ static void clear_exec_queue_registered(struct xe_exec_queue *q) static bool exec_queue_enabled(struct xe_exec_queue *q) { - return atomic_read(&q->guc->state) & ENGINE_STATE_ENABLED; + return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_ENABLED; } static void set_exec_queue_enabled(struct xe_exec_queue *q) { - atomic_or(ENGINE_STATE_ENABLED, &q->guc->state); + atomic_or(EXEC_QUEUE_STATE_ENABLED, &q->guc->state); } static void clear_exec_queue_enabled(struct xe_exec_queue *q) { - atomic_and(~ENGINE_STATE_ENABLED, &q->guc->state); + atomic_and(~EXEC_QUEUE_STATE_ENABLED, &q->guc->state); } static bool exec_queue_pending_enable(struct xe_exec_queue *q) @@ -142,17 +144,17 @@ static void set_exec_queue_banned(struct xe_exec_queue *q) static bool exec_queue_suspended(struct xe_exec_queue *q) { - return atomic_read(&q->guc->state) & ENGINE_STATE_SUSPENDED; + return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_SUSPENDED; } static void set_exec_queue_suspended(struct xe_exec_queue *q) { - atomic_or(ENGINE_STATE_SUSPENDED, &q->guc->state); + atomic_or(EXEC_QUEUE_STATE_SUSPENDED, &q->guc->state); } static void clear_exec_queue_suspended(struct xe_exec_queue *q) { - atomic_and(~ENGINE_STATE_SUSPENDED, &q->guc->state); + atomic_and(~EXEC_QUEUE_STATE_SUSPENDED, &q->guc->state); } static bool exec_queue_reset(struct xe_exec_queue *q) @@ -167,17 +169,28 @@ static void set_exec_queue_reset(struct xe_exec_queue *q) static bool exec_queue_killed(struct xe_exec_queue *q) { - return atomic_read(&q->guc->state) & ENGINE_STATE_KILLED; + return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_KILLED; } static void set_exec_queue_killed(struct xe_exec_queue *q) { - atomic_or(ENGINE_STATE_KILLED, &q->guc->state); + atomic_or(EXEC_QUEUE_STATE_KILLED, &q->guc->state); } -static bool exec_queue_killed_or_banned(struct xe_exec_queue *q) +static bool exec_queue_wedged(struct xe_exec_queue *q) { - return exec_queue_killed(q) || exec_queue_banned(q); + return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_WEDGED; +} + +static void set_exec_queue_wedged(struct xe_exec_queue *q) +{ + atomic_or(EXEC_QUEUE_STATE_WEDGED, &q->guc->state); +} + +static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q) +{ + return exec_queue_banned(q) || (atomic_read(&q->guc->state) & + (EXEC_QUEUE_STATE_WEDGED | EXEC_QUEUE_STATE_KILLED)); } #ifdef CONFIG_PROVE_LOCKING @@ -240,6 +253,17 @@ static void guc_submit_fini(struct drm_device *drm, void *arg) free_submit_wq(guc); } +static void guc_submit_wedged_fini(struct drm_device *drm, void *arg) +{ + struct xe_guc *guc = arg; + struct xe_exec_queue *q; + unsigned long index; + + xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) + if (exec_queue_wedged(q)) + xe_exec_queue_put(q); +} + static const struct xe_exec_queue_ops guc_exec_queue_ops; static void primelockdep(struct xe_guc *guc) @@ -250,13 +274,25 @@ static void primelockdep(struct xe_guc *guc) fs_reclaim_acquire(GFP_KERNEL); mutex_lock(&guc->submission_state.lock); - might_lock(&guc->submission_state.suspend.lock); mutex_unlock(&guc->submission_state.lock); fs_reclaim_release(GFP_KERNEL); } -int xe_guc_submit_init(struct xe_guc *guc) +/** + * xe_guc_submit_init() - Initialize GuC submission. + * @guc: the &xe_guc to initialize + * @num_ids: number of GuC context IDs to use + * + * The bare-metal or PF driver can pass ~0 as &num_ids to indicate that all + * GuC context IDs supported by the GuC firmware should be used for submission. + * + * Only VF drivers will have to provide explicit number of GuC context IDs + * that they can use for submission. + * + * Return: 0 on success or a negative error code on failure. + */ +int xe_guc_submit_init(struct xe_guc *guc, unsigned int num_ids) { struct xe_device *xe = guc_to_xe(guc); struct xe_gt *gt = guc_to_gt(guc); @@ -266,7 +302,7 @@ int xe_guc_submit_init(struct xe_guc *guc) if (err) return err; - err = xe_guc_id_mgr_init(&guc->submission_state.idm, ~0); + err = xe_guc_id_mgr_init(&guc->submission_state.idm, num_ids); if (err) return err; @@ -278,9 +314,6 @@ int xe_guc_submit_init(struct xe_guc *guc) xa_init(&guc->submission_state.exec_queue_lookup); - spin_lock_init(&guc->submission_state.suspend.lock); - guc->submission_state.suspend.context = dma_fence_context_alloc(1); - primelockdep(guc); return drmm_add_action_or_reset(&xe->drm, guc_submit_fini, guc); @@ -430,9 +463,9 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \ field_, val_) -static void __register_mlrc_engine(struct xe_guc *guc, - struct xe_exec_queue *q, - struct guc_ctxt_registration_info *info) +static void __register_mlrc_exec_queue(struct xe_guc *guc, + struct xe_exec_queue *q, + struct guc_ctxt_registration_info *info) { #define MAX_MLRC_REG_SIZE (13 + XE_HW_ENGINE_MAX_INSTANCE * 2) struct xe_device *xe = guc_to_xe(guc); @@ -457,7 +490,7 @@ static void __register_mlrc_engine(struct xe_guc *guc, action[len++] = info->hwlrca_hi; for (i = 1; i < q->width; ++i) { - struct xe_lrc *lrc = q->lrc + i; + struct xe_lrc *lrc = q->lrc[i]; action[len++] = lower_32_bits(xe_lrc_descriptor(lrc)); action[len++] = upper_32_bits(xe_lrc_descriptor(lrc)); @@ -469,8 +502,8 @@ static void __register_mlrc_engine(struct xe_guc *guc, xe_guc_ct_send(&guc->ct, action, len, 0, 0); } -static void __register_engine(struct xe_guc *guc, - struct guc_ctxt_registration_info *info) +static void __register_exec_queue(struct xe_guc *guc, + struct guc_ctxt_registration_info *info) { u32 action[] = { XE_GUC_ACTION_REGISTER_CONTEXT, @@ -490,11 +523,11 @@ static void __register_engine(struct xe_guc *guc, xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0); } -static void register_engine(struct xe_exec_queue *q) +static void register_exec_queue(struct xe_exec_queue *q) { struct xe_guc *guc = exec_queue_to_guc(q); struct xe_device *xe = guc_to_xe(guc); - struct xe_lrc *lrc = q->lrc; + struct xe_lrc *lrc = q->lrc[0]; struct guc_ctxt_registration_info info; xe_assert(xe, !exec_queue_registered(q)); @@ -538,9 +571,9 @@ static void register_engine(struct xe_exec_queue *q) set_exec_queue_registered(q); trace_xe_exec_queue_register(q); if (xe_exec_queue_is_parallel(q)) - __register_mlrc_engine(guc, q, &info); + __register_mlrc_exec_queue(guc, q, &info); else - __register_engine(guc, &info); + __register_exec_queue(guc, &info); init_policies(guc, q); } @@ -553,7 +586,7 @@ static int wq_wait_for_space(struct xe_exec_queue *q, u32 wqi_size) { struct xe_guc *guc = exec_queue_to_guc(q); struct xe_device *xe = guc_to_xe(guc); - struct iosys_map map = xe_lrc_parallel_map(q->lrc); + struct iosys_map map = xe_lrc_parallel_map(q->lrc[0]); unsigned int sleep_period_ms = 1; #define AVAILABLE_SPACE \ @@ -581,7 +614,7 @@ static int wq_noop_append(struct xe_exec_queue *q) { struct xe_guc *guc = exec_queue_to_guc(q); struct xe_device *xe = guc_to_xe(guc); - struct iosys_map map = xe_lrc_parallel_map(q->lrc); + struct iosys_map map = xe_lrc_parallel_map(q->lrc[0]); u32 len_dw = wq_space_until_wrap(q) / sizeof(u32) - 1; if (wq_wait_for_space(q, wq_space_until_wrap(q))) @@ -601,7 +634,7 @@ static void wq_item_append(struct xe_exec_queue *q) { struct xe_guc *guc = exec_queue_to_guc(q); struct xe_device *xe = guc_to_xe(guc); - struct iosys_map map = xe_lrc_parallel_map(q->lrc); + struct iosys_map map = xe_lrc_parallel_map(q->lrc[0]); #define WQ_HEADER_SIZE 4 /* Includes 1 LRC address too */ u32 wqi[XE_HW_ENGINE_MAX_INSTANCE + (WQ_HEADER_SIZE - 1)]; u32 wqi_size = (q->width + (WQ_HEADER_SIZE - 1)) * sizeof(u32); @@ -617,12 +650,12 @@ static void wq_item_append(struct xe_exec_queue *q) wqi[i++] = FIELD_PREP(WQ_TYPE_MASK, WQ_TYPE_MULTI_LRC) | FIELD_PREP(WQ_LEN_MASK, len_dw); - wqi[i++] = xe_lrc_descriptor(q->lrc); + wqi[i++] = xe_lrc_descriptor(q->lrc[0]); wqi[i++] = FIELD_PREP(WQ_GUC_ID_MASK, q->guc->id) | - FIELD_PREP(WQ_RING_TAIL_MASK, q->lrc->ring.tail / sizeof(u64)); + FIELD_PREP(WQ_RING_TAIL_MASK, q->lrc[0]->ring.tail / sizeof(u64)); wqi[i++] = 0; for (j = 1; j < q->width; ++j) { - struct xe_lrc *lrc = q->lrc + j; + struct xe_lrc *lrc = q->lrc[j]; wqi[i++] = lrc->ring.tail / sizeof(u64); } @@ -637,7 +670,7 @@ static void wq_item_append(struct xe_exec_queue *q) xe_device_wmb(xe); - map = xe_lrc_parallel_map(q->lrc); + map = xe_lrc_parallel_map(q->lrc[0]); parallel_write(xe, map, wq_desc.tail, q->guc->wqi_tail); } @@ -646,7 +679,7 @@ static void submit_exec_queue(struct xe_exec_queue *q) { struct xe_guc *guc = exec_queue_to_guc(q); struct xe_device *xe = guc_to_xe(guc); - struct xe_lrc *lrc = q->lrc; + struct xe_lrc *lrc = q->lrc[0]; u32 action[3]; u32 g2h_len = 0; u32 num_g2h = 0; @@ -658,7 +691,7 @@ static void submit_exec_queue(struct xe_exec_queue *q) if (xe_exec_queue_is_parallel(q)) wq_item_append(q); else - xe_lrc_write_ctx_reg(lrc, CTX_RING_TAIL, lrc->ring.tail); + xe_lrc_set_ring_tail(lrc, lrc->ring.tail); if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q)) return; @@ -708,9 +741,9 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job) trace_xe_sched_job_run(job); - if (!exec_queue_killed_or_banned(q) && !xe_sched_job_is_error(job)) { + if (!exec_queue_killed_or_banned_or_wedged(q) && !xe_sched_job_is_error(job)) { if (!exec_queue_registered(q)) - register_engine(q); + register_exec_queue(q); if (!lr) /* LR jobs are emitted in the exec IOCTL */ q->ring_ops->emit_job(job); submit_exec_queue(q); @@ -730,6 +763,8 @@ static void guc_exec_queue_free_job(struct drm_sched_job *drm_job) { struct xe_sched_job *job = to_xe_sched_job(drm_job); + xe_exec_queue_update_run_ticks(job->q); + trace_xe_sched_job_free(job); xe_sched_job_put(job); } @@ -781,55 +816,6 @@ static void disable_scheduling_deregister(struct xe_guc *guc, G2H_LEN_DW_DEREGISTER_CONTEXT, 2); } -static void guc_exec_queue_print(struct xe_exec_queue *q, struct drm_printer *p); - -#if IS_ENABLED(CONFIG_DRM_XE_SIMPLE_ERROR_CAPTURE) -static void simple_error_capture(struct xe_exec_queue *q) -{ - struct xe_guc *guc = exec_queue_to_guc(q); - struct xe_device *xe = guc_to_xe(guc); - struct drm_printer p = drm_err_printer(&xe->drm, NULL); - struct xe_hw_engine *hwe; - enum xe_hw_engine_id id; - u32 adj_logical_mask = q->logical_mask; - u32 width_mask = (0x1 << q->width) - 1; - int i; - bool cookie; - - if (q->vm && !q->vm->error_capture.capture_once) { - q->vm->error_capture.capture_once = true; - cookie = dma_fence_begin_signalling(); - for (i = 0; q->width > 1 && i < XE_HW_ENGINE_MAX_INSTANCE;) { - if (adj_logical_mask & BIT(i)) { - adj_logical_mask |= width_mask << i; - i += q->width; - } else { - ++i; - } - } - - if (xe_force_wake_get(gt_to_fw(guc_to_gt(guc)), XE_FORCEWAKE_ALL)) - xe_gt_info(guc_to_gt(guc), - "failed to get forcewake for error capture"); - xe_guc_ct_print(&guc->ct, &p, true); - guc_exec_queue_print(q, &p); - for_each_hw_engine(hwe, guc_to_gt(guc), id) { - if (hwe->class != q->hwe->class || - !(BIT(hwe->logical_instance) & adj_logical_mask)) - continue; - xe_hw_engine_print(hwe, &p); - } - xe_analyze_vm(&p, q->vm, q->gt->info.id); - xe_force_wake_put(gt_to_fw(guc_to_gt(guc)), XE_FORCEWAKE_ALL); - dma_fence_end_signalling(cookie); - } -} -#else -static void simple_error_capture(struct xe_exec_queue *q) -{ -} -#endif - static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q) { struct xe_guc *guc = exec_queue_to_guc(q); @@ -844,6 +830,40 @@ static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q) xe_sched_tdr_queue_imm(&q->guc->sched); } +static bool guc_submit_hint_wedged(struct xe_guc *guc) +{ + struct xe_device *xe = guc_to_xe(guc); + struct xe_exec_queue *q; + unsigned long index; + int err; + + if (xe->wedged.mode != 2) + return false; + + if (xe_device_wedged(xe)) + return true; + + xe_device_declare_wedged(xe); + + xe_guc_submit_reset_prepare(guc); + xe_guc_ct_stop(&guc->ct); + + err = drmm_add_action_or_reset(&guc_to_xe(guc)->drm, + guc_submit_wedged_fini, guc); + if (err) { + drm_err(&xe->drm, "Failed to register xe_guc_submit clean-up on wedged.mode=2. Although device is wedged.\n"); + return true; /* Device is wedged anyway */ + } + + mutex_lock(&guc->submission_state.lock); + xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) + if (xe_exec_queue_get_unless_zero(q)) + set_exec_queue_wedged(q); + mutex_unlock(&guc->submission_state.lock); + + return true; +} + static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w) { struct xe_guc_exec_queue *ge = @@ -852,10 +872,13 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w) struct xe_guc *guc = exec_queue_to_guc(q); struct xe_device *xe = guc_to_xe(guc); struct xe_gpu_scheduler *sched = &ge->sched; + bool wedged; xe_assert(xe, xe_exec_queue_is_lr(q)); trace_xe_exec_queue_lr_cleanup(q); + wedged = guc_submit_hint_wedged(exec_queue_to_guc(q)); + /* Kill the run_job / process_msg entry points */ xe_sched_submission_stop(sched); @@ -870,7 +893,7 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w) * xe_guc_deregister_done_handler() which treats it as an unexpected * state. */ - if (exec_queue_registered(q) && !exec_queue_destroyed(q)) { + if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) { struct xe_guc *guc = exec_queue_to_guc(q); int ret; @@ -905,6 +928,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job) struct xe_device *xe = guc_to_xe(exec_queue_to_guc(q)); int err = -ETIME; int i = 0; + bool wedged; /* * TDR has fired before free job worker. Common if exec queue @@ -916,18 +940,21 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job) return DRM_GPU_SCHED_STAT_NOMINAL; } - drm_notice(&xe->drm, "Timedout job: seqno=%u, guc_id=%d, flags=0x%lx", - xe_sched_job_seqno(job), q->guc->id, q->flags); + drm_notice(&xe->drm, "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx", + xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job), + q->guc->id, q->flags); xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_KERNEL, "Kernel-submitted job timed out\n"); xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q), "VM job timed out on non-killed execqueue\n"); - simple_error_capture(q); - xe_devcoredump(job); + if (!exec_queue_killed(q)) + xe_devcoredump(job); trace_xe_sched_job_timedout(job); + wedged = guc_submit_hint_wedged(exec_queue_to_guc(q)); + /* Kill the run_job entry point */ xe_sched_submission_stop(sched); @@ -935,8 +962,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job) * Kernel jobs should never fail, nor should VM jobs if they do * somethings has gone wrong and the GT needs a reset */ - if (q->flags & EXEC_QUEUE_FLAG_KERNEL || - (q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q))) { + if (!wedged && (q->flags & EXEC_QUEUE_FLAG_KERNEL || + (q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q)))) { if (!xe_sched_invalidate_job(job, 2)) { xe_sched_add_pending_job(sched, job); xe_sched_submission_start(sched); @@ -946,7 +973,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job) } /* Engine state now stable, disable scheduling if needed */ - if (exec_queue_registered(q)) { + if (!wedged && exec_queue_registered(q)) { struct xe_guc *guc = exec_queue_to_guc(q); int ret; @@ -989,6 +1016,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job) */ xe_sched_add_pending_job(sched, job); xe_sched_submission_start(sched); + xe_guc_exec_queue_trigger_cleanup(q); /* Mark all outstanding jobs as bad, thus completing them */ @@ -1011,6 +1039,7 @@ static void __guc_exec_queue_fini_async(struct work_struct *w) struct xe_exec_queue *q = ge->q; struct xe_guc *guc = exec_queue_to_guc(q); + xe_pm_runtime_get(guc_to_xe(guc)); trace_xe_exec_queue_destroy(q); if (xe_exec_queue_is_lr(q)) @@ -1021,6 +1050,7 @@ static void __guc_exec_queue_fini_async(struct work_struct *w) kfree(ge); xe_exec_queue_fini(q); + xe_pm_runtime_put(guc_to_xe(guc)); } static void guc_exec_queue_fini_async(struct xe_exec_queue *q) @@ -1028,7 +1058,7 @@ static void guc_exec_queue_fini_async(struct xe_exec_queue *q) INIT_WORK(&q->guc->fini_async, __guc_exec_queue_fini_async); /* We must block on kernel engines so slabs are empty on driver unload */ - if (q->flags & EXEC_QUEUE_FLAG_PERMANENT) + if (q->flags & EXEC_QUEUE_FLAG_PERMANENT || exec_queue_wedged(q)) __guc_exec_queue_fini_async(&q->guc->fini_async); else queue_work(system_wq, &q->guc->fini_async); @@ -1063,7 +1093,7 @@ static void __guc_exec_queue_process_msg_cleanup(struct xe_sched_msg *msg) static bool guc_exec_queue_allowed_to_change_state(struct xe_exec_queue *q) { - return !exec_queue_killed_or_banned(q) && exec_queue_registered(q); + return !exec_queue_killed_or_banned_or_wedged(q) && exec_queue_registered(q); } static void __guc_exec_queue_process_msg_set_sched_props(struct xe_sched_msg *msg) @@ -1206,7 +1236,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q) msecs_to_jiffies(q->sched_props.job_timeout_ms); err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops, get_submit_wq(guc), - q->lrc[0].ring.size / MAX_JOB_SIZE_BYTES, 64, + q->lrc[0]->ring.size / MAX_JOB_SIZE_BYTES, 64, timeout, guc_to_gt(guc)->ordered_wq, NULL, q->name, gt_to_xe(q->gt)->drm.dev); if (err) @@ -1275,7 +1305,7 @@ static void guc_exec_queue_fini(struct xe_exec_queue *q) { struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_CLEANUP; - if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT)) + if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && !exec_queue_wedged(q)) guc_exec_queue_add_msg(q, msg, CLEANUP); else __guc_exec_queue_fini(exec_queue_to_guc(q), q); @@ -1286,7 +1316,8 @@ static int guc_exec_queue_set_priority(struct xe_exec_queue *q, { struct xe_sched_msg *msg; - if (q->sched_props.priority == priority || exec_queue_killed_or_banned(q)) + if (q->sched_props.priority == priority || + exec_queue_killed_or_banned_or_wedged(q)) return 0; msg = kmalloc(sizeof(*msg), GFP_KERNEL); @@ -1304,7 +1335,7 @@ static int guc_exec_queue_set_timeslice(struct xe_exec_queue *q, u32 timeslice_u struct xe_sched_msg *msg; if (q->sched_props.timeslice_us == timeslice_us || - exec_queue_killed_or_banned(q)) + exec_queue_killed_or_banned_or_wedged(q)) return 0; msg = kmalloc(sizeof(*msg), GFP_KERNEL); @@ -1323,7 +1354,7 @@ static int guc_exec_queue_set_preempt_timeout(struct xe_exec_queue *q, struct xe_sched_msg *msg; if (q->sched_props.preempt_timeout_us == preempt_timeout_us || - exec_queue_killed_or_banned(q)) + exec_queue_killed_or_banned_or_wedged(q)) return 0; msg = kmalloc(sizeof(*msg), GFP_KERNEL); @@ -1340,7 +1371,7 @@ static int guc_exec_queue_suspend(struct xe_exec_queue *q) { struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_SUSPEND; - if (exec_queue_killed_or_banned(q) || q->guc->suspend_pending) + if (exec_queue_killed_or_banned_or_wedged(q) || q->guc->suspend_pending) return -EINVAL; q->guc->suspend_pending = true; @@ -1370,7 +1401,7 @@ static void guc_exec_queue_resume(struct xe_exec_queue *q) static bool guc_exec_queue_reset_status(struct xe_exec_queue *q) { - return exec_queue_reset(q); + return exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q); } /* @@ -1411,7 +1442,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q) set_exec_queue_suspended(q); suspend_fence_signal(q); } - atomic_and(EXEC_QUEUE_STATE_DESTROYED | ENGINE_STATE_SUSPENDED, + atomic_and(EXEC_QUEUE_STATE_DESTROYED | EXEC_QUEUE_STATE_SUSPENDED, &q->guc->state); q->guc->resume_time = 0; trace_xe_exec_queue_stop(q); @@ -1423,15 +1454,23 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q) */ if (!(q->flags & (EXEC_QUEUE_FLAG_KERNEL | EXEC_QUEUE_FLAG_VM))) { struct xe_sched_job *job = xe_sched_first_pending_job(sched); + bool ban = false; if (job) { if ((xe_sched_job_started(job) && !xe_sched_job_completed(job)) || xe_sched_invalidate_job(job, 2)) { trace_xe_sched_job_ban(job); - xe_sched_tdr_queue_imm(&q->guc->sched); - set_exec_queue_banned(q); + ban = true; } + } else if (xe_exec_queue_is_lr(q) && + (xe_lrc_ring_head(q->lrc[0]) != xe_lrc_ring_tail(q->lrc[0]))) { + ban = true; + } + + if (ban) { + set_exec_queue_banned(q); + xe_guc_exec_queue_trigger_cleanup(q); } } } @@ -1459,7 +1498,7 @@ void xe_guc_submit_reset_wait(struct xe_guc *guc) wait_event(guc->ct.wq, !guc_read_stopped(guc)); } -int xe_guc_submit_stop(struct xe_guc *guc) +void xe_guc_submit_stop(struct xe_guc *guc) { struct xe_exec_queue *q; unsigned long index; @@ -1479,19 +1518,18 @@ int xe_guc_submit_stop(struct xe_guc *guc) * creation which is protected by guc->submission_state.lock. */ - return 0; } static void guc_exec_queue_start(struct xe_exec_queue *q) { struct xe_gpu_scheduler *sched = &q->guc->sched; - if (!exec_queue_killed_or_banned(q)) { + if (!exec_queue_killed_or_banned_or_wedged(q)) { int i; trace_xe_exec_queue_resubmit(q); for (i = 0; i < q->width; ++i) - xe_lrc_set_ring_head(q->lrc + i, q->lrc[i].ring.tail); + xe_lrc_set_ring_head(q->lrc[i], q->lrc[i]->ring.tail); xe_sched_resubmit_jobs(sched); } @@ -1643,6 +1681,7 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len) int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len) { + struct xe_gt *gt = guc_to_gt(guc); struct xe_device *xe = guc_to_xe(guc); struct xe_exec_queue *q; u32 guc_id = msg[0]; @@ -1656,7 +1695,8 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len) if (unlikely(!q)) return -EPROTO; - drm_info(&xe->drm, "Engine reset: guc_id=%d", guc_id); + xe_gt_info(gt, "Engine reset: engine_class=%s, logical_mask: 0x%x, guc_id=%d", + xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id); /* FIXME: Do error capture, most likely async */ @@ -1678,6 +1718,7 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len) int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg, u32 len) { + struct xe_gt *gt = guc_to_gt(guc); struct xe_device *xe = guc_to_xe(guc); struct xe_exec_queue *q; u32 guc_id = msg[0]; @@ -1691,7 +1732,9 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg, if (unlikely(!q)) return -EPROTO; - drm_dbg(&xe->drm, "Engine memory cat error: guc_id=%d", guc_id); + xe_gt_dbg(gt, "Engine memory cat error: engine_class=%s, logical_mask: 0x%x, guc_id=%d", + xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id); + trace_xe_exec_queue_memory_cat_error(q); /* Treat the same as engine reset */ @@ -1732,7 +1775,7 @@ guc_exec_queue_wq_snapshot_capture(struct xe_exec_queue *q, { struct xe_guc *guc = exec_queue_to_guc(q); struct xe_device *xe = guc_to_xe(guc); - struct iosys_map map = xe_lrc_parallel_map(q->lrc); + struct iosys_map map = xe_lrc_parallel_map(q->lrc[0]); int i; snapshot->guc.wqi_head = q->guc->wqi_head; @@ -1812,7 +1855,7 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q) if (snapshot->lrc) { for (i = 0; i < q->width; ++i) { - struct xe_lrc *lrc = q->lrc + i; + struct xe_lrc *lrc = q->lrc[i]; snapshot->lrc[i] = xe_lrc_snapshot_capture(lrc); } diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h index fad0421ead36..4ad5f4c1b084 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.h +++ b/drivers/gpu/drm/xe/xe_guc_submit.h @@ -12,11 +12,11 @@ struct drm_printer; struct xe_exec_queue; struct xe_guc; -int xe_guc_submit_init(struct xe_guc *guc); +int xe_guc_submit_init(struct xe_guc *guc, unsigned int num_ids); int xe_guc_submit_reset_prepare(struct xe_guc *guc); void xe_guc_submit_reset_wait(struct xe_guc *guc); -int xe_guc_submit_stop(struct xe_guc *guc); +void xe_guc_submit_stop(struct xe_guc *guc); int xe_guc_submit_start(struct xe_guc *guc); int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len); diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h index 82bd93f7867d..546ac6350a31 100644 --- a/drivers/gpu/drm/xe/xe_guc_types.h +++ b/drivers/gpu/drm/xe/xe_guc_types.h @@ -72,15 +72,6 @@ struct xe_guc { atomic_t stopped; /** @submission_state.lock: protects submission state */ struct mutex lock; - /** @submission_state.suspend: suspend fence state */ - struct { - /** @submission_state.suspend.lock: suspend fences lock */ - spinlock_t lock; - /** @submission_state.suspend.context: suspend fences context */ - u64 context; - /** @submission_state.suspend.seqno: suspend fences seqno */ - u32 seqno; - } suspend; #ifdef CONFIG_PROVE_LOCKING #define NUM_SUBMIT_WQ 256 /** @submission_state.submit_wq_pool: submission ordered workqueues pool */ diff --git a/drivers/gpu/drm/xe/xe_huc.c b/drivers/gpu/drm/xe/xe_huc.c index 39a484a57585..b039ff49341b 100644 --- a/drivers/gpu/drm/xe/xe_huc.c +++ b/drivers/gpu/drm/xe/xe_huc.c @@ -5,6 +5,8 @@ #include "xe_huc.h" +#include <linux/delay.h> + #include <drm/drm_managed.h> #include "abi/gsc_pxp_commands_abi.h" diff --git a/drivers/gpu/drm/xe/xe_huc.h b/drivers/gpu/drm/xe/xe_huc.h index 3ab56cc14b00..fa1c45e70443 100644 --- a/drivers/gpu/drm/xe/xe_huc.h +++ b/drivers/gpu/drm/xe/xe_huc.h @@ -6,9 +6,10 @@ #ifndef _XE_HUC_H_ #define _XE_HUC_H_ -#include "xe_huc_types.h" +#include <linux/types.h> struct drm_printer; +struct xe_huc; enum xe_huc_auth_types { XE_HUC_AUTH_VIA_GUC = 0, diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c index 455f375c1cbd..0a83506e1ad8 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.c +++ b/drivers/gpu/drm/xe/xe_hw_engine.c @@ -18,6 +18,7 @@ #include "xe_gt.h" #include "xe_gt_ccs_mode.h" #include "xe_gt_printk.h" +#include "xe_gt_mcr.h" #include "xe_gt_topology.h" #include "xe_hw_fence.h" #include "xe_irq.h" @@ -267,7 +268,7 @@ static void hw_engine_fini(struct drm_device *drm, void *arg) if (hwe->exl_port) xe_execlist_port_destroy(hwe->exl_port); - xe_lrc_finish(&hwe->kernel_lrc); + xe_lrc_put(hwe->kernel_lrc); hwe->gt = NULL; } @@ -341,7 +342,7 @@ xe_hw_engine_setup_default_lrc_state(struct xe_hw_engine *hwe) u32 blit_cctl_val = REG_FIELD_PREP(BLIT_CCTL_DST_MOCS_MASK, mocs_write_idx) | REG_FIELD_PREP(BLIT_CCTL_SRC_MOCS_MASK, mocs_read_idx); struct xe_rtp_process_ctx ctx = XE_RTP_PROCESS_CTX_INITIALIZER(hwe); - const struct xe_rtp_entry_sr lrc_was[] = { + const struct xe_rtp_entry_sr lrc_setup[] = { /* * Some blitter commands do not have a field for MOCS, those * commands will use MOCS index pointed by BLIT_CCTL. @@ -373,7 +374,7 @@ xe_hw_engine_setup_default_lrc_state(struct xe_hw_engine *hwe) {} }; - xe_rtp_process_to_sr(&ctx, lrc_was, &hwe->reg_lrc); + xe_rtp_process_to_sr(&ctx, lrc_setup, &hwe->reg_lrc); } static void @@ -526,9 +527,11 @@ static int hw_engine_init(struct xe_gt *gt, struct xe_hw_engine *hwe, goto err_name; } - err = xe_lrc_init(&hwe->kernel_lrc, hwe, NULL, NULL, SZ_16K); - if (err) + hwe->kernel_lrc = xe_lrc_create(hwe, NULL, SZ_16K); + if (IS_ERR(hwe->kernel_lrc)) { + err = PTR_ERR(hwe->kernel_lrc); goto err_hwsp; + } if (!xe_device_uc_enabled(xe)) { hwe->exl_port = xe_execlist_port_create(xe, hwe); @@ -553,7 +556,7 @@ static int hw_engine_init(struct xe_gt *gt, struct xe_hw_engine *hwe, return drmm_add_action_or_reset(&xe->drm, hw_engine_fini, hwe); err_kernel_lrc: - xe_lrc_finish(&hwe->kernel_lrc); + xe_lrc_put(hwe->kernel_lrc); err_hwsp: xe_bo_unpin_map_no_vm(hwe->hwsp); err_name: @@ -716,6 +719,11 @@ static void check_gsc_availability(struct xe_gt *gt) */ if (!xe_uc_fw_is_available(>->uc.gsc.fw)) { gt->info.engine_mask &= ~BIT(XE_HW_ENGINE_GSCCS0); + + /* interrupts where previously enabled, so turn them off */ + xe_mmio_write32(gt, GUNIT_GSC_INTR_ENABLE, 0); + xe_mmio_write32(gt, GUNIT_GSC_INTR_MASK, ~0); + drm_info(&xe->drm, "gsccs disabled due to lack of FW\n"); } } @@ -766,6 +774,57 @@ void xe_hw_engine_handle_irq(struct xe_hw_engine *hwe, u16 intr_vec) xe_hw_fence_irq_run(hwe->fence_irq); } +static bool +is_slice_common_per_gslice(struct xe_device *xe) +{ + return GRAPHICS_VERx100(xe) >= 1255; +} + +static void +xe_hw_engine_snapshot_instdone_capture(struct xe_hw_engine *hwe, + struct xe_hw_engine_snapshot *snapshot) +{ + struct xe_gt *gt = hwe->gt; + struct xe_device *xe = gt_to_xe(gt); + unsigned int dss; + u16 group, instance; + + snapshot->reg.instdone.ring = hw_engine_mmio_read32(hwe, RING_INSTDONE(0)); + + if (snapshot->hwe->class != XE_ENGINE_CLASS_RENDER) + return; + + if (is_slice_common_per_gslice(xe) == false) { + snapshot->reg.instdone.slice_common[0] = + xe_mmio_read32(gt, SC_INSTDONE); + snapshot->reg.instdone.slice_common_extra[0] = + xe_mmio_read32(gt, SC_INSTDONE_EXTRA); + snapshot->reg.instdone.slice_common_extra2[0] = + xe_mmio_read32(gt, SC_INSTDONE_EXTRA2); + } else { + for_each_geometry_dss(dss, gt, group, instance) { + snapshot->reg.instdone.slice_common[dss] = + xe_gt_mcr_unicast_read(gt, XEHPG_SC_INSTDONE, group, instance); + snapshot->reg.instdone.slice_common_extra[dss] = + xe_gt_mcr_unicast_read(gt, XEHPG_SC_INSTDONE_EXTRA, group, instance); + snapshot->reg.instdone.slice_common_extra2[dss] = + xe_gt_mcr_unicast_read(gt, XEHPG_SC_INSTDONE_EXTRA2, group, instance); + } + } + + for_each_geometry_dss(dss, gt, group, instance) { + snapshot->reg.instdone.sampler[dss] = + xe_gt_mcr_unicast_read(gt, SAMPLER_INSTDONE, group, instance); + snapshot->reg.instdone.row[dss] = + xe_gt_mcr_unicast_read(gt, ROW_INSTDONE, group, instance); + + if (GRAPHICS_VERx100(xe) >= 1255) + snapshot->reg.instdone.geom_svg[dss] = + xe_gt_mcr_unicast_read(gt, XEHPG_INSTDONE_GEOM_SVGUNIT, + group, instance); + } +} + /** * xe_hw_engine_snapshot_capture - Take a quick snapshot of the HW Engine. * @hwe: Xe HW Engine. @@ -780,6 +839,7 @@ struct xe_hw_engine_snapshot * xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe) { struct xe_hw_engine_snapshot *snapshot; + size_t len; u64 val; if (!xe_hw_engine_is_valid(hwe)) @@ -790,8 +850,30 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe) if (!snapshot) return NULL; + /* Because XE_MAX_DSS_FUSE_BITS is defined in xe_gt_types.h and it + * includes xe_hw_engine_types.h the length of this 3 registers can't be + * set in struct xe_hw_engine_snapshot, so here doing additional + * allocations. + */ + len = (XE_MAX_DSS_FUSE_BITS * sizeof(u32)); + snapshot->reg.instdone.slice_common = kzalloc(len, GFP_ATOMIC); + snapshot->reg.instdone.slice_common_extra = kzalloc(len, GFP_ATOMIC); + snapshot->reg.instdone.slice_common_extra2 = kzalloc(len, GFP_ATOMIC); + snapshot->reg.instdone.sampler = kzalloc(len, GFP_ATOMIC); + snapshot->reg.instdone.row = kzalloc(len, GFP_ATOMIC); + snapshot->reg.instdone.geom_svg = kzalloc(len, GFP_ATOMIC); + if (!snapshot->reg.instdone.slice_common || + !snapshot->reg.instdone.slice_common_extra || + !snapshot->reg.instdone.slice_common_extra2 || + !snapshot->reg.instdone.sampler || + !snapshot->reg.instdone.row || + !snapshot->reg.instdone.geom_svg) { + xe_hw_engine_snapshot_free(snapshot); + return NULL; + } + snapshot->name = kstrdup(hwe->name, GFP_ATOMIC); - snapshot->class = hwe->class; + snapshot->hwe = hwe; snapshot->logical_instance = hwe->logical_instance; snapshot->forcewake.domain = hwe->domain; snapshot->forcewake.ref = xe_force_wake_ref(gt_to_fw(hwe->gt), @@ -828,6 +910,15 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe) snapshot->reg.ring_hwstam = hw_engine_mmio_read32(hwe, RING_HWSTAM(0)); snapshot->reg.ring_hws_pga = hw_engine_mmio_read32(hwe, RING_HWS_PGA(0)); snapshot->reg.ring_start = hw_engine_mmio_read32(hwe, RING_START(0)); + if (GRAPHICS_VERx100(hwe->gt->tile->xe) >= 2000) { + val = hw_engine_mmio_read32(hwe, RING_START_UDW(0)); + snapshot->reg.ring_start |= val << 32; + } + if (xe_gt_has_indirect_ring_state(hwe->gt)) { + snapshot->reg.indirect_ring_state = + hw_engine_mmio_read32(hwe, INDIRECT_RING_STATE(0)); + } + snapshot->reg.ring_head = hw_engine_mmio_read32(hwe, RING_HEAD(0)) & HEAD_ADDR; snapshot->reg.ring_tail = @@ -841,13 +932,57 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe) snapshot->reg.ring_emr = hw_engine_mmio_read32(hwe, RING_EMR(0)); snapshot->reg.ring_eir = hw_engine_mmio_read32(hwe, RING_EIR(0)); snapshot->reg.ipehr = hw_engine_mmio_read32(hwe, RING_IPEHR(0)); + xe_hw_engine_snapshot_instdone_capture(hwe, snapshot); - if (snapshot->class == XE_ENGINE_CLASS_COMPUTE) + if (snapshot->hwe->class == XE_ENGINE_CLASS_COMPUTE) snapshot->reg.rcu_mode = xe_mmio_read32(hwe->gt, RCU_MODE); return snapshot; } +static void +xe_hw_engine_snapshot_instdone_print(struct xe_hw_engine_snapshot *snapshot, struct drm_printer *p) +{ + struct xe_gt *gt = snapshot->hwe->gt; + struct xe_device *xe = gt_to_xe(gt); + u16 group, instance; + unsigned int dss; + + drm_printf(p, "\tRING_INSTDONE: 0x%08x\n", snapshot->reg.instdone.ring); + + if (snapshot->hwe->class != XE_ENGINE_CLASS_RENDER) + return; + + if (is_slice_common_per_gslice(xe) == false) { + drm_printf(p, "\tSC_INSTDONE[0]: 0x%08x\n", + snapshot->reg.instdone.slice_common[0]); + drm_printf(p, "\tSC_INSTDONE_EXTRA[0]: 0x%08x\n", + snapshot->reg.instdone.slice_common_extra[0]); + drm_printf(p, "\tSC_INSTDONE_EXTRA2[0]: 0x%08x\n", + snapshot->reg.instdone.slice_common_extra2[0]); + } else { + for_each_geometry_dss(dss, gt, group, instance) { + drm_printf(p, "\tSC_INSTDONE[%u]: 0x%08x\n", dss, + snapshot->reg.instdone.slice_common[dss]); + drm_printf(p, "\tSC_INSTDONE_EXTRA[%u]: 0x%08x\n", dss, + snapshot->reg.instdone.slice_common_extra[dss]); + drm_printf(p, "\tSC_INSTDONE_EXTRA2[%u]: 0x%08x\n", dss, + snapshot->reg.instdone.slice_common_extra2[dss]); + } + } + + for_each_geometry_dss(dss, gt, group, instance) { + drm_printf(p, "\tSAMPLER_INSTDONE[%u]: 0x%08x\n", dss, + snapshot->reg.instdone.sampler[dss]); + drm_printf(p, "\tROW_INSTDONE[%u]: 0x%08x\n", dss, + snapshot->reg.instdone.row[dss]); + + if (GRAPHICS_VERx100(xe) >= 1255) + drm_printf(p, "\tINSTDONE_GEOM_SVGUNIT[%u]: 0x%08x\n", + dss, snapshot->reg.instdone.geom_svg[dss]); + } +} + /** * xe_hw_engine_snapshot_print - Print out a given Xe HW Engine snapshot. * @snapshot: Xe HW Engine snapshot object. @@ -872,7 +1007,7 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, snapshot->reg.ring_execlist_status); drm_printf(p, "\tRING_EXECLIST_SQ_CONTENTS: 0x%016llx\n", snapshot->reg.ring_execlist_sq_contents); - drm_printf(p, "\tRING_START: 0x%08x\n", snapshot->reg.ring_start); + drm_printf(p, "\tRING_START: 0x%016llx\n", snapshot->reg.ring_start); drm_printf(p, "\tRING_HEAD: 0x%08x\n", snapshot->reg.ring_head); drm_printf(p, "\tRING_TAIL: 0x%08x\n", snapshot->reg.ring_tail); drm_printf(p, "\tRING_CTL: 0x%08x\n", snapshot->reg.ring_ctl); @@ -886,10 +1021,15 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, drm_printf(p, "\tACTHD: 0x%016llx\n", snapshot->reg.ring_acthd); drm_printf(p, "\tBBADDR: 0x%016llx\n", snapshot->reg.ring_bbaddr); drm_printf(p, "\tDMA_FADDR: 0x%016llx\n", snapshot->reg.ring_dma_fadd); + drm_printf(p, "\tINDIRECT_RING_STATE: 0x%08x\n", + snapshot->reg.indirect_ring_state); drm_printf(p, "\tIPEHR: 0x%08x\n", snapshot->reg.ipehr); - if (snapshot->class == XE_ENGINE_CLASS_COMPUTE) + xe_hw_engine_snapshot_instdone_print(snapshot, p); + + if (snapshot->hwe->class == XE_ENGINE_CLASS_COMPUTE) drm_printf(p, "\tRCU_MODE: 0x%08x\n", snapshot->reg.rcu_mode); + drm_puts(p, "\n"); } /** @@ -904,6 +1044,12 @@ void xe_hw_engine_snapshot_free(struct xe_hw_engine_snapshot *snapshot) if (!snapshot) return; + kfree(snapshot->reg.instdone.slice_common); + kfree(snapshot->reg.instdone.slice_common_extra); + kfree(snapshot->reg.instdone.slice_common_extra2); + kfree(snapshot->reg.instdone.sampler); + kfree(snapshot->reg.instdone.row); + kfree(snapshot->reg.instdone.geom_svg); kfree(snapshot->name); kfree(snapshot); } @@ -955,3 +1101,30 @@ bool xe_hw_engine_is_reserved(struct xe_hw_engine *hwe) return xe->info.has_usm && hwe->class == XE_ENGINE_CLASS_COPY && hwe->instance == gt->usm.reserved_bcs_instance; } + +const char *xe_hw_engine_class_to_str(enum xe_engine_class class) +{ + switch (class) { + case XE_ENGINE_CLASS_RENDER: + return "rcs"; + case XE_ENGINE_CLASS_VIDEO_DECODE: + return "vcs"; + case XE_ENGINE_CLASS_VIDEO_ENHANCE: + return "vecs"; + case XE_ENGINE_CLASS_COPY: + return "bcs"; + case XE_ENGINE_CLASS_OTHER: + return "other"; + case XE_ENGINE_CLASS_COMPUTE: + return "ccs"; + case XE_ENGINE_CLASS_MAX: + break; + } + + return NULL; +} + +u64 xe_hw_engine_read_timestamp(struct xe_hw_engine *hwe) +{ + return xe_mmio_read64_2x32(hwe->gt, RING_TIMESTAMP(hwe->mmio_base)); +} diff --git a/drivers/gpu/drm/xe/xe_hw_engine.h b/drivers/gpu/drm/xe/xe_hw_engine.h index 71968ee2f600..7f2d27c0ba1a 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.h +++ b/drivers/gpu/drm/xe/xe_hw_engine.h @@ -67,4 +67,7 @@ static inline bool xe_hw_engine_is_valid(struct xe_hw_engine *hwe) return hwe->name; } +const char *xe_hw_engine_class_to_str(enum xe_engine_class class); +u64 xe_hw_engine_read_timestamp(struct xe_hw_engine *hwe); + #endif diff --git a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c index 844ec68cbbb8..b53e8d2accdb 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c +++ b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c @@ -492,7 +492,7 @@ static const struct attribute * const files[] = { NULL }; -static void kobj_xe_hw_engine_class_fini(struct drm_device *drm, void *arg) +static void kobj_xe_hw_engine_class_fini(void *arg) { struct kobject *kobj = arg; @@ -517,7 +517,7 @@ kobj_xe_hw_engine_class(struct xe_device *xe, struct kobject *parent, const char } keclass->xe = xe; - err = drmm_add_action_or_reset(&xe->drm, kobj_xe_hw_engine_class_fini, + err = devm_add_action_or_reset(xe->drm.dev, kobj_xe_hw_engine_class_fini, &keclass->base); if (err) return NULL; @@ -525,7 +525,7 @@ kobj_xe_hw_engine_class(struct xe_device *xe, struct kobject *parent, const char return keclass; } -static void hw_engine_class_defaults_fini(struct drm_device *drm, void *arg) +static void hw_engine_class_defaults_fini(void *arg) { struct kobject *kobj = arg; @@ -552,7 +552,7 @@ static int xe_add_hw_engine_class_defaults(struct xe_device *xe, if (err) goto err_object; - return drmm_add_action_or_reset(&xe->drm, hw_engine_class_defaults_fini, kobj); + return devm_add_action_or_reset(xe->drm.dev, hw_engine_class_defaults_fini, kobj); err_object: kobject_put(kobj); @@ -611,31 +611,13 @@ static const struct kobj_type xe_hw_engine_sysfs_kobj_type = { .sysfs_ops = &xe_hw_engine_class_sysfs_ops, }; -static void hw_engine_class_sysfs_fini(struct drm_device *drm, void *arg) +static void hw_engine_class_sysfs_fini(void *arg) { struct kobject *kobj = arg; kobject_put(kobj); } -static const char *xe_hw_engine_class_to_str(enum xe_engine_class class) -{ - switch (class) { - case XE_ENGINE_CLASS_RENDER: - return "rcs"; - case XE_ENGINE_CLASS_VIDEO_DECODE: - return "vcs"; - case XE_ENGINE_CLASS_VIDEO_ENHANCE: - return "vecs"; - case XE_ENGINE_CLASS_COPY: - return "bcs"; - case XE_ENGINE_CLASS_COMPUTE: - return "ccs"; - default: - return NULL; - } -} - /** * xe_hw_engine_class_sysfs_init - Init HW engine classes on GT. * @gt: Xe GT. @@ -698,7 +680,7 @@ int xe_hw_engine_class_sysfs_init(struct xe_gt *gt) goto err_object; } - return drmm_add_action_or_reset(&xe->drm, hw_engine_class_sysfs_fini, kobj); + return devm_add_action_or_reset(xe->drm.dev, hw_engine_class_sysfs_fini, kobj); err_object: kobject_put(kobj); diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h index d7f828c76cc5..580bbd7e83b2 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine_types.h +++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h @@ -137,7 +137,7 @@ struct xe_hw_engine { /** @hwsp: hardware status page buffer object */ struct xe_bo *hwsp; /** @kernel_lrc: Kernel LRC (should be replaced /w an xe_engine) */ - struct xe_lrc kernel_lrc; + struct xe_lrc *kernel_lrc; /** @exl_port: execlists port */ struct xe_execlist_port *exl_port; /** @fence_irq: fence IRQ to run when a hw engine IRQ is received */ @@ -158,8 +158,8 @@ struct xe_hw_engine { struct xe_hw_engine_snapshot { /** @name: name of the hw engine */ char *name; - /** @class: class of this hw engine */ - enum xe_engine_class class; + /** @hwe: hw engine */ + struct xe_hw_engine *hwe; /** @logical_instance: logical instance of this hw engine */ u16 logical_instance; /** @forcewake: Force Wake information snapshot */ @@ -188,7 +188,7 @@ struct xe_hw_engine_snapshot { /** @reg.ring_hws_pga: RING_HWS_PGA */ u32 ring_hws_pga; /** @reg.ring_start: RING_START */ - u32 ring_start; + u64 ring_start; /** @reg.ring_head: RING_HEAD */ u32 ring_head; /** @reg.ring_tail: RING_TAIL */ @@ -207,10 +207,28 @@ struct xe_hw_engine_snapshot { u32 ring_emr; /** @reg.ring_eir: RING_EIR */ u32 ring_eir; + /** @reg.indirect_ring_state: INDIRECT_RING_STATE */ + u32 indirect_ring_state; /** @reg.ipehr: IPEHR */ u32 ipehr; /** @reg.rcu_mode: RCU_MODE */ u32 rcu_mode; + struct { + /** @reg.instdone.ring: RING_INSTDONE */ + u32 ring; + /** @reg.instdone.slice_common: SC_INSTDONE */ + u32 *slice_common; + /** @reg.instdone.slice_common_extra: SC_INSTDONE_EXTRA */ + u32 *slice_common_extra; + /** @reg.instdone.slice_common_extra2: SC_INSTDONE_EXTRA2 */ + u32 *slice_common_extra2; + /** @reg.instdone.sampler: SAMPLER_INSTDONE */ + u32 *sampler; + /** @reg.instdone.row: ROW_INSTDONE */ + u32 *row; + /** @reg.instdone.geom_svg: INSTDONE_GEOM_SVGUNIT */ + u32 *geom_svg; + } instdone; } reg; }; diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c index f872ef103127..35c0063a831a 100644 --- a/drivers/gpu/drm/xe/xe_hw_fence.c +++ b/drivers/gpu/drm/xe/xe_hw_fence.c @@ -208,23 +208,58 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence) return container_of(fence, struct xe_hw_fence, dma); } -struct xe_hw_fence *xe_hw_fence_create(struct xe_hw_fence_ctx *ctx, - struct iosys_map seqno_map) +/** + * xe_hw_fence_alloc() - Allocate an hw fence. + * + * Allocate but don't initialize an hw fence. + * + * Return: Pointer to the allocated fence or + * negative error pointer on error. + */ +struct dma_fence *xe_hw_fence_alloc(void) { - struct xe_hw_fence *fence; + struct xe_hw_fence *hw_fence = fence_alloc(); - fence = fence_alloc(); - if (!fence) + if (!hw_fence) return ERR_PTR(-ENOMEM); - fence->ctx = ctx; - fence->seqno_map = seqno_map; - INIT_LIST_HEAD(&fence->irq_link); + return &hw_fence->dma; +} - dma_fence_init(&fence->dma, &xe_hw_fence_ops, &ctx->irq->lock, - ctx->dma_fence_ctx, ctx->next_seqno++); +/** + * xe_hw_fence_free() - Free an hw fence. + * @fence: Pointer to the fence to free. + * + * Frees an hw fence that hasn't yet been + * initialized. + */ +void xe_hw_fence_free(struct dma_fence *fence) +{ + fence_free(&fence->rcu); +} - trace_xe_hw_fence_create(fence); +/** + * xe_hw_fence_init() - Initialize an hw fence. + * @fence: Pointer to the fence to initialize. + * @ctx: Pointer to the struct xe_hw_fence_ctx fence context. + * @seqno_map: Pointer to the map into where the seqno is blitted. + * + * Initializes a pre-allocated hw fence. + * After initialization, the fence is subject to normal + * dma-fence refcounting. + */ +void xe_hw_fence_init(struct dma_fence *fence, struct xe_hw_fence_ctx *ctx, + struct iosys_map seqno_map) +{ + struct xe_hw_fence *hw_fence = + container_of(fence, typeof(*hw_fence), dma); + + hw_fence->ctx = ctx; + hw_fence->seqno_map = seqno_map; + INIT_LIST_HEAD(&hw_fence->irq_link); + + dma_fence_init(fence, &xe_hw_fence_ops, &ctx->irq->lock, + ctx->dma_fence_ctx, ctx->next_seqno++); - return fence; + trace_xe_hw_fence_create(hw_fence); } diff --git a/drivers/gpu/drm/xe/xe_hw_fence.h b/drivers/gpu/drm/xe/xe_hw_fence.h index cfe5fd603787..f13a1c4982c7 100644 --- a/drivers/gpu/drm/xe/xe_hw_fence.h +++ b/drivers/gpu/drm/xe/xe_hw_fence.h @@ -24,7 +24,10 @@ void xe_hw_fence_ctx_init(struct xe_hw_fence_ctx *ctx, struct xe_gt *gt, struct xe_hw_fence_irq *irq, const char *name); void xe_hw_fence_ctx_finish(struct xe_hw_fence_ctx *ctx); -struct xe_hw_fence *xe_hw_fence_create(struct xe_hw_fence_ctx *ctx, - struct iosys_map seqno_map); +struct dma_fence *xe_hw_fence_alloc(void); +void xe_hw_fence_free(struct dma_fence *fence); + +void xe_hw_fence_init(struct dma_fence *fence, struct xe_hw_fence_ctx *ctx, + struct iosys_map seqno_map); #endif diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c index 453e601ddd5e..222c651ee1f8 100644 --- a/drivers/gpu/drm/xe/xe_hwmon.c +++ b/drivers/gpu/drm/xe/xe_hwmon.c @@ -86,19 +86,29 @@ static struct xe_reg xe_hwmon_get_reg(struct xe_hwmon *hwmon, enum xe_hwmon_reg switch (hwmon_reg) { case REG_PKG_RAPL_LIMIT: - if (xe->info.platform == XE_PVC && channel == CHANNEL_PKG) + if (xe->info.platform == XE_BATTLEMAGE) { + if (channel == CHANNEL_PKG) + return BMG_PACKAGE_RAPL_LIMIT; + else + return BMG_PLATFORM_POWER_LIMIT; + } else if (xe->info.platform == XE_PVC && channel == CHANNEL_PKG) { return PVC_GT0_PACKAGE_RAPL_LIMIT; - else if ((xe->info.platform == XE_DG2) && (channel == CHANNEL_PKG)) + } else if ((xe->info.platform == XE_DG2) && (channel == CHANNEL_PKG)) { return PCU_CR_PACKAGE_RAPL_LIMIT; + } break; case REG_PKG_POWER_SKU: - if (xe->info.platform == XE_PVC && channel == CHANNEL_PKG) + if (xe->info.platform == XE_BATTLEMAGE) + return BMG_PACKAGE_POWER_SKU; + else if (xe->info.platform == XE_PVC && channel == CHANNEL_PKG) return PVC_GT0_PACKAGE_POWER_SKU; else if ((xe->info.platform == XE_DG2) && (channel == CHANNEL_PKG)) return PCU_CR_PACKAGE_POWER_SKU; break; case REG_PKG_POWER_SKU_UNIT: - if (xe->info.platform == XE_PVC) + if (xe->info.platform == XE_BATTLEMAGE) + return BMG_PACKAGE_POWER_SKU_UNIT; + else if (xe->info.platform == XE_PVC) return PVC_GT0_PACKAGE_POWER_SKU_UNIT; else if (xe->info.platform == XE_DG2) return PCU_CR_PACKAGE_POWER_SKU_UNIT; @@ -108,10 +118,16 @@ static struct xe_reg xe_hwmon_get_reg(struct xe_hwmon *hwmon, enum xe_hwmon_reg return GT_PERF_STATUS; break; case REG_PKG_ENERGY_STATUS: - if (xe->info.platform == XE_PVC && channel == CHANNEL_PKG) + if (xe->info.platform == XE_BATTLEMAGE) { + if (channel == CHANNEL_PKG) + return BMG_PACKAGE_ENERGY_STATUS; + else + return BMG_PLATFORM_ENERGY_STATUS; + } else if (xe->info.platform == XE_PVC && channel == CHANNEL_PKG) { return PVC_GT0_PLATFORM_ENERGY_STATUS; - else if ((xe->info.platform == XE_DG2) && (channel == CHANNEL_PKG)) + } else if ((xe->info.platform == XE_DG2) && (channel == CHANNEL_PKG)) { return PCU_CR_PACKAGE_ENERGY_STATUS; + } break; default: drm_warn(&xe->drm, "Unknown xe hwmon reg id: %d\n", hwmon_reg); @@ -550,12 +566,17 @@ xe_hwmon_curr_is_visible(const struct xe_hwmon *hwmon, u32 attr, int channel) { u32 uval; + /* hwmon sysfs attribute of current available only for package */ + if (channel != CHANNEL_PKG) + return 0; + switch (attr) { case hwmon_curr_crit: - case hwmon_curr_label: - if (channel == CHANNEL_PKG) return (xe_hwmon_pcode_read_i1(hwmon->gt, &uval) || (uval & POWER_SETUP_I1_WATTS)) ? 0 : 0644; + case hwmon_curr_label: + return (xe_hwmon_pcode_read_i1(hwmon->gt, &uval) || + (uval & POWER_SETUP_I1_WATTS)) ? 0 : 0444; break; default: return 0; diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c index 996806353171..8ee3c300c5e4 100644 --- a/drivers/gpu/drm/xe/xe_irq.c +++ b/drivers/gpu/drm/xe/xe_irq.c @@ -663,7 +663,7 @@ static irq_handler_t xe_irq_handler(struct xe_device *xe) return xelp_irq_handler; } -static void irq_uninstall(struct drm_device *drm, void *arg) +static void irq_uninstall(void *arg) { struct xe_device *xe = arg; struct pci_dev *pdev = to_pci_dev(xe->drm.dev); @@ -723,7 +723,7 @@ int xe_irq_install(struct xe_device *xe) xe_irq_postinstall(xe); - err = drmm_add_action_or_reset(&xe->drm, irq_uninstall, xe); + err = devm_add_action_or_reset(xe->drm.dev, irq_uninstall, xe); if (err) goto free_irq_handler; @@ -737,7 +737,7 @@ free_irq_handler: void xe_irq_shutdown(struct xe_device *xe) { - irq_uninstall(&xe->drm, xe); + irq_uninstall(xe); } void xe_irq_suspend(struct xe_device *xe) diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c index 615bbc372ac6..c1bb85d2e243 100644 --- a/drivers/gpu/drm/xe/xe_lrc.c +++ b/drivers/gpu/drm/xe/xe_lrc.c @@ -11,7 +11,6 @@ #include "instructions/xe_gfxpipe_commands.h" #include "instructions/xe_gfx_state_commands.h" #include "regs/xe_engine_regs.h" -#include "regs/xe_gpu_commands.h" #include "regs/xe_lrc_layout.h" #include "xe_bb.h" #include "xe_bo.h" @@ -34,12 +33,15 @@ #define LRC_ENGINE_CLASS GENMASK_ULL(63, 61) #define LRC_ENGINE_INSTANCE GENMASK_ULL(53, 48) +#define LRC_INDIRECT_RING_STATE_SIZE SZ_4K + struct xe_lrc_snapshot { struct xe_bo *lrc_bo; void *lrc_snapshot; unsigned long lrc_size, lrc_offset; u32 context_desc; + u32 indirect_context_desc; u32 head; struct { u32 internal; @@ -55,20 +57,25 @@ lrc_to_xe(struct xe_lrc *lrc) return gt_to_xe(lrc->fence_ctx.gt); } -size_t xe_lrc_size(struct xe_device *xe, enum xe_engine_class class) +size_t xe_gt_lrc_size(struct xe_gt *gt, enum xe_engine_class class) { + struct xe_device *xe = gt_to_xe(gt); + size_t size; + switch (class) { case XE_ENGINE_CLASS_RENDER: if (GRAPHICS_VER(xe) >= 20) - return 4 * SZ_4K; + size = 4 * SZ_4K; else - return 14 * SZ_4K; + size = 14 * SZ_4K; + break; case XE_ENGINE_CLASS_COMPUTE: /* 14 pages since graphics_ver == 11 */ if (GRAPHICS_VER(xe) >= 20) - return 3 * SZ_4K; + size = 3 * SZ_4K; else - return 14 * SZ_4K; + size = 14 * SZ_4K; + break; default: WARN(1, "Unknown engine class: %d", class); fallthrough; @@ -76,8 +83,14 @@ size_t xe_lrc_size(struct xe_device *xe, enum xe_engine_class class) case XE_ENGINE_CLASS_VIDEO_DECODE: case XE_ENGINE_CLASS_VIDEO_ENHANCE: case XE_ENGINE_CLASS_OTHER: - return 2 * SZ_4K; + size = 2 * SZ_4K; } + + /* Add indirect ring state page */ + if (xe_gt_has_indirect_ring_state(gt)) + size += LRC_INDIRECT_RING_STATE_SIZE; + + return size; } /* @@ -508,6 +521,32 @@ static const u8 xe2_xcs_offsets[] = { 0 }; +static const u8 xe2_indirect_ring_state_offsets[] = { + NOP(1), /* [0x00] */ + LRI(5, POSTED), /* [0x01] */ + REG(0x034), /* [0x02] RING_BUFFER_HEAD */ + REG(0x030), /* [0x04] RING_BUFFER_TAIL */ + REG(0x038), /* [0x06] RING_BUFFER_START */ + REG(0x048), /* [0x08] RING_BUFFER_START_UDW */ + REG(0x03c), /* [0x0a] RING_BUFFER_CONTROL */ + + NOP(5), /* [0x0c] */ + LRI(9, POSTED), /* [0x11] */ + REG(0x168), /* [0x12] BB_ADDR_UDW */ + REG(0x140), /* [0x14] BB_ADDR */ + REG(0x110), /* [0x16] BB_STATE */ + REG16(0x588), /* [0x18] BB_STACK_WRITE_PORT */ + REG16(0x588), /* [0x20] BB_STACK_WRITE_PORT */ + REG16(0x588), /* [0x22] BB_STACK_WRITE_PORT */ + REG16(0x588), /* [0x24] BB_STACK_WRITE_PORT */ + REG16(0x588), /* [0x26] BB_STACK_WRITE_PORT */ + REG16(0x588), /* [0x28] BB_STACK_WRITE_PORT */ + + NOP(12), /* [0x00] */ + + 0 +}; + #undef REG16 #undef REG #undef LRI @@ -546,6 +585,10 @@ static void set_context_control(u32 *regs, struct xe_hw_engine *hwe) regs[CTX_CONTEXT_CONTROL] = _MASKED_BIT_ENABLE(CTX_CTRL_INHIBIT_SYN_CTX_SWITCH | CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT); + if (xe_gt_has_indirect_ring_state(hwe->gt)) + regs[CTX_CONTEXT_CONTROL] |= + _MASKED_BIT_ENABLE(CTX_CTRL_INDIRECT_RING_STATE_ENABLE); + /* TODO: Timestamp */ } @@ -589,6 +632,11 @@ static void reset_stop_ring(u32 *regs, struct xe_hw_engine *hwe) regs[x + 1] |= STOP_RING << 16; } +static inline bool xe_lrc_has_indirect_ring_state(struct xe_lrc *lrc) +{ + return lrc->flags & XE_LRC_FLAG_INDIRECT_RING_STATE; +} + static inline u32 __xe_lrc_ring_offset(struct xe_lrc *lrc) { return 0; @@ -643,6 +691,12 @@ static inline u32 __xe_lrc_regs_offset(struct xe_lrc *lrc) return xe_lrc_pphwsp_offset(lrc) + LRC_PPHWSP_SIZE; } +static inline u32 __xe_lrc_indirect_ring_offset(struct xe_lrc *lrc) +{ + /* Indirect ring state page is at the very end of LRC */ + return lrc->size - LRC_INDIRECT_RING_STATE_SIZE; +} + #define DECL_MAP_ADDR_HELPERS(elem) \ static inline struct iosys_map __xe_lrc_##elem##_map(struct xe_lrc *lrc) \ { \ @@ -663,6 +717,7 @@ DECL_MAP_ADDR_HELPERS(seqno) DECL_MAP_ADDR_HELPERS(regs) DECL_MAP_ADDR_HELPERS(start_seqno) DECL_MAP_ADDR_HELPERS(parallel) +DECL_MAP_ADDR_HELPERS(indirect_ring) #undef DECL_MAP_ADDR_HELPERS @@ -671,6 +726,35 @@ u32 xe_lrc_ggtt_addr(struct xe_lrc *lrc) return __xe_lrc_pphwsp_ggtt_addr(lrc); } +u32 xe_lrc_indirect_ring_ggtt_addr(struct xe_lrc *lrc) +{ + if (!xe_lrc_has_indirect_ring_state(lrc)) + return 0; + + return __xe_lrc_indirect_ring_ggtt_addr(lrc); +} + +static u32 xe_lrc_read_indirect_ctx_reg(struct xe_lrc *lrc, int reg_nr) +{ + struct xe_device *xe = lrc_to_xe(lrc); + struct iosys_map map; + + map = __xe_lrc_indirect_ring_map(lrc); + iosys_map_incr(&map, reg_nr * sizeof(u32)); + return xe_map_read32(xe, &map); +} + +static void xe_lrc_write_indirect_ctx_reg(struct xe_lrc *lrc, + int reg_nr, u32 val) +{ + struct xe_device *xe = lrc_to_xe(lrc); + struct iosys_map map; + + map = __xe_lrc_indirect_ring_map(lrc); + iosys_map_incr(&map, reg_nr * sizeof(u32)); + xe_map_write32(xe, &map, val); +} + u32 xe_lrc_read_ctx_reg(struct xe_lrc *lrc, int reg_nr) { struct xe_device *xe = lrc_to_xe(lrc); @@ -693,20 +777,25 @@ void xe_lrc_write_ctx_reg(struct xe_lrc *lrc, int reg_nr, u32 val) static void *empty_lrc_data(struct xe_hw_engine *hwe) { - struct xe_device *xe = gt_to_xe(hwe->gt); + struct xe_gt *gt = hwe->gt; void *data; u32 *regs; - data = kzalloc(xe_lrc_size(xe, hwe->class), GFP_KERNEL); + data = kzalloc(xe_gt_lrc_size(gt, hwe->class), GFP_KERNEL); if (!data) return NULL; /* 1st page: Per-Process of HW status Page */ regs = data + LRC_PPHWSP_SIZE; - set_offsets(regs, reg_offsets(xe, hwe->class), hwe); + set_offsets(regs, reg_offsets(gt_to_xe(gt), hwe->class), hwe); set_context_control(regs, hwe); set_memory_based_intr(regs, hwe); reset_stop_ring(regs, hwe); + if (xe_gt_has_indirect_ring_state(gt)) { + regs = data + xe_gt_lrc_size(gt, hwe->class) - + LRC_INDIRECT_RING_STATE_SIZE; + set_offsets(regs, xe2_indirect_ring_state_offsets, hwe); + } return data; } @@ -719,11 +808,20 @@ static void xe_lrc_set_ppgtt(struct xe_lrc *lrc, struct xe_vm *vm) xe_lrc_write_ctx_reg(lrc, CTX_PDP0_LDW, lower_32_bits(desc)); } +static void xe_lrc_finish(struct xe_lrc *lrc) +{ + xe_hw_fence_ctx_finish(&lrc->fence_ctx); + xe_bo_lock(lrc->bo, false); + xe_bo_unpin(lrc->bo); + xe_bo_unlock(lrc->bo); + xe_bo_put(lrc->bo); +} + #define PVC_CTX_ASID (0x2e + 1) #define PVC_CTX_ACC_CTR_THOLD (0x2a + 1) -int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, - struct xe_exec_queue *q, struct xe_vm *vm, u32 ring_size) +static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, + struct xe_vm *vm, u32 ring_size) { struct xe_gt *gt = hwe->gt; struct xe_tile *tile = gt_to_tile(gt); @@ -731,26 +829,32 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, struct iosys_map map; void *init_data = NULL; u32 arb_enable; + u32 lrc_size; int err; + kref_init(&lrc->refcount); lrc->flags = 0; + lrc_size = ring_size + xe_gt_lrc_size(gt, hwe->class); + if (xe_gt_has_indirect_ring_state(gt)) + lrc->flags |= XE_LRC_FLAG_INDIRECT_RING_STATE; /* * FIXME: Perma-pinning LRC as we don't yet support moving GGTT address * via VM bind calls. */ - lrc->bo = xe_bo_create_pin_map(xe, tile, vm, - ring_size + xe_lrc_size(xe, hwe->class), - ttm_bo_type_kernel, - XE_BO_FLAG_VRAM_IF_DGFX(tile) | - XE_BO_FLAG_GGTT | - XE_BO_FLAG_GGTT_INVALIDATE); + lrc->bo = xe_bo_create_pin_map(xe, tile, vm, lrc_size, + ttm_bo_type_kernel, + XE_BO_FLAG_VRAM_IF_DGFX(tile) | + XE_BO_FLAG_GGTT | + XE_BO_FLAG_GGTT_INVALIDATE); if (IS_ERR(lrc->bo)) return PTR_ERR(lrc->bo); + lrc->size = lrc_size; lrc->tile = gt_to_tile(hwe->gt); lrc->ring.size = ring_size; lrc->ring.tail = 0; + lrc->ctx_timestamp = 0; xe_hw_fence_ctx_init(&lrc->fence_ctx, hwe->gt, hwe->fence_irq, hwe->name); @@ -772,10 +876,10 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, xe_map_memset(xe, &map, 0, 0, LRC_PPHWSP_SIZE); /* PPHWSP */ xe_map_memcpy_to(xe, &map, LRC_PPHWSP_SIZE, gt->default_lrc[hwe->class] + LRC_PPHWSP_SIZE, - xe_lrc_size(xe, hwe->class) - LRC_PPHWSP_SIZE); + xe_gt_lrc_size(gt, hwe->class) - LRC_PPHWSP_SIZE); } else { xe_map_memcpy_to(xe, &map, 0, init_data, - xe_lrc_size(xe, hwe->class)); + xe_gt_lrc_size(gt, hwe->class)); kfree(init_data); } @@ -786,11 +890,27 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, xe_drm_client_add_bo(vm->xef->client, lrc->bo); } - xe_lrc_write_ctx_reg(lrc, CTX_RING_START, __xe_lrc_ring_ggtt_addr(lrc)); - xe_lrc_write_ctx_reg(lrc, CTX_RING_HEAD, 0); - xe_lrc_write_ctx_reg(lrc, CTX_RING_TAIL, lrc->ring.tail); - xe_lrc_write_ctx_reg(lrc, CTX_RING_CTL, - RING_CTL_SIZE(lrc->ring.size) | RING_VALID); + if (xe_gt_has_indirect_ring_state(gt)) { + xe_lrc_write_ctx_reg(lrc, CTX_INDIRECT_RING_STATE, + __xe_lrc_indirect_ring_ggtt_addr(lrc)); + + xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START, + __xe_lrc_ring_ggtt_addr(lrc)); + xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START_UDW, 0); + xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_HEAD, 0); + xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_TAIL, lrc->ring.tail); + xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_CTL, + RING_CTL_SIZE(lrc->ring.size) | RING_VALID); + } else { + xe_lrc_write_ctx_reg(lrc, CTX_RING_START, __xe_lrc_ring_ggtt_addr(lrc)); + xe_lrc_write_ctx_reg(lrc, CTX_RING_HEAD, 0); + xe_lrc_write_ctx_reg(lrc, CTX_RING_TAIL, lrc->ring.tail); + xe_lrc_write_ctx_reg(lrc, CTX_RING_CTL, + RING_CTL_SIZE(lrc->ring.size) | RING_VALID); + } + + xe_lrc_write_ctx_reg(lrc, CTX_TIMESTAMP, 0); + if (xe->info.has_asid && vm) xe_lrc_write_ctx_reg(lrc, PVC_CTX_ASID, vm->usm.asid); @@ -825,23 +945,81 @@ err_lrc_finish: return err; } -void xe_lrc_finish(struct xe_lrc *lrc) +/** + * xe_lrc_create - Create a LRC + * @hwe: Hardware Engine + * @vm: The VM (address space) + * @ring_size: LRC ring size + * + * Allocate and initialize the Logical Ring Context (LRC). + * + * Return pointer to created LRC upon success and an error pointer + * upon failure. + */ +struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm, + u32 ring_size) { - xe_hw_fence_ctx_finish(&lrc->fence_ctx); - xe_bo_lock(lrc->bo, false); - xe_bo_unpin(lrc->bo); - xe_bo_unlock(lrc->bo); - xe_bo_put(lrc->bo); + struct xe_lrc *lrc; + int err; + + lrc = kzalloc(sizeof(*lrc), GFP_KERNEL); + if (!lrc) + return ERR_PTR(-ENOMEM); + + err = xe_lrc_init(lrc, hwe, vm, ring_size); + if (err) { + kfree(lrc); + return ERR_PTR(err); + } + + return lrc; +} + +/** + * xe_lrc_destroy - Destroy the LRC + * @ref: reference to LRC + * + * Called when ref == 0, release resources held by the Logical Ring Context + * (LRC) and free the LRC memory. + */ +void xe_lrc_destroy(struct kref *ref) +{ + struct xe_lrc *lrc = container_of(ref, struct xe_lrc, refcount); + + xe_lrc_finish(lrc); + kfree(lrc); +} + +void xe_lrc_set_ring_tail(struct xe_lrc *lrc, u32 tail) +{ + if (xe_lrc_has_indirect_ring_state(lrc)) + xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_TAIL, tail); + else + xe_lrc_write_ctx_reg(lrc, CTX_RING_TAIL, tail); +} + +u32 xe_lrc_ring_tail(struct xe_lrc *lrc) +{ + if (xe_lrc_has_indirect_ring_state(lrc)) + return xe_lrc_read_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_TAIL) & TAIL_ADDR; + else + return xe_lrc_read_ctx_reg(lrc, CTX_RING_TAIL) & TAIL_ADDR; } void xe_lrc_set_ring_head(struct xe_lrc *lrc, u32 head) { - xe_lrc_write_ctx_reg(lrc, CTX_RING_HEAD, head); + if (xe_lrc_has_indirect_ring_state(lrc)) + xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_HEAD, head); + else + xe_lrc_write_ctx_reg(lrc, CTX_RING_HEAD, head); } u32 xe_lrc_ring_head(struct xe_lrc *lrc) { - return xe_lrc_read_ctx_reg(lrc, CTX_RING_HEAD) & HEAD_ADDR; + if (xe_lrc_has_indirect_ring_state(lrc)) + return xe_lrc_read_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_HEAD) & HEAD_ADDR; + else + return xe_lrc_read_ctx_reg(lrc, CTX_RING_HEAD) & HEAD_ADDR; } u32 xe_lrc_ring_space(struct xe_lrc *lrc) @@ -901,10 +1079,43 @@ u32 xe_lrc_seqno_ggtt_addr(struct xe_lrc *lrc) return __xe_lrc_seqno_ggtt_addr(lrc); } -struct dma_fence *xe_lrc_create_seqno_fence(struct xe_lrc *lrc) +/** + * xe_lrc_alloc_seqno_fence() - Allocate an lrc seqno fence. + * + * Allocate but don't initialize an lrc seqno fence. + * + * Return: Pointer to the allocated fence or + * negative error pointer on error. + */ +struct dma_fence *xe_lrc_alloc_seqno_fence(void) { - return &xe_hw_fence_create(&lrc->fence_ctx, - __xe_lrc_seqno_map(lrc))->dma; + return xe_hw_fence_alloc(); +} + +/** + * xe_lrc_free_seqno_fence() - Free an lrc seqno fence. + * @fence: Pointer to the fence to free. + * + * Frees an lrc seqno fence that hasn't yet been + * initialized. + */ +void xe_lrc_free_seqno_fence(struct dma_fence *fence) +{ + xe_hw_fence_free(fence); +} + +/** + * xe_lrc_init_seqno_fence() - Initialize an lrc seqno fence. + * @lrc: Pointer to the lrc. + * @fence: Pointer to the fence to initialize. + * + * Initializes a pre-allocated lrc seqno fence. + * After initialization, the fence is subject to normal + * dma-fence refcounting. + */ +void xe_lrc_init_seqno_fence(struct xe_lrc *lrc, struct dma_fence *fence) +{ + xe_hw_fence_init(fence, &lrc->fence_ctx, __xe_lrc_seqno_map(lrc)); } s32 xe_lrc_seqno(struct xe_lrc *lrc) @@ -1214,7 +1425,7 @@ void xe_lrc_dump_default(struct drm_printer *p, * hardware status page. */ dw = gt->default_lrc[hwe_class] + LRC_PPHWSP_SIZE; - remaining_dw = (xe_lrc_size(gt_to_xe(gt), hwe_class) - LRC_PPHWSP_SIZE) / 4; + remaining_dw = (xe_gt_lrc_size(gt, hwe_class) - LRC_PPHWSP_SIZE) / 4; while (remaining_dw > 0) { if ((*dw & XE_INSTR_CMD_TYPE) == XE_INSTR_MI) { @@ -1354,10 +1565,11 @@ struct xe_lrc_snapshot *xe_lrc_snapshot_capture(struct xe_lrc *lrc) if (!snapshot) return NULL; - snapshot->context_desc = lower_32_bits(xe_lrc_ggtt_addr(lrc)); + snapshot->context_desc = xe_lrc_ggtt_addr(lrc); + snapshot->indirect_context_desc = xe_lrc_indirect_ring_ggtt_addr(lrc); snapshot->head = xe_lrc_ring_head(lrc); snapshot->tail.internal = lrc->ring.tail; - snapshot->tail.memory = xe_lrc_read_ctx_reg(lrc, CTX_RING_TAIL); + snapshot->tail.memory = xe_lrc_ring_tail(lrc); snapshot->start_seqno = xe_lrc_start_seqno(lrc); snapshot->seqno = xe_lrc_seqno(lrc); snapshot->lrc_bo = xe_bo_get(lrc->bo); @@ -1382,7 +1594,7 @@ void xe_lrc_snapshot_capture_delayed(struct xe_lrc_snapshot *snapshot) if (!snapshot->lrc_snapshot) goto put_bo; - dma_resv_lock(bo->ttm.base.resv, NULL); + xe_bo_lock(bo, false); if (!ttm_bo_vmap(&bo->ttm, &src)) { xe_map_memcpy_from(xe_bo_device(bo), snapshot->lrc_snapshot, &src, snapshot->lrc_offset, @@ -1392,7 +1604,7 @@ void xe_lrc_snapshot_capture_delayed(struct xe_lrc_snapshot *snapshot) kvfree(snapshot->lrc_snapshot); snapshot->lrc_snapshot = NULL; } - dma_resv_unlock(bo->ttm.base.resv); + xe_bo_unlock(bo); put_bo: xe_bo_put(bo); } @@ -1405,6 +1617,8 @@ void xe_lrc_snapshot_print(struct xe_lrc_snapshot *snapshot, struct drm_printer return; drm_printf(p, "\tHW Context Desc: 0x%08x\n", snapshot->context_desc); + drm_printf(p, "\tHW Indirect Ring State: 0x%08x\n", + snapshot->indirect_context_desc); drm_printf(p, "\tLRC Head: (memory) %u\n", snapshot->head); drm_printf(p, "\tLRC Tail: (internal) %u, (memory) %u\n", snapshot->tail.internal, snapshot->tail.memory); @@ -1444,3 +1658,12 @@ void xe_lrc_snapshot_free(struct xe_lrc_snapshot *snapshot) xe_bo_put(snapshot->lrc_bo); kfree(snapshot); } + +u32 xe_lrc_update_timestamp(struct xe_lrc *lrc, u32 *old_ts) +{ + *old_ts = lrc->ctx_timestamp; + + lrc->ctx_timestamp = xe_lrc_read_ctx_reg(lrc, CTX_TIMESTAMP); + + return lrc->ctx_timestamp; +} diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h index d32fa31faa2c..882c3437ba5c 100644 --- a/drivers/gpu/drm/xe/xe_lrc.h +++ b/drivers/gpu/drm/xe/xe_lrc.h @@ -5,6 +5,8 @@ #ifndef _XE_LRC_H_ #define _XE_LRC_H_ +#include <linux/types.h> + #include "xe_lrc_types.h" struct drm_printer; @@ -12,23 +14,53 @@ struct xe_bb; struct xe_device; struct xe_exec_queue; enum xe_engine_class; +struct xe_gt; struct xe_hw_engine; +struct xe_lrc; +struct xe_lrc_snapshot; struct xe_vm; #define LRC_PPHWSP_SCRATCH_ADDR (0x34 * 4) -int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, - struct xe_exec_queue *q, struct xe_vm *vm, u32 ring_size); -void xe_lrc_finish(struct xe_lrc *lrc); +struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm, + u32 ring_size); +void xe_lrc_destroy(struct kref *ref); + +/** + * xe_lrc_get - Get reference to the LRC + * @lrc: Logical Ring Context + * + * Increment reference count of @lrc + */ +static inline struct xe_lrc *xe_lrc_get(struct xe_lrc *lrc) +{ + kref_get(&lrc->refcount); + return lrc; +} + +/** + * xe_lrc_put - Put reference of the LRC + * @lrc: Logical Ring Context + * + * Decrement reference count of @lrc, call xe_lrc_destroy when + * reference count reaches 0. + */ +static inline void xe_lrc_put(struct xe_lrc *lrc) +{ + kref_put(&lrc->refcount, xe_lrc_destroy); +} -size_t xe_lrc_size(struct xe_device *xe, enum xe_engine_class class); +size_t xe_gt_lrc_size(struct xe_gt *gt, enum xe_engine_class class); u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc); +void xe_lrc_set_ring_tail(struct xe_lrc *lrc, u32 tail); +u32 xe_lrc_ring_tail(struct xe_lrc *lrc); void xe_lrc_set_ring_head(struct xe_lrc *lrc, u32 head); u32 xe_lrc_ring_head(struct xe_lrc *lrc); u32 xe_lrc_ring_space(struct xe_lrc *lrc); void xe_lrc_write_ring(struct xe_lrc *lrc, const void *data, size_t size); +u32 xe_lrc_indirect_ring_ggtt_addr(struct xe_lrc *lrc); u32 xe_lrc_ggtt_addr(struct xe_lrc *lrc); u32 *xe_lrc_regs(struct xe_lrc *lrc); @@ -38,7 +70,9 @@ void xe_lrc_write_ctx_reg(struct xe_lrc *lrc, int reg_nr, u32 val); u64 xe_lrc_descriptor(struct xe_lrc *lrc); u32 xe_lrc_seqno_ggtt_addr(struct xe_lrc *lrc); -struct dma_fence *xe_lrc_create_seqno_fence(struct xe_lrc *lrc); +struct dma_fence *xe_lrc_alloc_seqno_fence(void); +void xe_lrc_free_seqno_fence(struct dma_fence *fence); +void xe_lrc_init_seqno_fence(struct xe_lrc *lrc, struct dma_fence *fence); s32 xe_lrc_seqno(struct xe_lrc *lrc); u32 xe_lrc_start_seqno_ggtt_addr(struct xe_lrc *lrc); @@ -60,4 +94,18 @@ void xe_lrc_snapshot_capture_delayed(struct xe_lrc_snapshot *snapshot); void xe_lrc_snapshot_print(struct xe_lrc_snapshot *snapshot, struct drm_printer *p); void xe_lrc_snapshot_free(struct xe_lrc_snapshot *snapshot); +/** + * xe_lrc_update_timestamp - readout LRC timestamp and update cached value + * @lrc: logical ring context for this exec queue + * @old_ts: pointer where to save the previous timestamp + * + * Read the current timestamp for this LRC and update the cached value. The + * previous cached value is also returned in @old_ts so the caller can calculate + * the delta between 2 updates. Note that this is not intended to be called from + * any place, but just by the paths updating the drm client utilization. + * + * Returns the current LRC timestamp + */ +u32 xe_lrc_update_timestamp(struct xe_lrc *lrc, u32 *old_ts); + #endif diff --git a/drivers/gpu/drm/xe/xe_lrc_types.h b/drivers/gpu/drm/xe/xe_lrc_types.h index b716df0dfb4e..71ecb453f811 100644 --- a/drivers/gpu/drm/xe/xe_lrc_types.h +++ b/drivers/gpu/drm/xe/xe_lrc_types.h @@ -6,6 +6,8 @@ #ifndef _XE_LRC_TYPES_H_ #define _XE_LRC_TYPES_H_ +#include <linux/kref.h> + #include "xe_hw_fence_types.h" struct xe_bo; @@ -20,12 +22,19 @@ struct xe_lrc { */ struct xe_bo *bo; + /** @size: size of lrc including any indirect ring state page */ + u32 size; + /** @tile: tile which this LRC belongs to */ struct xe_tile *tile; /** @flags: LRC flags */ +#define XE_LRC_FLAG_INDIRECT_RING_STATE 0x1 u32 flags; + /** @refcount: ref count of this lrc */ + struct kref refcount; + /** @ring: submission ring state */ struct { /** @ring.size: size of submission ring */ @@ -41,6 +50,9 @@ struct xe_lrc { /** @fence_ctx: context for hw fence */ struct xe_hw_fence_ctx fence_ctx; + + /** @ctx_timestamp: readout value of CTX_TIMESTAMP on last update */ + u32 ctx_timestamp; }; struct xe_lrc_snapshot; diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c index 65e5a3f4c340..7e3fb33110d9 100644 --- a/drivers/gpu/drm/xe/xe_migrate.c +++ b/drivers/gpu/drm/xe/xe_migrate.c @@ -14,8 +14,8 @@ #include <generated/xe_wa_oob.h> +#include "instructions/xe_gpu_commands.h" #include "instructions/xe_mi_commands.h" -#include "regs/xe_gpu_commands.h" #include "regs/xe_gtt_defs.h" #include "tests/xe_test.h" #include "xe_assert.h" @@ -69,7 +69,7 @@ struct xe_migrate { #define MAX_PREEMPTDISABLE_TRANSFER SZ_8M /* Around 1ms. */ #define MAX_CCS_LIMITED_TRANSFER SZ_4M /* XE_PAGE_SIZE * (FIELD_MAX(XE2_CCS_SIZE_MASK) + 1) */ -#define NUM_KERNEL_PDE 17 +#define NUM_KERNEL_PDE 15 #define NUM_PT_SLOTS 32 #define LEVEL0_PAGE_TABLE_ENCODE_SIZE SZ_2M #define MAX_NUM_PTE 512 @@ -137,10 +137,11 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m, struct xe_device *xe = tile_to_xe(tile); u16 pat_index = xe->pat.idx[XE_CACHE_WB]; u8 id = tile->id; - u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level; + u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level, + num_setup = num_level + 1; u32 map_ofs, level, i; struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo; - u64 entry; + u64 entry, pt30_ofs; /* Can't bump NUM_PT_SLOTS too high */ BUILD_BUG_ON(NUM_PT_SLOTS > SZ_2M/XE_PAGE_SIZE); @@ -160,10 +161,12 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m, if (IS_ERR(bo)) return PTR_ERR(bo); - entry = vm->pt_ops->pde_encode_bo(bo, bo->size - XE_PAGE_SIZE, pat_index); + /* PT31 reserved for 2M identity map */ + pt30_ofs = bo->size - 2 * XE_PAGE_SIZE; + entry = vm->pt_ops->pde_encode_bo(bo, pt30_ofs, pat_index); xe_pt_write(xe, &vm->pt_root[id]->bo->vmap, 0, entry); - map_ofs = (num_entries - num_level) * XE_PAGE_SIZE; + map_ofs = (num_entries - num_setup) * XE_PAGE_SIZE; /* Map the entire BO in our level 0 pt */ for (i = 0, level = 0; i < num_entries; level++) { @@ -234,7 +237,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m, } /* Write PDE's that point to our BO. */ - for (i = 0; i < num_entries - num_level; i++) { + for (i = 0; i < map_ofs / PAGE_SIZE; i++) { entry = vm->pt_ops->pde_encode_bo(bo, (u64)i * XE_PAGE_SIZE, pat_index); @@ -252,28 +255,54 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m, /* Identity map the entire vram at 256GiB offset */ if (IS_DGFX(xe)) { u64 pos, ofs, flags; + /* XXX: Unclear if this should be usable_size? */ + u64 vram_limit = xe->mem.vram.actual_physical_size + + xe->mem.vram.dpa_base; level = 2; ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8; flags = vm->pt_ops->pte_encode_addr(xe, 0, pat_index, level, true, 0); + xe_assert(xe, IS_ALIGNED(xe->mem.vram.usable_size, SZ_2M)); + /* - * Use 1GB pages, it shouldn't matter the physical amount of - * vram is less, when we don't access it. + * Use 1GB pages when possible, last chunk always use 2M + * pages as mixing reserved memory (stolen, WOCPM) with a single + * mapping is not allowed on certain platforms. */ - for (pos = xe->mem.vram.dpa_base; - pos < xe->mem.vram.actual_physical_size + xe->mem.vram.dpa_base; - pos += SZ_1G, ofs += 8) + for (pos = xe->mem.vram.dpa_base; pos < vram_limit; + pos += SZ_1G, ofs += 8) { + if (pos + SZ_1G >= vram_limit) { + u64 pt31_ofs = bo->size - XE_PAGE_SIZE; + + entry = vm->pt_ops->pde_encode_bo(bo, pt31_ofs, + pat_index); + xe_map_wr(xe, &bo->vmap, ofs, u64, entry); + + flags = vm->pt_ops->pte_encode_addr(xe, 0, + pat_index, + level - 1, + true, 0); + + for (ofs = pt31_ofs; pos < vram_limit; + pos += SZ_2M, ofs += 8) + xe_map_wr(xe, &bo->vmap, ofs, u64, pos | flags); + break; /* Ensure pos == vram_limit assert correct */ + } + xe_map_wr(xe, &bo->vmap, ofs, u64, pos | flags); + } + + xe_assert(xe, pos == vram_limit); } /* * Example layout created above, with root level = 3: * [PT0...PT7]: kernel PT's for copy/clear; 64 or 4KiB PTE's * [PT8]: Kernel PT for VM_BIND, 4 KiB PTE's - * [PT9...PT28]: Userspace PT's for VM_BIND, 4 KiB PTE's - * [PT29 = PDE 0] [PT30 = PDE 1] [PT31 = PDE 2] + * [PT9...PT27]: Userspace PT's for VM_BIND, 4 KiB PTE's + * [PT28 = PDE 0] [PT29 = PDE 1] [PT30 = PDE 2] [PT31 = 2M vram identity map] * * This makes the lowest part of the VM point to the pagetables. * Hence the lowest 2M in the vm should point to itself, with a few writes @@ -383,6 +412,9 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile) } mutex_init(&m->job_mutex); + fs_reclaim_acquire(GFP_KERNEL); + might_lock(&m->job_mutex); + fs_reclaim_release(GFP_KERNEL); err = drmm_add_action_or_reset(&xe->drm, xe_migrate_fini, m); if (err) @@ -807,7 +839,6 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m, IS_DGFX(xe) ? dst_is_vram : dst_is_pltt, src_L0, ccs_ofs, copy_ccs); - mutex_lock(&m->job_mutex); job = xe_bb_create_migration_job(m->q, bb, xe_migrate_batch_base(m, usm), update_idx); @@ -827,6 +858,7 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m, goto err_job; } + mutex_lock(&m->job_mutex); xe_sched_job_arm(job); dma_fence_put(fence); fence = dma_fence_get(&job->drm.s_fence->finished); @@ -844,7 +876,6 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m, err_job: xe_sched_job_put(job); err: - mutex_unlock(&m->job_mutex); xe_bb_free(bb, NULL); err_sync: @@ -934,8 +965,8 @@ static bool has_service_copy_support(struct xe_gt *gt) * all of the actual service copy engines (BCS1-BCS8) have been fused * off. */ - return gt->info.__engine_mask & GENMASK(XE_HW_ENGINE_BCS8, - XE_HW_ENGINE_BCS1); + return gt->info.engine_mask & GENMASK(XE_HW_ENGINE_BCS8, + XE_HW_ENGINE_BCS1); } static u32 emit_clear_cmd_len(struct xe_gt *gt) @@ -1044,7 +1075,6 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, flush_flags = MI_FLUSH_DW_CCS; } - mutex_lock(&m->job_mutex); job = xe_bb_create_migration_job(m->q, bb, xe_migrate_batch_base(m, usm), update_idx); @@ -1067,6 +1097,7 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, goto err_job; } + mutex_lock(&m->job_mutex); xe_sched_job_arm(job); dma_fence_put(fence); fence = dma_fence_get(&job->drm.s_fence->finished); @@ -1083,7 +1114,6 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, err_job: xe_sched_job_put(job); err: - mutex_unlock(&m->job_mutex); xe_bb_free(bb, NULL); err_sync: /* Sync partial copies if any. FIXME: job_mutex? */ @@ -1377,9 +1407,6 @@ xe_migrate_update_pgtables(struct xe_migrate *m, write_pgtable(tile, bb, 0, &updates[i], pt_update); } - if (!q) - mutex_lock(&m->job_mutex); - job = xe_bb_create_migration_job(q ?: m->q, bb, xe_migrate_batch_base(m, usm), update_idx); @@ -1420,6 +1447,9 @@ xe_migrate_update_pgtables(struct xe_migrate *m, if (err) goto err_job; } + if (!q) + mutex_lock(&m->job_mutex); + xe_sched_job_arm(job); fence = dma_fence_get(&job->drm.s_fence->finished); xe_sched_job_push(job); @@ -1435,8 +1465,6 @@ xe_migrate_update_pgtables(struct xe_migrate *m, err_job: xe_sched_job_put(job); err_bb: - if (!q) - mutex_unlock(&m->job_mutex); xe_bb_free(bb, NULL); err: drm_suballoc_free(sa_bo, NULL); diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c index 334637511e75..7962eeb9adb7 100644 --- a/drivers/gpu/drm/xe/xe_mmio.c +++ b/drivers/gpu/drm/xe/xe_mmio.c @@ -3,337 +3,36 @@ * Copyright © 2021-2023 Intel Corporation */ -#include <linux/minmax.h> - #include "xe_mmio.h" +#include <linux/delay.h> +#include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/minmax.h> +#include <linux/pci.h> + #include <drm/drm_managed.h> -#include <drm/xe_drm.h> +#include <drm/drm_print.h> -#include "regs/xe_engine_regs.h" -#include "regs/xe_gt_regs.h" +#include "regs/xe_bars.h" #include "regs/xe_regs.h" -#include "xe_bo.h" #include "xe_device.h" -#include "xe_ggtt.h" #include "xe_gt.h" -#include "xe_gt_mcr.h" +#include "xe_gt_printk.h" +#include "xe_gt_sriov_vf.h" #include "xe_macros.h" -#include "xe_module.h" #include "xe_sriov.h" -#include "xe_tile.h" - -#define XEHP_MTCFG_ADDR XE_REG(0x101800) -#define TILE_COUNT REG_GENMASK(15, 8) - -#define BAR_SIZE_SHIFT 20 - -static void -_resize_bar(struct xe_device *xe, int resno, resource_size_t size) -{ - struct pci_dev *pdev = to_pci_dev(xe->drm.dev); - int bar_size = pci_rebar_bytes_to_size(size); - int ret; - - if (pci_resource_len(pdev, resno)) - pci_release_resource(pdev, resno); - - ret = pci_resize_resource(pdev, resno, bar_size); - if (ret) { - drm_info(&xe->drm, "Failed to resize BAR%d to %dM (%pe). Consider enabling 'Resizable BAR' support in your BIOS\n", - resno, 1 << bar_size, ERR_PTR(ret)); - return; - } - - drm_info(&xe->drm, "BAR%d resized to %dM\n", resno, 1 << bar_size); -} - -/* - * if force_vram_bar_size is set, attempt to set to the requested size - * else set to maximum possible size - */ -static void xe_resize_vram_bar(struct xe_device *xe) -{ - u64 force_vram_bar_size = xe_modparam.force_vram_bar_size; - struct pci_dev *pdev = to_pci_dev(xe->drm.dev); - struct pci_bus *root = pdev->bus; - resource_size_t current_size; - resource_size_t rebar_size; - struct resource *root_res; - u32 bar_size_mask; - u32 pci_cmd; - int i; - - /* gather some relevant info */ - current_size = pci_resource_len(pdev, LMEM_BAR); - bar_size_mask = pci_rebar_get_possible_sizes(pdev, LMEM_BAR); - - if (!bar_size_mask) - return; - - /* set to a specific size? */ - if (force_vram_bar_size) { - u32 bar_size_bit; - - rebar_size = force_vram_bar_size * (resource_size_t)SZ_1M; - - bar_size_bit = bar_size_mask & BIT(pci_rebar_bytes_to_size(rebar_size)); - - if (!bar_size_bit) { - drm_info(&xe->drm, - "Requested size: %lluMiB is not supported by rebar sizes: 0x%x. Leaving default: %lluMiB\n", - (u64)rebar_size >> 20, bar_size_mask, (u64)current_size >> 20); - return; - } - - rebar_size = 1ULL << (__fls(bar_size_bit) + BAR_SIZE_SHIFT); - - if (rebar_size == current_size) - return; - } else { - rebar_size = 1ULL << (__fls(bar_size_mask) + BAR_SIZE_SHIFT); - - /* only resize if larger than current */ - if (rebar_size <= current_size) - return; - } - - drm_info(&xe->drm, "Attempting to resize bar from %lluMiB -> %lluMiB\n", - (u64)current_size >> 20, (u64)rebar_size >> 20); - - while (root->parent) - root = root->parent; - - pci_bus_for_each_resource(root, root_res, i) { - if (root_res && root_res->flags & (IORESOURCE_MEM | IORESOURCE_MEM_64) && - (u64)root_res->start > 0x100000000ul) - break; - } - - if (!root_res) { - drm_info(&xe->drm, "Can't resize VRAM BAR - platform support is missing. Consider enabling 'Resizable BAR' support in your BIOS\n"); - return; - } - - pci_read_config_dword(pdev, PCI_COMMAND, &pci_cmd); - pci_write_config_dword(pdev, PCI_COMMAND, pci_cmd & ~PCI_COMMAND_MEMORY); - - _resize_bar(xe, LMEM_BAR, rebar_size); - - pci_assign_unassigned_bus_resources(pdev->bus); - pci_write_config_dword(pdev, PCI_COMMAND, pci_cmd); -} - -static bool xe_pci_resource_valid(struct pci_dev *pdev, int bar) -{ - if (!pci_resource_flags(pdev, bar)) - return false; - - if (pci_resource_flags(pdev, bar) & IORESOURCE_UNSET) - return false; - - if (!pci_resource_len(pdev, bar)) - return false; - - return true; -} - -static int xe_determine_lmem_bar_size(struct xe_device *xe) -{ - struct pci_dev *pdev = to_pci_dev(xe->drm.dev); - - if (!xe_pci_resource_valid(pdev, LMEM_BAR)) { - drm_err(&xe->drm, "pci resource is not valid\n"); - return -ENXIO; - } - - xe_resize_vram_bar(xe); - - xe->mem.vram.io_start = pci_resource_start(pdev, LMEM_BAR); - xe->mem.vram.io_size = pci_resource_len(pdev, LMEM_BAR); - if (!xe->mem.vram.io_size) - return -EIO; - - /* XXX: Need to change when xe link code is ready */ - xe->mem.vram.dpa_base = 0; - - /* set up a map to the total memory area. */ - xe->mem.vram.mapping = ioremap_wc(xe->mem.vram.io_start, xe->mem.vram.io_size); - - return 0; -} - -static inline u64 get_flat_ccs_offset(struct xe_gt *gt, u64 tile_size) -{ - struct xe_device *xe = gt_to_xe(gt); - u64 offset; - u32 reg; - - if (GRAPHICS_VER(xe) >= 20) { - u64 ccs_size = tile_size / 512; - u64 offset_hi, offset_lo; - u32 nodes, num_enabled; - - reg = xe_mmio_read32(gt, MIRROR_FUSE3); - nodes = REG_FIELD_GET(XE2_NODE_ENABLE_MASK, reg); - num_enabled = hweight32(nodes); /* Number of enabled l3 nodes */ - - reg = xe_gt_mcr_unicast_read_any(gt, XE2_FLAT_CCS_BASE_RANGE_LOWER); - offset_lo = REG_FIELD_GET(XE2_FLAT_CCS_BASE_LOWER_ADDR_MASK, reg); - - reg = xe_gt_mcr_unicast_read_any(gt, XE2_FLAT_CCS_BASE_RANGE_UPPER); - offset_hi = REG_FIELD_GET(XE2_FLAT_CCS_BASE_UPPER_ADDR_MASK, reg); - - offset = offset_hi << 32; /* HW view bits 39:32 */ - offset |= offset_lo << 6; /* HW view bits 31:6 */ - offset *= num_enabled; /* convert to SW view */ - - /* We don't expect any holes */ - xe_assert_msg(xe, offset == (xe_mmio_read64_2x32(gt, GSMBASE) - ccs_size), - "Hole between CCS and GSM.\n"); - } else { - reg = xe_gt_mcr_unicast_read_any(gt, XEHP_FLAT_CCS_BASE_ADDR); - offset = (u64)REG_FIELD_GET(XEHP_FLAT_CCS_PTR, reg) * SZ_64K; - } - - return offset; -} - -/** - * xe_mmio_tile_vram_size() - Collect vram size and offset information - * @tile: tile to get info for - * @vram_size: available vram (size - device reserved portions) - * @tile_size: actual vram size - * @tile_offset: physical start point in the vram address space - * - * There are 4 places for size information: - * - io size (from pci_resource_len of LMEM bar) (only used for small bar and DG1) - * - TILEx size (actual vram size) - * - GSMBASE offset (TILEx - "stolen") - * - CSSBASE offset (TILEx - CSS space necessary) - * - * CSSBASE is always a lower/smaller offset then GSMBASE. - * - * The actual available size of memory is to the CCS or GSM base. - * NOTE: multi-tile bases will include the tile offset. - * - */ -static int xe_mmio_tile_vram_size(struct xe_tile *tile, u64 *vram_size, - u64 *tile_size, u64 *tile_offset) -{ - struct xe_device *xe = tile_to_xe(tile); - struct xe_gt *gt = tile->primary_gt; - u64 offset; - int err; - u32 reg; - - err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); - if (err) - return err; - - /* actual size */ - if (unlikely(xe->info.platform == XE_DG1)) { - *tile_size = pci_resource_len(to_pci_dev(xe->drm.dev), LMEM_BAR); - *tile_offset = 0; - } else { - reg = xe_gt_mcr_unicast_read_any(gt, XEHP_TILE_ADDR_RANGE(gt->info.id)); - *tile_size = (u64)REG_FIELD_GET(GENMASK(14, 8), reg) * SZ_1G; - *tile_offset = (u64)REG_FIELD_GET(GENMASK(7, 1), reg) * SZ_1G; - } - - /* minus device usage */ - if (xe->info.has_flat_ccs) { - offset = get_flat_ccs_offset(gt, *tile_size); - } else { - offset = xe_mmio_read64_2x32(gt, GSMBASE); - } - - /* remove the tile offset so we have just the available size */ - *vram_size = offset - *tile_offset; - - return xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); -} -int xe_mmio_probe_vram(struct xe_device *xe) +static void tiles_fini(void *arg) { + struct xe_device *xe = arg; struct xe_tile *tile; - resource_size_t io_size; - u64 available_size = 0; - u64 total_size = 0; - u64 tile_offset; - u64 tile_size; - u64 vram_size; - int err; - u8 id; - - if (!IS_DGFX(xe)) - return 0; - - /* Get the size of the root tile's vram for later accessibility comparison */ - tile = xe_device_get_root_tile(xe); - err = xe_mmio_tile_vram_size(tile, &vram_size, &tile_size, &tile_offset); - if (err) - return err; - - err = xe_determine_lmem_bar_size(xe); - if (err) - return err; - - drm_info(&xe->drm, "VISIBLE VRAM: %pa, %pa\n", &xe->mem.vram.io_start, - &xe->mem.vram.io_size); - - io_size = xe->mem.vram.io_size; - - /* tile specific ranges */ - for_each_tile(tile, xe, id) { - err = xe_mmio_tile_vram_size(tile, &vram_size, &tile_size, &tile_offset); - if (err) - return err; - - tile->mem.vram.actual_physical_size = tile_size; - tile->mem.vram.io_start = xe->mem.vram.io_start + tile_offset; - tile->mem.vram.io_size = min_t(u64, vram_size, io_size); - - if (!tile->mem.vram.io_size) { - drm_err(&xe->drm, "Tile without any CPU visible VRAM. Aborting.\n"); - return -ENODEV; - } - - tile->mem.vram.dpa_base = xe->mem.vram.dpa_base + tile_offset; - tile->mem.vram.usable_size = vram_size; - tile->mem.vram.mapping = xe->mem.vram.mapping + tile_offset; - - if (tile->mem.vram.io_size < tile->mem.vram.usable_size) - drm_info(&xe->drm, "Small BAR device\n"); - drm_info(&xe->drm, "VRAM[%u, %u]: Actual physical size %pa, usable size exclude stolen %pa, CPU accessible size %pa\n", id, - tile->id, &tile->mem.vram.actual_physical_size, &tile->mem.vram.usable_size, &tile->mem.vram.io_size); - drm_info(&xe->drm, "VRAM[%u, %u]: DPA range: [%pa-%llx], io range: [%pa-%llx]\n", id, tile->id, - &tile->mem.vram.dpa_base, tile->mem.vram.dpa_base + (u64)tile->mem.vram.actual_physical_size, - &tile->mem.vram.io_start, tile->mem.vram.io_start + (u64)tile->mem.vram.io_size); - - /* calculate total size using tile size to get the correct HW sizing */ - total_size += tile_size; - available_size += vram_size; - - if (total_size > xe->mem.vram.io_size) { - drm_info(&xe->drm, "VRAM: %pa is larger than resource %pa\n", - &total_size, &xe->mem.vram.io_size); - } - - io_size -= min_t(u64, tile_size, io_size); - } - - xe->mem.vram.actual_physical_size = total_size; - - drm_info(&xe->drm, "Total VRAM: %pa, %pa\n", &xe->mem.vram.io_start, - &xe->mem.vram.actual_physical_size); - drm_info(&xe->drm, "Available VRAM: %pa, %pa\n", &xe->mem.vram.io_start, - &available_size); + int id; - return 0; + for_each_tile(tile, xe, id) + tile->mmio.regs = NULL; } -void xe_mmio_probe_tiles(struct xe_device *xe) +int xe_mmio_probe_tiles(struct xe_device *xe) { size_t tile_mmio_size = SZ_16M, tile_mmio_ext_size = xe->info.tile_mmio_ext_size; u8 id, tile_count = xe->info.tile_count; @@ -384,15 +83,16 @@ add_mmio_ext: regs += tile_mmio_ext_size; } } + + return devm_add_action_or_reset(xe->drm.dev, tiles_fini, xe); } -static void mmio_fini(struct drm_device *drm, void *arg) +static void mmio_fini(void *arg) { struct xe_device *xe = arg; pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs); - if (xe->mem.vram.mapping) - iounmap(xe->mem.vram.mapping); + xe->mmio.regs = NULL; } int xe_mmio_init(struct xe_device *xe) @@ -407,7 +107,7 @@ int xe_mmio_init(struct xe_device *xe) * registers (0-4MB), reserved space (4MB-8MB) and GGTT (8MB-16MB). */ xe->mmio.size = pci_resource_len(pdev, mmio_bar); - xe->mmio.regs = pci_iomap(pdev, mmio_bar, 0); + xe->mmio.regs = pci_iomap(pdev, mmio_bar, GTTMMADR_BAR); if (xe->mmio.regs == NULL) { drm_err(&xe->drm, "failed to map registers\n"); return -EIO; @@ -417,47 +117,42 @@ int xe_mmio_init(struct xe_device *xe) root_tile->mmio.size = SZ_16M; root_tile->mmio.regs = xe->mmio.regs; - return drmm_add_action_or_reset(&xe->drm, mmio_fini, xe); + return devm_add_action_or_reset(xe->drm.dev, mmio_fini, xe); } u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg) { struct xe_tile *tile = gt_to_tile(gt); + u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); - if (reg.addr < gt->mmio.adj_limit) - reg.addr += gt->mmio.adj_offset; - - return readb((reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + reg.addr); + return readb((reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + addr); } u16 xe_mmio_read16(struct xe_gt *gt, struct xe_reg reg) { struct xe_tile *tile = gt_to_tile(gt); + u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); - if (reg.addr < gt->mmio.adj_limit) - reg.addr += gt->mmio.adj_offset; - - return readw((reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + reg.addr); + return readw((reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + addr); } void xe_mmio_write32(struct xe_gt *gt, struct xe_reg reg, u32 val) { struct xe_tile *tile = gt_to_tile(gt); + u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); - if (reg.addr < gt->mmio.adj_limit) - reg.addr += gt->mmio.adj_offset; - - writel(val, (reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + reg.addr); + writel(val, (reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + addr); } u32 xe_mmio_read32(struct xe_gt *gt, struct xe_reg reg) { struct xe_tile *tile = gt_to_tile(gt); + u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); - if (reg.addr < gt->mmio.adj_limit) - reg.addr += gt->mmio.adj_offset; + if (!reg.vf && IS_SRIOV_VF(gt_to_xe(gt))) + return xe_gt_sriov_vf_read32(gt, reg); - return readl((reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + reg.addr); + return readl((reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + addr); } u32 xe_mmio_rmw32(struct xe_gt *gt, struct xe_reg reg, u32 clr, u32 set) @@ -486,10 +181,9 @@ bool xe_mmio_in_range(const struct xe_gt *gt, const struct xe_mmio_range *range, struct xe_reg reg) { - if (reg.addr < gt->mmio.adj_limit) - reg.addr += gt->mmio.adj_offset; + u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); - return range && reg.addr >= range->start && reg.addr <= range->end; + return range && addr >= range->start && addr <= range->end; } /** @@ -519,10 +213,11 @@ u64 xe_mmio_read64_2x32(struct xe_gt *gt, struct xe_reg reg) struct xe_reg reg_udw = { .addr = reg.addr + 0x4 }; u32 ldw, udw, oldudw, retries; - if (reg.addr < gt->mmio.adj_limit) { - reg.addr += gt->mmio.adj_offset; - reg_udw.addr += gt->mmio.adj_offset; - } + reg.addr = xe_mmio_adjusted_addr(gt, reg.addr); + reg_udw.addr = xe_mmio_adjusted_addr(gt, reg_udw.addr); + + /* we shouldn't adjust just one register address */ + xe_gt_assert(gt, reg_udw.addr == reg.addr + 0x4); oldudw = xe_mmio_read32(gt, reg_udw); for (retries = 5; retries; --retries) { @@ -601,3 +296,64 @@ int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask, u32 val, u32 t return ret; } + +/** + * xe_mmio_wait32_not() - Wait for a register to return anything other than the given masked value + * @gt: MMIO target GT + * @reg: register to read value from + * @mask: mask to be applied to the value read from the register + * @val: value to match after applying the mask + * @timeout_us: time out after this period of time. Wait logic tries to be + * smart, applying an exponential backoff until @timeout_us is reached. + * @out_val: if not NULL, points where to store the last unmasked value + * @atomic: needs to be true if calling from an atomic context + * + * This function polls for a masked value to change from a given value and + * returns zero on success or -ETIMEDOUT if timed out. + * + * Note that @timeout_us represents the minimum amount of time to wait before + * giving up. The actual time taken by this function can be a little more than + * @timeout_us for different reasons, specially in non-atomic contexts. Thus, + * it is possible that this function succeeds even after @timeout_us has passed. + */ +int xe_mmio_wait32_not(struct xe_gt *gt, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us, + u32 *out_val, bool atomic) +{ + ktime_t cur = ktime_get_raw(); + const ktime_t end = ktime_add_us(cur, timeout_us); + int ret = -ETIMEDOUT; + s64 wait = 10; + u32 read; + + for (;;) { + read = xe_mmio_read32(gt, reg); + if ((read & mask) != val) { + ret = 0; + break; + } + + cur = ktime_get_raw(); + if (!ktime_before(cur, end)) + break; + + if (ktime_after(ktime_add_us(cur, wait), end)) + wait = ktime_us_delta(end, cur); + + if (atomic) + udelay(wait); + else + usleep_range(wait, wait << 1); + wait <<= 1; + } + + if (ret != 0) { + read = xe_mmio_read32(gt, reg); + if ((read & mask) != val) + ret = 0; + } + + if (out_val) + *out_val = read; + + return ret; +} diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h index a3cd7b3036c7..6ae0cc32c651 100644 --- a/drivers/gpu/drm/xe/xe_mmio.h +++ b/drivers/gpu/drm/xe/xe_mmio.h @@ -6,22 +6,13 @@ #ifndef _XE_MMIO_H_ #define _XE_MMIO_H_ -#include <linux/delay.h> -#include <linux/io-64-nonatomic-lo-hi.h> - -#include "regs/xe_reg_defs.h" -#include "xe_device_types.h" -#include "xe_gt_printk.h" #include "xe_gt_types.h" -struct drm_device; -struct drm_file; struct xe_device; - -#define LMEM_BAR 2 +struct xe_reg; int xe_mmio_init(struct xe_device *xe); -void xe_mmio_probe_tiles(struct xe_device *xe); +int xe_mmio_probe_tiles(struct xe_device *xe); u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg); u16 xe_mmio_read16(struct xe_gt *gt, struct xe_reg reg); @@ -35,5 +26,14 @@ int xe_mmio_probe_vram(struct xe_device *xe); u64 xe_mmio_read64_2x32(struct xe_gt *gt, struct xe_reg reg); int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us, u32 *out_val, bool atomic); +int xe_mmio_wait32_not(struct xe_gt *gt, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us, + u32 *out_val, bool atomic); + +static inline u32 xe_mmio_adjusted_addr(const struct xe_gt *gt, u32 addr) +{ + if (addr < gt->mmio.adj_limit) + addr += gt->mmio.adj_offset; + return addr; +} #endif diff --git a/drivers/gpu/drm/xe/xe_mocs.c b/drivers/gpu/drm/xe/xe_mocs.c index 1e92f8ee07ba..de3f2d3f1b04 100644 --- a/drivers/gpu/drm/xe/xe_mocs.c +++ b/drivers/gpu/drm/xe/xe_mocs.c @@ -9,10 +9,13 @@ #include "xe_bo.h" #include "xe_device.h" #include "xe_exec_queue.h" +#include "xe_force_wake.h" #include "xe_gt.h" #include "xe_gt_mcr.h" +#include "xe_gt_printk.h" #include "xe_mmio.h" #include "xe_platform_types.h" +#include "xe_pm.h" #include "xe_sriov.h" #include "xe_step_types.h" @@ -36,34 +39,23 @@ struct xe_mocs_entry { u16 used; }; +struct xe_mocs_info; + +struct xe_mocs_ops { + void (*dump)(struct xe_mocs_info *mocs, unsigned int flags, + struct xe_gt *gt, struct drm_printer *p); +}; + struct xe_mocs_info { unsigned int size; unsigned int n_entries; const struct xe_mocs_entry *table; + const struct xe_mocs_ops *ops; u8 uc_index; u8 wb_index; u8 unused_entries_index; }; -/* Defines for the tables (XXX_MOCS_0 - XXX_MOCS_63) */ -#define _LE_CACHEABILITY(value) ((value) << 0) -#define _LE_TGT_CACHE(value) ((value) << 2) -#define LE_LRUM(value) ((value) << 4) -#define LE_AOM(value) ((value) << 6) -#define LE_RSC(value) ((value) << 7) -#define LE_SCC(value) ((value) << 8) -#define LE_PFM(value) ((value) << 11) -#define LE_SCF(value) ((value) << 14) -#define LE_COS(value) ((value) << 15) -#define LE_SSE(value) ((value) << 17) - -/* Defines for the tables (LNCFMOCS0 - LNCFMOCS31) - two entries per word */ -#define L3_ESC(value) ((value) << 0) -#define L3_SCC(value) ((value) << 1) -#define _L3_CACHEABILITY(value) ((value) << 4) -#define L3_GLBGO(value) ((value) << 6) -#define L3_LKUP(value) ((value) << 7) - /* Defines for the tables (GLOB_MOCS_0 - GLOB_MOCS_16) */ #define IG_PAT REG_BIT(8) #define L3_CACHE_POLICY_MASK REG_GENMASK(5, 4) @@ -80,22 +72,22 @@ struct xe_mocs_info { * Note: LE_0_PAGETABLE works only up to Gen11; for newer gens it means * the same as LE_UC */ -#define LE_0_PAGETABLE _LE_CACHEABILITY(0) -#define LE_1_UC _LE_CACHEABILITY(1) -#define LE_2_WT _LE_CACHEABILITY(2) -#define LE_3_WB _LE_CACHEABILITY(3) +#define LE_0_PAGETABLE LE_CACHEABILITY(0) +#define LE_1_UC LE_CACHEABILITY(1) +#define LE_2_WT LE_CACHEABILITY(2) +#define LE_3_WB LE_CACHEABILITY(3) /* Target cache */ -#define LE_TC_0_PAGETABLE _LE_TGT_CACHE(0) -#define LE_TC_1_LLC _LE_TGT_CACHE(1) -#define LE_TC_2_LLC_ELLC _LE_TGT_CACHE(2) -#define LE_TC_3_LLC_ELLC_ALT _LE_TGT_CACHE(3) +#define LE_TC_0_PAGETABLE LE_TGT_CACHE(0) +#define LE_TC_1_LLC LE_TGT_CACHE(1) +#define LE_TC_2_LLC_ELLC LE_TGT_CACHE(2) +#define LE_TC_3_LLC_ELLC_ALT LE_TGT_CACHE(3) /* L3 caching options */ -#define L3_0_DIRECT _L3_CACHEABILITY(0) -#define L3_1_UC _L3_CACHEABILITY(1) -#define L3_2_RESERVED _L3_CACHEABILITY(2) -#define L3_3_WB _L3_CACHEABILITY(3) +#define L3_0_DIRECT L3_CACHEABILITY(0) +#define L3_1_UC L3_CACHEABILITY(1) +#define L3_2_RESERVED L3_CACHEABILITY(2) +#define L3_3_WB L3_CACHEABILITY(3) /* L4 caching options */ #define L4_0_WB REG_FIELD_PREP(L4_CACHE_POLICY_MASK, 0) @@ -107,6 +99,8 @@ struct xe_mocs_info { #define XE2_L3_1_XD REG_FIELD_PREP(L3_CACHE_POLICY_MASK, 1) #define XE2_L3_3_UC REG_FIELD_PREP(L3_CACHE_POLICY_MASK, 3) +#define XE2_L3_CLOS_MASK REG_GENMASK(7, 6) + #define MOCS_ENTRY(__idx, __control_value, __l3cc_value) \ [__idx] = { \ .control_value = __control_value, \ @@ -255,6 +249,84 @@ static const struct xe_mocs_entry gen12_mocs_desc[] = { L3_1_UC) }; +static bool regs_are_mcr(struct xe_gt *gt) +{ + struct xe_device *xe = gt_to_xe(gt); + + if (xe_gt_is_media_type(gt)) + return MEDIA_VER(xe) >= 20; + else + return GRAPHICS_VERx100(xe) >= 1250; +} + +static void xelp_lncf_dump(struct xe_mocs_info *info, struct xe_gt *gt, struct drm_printer *p) +{ + unsigned int i, j; + u32 reg_val; + + drm_printf(p, "LNCFCMOCS[idx] = [ESC, SCC, L3CC] (value)\n\n"); + + for (i = 0, j = 0; i < (info->n_entries + 1) / 2; i++, j++) { + if (regs_are_mcr(gt)) + reg_val = xe_gt_mcr_unicast_read_any(gt, XEHP_LNCFCMOCS(i)); + else + reg_val = xe_mmio_read32(gt, XELP_LNCFCMOCS(i)); + + drm_printf(p, "LNCFCMOCS[%2d] = [%u, %u, %u] (%#8x)\n", + j++, + !!(reg_val & L3_ESC_MASK), + REG_FIELD_GET(L3_SCC_MASK, reg_val), + REG_FIELD_GET(L3_CACHEABILITY_MASK, reg_val), + reg_val); + + drm_printf(p, "LNCFCMOCS[%2d] = [%u, %u, %u] (%#8x)\n", + j, + !!(reg_val & L3_UPPER_IDX_ESC_MASK), + REG_FIELD_GET(L3_UPPER_IDX_SCC_MASK, reg_val), + REG_FIELD_GET(L3_UPPER_IDX_CACHEABILITY_MASK, reg_val), + reg_val); + } +} + +static void xelp_mocs_dump(struct xe_mocs_info *info, unsigned int flags, + struct xe_gt *gt, struct drm_printer *p) +{ + unsigned int i; + u32 reg_val; + + if (flags & HAS_GLOBAL_MOCS) { + drm_printf(p, "Global mocs table configuration:\n"); + drm_printf(p, "GLOB_MOCS[idx] = [LeCC, TC, LRUM, AOM, RSC, SCC, PFM, SCF, CoS, SSE] (value)\n\n"); + + for (i = 0; i < info->n_entries; i++) { + if (regs_are_mcr(gt)) + reg_val = xe_gt_mcr_unicast_read_any(gt, XEHP_GLOBAL_MOCS(i)); + else + reg_val = xe_mmio_read32(gt, XELP_GLOBAL_MOCS(i)); + + drm_printf(p, "GLOB_MOCS[%2d] = [%u, %u, %u, %u, %u, %u, %u, %u, %u, %u ] (%#8x)\n", + i, + REG_FIELD_GET(LE_CACHEABILITY_MASK, reg_val), + REG_FIELD_GET(LE_TGT_CACHE_MASK, reg_val), + REG_FIELD_GET(LE_LRUM_MASK, reg_val), + !!(reg_val & LE_AOM_MASK), + !!(reg_val & LE_RSC_MASK), + REG_FIELD_GET(LE_SCC_MASK, reg_val), + REG_FIELD_GET(LE_PFM_MASK, reg_val), + !!(reg_val & LE_SCF_MASK), + REG_FIELD_GET(LE_COS_MASK, reg_val), + REG_FIELD_GET(LE_SSE_MASK, reg_val), + reg_val); + } + } + + xelp_lncf_dump(info, gt, p); +} + +static const struct xe_mocs_ops xelp_mocs_ops = { + .dump = xelp_mocs_dump, +}; + static const struct xe_mocs_entry dg1_mocs_desc[] = { /* UC */ MOCS_ENTRY(1, 0, L3_1_UC), @@ -291,6 +363,40 @@ static const struct xe_mocs_entry dg2_mocs_desc[] = { MOCS_ENTRY(3, 0, L3_3_WB | L3_LKUP(1)), }; +static void xehp_lncf_dump(struct xe_mocs_info *info, unsigned int flags, + struct xe_gt *gt, struct drm_printer *p) +{ + unsigned int i, j; + u32 reg_val; + + drm_printf(p, "LNCFCMOCS[idx] = [UCL3LOOKUP, GLBGO, L3CC] (value)\n\n"); + + for (i = 0, j = 0; i < (info->n_entries + 1) / 2; i++, j++) { + if (regs_are_mcr(gt)) + reg_val = xe_gt_mcr_unicast_read_any(gt, XEHP_LNCFCMOCS(i)); + else + reg_val = xe_mmio_read32(gt, XELP_LNCFCMOCS(i)); + + drm_printf(p, "LNCFCMOCS[%2d] = [%u, %u, %u] (%#8x)\n", + j++, + !!(reg_val & L3_LKUP_MASK), + !!(reg_val & L3_GLBGO_MASK), + REG_FIELD_GET(L3_CACHEABILITY_MASK, reg_val), + reg_val); + + drm_printf(p, "LNCFCMOCS[%2d] = [%u, %u, %u] (%#8x)\n", + j, + !!(reg_val & L3_UPPER_LKUP_MASK), + !!(reg_val & L3_UPPER_GLBGO_MASK), + REG_FIELD_GET(L3_UPPER_IDX_CACHEABILITY_MASK, reg_val), + reg_val); + } +} + +static const struct xe_mocs_ops xehp_mocs_ops = { + .dump = xehp_lncf_dump, +}; + static const struct xe_mocs_entry pvc_mocs_desc[] = { /* Error */ MOCS_ENTRY(0, 0, L3_3_WB), @@ -302,6 +408,36 @@ static const struct xe_mocs_entry pvc_mocs_desc[] = { MOCS_ENTRY(2, 0, L3_3_WB), }; +static void pvc_mocs_dump(struct xe_mocs_info *info, unsigned int flags, struct xe_gt *gt, + struct drm_printer *p) +{ + unsigned int i, j; + u32 reg_val; + + drm_printf(p, "LNCFCMOCS[idx] = [ L3CC ] (value)\n\n"); + + for (i = 0, j = 0; i < (info->n_entries + 1) / 2; i++, j++) { + if (regs_are_mcr(gt)) + reg_val = xe_gt_mcr_unicast_read_any(gt, XEHP_LNCFCMOCS(i)); + else + reg_val = xe_mmio_read32(gt, XELP_LNCFCMOCS(i)); + + drm_printf(p, "LNCFCMOCS[%2d] = [ %u ] (%#8x)\n", + j++, + REG_FIELD_GET(L3_CACHEABILITY_MASK, reg_val), + reg_val); + + drm_printf(p, "LNCFCMOCS[%2d] = [ %u ] (%#8x)\n", + j, + REG_FIELD_GET(L3_UPPER_IDX_CACHEABILITY_MASK, reg_val), + reg_val); + } +} + +static const struct xe_mocs_ops pvc_mocs_ops = { + .dump = pvc_mocs_dump, +}; + static const struct xe_mocs_entry mtl_mocs_desc[] = { /* Error - Reserved for Non-Use */ MOCS_ENTRY(0, @@ -353,6 +489,36 @@ static const struct xe_mocs_entry mtl_mocs_desc[] = { L3_GLBGO(1) | L3_1_UC), }; +static void mtl_mocs_dump(struct xe_mocs_info *info, unsigned int flags, + struct xe_gt *gt, struct drm_printer *p) +{ + unsigned int i; + u32 reg_val; + + drm_printf(p, "Global mocs table configuration:\n"); + drm_printf(p, "GLOB_MOCS[idx] = [IG_PAT, L4_CACHE_POLICY] (value)\n\n"); + + for (i = 0; i < info->n_entries; i++) { + if (regs_are_mcr(gt)) + reg_val = xe_gt_mcr_unicast_read_any(gt, XEHP_GLOBAL_MOCS(i)); + else + reg_val = xe_mmio_read32(gt, XELP_GLOBAL_MOCS(i)); + + drm_printf(p, "GLOB_MOCS[%2d] = [%u, %u] (%#8x)\n", + i, + !!(reg_val & IG_PAT), + REG_FIELD_GET(L4_CACHE_POLICY_MASK, reg_val), + reg_val); + } + + /* MTL lncf mocs table pattern is similar to that of xehp */ + xehp_lncf_dump(info, flags, gt, p); +} + +static const struct xe_mocs_ops mtl_mocs_ops = { + .dump = mtl_mocs_dump, +}; + static const struct xe_mocs_entry xe2_mocs_table[] = { /* Defer to PAT */ MOCS_ENTRY(0, XE2_L3_0_WB | L4_3_UC, 0), @@ -366,6 +532,34 @@ static const struct xe_mocs_entry xe2_mocs_table[] = { MOCS_ENTRY(4, IG_PAT | XE2_L3_0_WB | L4_0_WB, 0), }; +static void xe2_mocs_dump(struct xe_mocs_info *info, unsigned int flags, + struct xe_gt *gt, struct drm_printer *p) +{ + unsigned int i; + u32 reg_val; + + drm_printf(p, "Global mocs table configuration:\n"); + drm_printf(p, "GLOB_MOCS[idx] = [IG_PAT, L3_CLOS, L3_CACHE_POLICY, L4_CACHE_POLICY] (value)\n\n"); + + for (i = 0; i < info->n_entries; i++) { + if (regs_are_mcr(gt)) + reg_val = xe_gt_mcr_unicast_read_any(gt, XEHP_GLOBAL_MOCS(i)); + else + reg_val = xe_mmio_read32(gt, XELP_GLOBAL_MOCS(i)); + + drm_printf(p, "GLOB_MOCS[%2d] = [%u, %u, %u] (%#8x)\n", + i, + !!(reg_val & IG_PAT), + REG_FIELD_GET(XE2_L3_CLOS_MASK, reg_val), + REG_FIELD_GET(L4_CACHE_POLICY_MASK, reg_val), + reg_val); + } +} + +static const struct xe_mocs_ops xe2_mocs_ops = { + .dump = xe2_mocs_dump, +}; + static unsigned int get_mocs_settings(struct xe_device *xe, struct xe_mocs_info *info) { @@ -376,6 +570,7 @@ static unsigned int get_mocs_settings(struct xe_device *xe, switch (xe->info.platform) { case XE_LUNARLAKE: case XE_BATTLEMAGE: + info->ops = &xe2_mocs_ops; info->size = ARRAY_SIZE(xe2_mocs_table); info->table = xe2_mocs_table; info->n_entries = XE2_NUM_MOCS_ENTRIES; @@ -384,6 +579,7 @@ static unsigned int get_mocs_settings(struct xe_device *xe, info->unused_entries_index = 4; break; case XE_PVC: + info->ops = &pvc_mocs_ops; info->size = ARRAY_SIZE(pvc_mocs_desc); info->table = pvc_mocs_desc; info->n_entries = PVC_NUM_MOCS_ENTRIES; @@ -392,6 +588,7 @@ static unsigned int get_mocs_settings(struct xe_device *xe, info->unused_entries_index = 2; break; case XE_METEORLAKE: + info->ops = &mtl_mocs_ops; info->size = ARRAY_SIZE(mtl_mocs_desc); info->table = mtl_mocs_desc; info->n_entries = MTL_NUM_MOCS_ENTRIES; @@ -399,6 +596,7 @@ static unsigned int get_mocs_settings(struct xe_device *xe, info->unused_entries_index = 1; break; case XE_DG2: + info->ops = &xehp_mocs_ops; info->size = ARRAY_SIZE(dg2_mocs_desc); info->table = dg2_mocs_desc; info->uc_index = 1; @@ -410,6 +608,7 @@ static unsigned int get_mocs_settings(struct xe_device *xe, info->unused_entries_index = 3; break; case XE_DG1: + info->ops = &xelp_mocs_ops; info->size = ARRAY_SIZE(dg1_mocs_desc); info->table = dg1_mocs_desc; info->uc_index = 1; @@ -421,6 +620,7 @@ static unsigned int get_mocs_settings(struct xe_device *xe, case XE_ALDERLAKE_S: case XE_ALDERLAKE_P: case XE_ALDERLAKE_N: + info->ops = &xelp_mocs_ops; info->size = ARRAY_SIZE(gen12_mocs_desc); info->table = gen12_mocs_desc; info->n_entries = XELP_NUM_MOCS_ENTRIES; @@ -442,6 +642,8 @@ static unsigned int get_mocs_settings(struct xe_device *xe, */ xe_assert(xe, info->unused_entries_index != 0); + xe_assert(xe, !info->ops || info->ops->dump); + if (XE_WARN_ON(info->size > info->n_entries)) { info->table = NULL; return 0; @@ -467,16 +669,6 @@ static u32 get_entry_control(const struct xe_mocs_info *info, return info->table[info->unused_entries_index].control_value; } -static bool regs_are_mcr(struct xe_gt *gt) -{ - struct xe_device *xe = gt_to_xe(gt); - - if (xe_gt_is_media_type(gt)) - return MEDIA_VER(xe) >= 20; - else - return GRAPHICS_VERx100(xe) >= 1250; -} - static void __init_mocs_table(struct xe_gt *gt, const struct xe_mocs_info *info) { @@ -578,6 +770,33 @@ void xe_mocs_init(struct xe_gt *gt) init_l3cc_table(gt, &table); } +void xe_mocs_dump(struct xe_gt *gt, struct drm_printer *p) +{ + struct xe_mocs_info table; + unsigned int flags; + u32 ret; + struct xe_device *xe = gt_to_xe(gt); + + flags = get_mocs_settings(xe, &table); + + if (!table.ops->dump) + return; + + xe_pm_runtime_get_noresume(xe); + ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); + + if (ret) + goto err_fw; + + table.ops->dump(&table, flags, gt, p); + + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); + +err_fw: + xe_assert(xe, !ret); + xe_pm_runtime_put(xe); +} + #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST) #include "tests/xe_mocs.c" #endif diff --git a/drivers/gpu/drm/xe/xe_mocs.h b/drivers/gpu/drm/xe/xe_mocs.h index 053754c5a94e..dc972ffd4d07 100644 --- a/drivers/gpu/drm/xe/xe_mocs.h +++ b/drivers/gpu/drm/xe/xe_mocs.h @@ -6,12 +6,17 @@ #ifndef _XE_MOCS_H_ #define _XE_MOCS_H_ -#include <linux/types.h> - -struct xe_exec_queue; +struct drm_printer; struct xe_gt; void xe_mocs_init_early(struct xe_gt *gt); void xe_mocs_init(struct xe_gt *gt); +/** + * xe_mocs_dump - Dump mocs table + * @gt: GT structure + * @p: Printer to dump info to + */ +void xe_mocs_dump(struct xe_gt *gt, struct drm_printer *p); + #endif diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c index ceb8345cbca6..3edeb30d5ccb 100644 --- a/drivers/gpu/drm/xe/xe_module.c +++ b/drivers/gpu/drm/xe/xe_module.c @@ -17,6 +17,7 @@ struct xe_modparam xe_modparam = { .enable_display = true, .guc_log_level = 5, .force_probe = CONFIG_DRM_XE_FORCE_PROBE, + .wedged_mode = 1, /* the rest are 0 by default */ }; @@ -55,6 +56,10 @@ MODULE_PARM_DESC(max_vfs, "(0 = no VFs [default]; N = allow up to N VFs)"); #endif +module_param_named_unsafe(wedged_mode, xe_modparam.wedged_mode, int, 0600); +MODULE_PARM_DESC(wedged_mode, + "Module's default policy for the wedged mode - 0=never, 1=upon-critical-errors[default], 2=upon-any-hang"); + struct init_funcs { int (*init)(void); void (*exit)(void); diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h index b369984f08ec..61a0d28a28c8 100644 --- a/drivers/gpu/drm/xe/xe_module.h +++ b/drivers/gpu/drm/xe/xe_module.h @@ -21,6 +21,7 @@ struct xe_modparam { #ifdef CONFIG_PCI_IOV unsigned int max_vfs; #endif + int wedged_mode; }; extern struct xe_modparam xe_modparam; diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c index d5b516f115ad..4ee32ee1cc88 100644 --- a/drivers/gpu/drm/xe/xe_pat.c +++ b/drivers/gpu/drm/xe/xe_pat.c @@ -10,6 +10,7 @@ #include "regs/xe_reg_defs.h" #include "xe_assert.h" #include "xe_device.h" +#include "xe_force_wake.h" #include "xe_gt.h" #include "xe_gt_mcr.h" #include "xe_mmio.h" diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c index f326dbb1cecd..e84da0cbb8e9 100644 --- a/drivers/gpu/drm/xe/xe_pci.c +++ b/drivers/gpu/drm/xe/xe_pci.c @@ -20,9 +20,12 @@ #include "xe_device.h" #include "xe_drv.h" #include "xe_gt.h" +#include "xe_gt_sriov_vf.h" +#include "xe_guc.h" #include "xe_macros.h" #include "xe_mmio.h" #include "xe_module.h" +#include "xe_pci_sriov.h" #include "xe_pci_types.h" #include "xe_pm.h" #include "xe_sriov.h" @@ -40,12 +43,6 @@ struct xe_subplatform_desc { const u16 *pciidlist; }; -struct xe_gt_desc { - enum xe_gt_type type; - u32 mmio_adj_limit; - u32 mmio_adj_offset; -}; - struct xe_device_desc { /* Should only ever be set for platforms without GMD_ID */ const struct xe_graphics_desc *graphics; @@ -74,7 +71,7 @@ __diag_push(); __diag_ignore_all("-Woverride-init", "Allow field overrides in table"); #define PLATFORM(x) \ - .platform = (x), \ + .platform = XE_##x, \ .platform_name = #x #define NOP(x) x @@ -146,6 +143,7 @@ static const struct xe_graphics_desc graphics_xehpc = { .vram_flags = XE_VRAM_FLAGS_NEED64K, .has_asid = 1, + .has_atomic_enable_pte_bit = 1, .has_flat_ccs = 0, .has_usm = 1, }; @@ -163,7 +161,9 @@ static const struct xe_graphics_desc graphics_xelpg = { #define XE2_GFX_FEATURES \ .dma_mask_size = 46, \ .has_asid = 1, \ + .has_atomic_enable_pte_bit = 1, \ .has_flat_ccs = 1, \ + .has_indirect_ring_state = 1, \ .has_range_tlb_invalidation = 1, \ .has_usm = 1, \ .va_bits = 48, \ @@ -211,13 +211,14 @@ static const struct xe_media_desc media_xe2 = { .name = "Xe2_LPM / Xe2_HPM", .hw_engine_mask = GENMASK(XE_HW_ENGINE_VCS7, XE_HW_ENGINE_VCS0) | - GENMASK(XE_HW_ENGINE_VECS3, XE_HW_ENGINE_VECS0), /* TODO: GSC0 */ + GENMASK(XE_HW_ENGINE_VECS3, XE_HW_ENGINE_VECS0) | + BIT(XE_HW_ENGINE_GSCCS0) }; static const struct xe_device_desc tgl_desc = { .graphics = &graphics_xelp, .media = &media_xem, - PLATFORM(XE_TIGERLAKE), + PLATFORM(TIGERLAKE), .has_display = true, .has_llc = true, .require_force_probe = true, @@ -226,7 +227,7 @@ static const struct xe_device_desc tgl_desc = { static const struct xe_device_desc rkl_desc = { .graphics = &graphics_xelp, .media = &media_xem, - PLATFORM(XE_ROCKETLAKE), + PLATFORM(ROCKETLAKE), .has_display = true, .has_llc = true, .require_force_probe = true, @@ -237,7 +238,7 @@ static const u16 adls_rpls_ids[] = { XE_RPLS_IDS(NOP), 0 }; static const struct xe_device_desc adl_s_desc = { .graphics = &graphics_xelp, .media = &media_xem, - PLATFORM(XE_ALDERLAKE_S), + PLATFORM(ALDERLAKE_S), .has_display = true, .has_llc = true, .require_force_probe = true, @@ -252,7 +253,7 @@ static const u16 adlp_rplu_ids[] = { XE_RPLU_IDS(NOP), 0 }; static const struct xe_device_desc adl_p_desc = { .graphics = &graphics_xelp, .media = &media_xem, - PLATFORM(XE_ALDERLAKE_P), + PLATFORM(ALDERLAKE_P), .has_display = true, .has_llc = true, .require_force_probe = true, @@ -265,7 +266,7 @@ static const struct xe_device_desc adl_p_desc = { static const struct xe_device_desc adl_n_desc = { .graphics = &graphics_xelp, .media = &media_xem, - PLATFORM(XE_ALDERLAKE_N), + PLATFORM(ALDERLAKE_N), .has_display = true, .has_llc = true, .require_force_probe = true, @@ -278,7 +279,7 @@ static const struct xe_device_desc dg1_desc = { .graphics = &graphics_xelpp, .media = &media_xem, DGFX_FEATURES, - PLATFORM(XE_DG1), + PLATFORM(DG1), .has_display = true, .has_heci_gscfi = 1, .require_force_probe = true, @@ -290,7 +291,7 @@ static const u16 dg2_g12_ids[] = { XE_DG2_G12_IDS(NOP), 0 }; #define DG2_FEATURES \ DGFX_FEATURES, \ - PLATFORM(XE_DG2), \ + PLATFORM(DG2), \ .has_heci_gscfi = 1, \ .subplatforms = (const struct xe_subplatform_desc[]) { \ { XE_SUBPLATFORM_DG2_G10, "G10", dg2_g10_ids }, \ @@ -320,7 +321,7 @@ static const struct xe_device_desc dg2_desc = { static const __maybe_unused struct xe_device_desc pvc_desc = { .graphics = &graphics_xehpc, DGFX_FEATURES, - PLATFORM(XE_PVC), + PLATFORM(PVC), .has_display = false, .has_heci_gscfi = 1, .require_force_probe = true, @@ -329,19 +330,19 @@ static const __maybe_unused struct xe_device_desc pvc_desc = { static const struct xe_device_desc mtl_desc = { /* .graphics and .media determined via GMD_ID */ .require_force_probe = true, - PLATFORM(XE_METEORLAKE), + PLATFORM(METEORLAKE), .has_display = true, }; static const struct xe_device_desc lnl_desc = { - PLATFORM(XE_LUNARLAKE), + PLATFORM(LUNARLAKE), .has_display = true, .require_force_probe = true, }; static const struct xe_device_desc bmg_desc __maybe_unused = { DGFX_FEATURES, - PLATFORM(XE_BATTLEMAGE), + PLATFORM(BATTLEMAGE), .require_force_probe = true, }; @@ -470,10 +471,52 @@ static void read_gmdid(struct xe_device *xe, enum xe_gmdid_type type, u32 *ver, KUNIT_STATIC_STUB_REDIRECT(read_gmdid, xe, type, ver, revid); - if (type == GMDID_MEDIA) - gmdid_reg.addr += MEDIA_GT_GSI_OFFSET; + if (IS_SRIOV_VF(xe)) { + /* + * To get the value of the GMDID register, VFs must obtain it + * from the GuC using MMIO communication. + * + * Note that at this point the xe_gt is not fully uninitialized + * and only basic access to MMIO registers is possible. To use + * our existing GuC communication functions we must perform at + * least basic xe_gt and xe_guc initialization. + * + * Since to obtain the value of GMDID_MEDIA we need to use the + * media GuC, temporarly tweak the gt type. + */ + xe_gt_assert(gt, gt->info.type == XE_GT_TYPE_UNINITIALIZED); + + if (type == GMDID_MEDIA) { + gt->info.id = 1; + gt->info.type = XE_GT_TYPE_MEDIA; + } else { + gt->info.id = 0; + gt->info.type = XE_GT_TYPE_MAIN; + } + + xe_guc_comm_init_early(>->uc.guc); + + /* Don't bother with GMDID if failed to negotiate the GuC ABI */ + val = xe_gt_sriov_vf_bootstrap(gt) ? 0 : xe_gt_sriov_vf_gmdid(gt); + + /* + * Only undo xe_gt.info here, the remaining changes made above + * will be overwritten as part of the regular initialization. + */ + gt->info.id = 0; + gt->info.type = XE_GT_TYPE_UNINITIALIZED; + } else { + /* + * We need to apply the GSI offset explicitly here as at this + * point the xe_gt is not fully uninitialized and only basic + * access to MMIO registers is possible. + */ + if (type == GMDID_MEDIA) + gmdid_reg.addr += MEDIA_GT_GSI_OFFSET; + + val = xe_mmio_read32(gt, gmdid_reg); + } - val = xe_mmio_read32(gt, gmdid_reg); *ver = REG_FIELD_GET(GMD_ID_ARCH_MASK, val) * 100 + REG_FIELD_GET(GMD_ID_RELEASE_MASK, val); *revid = REG_FIELD_GET(GMD_ID_REVID, val); } @@ -554,6 +597,7 @@ static int xe_info_init_early(struct xe_device *xe, { int err; + xe->info.platform_name = desc->platform_name; xe->info.platform = desc->platform; xe->info.subplatform = subplatform_desc ? subplatform_desc->subplatform : XE_SUBPLATFORM_NONE; @@ -628,6 +672,9 @@ static int xe_info_init(struct xe_device *xe, xe->info.va_bits = graphics_desc->va_bits; xe->info.vm_max_level = graphics_desc->vm_max_level; xe->info.has_asid = graphics_desc->has_asid; + xe->info.has_atomic_enable_pte_bit = graphics_desc->has_atomic_enable_pte_bit; + if (xe->info.platform != XE_PVC) + xe->info.has_device_atomics_on_smem = 1; xe->info.has_flat_ccs = graphics_desc->has_flat_ccs; xe->info.has_range_tlb_invalidation = graphics_desc->has_range_tlb_invalidation; xe->info.has_usm = graphics_desc->has_usm; @@ -655,9 +702,10 @@ static int xe_info_init(struct xe_device *xe, gt = tile->primary_gt; gt->info.id = xe->info.gt_count++; gt->info.type = XE_GT_TYPE_MAIN; - gt->info.__engine_mask = graphics_desc->hw_engine_mask; + gt->info.has_indirect_ring_state = graphics_desc->has_indirect_ring_state; + gt->info.engine_mask = graphics_desc->hw_engine_mask; if (MEDIA_VER(xe) < 13 && media_desc) - gt->info.__engine_mask |= media_desc->hw_engine_mask; + gt->info.engine_mask |= media_desc->hw_engine_mask; if (MEDIA_VER(xe) < 13 || !media_desc) continue; @@ -672,7 +720,8 @@ static int xe_info_init(struct xe_device *xe, gt = tile->media_gt; gt->info.type = XE_GT_TYPE_MEDIA; - gt->info.__engine_mask = media_desc->hw_engine_mask; + gt->info.has_indirect_ring_state = media_desc->has_indirect_ring_state; + gt->info.engine_mask = media_desc->hw_engine_mask; gt->mmio.adj_offset = MEDIA_GT_GSI_OFFSET; gt->mmio.adj_limit = MEDIA_GT_GSI_LENGTH; @@ -757,7 +806,9 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (err) return err; - xe_display_probe(xe); + err = xe_display_probe(xe); + if (err) + return err; drm_dbg(&xe->drm, "%s %s %04x:%04x dgfx:%d gfx:%s (%d.%02d) media:%s (%d.%02d) display:%s dma_m_s:%d tc:%d gscfi:%d", desc->platform_name, @@ -951,6 +1002,9 @@ static struct pci_driver xe_pci_driver = { .probe = xe_pci_probe, .remove = xe_pci_remove, .shutdown = xe_pci_shutdown, +#ifdef CONFIG_PCI_IOV + .sriov_configure = xe_pci_sriov_configure, +#endif #ifdef CONFIG_PM_SLEEP .driver.pm = &xe_pm_ops, #endif diff --git a/drivers/gpu/drm/xe/xe_pci_sriov.c b/drivers/gpu/drm/xe/xe_pci_sriov.c new file mode 100644 index 000000000000..06d0fceb5114 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_pci_sriov.c @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#include "xe_assert.h" +#include "xe_device.h" +#include "xe_gt_sriov_pf_config.h" +#include "xe_pci_sriov.h" +#include "xe_pm.h" +#include "xe_sriov.h" +#include "xe_sriov_pf_helpers.h" +#include "xe_sriov_printk.h" + +static int pf_provision_vfs(struct xe_device *xe, unsigned int num_vfs) +{ + struct xe_gt *gt; + unsigned int id; + int result = 0, err; + + for_each_gt(gt, xe, id) { + err = xe_gt_sriov_pf_config_set_fair(gt, VFID(1), num_vfs); + result = result ?: err; + } + + return result; +} + +static void pf_unprovision_vfs(struct xe_device *xe, unsigned int num_vfs) +{ + struct xe_gt *gt; + unsigned int id; + unsigned int n; + + for_each_gt(gt, xe, id) + for (n = 1; n <= num_vfs; n++) + xe_gt_sriov_pf_config_release(gt, n, true); +} + +static int pf_enable_vfs(struct xe_device *xe, int num_vfs) +{ + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); + int total_vfs = xe_sriov_pf_get_totalvfs(xe); + int err; + + xe_assert(xe, IS_SRIOV_PF(xe)); + xe_assert(xe, num_vfs > 0); + xe_assert(xe, num_vfs <= total_vfs); + xe_sriov_dbg(xe, "enabling %u VF%s\n", num_vfs, str_plural(num_vfs)); + + /* + * We must hold additional reference to the runtime PM to keep PF in D0 + * during VFs lifetime, as our VFs do not implement the PM capability. + * + * With PF being in D0 state, all VFs will also behave as in D0 state. + * This will also keep GuC alive with all VFs' configurations. + * + * We will release this additional PM reference in pf_disable_vfs(). + */ + xe_pm_runtime_get_noresume(xe); + + err = pf_provision_vfs(xe, num_vfs); + if (err < 0) + goto failed; + + err = pci_enable_sriov(pdev, num_vfs); + if (err < 0) + goto failed; + + xe_sriov_info(xe, "Enabled %u of %u VF%s\n", + num_vfs, total_vfs, str_plural(total_vfs)); + return num_vfs; + +failed: + pf_unprovision_vfs(xe, num_vfs); + xe_pm_runtime_put(xe); + + xe_sriov_notice(xe, "Failed to enable %u VF%s (%pe)\n", + num_vfs, str_plural(num_vfs), ERR_PTR(err)); + return err; +} + +static int pf_disable_vfs(struct xe_device *xe) +{ + struct device *dev = xe->drm.dev; + struct pci_dev *pdev = to_pci_dev(dev); + u16 num_vfs = pci_num_vf(pdev); + + xe_assert(xe, IS_SRIOV_PF(xe)); + xe_sriov_dbg(xe, "disabling %u VF%s\n", num_vfs, str_plural(num_vfs)); + + if (!num_vfs) + return 0; + + pci_disable_sriov(pdev); + + pf_unprovision_vfs(xe, num_vfs); + + /* not needed anymore - see pf_enable_vfs() */ + xe_pm_runtime_put(xe); + + xe_sriov_info(xe, "Disabled %u VF%s\n", num_vfs, str_plural(num_vfs)); + return 0; +} + +/** + * xe_pci_sriov_configure - Configure SR-IOV (enable/disable VFs). + * @pdev: the &pci_dev + * @num_vfs: number of VFs to enable or zero to disable all VFs + * + * This is the Xe implementation of struct pci_driver.sriov_configure callback. + * + * This callback will be called by the PCI subsystem to enable or disable SR-IOV + * Virtual Functions (VFs) as requested by the used via the PCI sysfs interface. + * + * Return: number of configured VFs or a negative error code on failure. + */ +int xe_pci_sriov_configure(struct pci_dev *pdev, int num_vfs) +{ + struct xe_device *xe = pdev_to_xe_device(pdev); + int ret; + + if (!IS_SRIOV_PF(xe)) + return -ENODEV; + + if (num_vfs < 0) + return -EINVAL; + + if (num_vfs > xe_sriov_pf_get_totalvfs(xe)) + return -ERANGE; + + if (num_vfs && pci_num_vf(pdev)) + return -EBUSY; + + xe_pm_runtime_get(xe); + if (num_vfs > 0) + ret = pf_enable_vfs(xe, num_vfs); + else + ret = pf_disable_vfs(xe); + xe_pm_runtime_put(xe); + + return ret; +} diff --git a/drivers/gpu/drm/xe/xe_pci_sriov.h b/drivers/gpu/drm/xe/xe_pci_sriov.h new file mode 100644 index 000000000000..3b8bfbf7e1d9 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_pci_sriov.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_PCI_SRIOV_H_ +#define _XE_PCI_SRIOV_H_ + +struct pci_dev; + +int xe_pci_sriov_configure(struct pci_dev *pdev, int num_vfs); + +#endif diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h index b1ad12fa22d6..79b0f80376a4 100644 --- a/drivers/gpu/drm/xe/xe_pci_types.h +++ b/drivers/gpu/drm/xe/xe_pci_types.h @@ -25,7 +25,9 @@ struct xe_graphics_desc { u8 max_remote_tiles:2; u8 has_asid:1; + u8 has_atomic_enable_pte_bit:1; u8 has_flat_ccs:1; + u8 has_indirect_ring_state:1; u8 has_range_tlb_invalidation:1; u8 has_usm:1; }; @@ -36,6 +38,8 @@ struct xe_media_desc { u8 rel; u64 hw_engine_mask; /* hardware engines provided by media IP */ + + u8 has_indirect_ring_state:1; }; struct gmdid_map { diff --git a/drivers/gpu/drm/xe/xe_pcode.c b/drivers/gpu/drm/xe/xe_pcode.c index a5e7da8cf944..9c4eefdf6642 100644 --- a/drivers/gpu/drm/xe/xe_pcode.c +++ b/drivers/gpu/drm/xe/xe_pcode.c @@ -10,6 +10,7 @@ #include <drm/drm_managed.h> +#include "xe_assert.h" #include "xe_device.h" #include "xe_gt.h" #include "xe_mmio.h" @@ -124,6 +125,8 @@ static int pcode_try_request(struct xe_gt *gt, u32 mbox, { int slept, wait = 10; + xe_gt_assert(gt, timeout_us > 0); + for (slept = 0; slept < timeout_us; slept += wait) { if (locked) *status = pcode_mailbox_rw(gt, mbox, &request, NULL, 1, true, @@ -169,6 +172,8 @@ int xe_pcode_request(struct xe_gt *gt, u32 mbox, u32 request, u32 status; int ret; + xe_gt_assert(gt, timeout_base_ms <= 3); + mutex_lock(>->pcode.lock); ret = pcode_try_request(gt, mbox, request, reply_mask, reply, &status, @@ -188,7 +193,6 @@ int xe_pcode_request(struct xe_gt *gt, u32 mbox, u32 request, */ drm_err(>_to_xe(gt)->drm, "PCODE timeout, retrying with preemption disabled\n"); - drm_WARN_ON_ONCE(>_to_xe(gt)->drm, timeout_base_ms > 1); preempt_disable(); ret = pcode_try_request(gt, mbox, request, reply_mask, reply, &status, true, 50 * 1000, true); diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c index 37fbeda12d3b..de3b5df65e48 100644 --- a/drivers/gpu/drm/xe/xe_pm.c +++ b/drivers/gpu/drm/xe/xe_pm.c @@ -69,7 +69,7 @@ */ #ifdef CONFIG_LOCKDEP -struct lockdep_map xe_pm_runtime_lockdep_map = { +static struct lockdep_map xe_pm_runtime_lockdep_map = { .name = "xe_pm_runtime_lockdep_map" }; #endif @@ -96,12 +96,12 @@ int xe_pm_suspend(struct xe_device *xe) if (err) goto err; - xe_display_pm_suspend(xe); + xe_display_pm_suspend(xe, false); for_each_gt(gt, xe, id) { err = xe_gt_suspend(gt); if (err) { - xe_display_pm_resume(xe); + xe_display_pm_resume(xe, false); goto err; } } @@ -151,7 +151,7 @@ int xe_pm_resume(struct xe_device *xe) xe_irq_resume(xe); - xe_display_pm_resume(xe); + xe_display_pm_resume(xe, false); for_each_gt(gt, xe, id) xe_gt_resume(gt); @@ -366,6 +366,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe) err = xe_bo_evict_all(xe); if (err) goto out; + xe_display_pm_suspend(xe, true); } for_each_gt(gt, xe, id) { @@ -375,7 +376,12 @@ int xe_pm_runtime_suspend(struct xe_device *xe) } xe_irq_suspend(xe); + + if (xe->d3cold.allowed) + xe_display_pm_suspend_late(xe); out: + if (err) + xe_display_pm_resume(xe, true); lock_map_release(&xe_pm_runtime_lockdep_map); xe_pm_write_callback_task(xe, NULL); return err; @@ -398,19 +404,13 @@ int xe_pm_runtime_resume(struct xe_device *xe) lock_map_acquire(&xe_pm_runtime_lockdep_map); - /* - * It can be possible that xe has allowed d3cold but other pcie devices - * in gfx card soc would have blocked d3cold, therefore card has not - * really lost power. Detecting primary Gt power is sufficient. - */ - gt = xe_device_get_gt(xe, 0); - xe->d3cold.power_lost = xe_guc_in_reset(>->uc.guc); - - if (xe->d3cold.allowed && xe->d3cold.power_lost) { + if (xe->d3cold.allowed) { err = xe_pcode_ready(xe, true); if (err) goto out; + xe_display_pm_resume_early(xe); + /* * This only restores pinned memory which is the memory * required for the GT(s) to resume. @@ -425,7 +425,8 @@ int xe_pm_runtime_resume(struct xe_device *xe) for_each_gt(gt, xe, id) xe_gt_resume(gt); - if (xe->d3cold.allowed && xe->d3cold.power_lost) { + if (xe->d3cold.allowed) { + xe_display_pm_resume(xe, true); err = xe_bo_restore_user(xe); if (err) goto out; @@ -505,19 +506,20 @@ int xe_pm_runtime_get_ioctl(struct xe_device *xe) * xe_pm_runtime_get_if_active - Get a runtime_pm reference if device active * @xe: xe device instance * - * Returns: Any number greater than or equal to 0 for success, negative error - * code otherwise. + * Return: True if device is awake (regardless the previous number of references) + * and a new reference was taken, false otherwise. */ -int xe_pm_runtime_get_if_active(struct xe_device *xe) +bool xe_pm_runtime_get_if_active(struct xe_device *xe) { - return pm_runtime_get_if_active(xe->drm.dev); + return pm_runtime_get_if_active(xe->drm.dev) > 0; } /** - * xe_pm_runtime_get_if_in_use - Get a runtime_pm reference and resume if needed + * xe_pm_runtime_get_if_in_use - Get a new reference if device is active with previous ref taken * @xe: xe device instance * - * Returns: True if device is awake and the reference was taken, false otherwise. + * Return: True if device is awake, a previous reference had been already taken, + * and a new reference was now taken, false otherwise. */ bool xe_pm_runtime_get_if_in_use(struct xe_device *xe) { diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h index 18b0613fe57b..104a21ae6dfd 100644 --- a/drivers/gpu/drm/xe/xe_pm.h +++ b/drivers/gpu/drm/xe/xe_pm.h @@ -8,12 +8,7 @@ #include <linux/pm_runtime.h> -/* - * TODO: Threshold = 0 will block D3Cold. - * Before we can move this to a higher value (like 300), we need to: - * 1. rewrite the VRAM save / restore to avoid buffer object locks - */ -#define DEFAULT_VRAM_THRESHOLD 0 /* in MB */ +#define DEFAULT_VRAM_THRESHOLD 300 /* in MB */ struct xe_device; @@ -29,7 +24,7 @@ int xe_pm_runtime_resume(struct xe_device *xe); void xe_pm_runtime_get(struct xe_device *xe); int xe_pm_runtime_get_ioctl(struct xe_device *xe); void xe_pm_runtime_put(struct xe_device *xe); -int xe_pm_runtime_get_if_active(struct xe_device *xe); +bool xe_pm_runtime_get_if_active(struct xe_device *xe); bool xe_pm_runtime_get_if_in_use(struct xe_device *xe); void xe_pm_runtime_get_noresume(struct xe_device *xe); bool xe_pm_runtime_resume_and_get(struct xe_device *xe); diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.c b/drivers/gpu/drm/xe/xe_preempt_fence.c index 7d50c6e89d8e..5b243b7feb59 100644 --- a/drivers/gpu/drm/xe/xe_preempt_fence.c +++ b/drivers/gpu/drm/xe/xe_preempt_fence.c @@ -23,11 +23,19 @@ static void preempt_fence_work_func(struct work_struct *w) q->ops->suspend_wait(q); dma_fence_signal(&pfence->base); - dma_fence_end_signalling(cookie); - + /* + * Opt for keep everything in the fence critical section. This looks really strange since we + * have just signalled the fence, however the preempt fences are all signalled via single + * global ordered-wq, therefore anything that happens in this callback can easily block + * progress on the entire wq, which itself may prevent other published preempt fences from + * ever signalling. Therefore try to keep everything here in the callback in the fence + * critical section. For example if something below grabs a scary lock like vm->lock, + * lockdep should complain since we also hold that lock whilst waiting on preempt fences to + * complete. + */ xe_vm_queue_rebind_worker(q->vm); - xe_exec_queue_put(q); + dma_fence_end_signalling(cookie); } static const char * diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c index 5b7930f46cf3..cd60c009b679 100644 --- a/drivers/gpu/drm/xe/xe_pt.c +++ b/drivers/gpu/drm/xe/xe_pt.c @@ -619,9 +619,40 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; int ret; - if ((vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) && - (is_devmem || !IS_DGFX(xe))) - xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE; + /** + * Default atomic expectations for different allocation scenarios are as follows: + * + * 1. Traditional API: When the VM is not in LR mode: + * - Device atomics are expected to function with all allocations. + * + * 2. Compute/SVM API: When the VM is in LR mode: + * - Device atomics are the default behavior when the bo is placed in a single region. + * - In all other cases device atomics will be disabled with AE=0 until an application + * request differently using a ioctl like madvise. + */ + if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) { + if (xe_vm_in_lr_mode(xe_vma_vm(vma))) { + if (bo && xe_bo_has_single_placement(bo)) + xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE; + /** + * If a SMEM+LMEM allocation is backed by SMEM, a device + * atomics will cause a gpu page fault and which then + * gets migrated to LMEM, bind such allocations with + * device atomics enabled. + */ + else if (is_devmem && !xe_bo_has_single_placement(bo)) + xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE; + } else { + xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE; + } + + /** + * Unset AE if the platform(PVC) doesn't support it on an + * allocation + */ + if (!xe->info.has_device_atomics_on_smem && !is_devmem) + xe_walk.default_pte &= ~XE_USM_PPGTT_PTE_AE; + } if (is_devmem) { xe_walk.default_pte |= XE_PPGTT_PTE_DM; @@ -732,7 +763,7 @@ static int xe_pt_zap_ptes_entry(struct xe_ptw *parent, pgoff_t offset, pgoff_t end_offset; XE_WARN_ON(!*child); - XE_WARN_ON(!level && xe_child->is_compact); + XE_WARN_ON(!level); /* * Note that we're called from an entry callback, and we're dealing @@ -781,8 +812,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma) .tile = tile, }; struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; + u8 pt_mask = (vma->tile_present & ~vma->tile_invalidated); - if (!(vma->tile_present & BIT(tile->id))) + if (!(pt_mask & BIT(tile->id))) return false; (void)xe_pt_walk_shared(&pt->base, pt->level, xe_vma_start(vma), @@ -1075,10 +1107,12 @@ static const struct xe_migrate_pt_update_ops userptr_bind_ops = { struct invalidation_fence { struct xe_gt_tlb_invalidation_fence base; struct xe_gt *gt; - struct xe_vma *vma; struct dma_fence *fence; struct dma_fence_cb cb; struct work_struct work; + u64 start; + u64 end; + u32 asid; }; static const char * @@ -1121,13 +1155,14 @@ static void invalidation_fence_work_func(struct work_struct *w) container_of(w, struct invalidation_fence, work); trace_xe_gt_tlb_invalidation_fence_work_func(&ifence->base); - xe_gt_tlb_invalidation_vma(ifence->gt, &ifence->base, ifence->vma); + xe_gt_tlb_invalidation_range(ifence->gt, &ifence->base, ifence->start, + ifence->end, ifence->asid); } static int invalidation_fence_init(struct xe_gt *gt, struct invalidation_fence *ifence, struct dma_fence *fence, - struct xe_vma *vma) + u64 start, u64 end, u32 asid) { int ret; @@ -1144,7 +1179,9 @@ static int invalidation_fence_init(struct xe_gt *gt, dma_fence_get(&ifence->base.base); /* Ref for caller */ ifence->fence = fence; ifence->gt = gt; - ifence->vma = vma; + ifence->start = start; + ifence->end = end; + ifence->asid = asid; INIT_WORK(&ifence->work, invalidation_fence_work_func); ret = dma_fence_add_callback(fence, &ifence->cb, invalidation_fence_cb); @@ -1295,8 +1332,11 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue /* TLB invalidation must be done before signaling rebind */ if (ifence) { - int err = invalidation_fence_init(tile->primary_gt, ifence, fence, - vma); + int err = invalidation_fence_init(tile->primary_gt, + ifence, fence, + xe_vma_start(vma), + xe_vma_end(vma), + xe_vma_vm(vma)->usm.asid); if (err) { dma_fence_put(fence); kfree(ifence); @@ -1405,7 +1445,7 @@ static int xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t offset, struct xe_pt *xe_child = container_of(*child, typeof(*xe_child), base); XE_WARN_ON(!*child); - XE_WARN_ON(!level && xe_child->is_compact); + XE_WARN_ON(!level); xe_pt_check_kill(addr, next, level - 1, xe_child, action, walk); @@ -1641,7 +1681,10 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu dma_fence_wait(fence, false); /* TLB invalidation must be done before signaling unbind */ - err = invalidation_fence_init(tile->primary_gt, ifence, fence, vma); + err = invalidation_fence_init(tile->primary_gt, ifence, fence, + xe_vma_start(vma), + xe_vma_end(vma), + xe_vma_vm(vma)->usm.asid); if (err) { dma_fence_put(fence); kfree(ifence); diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c index df407d73e5f5..995effcb904b 100644 --- a/drivers/gpu/drm/xe/xe_query.c +++ b/drivers/gpu/drm/xe/xe_query.c @@ -16,6 +16,7 @@ #include "xe_bo.h" #include "xe_device.h" #include "xe_exec_queue.h" +#include "xe_force_wake.h" #include "xe_ggtt.h" #include "xe_gt.h" #include "xe_guc_hwconfig.h" @@ -454,9 +455,10 @@ static int query_hwconfig(struct xe_device *xe, static size_t calc_topo_query_size(struct xe_device *xe) { return xe->info.gt_count * - (3 * sizeof(struct drm_xe_query_topology_mask) + + (4 * sizeof(struct drm_xe_query_topology_mask) + sizeof_field(struct xe_gt, fuse_topo.g_dss_mask) + sizeof_field(struct xe_gt, fuse_topo.c_dss_mask) + + sizeof_field(struct xe_gt, fuse_topo.l3_bank_mask) + sizeof_field(struct xe_gt, fuse_topo.eu_mask_per_dss)); } @@ -510,6 +512,12 @@ static int query_gt_topology(struct xe_device *xe, if (err) return err; + topo.type = DRM_XE_TOPO_L3_BANK; + err = copy_mask(&query_ptr, &topo, gt->fuse_topo.l3_bank_mask, + sizeof(gt->fuse_topo.l3_bank_mask)); + if (err) + return err; + topo.type = DRM_XE_TOPO_EU_PER_DSS; err = copy_mask(&query_ptr, &topo, gt->fuse_topo.eu_mask_per_dss, diff --git a/drivers/gpu/drm/xe/xe_reg_sr.h b/drivers/gpu/drm/xe/xe_reg_sr.h index e3197c33afe2..51fbba423e27 100644 --- a/drivers/gpu/drm/xe/xe_reg_sr.h +++ b/drivers/gpu/drm/xe/xe_reg_sr.h @@ -6,8 +6,6 @@ #ifndef _XE_REG_SR_ #define _XE_REG_SR_ -#include "xe_reg_sr_types.h" - /* * Reg save/restore bookkeeping */ @@ -15,6 +13,8 @@ struct xe_device; struct xe_gt; struct xe_hw_engine; +struct xe_reg_sr; +struct xe_reg_sr_entry; struct drm_printer; int xe_reg_sr_init(struct xe_reg_sr *sr, const char *name, struct xe_device *xe); diff --git a/drivers/gpu/drm/xe/xe_res_cursor.h b/drivers/gpu/drm/xe/xe_res_cursor.h index 0a306963aa8e..655af89b31a9 100644 --- a/drivers/gpu/drm/xe/xe_res_cursor.h +++ b/drivers/gpu/drm/xe/xe_res_cursor.h @@ -157,8 +157,6 @@ static inline void xe_res_first_sg(const struct sg_table *sg, struct xe_res_cursor *cur) { XE_WARN_ON(!sg); - XE_WARN_ON(!IS_ALIGNED(start, PAGE_SIZE) || - !IS_ALIGNED(size, PAGE_SIZE)); cur->node = NULL; cur->start = start; cur->remaining = size; diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c index d42b3f33bd7a..db630d27beba 100644 --- a/drivers/gpu/drm/xe/xe_ring_ops.c +++ b/drivers/gpu/drm/xe/xe_ring_ops.c @@ -7,9 +7,9 @@ #include <generated/xe_wa_oob.h> +#include "instructions/xe_gpu_commands.h" #include "instructions/xe_mi_commands.h" #include "regs/xe_engine_regs.h" -#include "regs/xe_gpu_commands.h" #include "regs/xe_gt_regs.h" #include "regs/xe_lrc_layout.h" #include "xe_exec_queue_types.h" @@ -80,6 +80,16 @@ static int emit_store_imm_ggtt(u32 addr, u32 value, u32 *dw, int i) return i; } +static int emit_flush_dw(u32 *dw, int i) +{ + dw[i++] = MI_FLUSH_DW | MI_FLUSH_IMM_DW; + dw[i++] = 0; + dw[i++] = 0; + dw[i++] = 0; + + return i; +} + static int emit_flush_imm_ggtt(u32 addr, u32 value, bool invalidate_tlb, u32 *dw, int i) { @@ -234,10 +244,12 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc i = emit_bb_start(batch_addr, ppgtt_flag, dw, i); - if (job->user_fence.used) + if (job->user_fence.used) { + i = emit_flush_dw(dw, i); i = emit_store_imm_ppgtt_posted(job->user_fence.addr, job->user_fence.value, dw, i); + } i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, false, dw, i); @@ -293,10 +305,12 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc, i = emit_bb_start(batch_addr, ppgtt_flag, dw, i); - if (job->user_fence.used) + if (job->user_fence.used) { + i = emit_flush_dw(dw, i); i = emit_store_imm_ppgtt_posted(job->user_fence.addr, job->user_fence.value, dw, i); + } i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, false, dw, i); @@ -366,7 +380,7 @@ static void emit_migration_job_gen12(struct xe_sched_job *job, dw[i++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; /* Enabled again below */ - i = emit_bb_start(job->batch_addr[0], BIT(8), dw, i); + i = emit_bb_start(job->ptrs[0].batch_addr, BIT(8), dw, i); if (!IS_SRIOV_VF(gt_to_xe(job->q->gt))) { /* XXX: Do we need this? Leaving for now. */ @@ -375,7 +389,7 @@ static void emit_migration_job_gen12(struct xe_sched_job *job, dw[i++] = preparser_disable(false); } - i = emit_bb_start(job->batch_addr[1], BIT(8), dw, i); + i = emit_bb_start(job->ptrs[1].batch_addr, BIT(8), dw, i); dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | job->migrate_flush_flags | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_IMM_DW; @@ -396,9 +410,9 @@ static void emit_job_gen12_gsc(struct xe_sched_job *job) xe_gt_assert(gt, job->q->width <= 1); /* no parallel submission for GSCCS */ - __emit_job_gen12_simple(job, job->q->lrc, - job->batch_addr[0], - xe_sched_job_seqno(job)); + __emit_job_gen12_simple(job, job->q->lrc[0], + job->ptrs[0].batch_addr, + xe_sched_job_lrc_seqno(job)); } static void emit_job_gen12_copy(struct xe_sched_job *job) @@ -406,15 +420,15 @@ static void emit_job_gen12_copy(struct xe_sched_job *job) int i; if (xe_sched_job_is_migration(job->q)) { - emit_migration_job_gen12(job, job->q->lrc, - xe_sched_job_seqno(job)); + emit_migration_job_gen12(job, job->q->lrc[0], + xe_sched_job_lrc_seqno(job)); return; } for (i = 0; i < job->q->width; ++i) - __emit_job_gen12_simple(job, job->q->lrc + i, - job->batch_addr[i], - xe_sched_job_seqno(job)); + __emit_job_gen12_simple(job, job->q->lrc[i], + job->ptrs[i].batch_addr, + xe_sched_job_lrc_seqno(job)); } static void emit_job_gen12_video(struct xe_sched_job *job) @@ -423,9 +437,9 @@ static void emit_job_gen12_video(struct xe_sched_job *job) /* FIXME: Not doing parallel handshake for now */ for (i = 0; i < job->q->width; ++i) - __emit_job_gen12_video(job, job->q->lrc + i, - job->batch_addr[i], - xe_sched_job_seqno(job)); + __emit_job_gen12_video(job, job->q->lrc[i], + job->ptrs[i].batch_addr, + xe_sched_job_lrc_seqno(job)); } static void emit_job_gen12_render_compute(struct xe_sched_job *job) @@ -433,9 +447,9 @@ static void emit_job_gen12_render_compute(struct xe_sched_job *job) int i; for (i = 0; i < job->q->width; ++i) - __emit_job_gen12_render_compute(job, job->q->lrc + i, - job->batch_addr[i], - xe_sched_job_seqno(job)); + __emit_job_gen12_render_compute(job, job->q->lrc[i], + job->ptrs[i].batch_addr, + xe_sched_job_lrc_seqno(job)); } static const struct xe_ring_ops ring_ops_gen12_gsc = { diff --git a/drivers/gpu/drm/xe/xe_rtp.c b/drivers/gpu/drm/xe/xe_rtp.c index fb44cc7521d8..01c32a932780 100644 --- a/drivers/gpu/drm/xe/xe_rtp.c +++ b/drivers/gpu/drm/xe/xe_rtp.c @@ -323,3 +323,9 @@ bool xe_rtp_match_first_gslice_fused_off(const struct xe_gt *gt, return dss >= dss_per_gslice; } + +bool xe_rtp_match_when_media2000(const struct xe_gt *gt, + const struct xe_hw_engine *hwe) +{ + return (gt_to_xe(gt))->info.media_verx100 == 2000; +} diff --git a/drivers/gpu/drm/xe/xe_rtp.h b/drivers/gpu/drm/xe/xe_rtp.h index c56fedd126e6..a32645f5f80b 100644 --- a/drivers/gpu/drm/xe/xe_rtp.h +++ b/drivers/gpu/drm/xe/xe_rtp.h @@ -341,7 +341,7 @@ struct xe_reg_sr; * }; */ #define XE_RTP_RULES(...) \ - .n_rules = _XE_COUNT_ARGS(__VA_ARGS__), \ + .n_rules = COUNT_ARGS(__VA_ARGS__), \ .rules = (const struct xe_rtp_rule[]) { \ XE_RTP_PASTE_FOREACH(RULE_, COMMA, (__VA_ARGS__)) \ } @@ -366,7 +366,7 @@ struct xe_reg_sr; * }; */ #define XE_RTP_ACTIONS(...) \ - .n_actions = _XE_COUNT_ARGS(__VA_ARGS__), \ + .n_actions = COUNT_ARGS(__VA_ARGS__), \ .actions = (const struct xe_rtp_action[]) { \ XE_RTP_PASTE_FOREACH(ACTION_, COMMA, (__VA_ARGS__)) \ } @@ -427,4 +427,18 @@ bool xe_rtp_match_first_render_or_compute(const struct xe_gt *gt, bool xe_rtp_match_first_gslice_fused_off(const struct xe_gt *gt, const struct xe_hw_engine *hwe); +/* + * xe_rtp_match_when_media2000 - Match when media GT version 2000 + * + * @gt: GT structure + * @hwe: Engine instance + * + * Its one of the case where we need to apply workaround on primary GT + * based on if media GT version 2000 is present. Thus this API will help + * us to match media version 2000. + * + * Returns: true if media GT version 2000, false otherwise. + */ +bool xe_rtp_match_when_media2000(const struct xe_gt *gt, + const struct xe_hw_engine *hwe); #endif diff --git a/drivers/gpu/drm/xe/xe_rtp_helpers.h b/drivers/gpu/drm/xe/xe_rtp_helpers.h index 181b6290fac3..7735f217ba71 100644 --- a/drivers/gpu/drm/xe/xe_rtp_helpers.h +++ b/drivers/gpu/drm/xe/xe_rtp_helpers.h @@ -10,22 +10,16 @@ #error "This header is supposed to be included by xe_rtp.h only" #endif +#include "xe_args.h" + /* * Helper macros - not to be used outside this header. */ #define _XE_ESC(...) __VA_ARGS__ -#define _XE_COUNT_ARGS(...) _XE_ESC(__XE_COUNT_ARGS(__VA_ARGS__, 5, 4, 3, 2, 1,)) -#define __XE_COUNT_ARGS(_, _5, _4, _3, _2, X_, ...) X_ - -#define _XE_FIRST(...) _XE_ESC(__XE_FIRST(__VA_ARGS__,)) -#define __XE_FIRST(x_, ...) x_ -#define _XE_TUPLE_TAIL(...) _XE_ESC(__XE_TUPLE_TAIL(__VA_ARGS__)) -#define __XE_TUPLE_TAIL(x_, ...) (__VA_ARGS__) -#define _XE_DROP_FIRST(x_, ...) __VA_ARGS__ +#define _XE_TUPLE_TAIL(...) (DROP_FIRST_ARG(__VA_ARGS__)) -#define _XE_RTP_CONCAT(a, b) __XE_RTP_CONCAT(a, b) -#define __XE_RTP_CONCAT(a, b) XE_RTP_ ## a ## b +#define _XE_RTP_CONCAT(a, b) CONCATENATE(XE_RTP_, CONCATENATE(a, b)) #define __XE_RTP_PASTE_SEP_COMMA , #define __XE_RTP_PASTE_SEP_BITWISE_OR | @@ -59,11 +53,11 @@ * * XE_RTP_TEST_FOO BANANA XE_RTP_TEST_BAR */ -#define XE_RTP_PASTE_FOREACH(prefix_, sep_, args_) _XE_ESC(_XE_RTP_CONCAT(PASTE_, _XE_COUNT_ARGS args_)(prefix_, sep_, args_)) -#define XE_RTP_PASTE_1(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, _XE_FIRST args_) -#define XE_RTP_PASTE_2(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, _XE_FIRST args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_1(prefix_, sep_, _XE_TUPLE_TAIL args_) -#define XE_RTP_PASTE_3(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, _XE_FIRST args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_2(prefix_, sep_, _XE_TUPLE_TAIL args_) -#define XE_RTP_PASTE_4(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, _XE_FIRST args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_3(prefix_, sep_, _XE_TUPLE_TAIL args_) +#define XE_RTP_PASTE_FOREACH(prefix_, sep_, args_) _XE_RTP_CONCAT(PASTE_, COUNT_ARGS args_)(prefix_, sep_, args_) +#define XE_RTP_PASTE_1(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) +#define XE_RTP_PASTE_2(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_1(prefix_, sep_, _XE_TUPLE_TAIL args_) +#define XE_RTP_PASTE_3(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_2(prefix_, sep_, _XE_TUPLE_TAIL args_) +#define XE_RTP_PASTE_4(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_3(prefix_, sep_, _XE_TUPLE_TAIL args_) /* * XE_RTP_DROP_CAST - Drop cast to convert a compound statement to a initializer @@ -76,6 +70,6 @@ * * { .a = 10 } */ -#define XE_RTP_DROP_CAST(...) _XE_ESC(_XE_DROP_FIRST _XE_ESC __VA_ARGS__) +#define XE_RTP_DROP_CAST(...) _XE_ESC(DROP_FIRST_ARG _XE_ESC __VA_ARGS__) #endif diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c index cd8a2fba5438..5c013904877a 100644 --- a/drivers/gpu/drm/xe/xe_sched_job.c +++ b/drivers/gpu/drm/xe/xe_sched_job.c @@ -6,7 +6,7 @@ #include "xe_sched_job.h" #include <drm/xe_drm.h> -#include <linux/dma-fence-array.h> +#include <linux/dma-fence-chain.h> #include <linux/slab.h> #include "xe_device.h" @@ -29,7 +29,7 @@ int __init xe_sched_job_module_init(void) xe_sched_job_slab = kmem_cache_create("xe_sched_job", sizeof(struct xe_sched_job) + - sizeof(u64), 0, + sizeof(struct xe_job_ptrs), 0, SLAB_HWCACHE_ALIGN, NULL); if (!xe_sched_job_slab) return -ENOMEM; @@ -37,7 +37,7 @@ int __init xe_sched_job_module_init(void) xe_sched_job_parallel_slab = kmem_cache_create("xe_sched_job_parallel", sizeof(struct xe_sched_job) + - sizeof(u64) * + sizeof(struct xe_job_ptrs) * XE_HW_ENGINE_MAX_INSTANCE, 0, SLAB_HWCACHE_ALIGN, NULL); if (!xe_sched_job_parallel_slab) { @@ -79,26 +79,33 @@ static struct xe_device *job_to_xe(struct xe_sched_job *job) return gt_to_xe(job->q->gt); } +/* Free unused pre-allocated fences */ +static void xe_sched_job_free_fences(struct xe_sched_job *job) +{ + int i; + + for (i = 0; i < job->q->width; ++i) { + struct xe_job_ptrs *ptrs = &job->ptrs[i]; + + if (ptrs->lrc_fence) + xe_lrc_free_seqno_fence(ptrs->lrc_fence); + if (ptrs->chain_fence) + dma_fence_chain_free(ptrs->chain_fence); + } +} + struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, u64 *batch_addr) { - struct xe_sched_job *job; - struct dma_fence **fences; bool is_migration = xe_sched_job_is_migration(q); + struct xe_sched_job *job; int err; - int i, j; + int i; u32 width; /* only a kernel context can submit a vm-less job */ XE_WARN_ON(!q->vm && !(q->flags & EXEC_QUEUE_FLAG_KERNEL)); - /* Migration and kernel engines have their own locking */ - if (!(q->flags & (EXEC_QUEUE_FLAG_KERNEL | EXEC_QUEUE_FLAG_VM))) { - lockdep_assert_held(&q->vm->lock); - if (!xe_vm_in_lr_mode(q->vm)) - xe_vm_assert_held(q->vm); - } - job = job_alloc(xe_exec_queue_is_parallel(q) || is_migration); if (!job) return ERR_PTR(-ENOMEM); @@ -111,44 +118,25 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, if (err) goto err_free; - if (!xe_exec_queue_is_parallel(q)) { - job->fence = xe_lrc_create_seqno_fence(q->lrc); - if (IS_ERR(job->fence)) { - err = PTR_ERR(job->fence); - goto err_sched_job; - } - } else { - struct dma_fence_array *cf; + for (i = 0; i < q->width; ++i) { + struct dma_fence *fence = xe_lrc_alloc_seqno_fence(); + struct dma_fence_chain *chain; - fences = kmalloc_array(q->width, sizeof(*fences), GFP_KERNEL); - if (!fences) { - err = -ENOMEM; + if (IS_ERR(fence)) { + err = PTR_ERR(fence); goto err_sched_job; } + job->ptrs[i].lrc_fence = fence; - for (j = 0; j < q->width; ++j) { - fences[j] = xe_lrc_create_seqno_fence(q->lrc + j); - if (IS_ERR(fences[j])) { - err = PTR_ERR(fences[j]); - goto err_fences; - } - } + if (i + 1 == q->width) + continue; - cf = dma_fence_array_create(q->width, fences, - q->parallel.composite_fence_ctx, - q->parallel.composite_fence_seqno++, - false); - if (!cf) { - --q->parallel.composite_fence_seqno; + chain = dma_fence_chain_alloc(); + if (!chain) { err = -ENOMEM; - goto err_fences; + goto err_sched_job; } - - /* Sanity check */ - for (j = 0; j < q->width; ++j) - xe_assert(job_to_xe(job), cf->base.seqno == fences[j]->seqno); - - job->fence = &cf->base; + job->ptrs[i].chain_fence = chain; } width = q->width; @@ -156,23 +144,14 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, width = 2; for (i = 0; i < width; ++i) - job->batch_addr[i] = batch_addr[i]; - - /* All other jobs require a VM to be open which has a ref */ - if (unlikely(q->flags & EXEC_QUEUE_FLAG_KERNEL)) - xe_pm_runtime_get_noresume(job_to_xe(job)); - xe_device_assert_mem_access(job_to_xe(job)); + job->ptrs[i].batch_addr = batch_addr[i]; + xe_pm_runtime_get_noresume(job_to_xe(job)); trace_xe_sched_job_create(job); return job; -err_fences: - for (j = j - 1; j >= 0; --j) { - --q->lrc[j].fence_ctx.next_seqno; - dma_fence_put(fences[j]); - } - kfree(fences); err_sched_job: + xe_sched_job_free_fences(job); drm_sched_job_cleanup(&job->drm); err_free: xe_exec_queue_put(q); @@ -191,36 +170,42 @@ void xe_sched_job_destroy(struct kref *ref) { struct xe_sched_job *job = container_of(ref, struct xe_sched_job, refcount); + struct xe_device *xe = job_to_xe(job); - if (unlikely(job->q->flags & EXEC_QUEUE_FLAG_KERNEL)) - xe_pm_runtime_put(job_to_xe(job)); + xe_sched_job_free_fences(job); xe_exec_queue_put(job->q); dma_fence_put(job->fence); drm_sched_job_cleanup(&job->drm); job_free(job); + xe_pm_runtime_put(xe); } -void xe_sched_job_set_error(struct xe_sched_job *job, int error) +/* Set the error status under the fence to avoid racing with signaling */ +static bool xe_fence_set_error(struct dma_fence *fence, int error) { - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags)) - return; + unsigned long irq_flags; + bool signaled; - dma_fence_set_error(job->fence, error); + spin_lock_irqsave(fence->lock, irq_flags); + signaled = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags); + if (!signaled) + dma_fence_set_error(fence, error); + spin_unlock_irqrestore(fence->lock, irq_flags); - if (dma_fence_is_array(job->fence)) { - struct dma_fence_array *array = - to_dma_fence_array(job->fence); - struct dma_fence **child = array->fences; - unsigned int nchild = array->num_fences; + return signaled; +} + +void xe_sched_job_set_error(struct xe_sched_job *job, int error) +{ + if (xe_fence_set_error(job->fence, error)) + return; - do { - struct dma_fence *current_fence = *child++; + if (dma_fence_is_chain(job->fence)) { + struct dma_fence *iter; - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, - ¤t_fence->flags)) - continue; - dma_fence_set_error(current_fence, error); - } while (--nchild); + dma_fence_chain_for_each(iter, job->fence) + xe_fence_set_error(dma_fence_chain_contained(iter), + error); } trace_xe_sched_job_set_error(job); @@ -231,30 +216,42 @@ void xe_sched_job_set_error(struct xe_sched_job *job, int error) bool xe_sched_job_started(struct xe_sched_job *job) { - struct xe_lrc *lrc = job->q->lrc; + struct xe_lrc *lrc = job->q->lrc[0]; - return !__dma_fence_is_later(xe_sched_job_seqno(job), + return !__dma_fence_is_later(xe_sched_job_lrc_seqno(job), xe_lrc_start_seqno(lrc), - job->fence->ops); + dma_fence_chain_contained(job->fence)->ops); } bool xe_sched_job_completed(struct xe_sched_job *job) { - struct xe_lrc *lrc = job->q->lrc; + struct xe_lrc *lrc = job->q->lrc[0]; /* * Can safely check just LRC[0] seqno as that is last seqno written when * parallel handshake is done. */ - return !__dma_fence_is_later(xe_sched_job_seqno(job), xe_lrc_seqno(lrc), - job->fence->ops); + return !__dma_fence_is_later(xe_sched_job_lrc_seqno(job), + xe_lrc_seqno(lrc), + dma_fence_chain_contained(job->fence)->ops); } void xe_sched_job_arm(struct xe_sched_job *job) { struct xe_exec_queue *q = job->q; + struct dma_fence *fence, *prev; struct xe_vm *vm = q->vm; + u64 seqno = 0; + int i; + + /* Migration and kernel engines have their own locking */ + if (IS_ENABLED(CONFIG_LOCKDEP) && + !(q->flags & (EXEC_QUEUE_FLAG_KERNEL | EXEC_QUEUE_FLAG_VM))) { + lockdep_assert_held(&q->vm->lock); + if (!xe_vm_in_lr_mode(q->vm)) + xe_vm_assert_held(q->vm); + } if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) && (vm->batch_invalidate_tlb || vm->tlb_flush_seqno != q->tlb_flush_seqno)) { @@ -263,6 +260,27 @@ void xe_sched_job_arm(struct xe_sched_job *job) job->ring_ops_flush_tlb = true; } + /* Arm the pre-allocated fences */ + for (i = 0; i < q->width; prev = fence, ++i) { + struct dma_fence_chain *chain; + + fence = job->ptrs[i].lrc_fence; + xe_lrc_init_seqno_fence(q->lrc[i], fence); + job->ptrs[i].lrc_fence = NULL; + if (!i) { + job->lrc_seqno = fence->seqno; + continue; + } else { + xe_assert(gt_to_xe(q->gt), job->lrc_seqno == fence->seqno); + } + + chain = job->ptrs[i - 1].chain_fence; + dma_fence_chain_init(chain, prev, fence, seqno++); + job->ptrs[i - 1].chain_fence = NULL; + fence = &chain->base; + } + + job->fence = fence; drm_sched_job_arm(&job->drm); } @@ -322,7 +340,8 @@ xe_sched_job_snapshot_capture(struct xe_sched_job *job) snapshot->batch_addr_len = q->width; for (i = 0; i < q->width; i++) - snapshot->batch_addr[i] = xe_device_uncanonicalize_addr(xe, job->batch_addr[i]); + snapshot->batch_addr[i] = + xe_device_uncanonicalize_addr(xe, job->ptrs[i].batch_addr); return snapshot; } diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h index c75018f4660d..f362e28455db 100644 --- a/drivers/gpu/drm/xe/xe_sched_job.h +++ b/drivers/gpu/drm/xe/xe_sched_job.h @@ -70,7 +70,12 @@ to_xe_sched_job(struct drm_sched_job *drm) static inline u32 xe_sched_job_seqno(struct xe_sched_job *job) { - return job->fence->seqno; + return job->fence ? job->fence->seqno : 0; +} + +static inline u32 xe_sched_job_lrc_seqno(struct xe_sched_job *job) +{ + return job->lrc_seqno; } static inline void diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h index 5e12724219fd..0d3f76fb05ce 100644 --- a/drivers/gpu/drm/xe/xe_sched_job_types.h +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h @@ -11,6 +11,20 @@ #include <drm/gpu_scheduler.h> struct xe_exec_queue; +struct dma_fence; +struct dma_fence_chain; + +/** + * struct xe_job_ptrs - Per hw engine instance data + */ +struct xe_job_ptrs { + /** @lrc_fence: Pre-allocated uinitialized lrc fence.*/ + struct dma_fence *lrc_fence; + /** @chain_fence: Pre-allocated ninitialized fence chain node. */ + struct dma_fence_chain *chain_fence; + /** @batch_addr: Batch buffer address. */ + u64 batch_addr; +}; /** * struct xe_sched_job - XE schedule job (batch buffer tracking) @@ -37,12 +51,14 @@ struct xe_sched_job { /** @user_fence.value: write back value */ u64 value; } user_fence; + /** @lrc_seqno: LRC seqno */ + u32 lrc_seqno; /** @migrate_flush_flags: Additional flush flags for migration jobs */ u32 migrate_flush_flags; /** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */ bool ring_ops_flush_tlb; - /** @batch_addr: batch buffer address of job */ - u64 batch_addr[]; + /** @ptrs: per instance pointers. */ + struct xe_job_ptrs ptrs[]; }; struct xe_sched_job_snapshot { diff --git a/drivers/gpu/drm/xe/xe_sriov.c b/drivers/gpu/drm/xe/xe_sriov.c index 1c3fa84b6adb..a274a5fb1401 100644 --- a/drivers/gpu/drm/xe/xe_sriov.c +++ b/drivers/gpu/drm/xe/xe_sriov.c @@ -53,6 +53,7 @@ static bool test_is_vf(struct xe_device *xe) */ void xe_sriov_probe_early(struct xe_device *xe) { + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); enum xe_sriov_mode mode = XE_SRIOV_MODE_NONE; bool has_sriov = xe->info.has_sriov; @@ -61,6 +62,16 @@ void xe_sriov_probe_early(struct xe_device *xe) mode = XE_SRIOV_MODE_VF; else if (xe_sriov_pf_readiness(xe)) mode = XE_SRIOV_MODE_PF; + } else if (pci_sriov_get_totalvfs(pdev)) { + /* + * Even if we have not enabled SR-IOV support using the + * platform specific has_sriov flag, the hardware may still + * report SR-IOV capability and the PCI layer may wrongly + * advertise driver support to enable VFs. Explicitly reset + * the number of supported VFs to zero to avoid confusion. + */ + drm_info(&xe->drm, "Support for SR-IOV is not available\n"); + pci_sriov_set_totalvfs(pdev, 0); } xe_assert(xe, !xe->sriov.__mode); diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c index 65f1f1628235..2883d9aca404 100644 --- a/drivers/gpu/drm/xe/xe_sync.c +++ b/drivers/gpu/drm/xe/xe_sync.c @@ -339,6 +339,21 @@ err_out: } /** + * __xe_sync_ufence_get() - Get user fence from user fence + * @ufence: input user fence + * + * Get a user fence reference from user fence + * + * Return: xe_user_fence pointer with reference + */ +struct xe_user_fence *__xe_sync_ufence_get(struct xe_user_fence *ufence) +{ + user_fence_get(ufence); + + return ufence; +} + +/** * xe_sync_ufence_get() - Get user fence from sync * @sync: input sync * diff --git a/drivers/gpu/drm/xe/xe_sync.h b/drivers/gpu/drm/xe/xe_sync.h index 3e03396af2c6..006dbf780793 100644 --- a/drivers/gpu/drm/xe/xe_sync.h +++ b/drivers/gpu/drm/xe/xe_sync.h @@ -37,6 +37,7 @@ static inline bool xe_sync_is_ufence(struct xe_sync_entry *sync) return !!sync->ufence; } +struct xe_user_fence *__xe_sync_ufence_get(struct xe_user_fence *ufence); struct xe_user_fence *xe_sync_ufence_get(struct xe_sync_entry *sync); void xe_sync_ufence_put(struct xe_user_fence *ufence); int xe_sync_ufence_get_status(struct xe_user_fence *ufence); diff --git a/drivers/gpu/drm/xe/xe_tile_sysfs.c b/drivers/gpu/drm/xe/xe_tile_sysfs.c index 64661403afcd..b804234a6551 100644 --- a/drivers/gpu/drm/xe/xe_tile_sysfs.c +++ b/drivers/gpu/drm/xe/xe_tile_sysfs.c @@ -22,7 +22,7 @@ static const struct kobj_type xe_tile_sysfs_kobj_type = { .sysfs_ops = &kobj_sysfs_ops, }; -static void tile_sysfs_fini(struct drm_device *drm, void *arg) +static void tile_sysfs_fini(void *arg) { struct xe_tile *tile = arg; @@ -55,5 +55,5 @@ int xe_tile_sysfs_init(struct xe_tile *tile) if (err) return err; - return drmm_add_action_or_reset(&xe->drm, tile_sysfs_fini, tile); + return devm_add_action_or_reset(xe->drm.dev, tile_sysfs_fini, tile); } diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h index 2d56cfc09e42..e4cba64474e6 100644 --- a/drivers/gpu/drm/xe/xe_trace.h +++ b/drivers/gpu/drm/xe/xe_trace.h @@ -254,6 +254,7 @@ DECLARE_EVENT_CLASS(xe_sched_job, TP_STRUCT__entry( __field(u32, seqno) + __field(u32, lrc_seqno) __field(u16, guc_id) __field(u32, guc_state) __field(u32, flags) @@ -264,17 +265,19 @@ DECLARE_EVENT_CLASS(xe_sched_job, TP_fast_assign( __entry->seqno = xe_sched_job_seqno(job); + __entry->lrc_seqno = xe_sched_job_lrc_seqno(job); __entry->guc_id = job->q->guc->id; __entry->guc_state = atomic_read(&job->q->guc->state); __entry->flags = job->q->flags; - __entry->error = job->fence->error; + __entry->error = job->fence ? job->fence->error : 0; __entry->fence = job->fence; - __entry->batch_addr = (u64)job->batch_addr[0]; + __entry->batch_addr = (u64)job->ptrs[0].batch_addr; ), - TP_printk("fence=%p, seqno=%u, guc_id=%d, batch_addr=0x%012llx, guc_state=0x%x, flags=0x%x, error=%d", - __entry->fence, __entry->seqno, __entry->guc_id, + TP_printk("fence=%p, seqno=%u, lrc_seqno=%u, guc_id=%d, batch_addr=0x%012llx, guc_state=0x%x, flags=0x%x, error=%d", + __entry->fence, __entry->seqno, + __entry->lrc_seqno, __entry->guc_id, __entry->batch_addr, __entry->guc_state, __entry->flags, __entry->error) ); diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c index f77367329760..f46fd2df84de 100644 --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c @@ -13,11 +13,13 @@ #include <generated/xe_wa_oob.h> +#include "regs/xe_bars.h" #include "regs/xe_gt_regs.h" #include "regs/xe_regs.h" #include "xe_bo.h" #include "xe_device.h" #include "xe_gt.h" +#include "xe_gt_printk.h" #include "xe_mmio.h" #include "xe_res_cursor.h" #include "xe_sriov.h" diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c index 4feb35c95a1c..0f240534fb72 100644 --- a/drivers/gpu/drm/xe/xe_uc.c +++ b/drivers/gpu/drm/xe/xe_uc.c @@ -5,15 +5,17 @@ #include "xe_uc.h" +#include "xe_assert.h" #include "xe_device.h" #include "xe_gsc.h" #include "xe_gsc_proxy.h" #include "xe_gt.h" +#include "xe_gt_printk.h" +#include "xe_gt_sriov_vf.h" #include "xe_guc.h" -#include "xe_guc_db_mgr.h" #include "xe_guc_pc.h" -#include "xe_guc_submit.h" #include "xe_huc.h" +#include "xe_sriov.h" #include "xe_uc_fw.h" #include "xe_wopcm.h" @@ -51,19 +53,19 @@ int xe_uc_init(struct xe_uc *uc) goto err; if (!xe_device_uc_enabled(uc_to_xe(uc))) - goto err; + return 0; - ret = xe_wopcm_init(&uc->wopcm); - if (ret) - goto err; + if (IS_SRIOV_VF(uc_to_xe(uc))) + return 0; - ret = xe_guc_submit_init(&uc->guc); + ret = xe_wopcm_init(&uc->wopcm); if (ret) goto err; - ret = xe_guc_db_mgr_init(&uc->guc.dbm, ~0); + return 0; err: + xe_gt_err(uc_to_gt(uc), "Failed to initialize uC (%pe)\n", ERR_PTR(ret)); return ret; } @@ -144,6 +146,31 @@ int xe_uc_init_hwconfig(struct xe_uc *uc) return 0; } +static int vf_uc_init_hw(struct xe_uc *uc) +{ + int err; + + err = xe_uc_sanitize_reset(uc); + if (err) + return err; + + err = xe_guc_enable_communication(&uc->guc); + if (err) + return err; + + err = xe_gt_sriov_vf_connect(uc_to_gt(uc)); + if (err) + return err; + + uc->guc.submission_state.enabled = true; + + err = xe_gt_record_default_lrcs(uc_to_gt(uc)); + if (err) + return err; + + return 0; +} + /* * Should be called during driver load, after every GT reset, and after every * suspend to reload / auth the firmwares. @@ -156,6 +183,9 @@ int xe_uc_init_hw(struct xe_uc *uc) if (!xe_device_uc_enabled(uc_to_xe(uc))) return 0; + if (IS_SRIOV_VF(uc_to_xe(uc))) + return vf_uc_init_hw(uc); + ret = xe_huc_upload(&uc->huc); if (ret) return ret; @@ -215,13 +245,13 @@ void xe_uc_stop_prepare(struct xe_uc *uc) xe_guc_stop_prepare(&uc->guc); } -int xe_uc_stop(struct xe_uc *uc) +void xe_uc_stop(struct xe_uc *uc) { /* GuC submission not enabled, nothing to do */ if (!xe_device_uc_enabled(uc_to_xe(uc))) - return 0; + return; - return xe_guc_stop(&uc->guc); + xe_guc_stop(&uc->guc); } int xe_uc_start(struct xe_uc *uc) @@ -247,17 +277,13 @@ again: int xe_uc_suspend(struct xe_uc *uc) { - int ret; - /* GuC submission not enabled, nothing to do */ if (!xe_device_uc_enabled(uc_to_xe(uc))) return 0; uc_reset_wait(uc); - ret = xe_uc_stop(uc); - if (ret) - return ret; + xe_uc_stop(uc); return xe_guc_suspend(&uc->guc); } diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h index e4d4e3c99f0e..11856f24e6f9 100644 --- a/drivers/gpu/drm/xe/xe_uc.h +++ b/drivers/gpu/drm/xe/xe_uc.h @@ -6,7 +6,7 @@ #ifndef _XE_UC_H_ #define _XE_UC_H_ -#include "xe_uc_types.h" +struct xe_uc; int xe_uc_init(struct xe_uc *uc); int xe_uc_init_hwconfig(struct xe_uc *uc); @@ -16,7 +16,7 @@ int xe_uc_fini_hw(struct xe_uc *uc); void xe_uc_gucrc_disable(struct xe_uc *uc); int xe_uc_reset_prepare(struct xe_uc *uc); void xe_uc_stop_prepare(struct xe_uc *uc); -int xe_uc_stop(struct xe_uc *uc); +void xe_uc_stop(struct xe_uc *uc); int xe_uc_start(struct xe_uc *uc); int xe_uc_suspend(struct xe_uc *uc); int xe_uc_sanitize_reset(struct xe_uc *uc); diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c b/drivers/gpu/drm/xe/xe_uc_fw.c index 186f81640cef..5f23ecd98376 100644 --- a/drivers/gpu/drm/xe/xe_uc_fw.c +++ b/drivers/gpu/drm/xe/xe_uc_fw.c @@ -14,6 +14,7 @@ #include "xe_force_wake.h" #include "xe_gsc.h" #include "xe_gt.h" +#include "xe_gt_printk.h" #include "xe_map.h" #include "xe_mmio.h" #include "xe_module.h" @@ -653,6 +654,10 @@ static int uc_fw_request(struct xe_uc_fw *uc_fw, const struct firmware **firmwar uc_fw_auto_select(xe, uc_fw); if (IS_SRIOV_VF(xe)) { + /* Only GuC/HuC are supported */ + if (uc_fw->type != XE_UC_FW_TYPE_GUC && + uc_fw->type != XE_UC_FW_TYPE_HUC) + uc_fw->path = NULL; /* VF will support only firmwares that driver can autoselect */ xe_uc_fw_change_status(uc_fw, uc_fw->path ? XE_UC_FIRMWARE_PRELOADED : diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 4aa3943e6f29..99bf7412475c 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -85,8 +85,8 @@ static bool preempt_fences_waiting(struct xe_vm *vm) list_for_each_entry(q, &vm->preempt.exec_queues, compute.link) { if (!q->compute.pfence || - (q->compute.pfence && test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, - &q->compute.pfence->flags))) { + test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, + &q->compute.pfence->flags)) { return true; } } @@ -315,19 +315,23 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm) #define XE_VM_REBIND_RETRY_TIMEOUT_MS 1000 -static void xe_vm_kill(struct xe_vm *vm) +static void xe_vm_kill(struct xe_vm *vm, bool unlocked) { struct xe_exec_queue *q; lockdep_assert_held(&vm->lock); - xe_vm_lock(vm, false); + if (unlocked) + xe_vm_lock(vm, false); + vm->flags |= XE_VM_FLAG_BANNED; trace_xe_vm_kill(vm); list_for_each_entry(q, &vm->preempt.exec_queues, compute.link) q->ops->kill(q); - xe_vm_unlock(vm); + + if (unlocked) + xe_vm_unlock(vm); /* TODO: Inform user the VM is banned */ } @@ -557,7 +561,7 @@ out_unlock_outer: if (err) { drm_warn(&vm->xe->drm, "VM worker error: %d\n", err); - xe_vm_kill(vm); + xe_vm_kill(vm, true); } up_write(&vm->lock); @@ -708,37 +712,116 @@ int xe_vm_userptr_check_repin(struct xe_vm *vm) list_empty_careful(&vm->userptr.invalidated)) ? 0 : -EAGAIN; } -static struct dma_fence * -xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q, - struct xe_sync_entry *syncs, u32 num_syncs, - bool first_op, bool last_op); +static void xe_vm_populate_rebind(struct xe_vma_op *op, struct xe_vma *vma, + u8 tile_mask) +{ + INIT_LIST_HEAD(&op->link); + op->tile_mask = tile_mask; + op->base.op = DRM_GPUVA_OP_MAP; + op->base.map.va.addr = vma->gpuva.va.addr; + op->base.map.va.range = vma->gpuva.va.range; + op->base.map.gem.obj = vma->gpuva.gem.obj; + op->base.map.gem.offset = vma->gpuva.gem.offset; + op->map.vma = vma; + op->map.immediate = true; + op->map.dumpable = vma->gpuva.flags & XE_VMA_DUMPABLE; + op->map.is_null = xe_vma_is_null(vma); +} + +static int xe_vm_ops_add_rebind(struct xe_vma_ops *vops, struct xe_vma *vma, + u8 tile_mask) +{ + struct xe_vma_op *op; + + op = kzalloc(sizeof(*op), GFP_KERNEL); + if (!op) + return -ENOMEM; + + xe_vm_populate_rebind(op, vma, tile_mask); + list_add_tail(&op->link, &vops->list); + + return 0; +} + +static struct dma_fence *ops_execute(struct xe_vm *vm, + struct xe_vma_ops *vops); +static void xe_vma_ops_init(struct xe_vma_ops *vops, struct xe_vm *vm, + struct xe_exec_queue *q, + struct xe_sync_entry *syncs, u32 num_syncs); int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker) { struct dma_fence *fence; struct xe_vma *vma, *next; + struct xe_vma_ops vops; + struct xe_vma_op *op, *next_op; + int err; lockdep_assert_held(&vm->lock); - if (xe_vm_in_lr_mode(vm) && !rebind_worker) + if ((xe_vm_in_lr_mode(vm) && !rebind_worker) || + list_empty(&vm->rebind_list)) return 0; + xe_vma_ops_init(&vops, vm, NULL, NULL, 0); + xe_vm_assert_held(vm); - list_for_each_entry_safe(vma, next, &vm->rebind_list, - combined_links.rebind) { + list_for_each_entry(vma, &vm->rebind_list, combined_links.rebind) { xe_assert(vm->xe, vma->tile_present); - list_del_init(&vma->combined_links.rebind); if (rebind_worker) trace_xe_vma_rebind_worker(vma); else trace_xe_vma_rebind_exec(vma); - fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false); - if (IS_ERR(fence)) - return PTR_ERR(fence); + + err = xe_vm_ops_add_rebind(&vops, vma, + vma->tile_present); + if (err) + goto free_ops; + } + + fence = ops_execute(vm, &vops); + if (IS_ERR(fence)) { + err = PTR_ERR(fence); + } else { dma_fence_put(fence); + list_for_each_entry_safe(vma, next, &vm->rebind_list, + combined_links.rebind) + list_del_init(&vma->combined_links.rebind); + } +free_ops: + list_for_each_entry_safe(op, next_op, &vops.list, link) { + list_del(&op->link); + kfree(op); } - return 0; + return err; +} + +struct dma_fence *xe_vma_rebind(struct xe_vm *vm, struct xe_vma *vma, u8 tile_mask) +{ + struct dma_fence *fence = NULL; + struct xe_vma_ops vops; + struct xe_vma_op *op, *next_op; + int err; + + lockdep_assert_held(&vm->lock); + xe_vm_assert_held(vm); + xe_assert(vm->xe, xe_vm_in_fault_mode(vm)); + + xe_vma_ops_init(&vops, vm, NULL, NULL, 0); + + err = xe_vm_ops_add_rebind(&vops, vma, tile_mask); + if (err) + return ERR_PTR(err); + + fence = ops_execute(vm, &vops); + + list_for_each_entry_safe(op, next_op, &vops.list, link) { + list_del(&op->link); + kfree(op); + } + + return fence; } static void xe_vma_free(struct xe_vma *vma) @@ -805,7 +888,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm, for_each_tile(tile, vm->xe, id) vma->tile_mask |= 0x1 << id; - if (GRAPHICS_VER(vm->xe) >= 20 || vm->xe->info.platform == XE_PVC) + if (vm->xe->info.has_atomic_enable_pte_bit) vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT; vma->pat_index = pat_index; @@ -1173,6 +1256,8 @@ static const struct xe_pt_ops xelp_pt_ops = { .pde_encode_bo = xelp_pde_encode_bo, }; +static void vm_destroy_work_func(struct work_struct *w); + /** * xe_vm_create_scratch() - Setup a scratch memory pagetable tree for the * given tile and vm. @@ -1252,6 +1337,8 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags) init_rwsem(&vm->userptr.notifier_lock); spin_lock_init(&vm->userptr.invalidated_lock); + INIT_WORK(&vm->destroy_work, vm_destroy_work_func); + INIT_LIST_HEAD(&vm->preempt.exec_queues); vm->preempt.min_run_period_ms = 10; /* FIXME: Wire up to uAPI */ @@ -1260,7 +1347,13 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags) vm->pt_ops = &xelp_pt_ops; - if (!(flags & XE_VM_FLAG_MIGRATION)) + /* + * Long-running workloads are not protected by the scheduler references. + * By design, run_job for long-running workloads returns NULL and the + * scheduler drops all the references of it, hence protecting the VM + * for this case is necessary. + */ + if (flags & XE_VM_FLAG_LR_MODE) xe_pm_runtime_get_noresume(xe); vm_resv_obj = drm_gpuvm_resv_object_alloc(&xe->drm); @@ -1274,7 +1367,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags) drm_gem_object_put(vm_resv_obj); - err = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL); + err = xe_vm_lock(vm, true); if (err) goto err_close; @@ -1318,7 +1411,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags) xe_pt_populate_empty(tile, vm, vm->pt_root[id]); } - dma_resv_unlock(xe_vm_resv(vm)); + xe_vm_unlock(vm); /* Kernel migration VM shouldn't have a circular loop.. */ if (!(flags & XE_VM_FLAG_MIGRATION)) { @@ -1360,7 +1453,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags) return vm; err_unlock_close: - dma_resv_unlock(xe_vm_resv(vm)); + xe_vm_unlock(vm); err_close: xe_vm_close_and_put(vm); return ERR_PTR(err); @@ -1370,7 +1463,7 @@ err_no_resv: for_each_tile(tile, xe, id) xe_range_fence_tree_fini(&vm->rftree[id]); kfree(vm); - if (!(flags & XE_VM_FLAG_MIGRATION)) + if (flags & XE_VM_FLAG_LR_MODE) xe_pm_runtime_put(xe); return ERR_PTR(err); } @@ -1489,9 +1582,10 @@ void xe_vm_close_and_put(struct xe_vm *vm) xe_vm_put(vm); } -static void xe_vm_free(struct drm_gpuvm *gpuvm) +static void vm_destroy_work_func(struct work_struct *w) { - struct xe_vm *vm = container_of(gpuvm, struct xe_vm, gpuvm); + struct xe_vm *vm = + container_of(w, struct xe_vm, destroy_work); struct xe_device *xe = vm->xe; struct xe_tile *tile; u8 id; @@ -1504,7 +1598,7 @@ static void xe_vm_free(struct drm_gpuvm *gpuvm) mutex_destroy(&vm->snap_mutex); - if (!(vm->flags & XE_VM_FLAG_MIGRATION)) + if (vm->flags & XE_VM_FLAG_LR_MODE) xe_pm_runtime_put(xe); for_each_tile(tile, xe, id) @@ -1514,6 +1608,14 @@ static void xe_vm_free(struct drm_gpuvm *gpuvm) kfree(vm); } +static void xe_vm_free(struct drm_gpuvm *gpuvm) +{ + struct xe_vm *vm = container_of(gpuvm, struct xe_vm, gpuvm); + + /* To destroy the VM we need to be able to sleep */ + queue_work(system_unbound_wq, &vm->destroy_work); +} + struct xe_vm *xe_vm_lookup(struct xe_file *xef, u32 id) { struct xe_vm *vm; @@ -1550,23 +1652,13 @@ xe_vm_unbind_vma(struct xe_vma *vma, struct xe_exec_queue *q, struct dma_fence *fence = NULL; struct dma_fence **fences = NULL; struct dma_fence_array *cf = NULL; - int cur_fence = 0, i; + int cur_fence = 0; int number_tiles = hweight8(vma->tile_present); int err; u8 id; trace_xe_vma_unbind(vma); - if (vma->ufence) { - struct xe_user_fence * const f = vma->ufence; - - if (!xe_sync_ufence_get_status(f)) - return ERR_PTR(-EBUSY); - - vma->ufence = NULL; - xe_sync_ufence_put(f); - } - if (number_tiles > 1) { fences = kmalloc_array(number_tiles, sizeof(*fences), GFP_KERNEL); @@ -1608,10 +1700,6 @@ next: fence = cf ? &cf->base : !fence ? xe_exec_queue_last_fence_get(wait_exec_queue, vm) : fence; - if (last_op) { - for (i = 0; i < num_syncs; i++) - xe_sync_entry_signal(&syncs[i], fence); - } return fence; @@ -1628,15 +1716,15 @@ err_fences: static struct dma_fence * xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q, struct xe_sync_entry *syncs, u32 num_syncs, - bool first_op, bool last_op) + u8 tile_mask, bool first_op, bool last_op) { struct xe_tile *tile; struct dma_fence *fence; struct dma_fence **fences = NULL; struct dma_fence_array *cf = NULL; struct xe_vm *vm = xe_vma_vm(vma); - int cur_fence = 0, i; - int number_tiles = hweight8(vma->tile_mask); + int cur_fence = 0; + int number_tiles = hweight8(tile_mask); int err; u8 id; @@ -1650,7 +1738,7 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q, } for_each_tile(tile, vm->xe, id) { - if (!(vma->tile_mask & BIT(id))) + if (!(tile_mask & BIT(id))) goto next; fence = __xe_pt_bind_vma(tile, vma, q ? q : vm->q[id], @@ -1682,12 +1770,6 @@ next: } } - if (last_op) { - for (i = 0; i < num_syncs; i++) - xe_sync_entry_signal(&syncs[i], - cf ? &cf->base : fence); - } - return cf ? &cf->base : fence; err_fences: @@ -1715,87 +1797,46 @@ find_ufence_get(struct xe_sync_entry *syncs, u32 num_syncs) return NULL; } -static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma, - struct xe_exec_queue *q, struct xe_sync_entry *syncs, - u32 num_syncs, bool immediate, bool first_op, - bool last_op) +static struct dma_fence * +xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma, struct xe_exec_queue *q, + struct xe_bo *bo, struct xe_sync_entry *syncs, u32 num_syncs, + u8 tile_mask, bool immediate, bool first_op, bool last_op) { struct dma_fence *fence; struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, q); - struct xe_user_fence *ufence; xe_vm_assert_held(vm); - - ufence = find_ufence_get(syncs, num_syncs); - if (vma->ufence && ufence) - xe_sync_ufence_put(vma->ufence); - - vma->ufence = ufence ?: vma->ufence; + xe_bo_assert_held(bo); if (immediate) { - fence = xe_vm_bind_vma(vma, q, syncs, num_syncs, first_op, - last_op); + fence = xe_vm_bind_vma(vma, q, syncs, num_syncs, tile_mask, + first_op, last_op); if (IS_ERR(fence)) - return PTR_ERR(fence); + return fence; } else { - int i; - xe_assert(vm->xe, xe_vm_in_fault_mode(vm)); fence = xe_exec_queue_last_fence_get(wait_exec_queue, vm); - if (last_op) { - for (i = 0; i < num_syncs; i++) - xe_sync_entry_signal(&syncs[i], fence); - } } - if (last_op) - xe_exec_queue_last_fence_set(wait_exec_queue, vm, fence); - dma_fence_put(fence); - - return 0; -} - -static int xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma, struct xe_exec_queue *q, - struct xe_bo *bo, struct xe_sync_entry *syncs, - u32 num_syncs, bool immediate, bool first_op, - bool last_op) -{ - int err; - - xe_vm_assert_held(vm); - xe_bo_assert_held(bo); - - if (bo && immediate) { - err = xe_bo_validate(bo, vm, true); - if (err) - return err; - } - - return __xe_vm_bind(vm, vma, q, syncs, num_syncs, immediate, first_op, - last_op); + return fence; } -static int xe_vm_unbind(struct xe_vm *vm, struct xe_vma *vma, - struct xe_exec_queue *q, struct xe_sync_entry *syncs, - u32 num_syncs, bool first_op, bool last_op) +static struct dma_fence * +xe_vm_unbind(struct xe_vm *vm, struct xe_vma *vma, + struct xe_exec_queue *q, struct xe_sync_entry *syncs, + u32 num_syncs, bool first_op, bool last_op) { struct dma_fence *fence; - struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, q); xe_vm_assert_held(vm); xe_bo_assert_held(xe_vma_bo(vma)); fence = xe_vm_unbind_vma(vma, q, syncs, num_syncs, first_op, last_op); if (IS_ERR(fence)) - return PTR_ERR(fence); + return fence; - xe_vma_destroy(vma, fence); - if (last_op) - xe_exec_queue_last_fence_set(wait_exec_queue, vm, fence); - dma_fence_put(fence); - - return 0; + return fence; } #define ALL_DRM_XE_VM_CREATE_FLAGS (DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE | \ @@ -1938,40 +1979,18 @@ static const u32 region_to_mem_type[] = { XE_PL_VRAM1, }; -static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma, - struct xe_exec_queue *q, u32 region, - struct xe_sync_entry *syncs, u32 num_syncs, - bool first_op, bool last_op) +static struct dma_fence * +xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma, + struct xe_exec_queue *q, struct xe_sync_entry *syncs, + u32 num_syncs, bool first_op, bool last_op) { struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, q); - int err; - - xe_assert(vm->xe, region < ARRAY_SIZE(region_to_mem_type)); - - if (!xe_vma_has_no_bo(vma)) { - err = xe_bo_migrate(xe_vma_bo(vma), region_to_mem_type[region]); - if (err) - return err; - } if (vma->tile_mask != (vma->tile_present & ~vma->tile_invalidated)) { return xe_vm_bind(vm, vma, q, xe_vma_bo(vma), syncs, num_syncs, - true, first_op, last_op); + vma->tile_mask, true, first_op, last_op); } else { - int i; - - /* Nothing to do, signal fences now */ - if (last_op) { - for (i = 0; i < num_syncs; i++) { - struct dma_fence *fence = - xe_exec_queue_last_fence_get(wait_exec_queue, vm); - - xe_sync_entry_signal(&syncs[i], fence); - dma_fence_put(fence); - } - } - - return 0; + return xe_exec_queue_last_fence_get(wait_exec_queue, vm); } } @@ -2267,23 +2286,28 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op) static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q, struct drm_gpuva_ops *ops, struct xe_sync_entry *syncs, u32 num_syncs, - struct list_head *ops_list, bool last) + struct xe_vma_ops *vops, bool last) { struct xe_device *xe = vm->xe; struct xe_vma_op *last_op = NULL; struct drm_gpuva_op *__op; + struct xe_tile *tile; + u8 id, tile_mask = 0; int err = 0; lockdep_assert_held_write(&vm->lock); + for_each_tile(tile, vm->xe, id) + tile_mask |= 0x1 << id; + drm_gpuva_for_each_op(__op, ops) { struct xe_vma_op *op = gpuva_op_to_vma_op(__op); struct xe_vma *vma; - bool first = list_empty(ops_list); + bool first = list_empty(&vops->list); unsigned int flags = 0; INIT_LIST_HEAD(&op->link); - list_add_tail(&op->link, ops_list); + list_add_tail(&op->link, &vops->list); if (first) { op->flags |= XE_VMA_OP_FIRST; @@ -2292,6 +2316,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q, } op->q = q; + op->tile_mask = tile_mask; switch (op->base.op) { case DRM_GPUVA_OP_MAP: @@ -2409,12 +2434,11 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q, } /* FIXME: Unhandled corner case */ - XE_WARN_ON(!last_op && last && !list_empty(ops_list)); + XE_WARN_ON(!last_op && last && !list_empty(&vops->list)); if (!last_op) return 0; - last_op->ops = ops; if (last) { last_op->flags |= XE_VMA_OP_LAST; last_op->num_syncs = num_syncs; @@ -2424,27 +2448,24 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q, return 0; } -static int op_execute(struct drm_exec *exec, struct xe_vm *vm, - struct xe_vma *vma, struct xe_vma_op *op) +static struct dma_fence *op_execute(struct xe_vm *vm, struct xe_vma *vma, + struct xe_vma_op *op) { - int err; - - lockdep_assert_held_write(&vm->lock); + struct dma_fence *fence = NULL; - err = xe_vm_lock_vma(exec, vma); - if (err) - return err; + lockdep_assert_held(&vm->lock); xe_vm_assert_held(vm); xe_bo_assert_held(xe_vma_bo(vma)); switch (op->base.op) { case DRM_GPUVA_OP_MAP: - err = xe_vm_bind(vm, vma, op->q, xe_vma_bo(vma), - op->syncs, op->num_syncs, - op->map.immediate || !xe_vm_in_fault_mode(vm), - op->flags & XE_VMA_OP_FIRST, - op->flags & XE_VMA_OP_LAST); + fence = xe_vm_bind(vm, vma, op->q, xe_vma_bo(vma), + op->syncs, op->num_syncs, + op->tile_mask, + op->map.immediate || !xe_vm_in_fault_mode(vm), + op->flags & XE_VMA_OP_FIRST, + op->flags & XE_VMA_OP_LAST); break; case DRM_GPUVA_OP_REMAP: { @@ -2454,37 +2475,41 @@ static int op_execute(struct drm_exec *exec, struct xe_vm *vm, if (!op->remap.unmap_done) { if (prev || next) vma->gpuva.flags |= XE_VMA_FIRST_REBIND; - err = xe_vm_unbind(vm, vma, op->q, op->syncs, - op->num_syncs, - op->flags & XE_VMA_OP_FIRST, - op->flags & XE_VMA_OP_LAST && - !prev && !next); - if (err) + fence = xe_vm_unbind(vm, vma, op->q, op->syncs, + op->num_syncs, + op->flags & XE_VMA_OP_FIRST, + op->flags & XE_VMA_OP_LAST && + !prev && !next); + if (IS_ERR(fence)) break; op->remap.unmap_done = true; } if (prev) { op->remap.prev->gpuva.flags |= XE_VMA_LAST_REBIND; - err = xe_vm_bind(vm, op->remap.prev, op->q, - xe_vma_bo(op->remap.prev), op->syncs, - op->num_syncs, true, false, - op->flags & XE_VMA_OP_LAST && !next); + dma_fence_put(fence); + fence = xe_vm_bind(vm, op->remap.prev, op->q, + xe_vma_bo(op->remap.prev), op->syncs, + op->num_syncs, + op->remap.prev->tile_mask, true, + false, + op->flags & XE_VMA_OP_LAST && !next); op->remap.prev->gpuva.flags &= ~XE_VMA_LAST_REBIND; - if (err) + if (IS_ERR(fence)) break; op->remap.prev = NULL; } if (next) { op->remap.next->gpuva.flags |= XE_VMA_LAST_REBIND; - err = xe_vm_bind(vm, op->remap.next, op->q, - xe_vma_bo(op->remap.next), - op->syncs, op->num_syncs, - true, false, - op->flags & XE_VMA_OP_LAST); + dma_fence_put(fence); + fence = xe_vm_bind(vm, op->remap.next, op->q, + xe_vma_bo(op->remap.next), + op->syncs, op->num_syncs, + op->remap.next->tile_mask, true, + false, op->flags & XE_VMA_OP_LAST); op->remap.next->gpuva.flags &= ~XE_VMA_LAST_REBIND; - if (err) + if (IS_ERR(fence)) break; op->remap.next = NULL; } @@ -2492,43 +2517,35 @@ static int op_execute(struct drm_exec *exec, struct xe_vm *vm, break; } case DRM_GPUVA_OP_UNMAP: - err = xe_vm_unbind(vm, vma, op->q, op->syncs, - op->num_syncs, op->flags & XE_VMA_OP_FIRST, - op->flags & XE_VMA_OP_LAST); + fence = xe_vm_unbind(vm, vma, op->q, op->syncs, + op->num_syncs, op->flags & XE_VMA_OP_FIRST, + op->flags & XE_VMA_OP_LAST); break; case DRM_GPUVA_OP_PREFETCH: - err = xe_vm_prefetch(vm, vma, op->q, op->prefetch.region, - op->syncs, op->num_syncs, - op->flags & XE_VMA_OP_FIRST, - op->flags & XE_VMA_OP_LAST); + fence = xe_vm_prefetch(vm, vma, op->q, op->syncs, op->num_syncs, + op->flags & XE_VMA_OP_FIRST, + op->flags & XE_VMA_OP_LAST); break; default: drm_warn(&vm->xe->drm, "NOT POSSIBLE"); } - if (err) + if (IS_ERR(fence)) trace_xe_vma_fail(vma); - return err; + return fence; } -static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma, - struct xe_vma_op *op) +static struct dma_fence * +__xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma, + struct xe_vma_op *op) { - struct drm_exec exec; + struct dma_fence *fence; int err; retry_userptr: - drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0); - drm_exec_until_all_locked(&exec) { - err = op_execute(&exec, vm, vma, op); - drm_exec_retry_on_contention(&exec); - if (err) - break; - } - drm_exec_fini(&exec); - - if (err == -EAGAIN) { + fence = op_execute(vm, vma, op); + if (IS_ERR(fence) && PTR_ERR(fence) == -EAGAIN) { lockdep_assert_held_write(&vm->lock); if (op->base.op == DRM_GPUVA_OP_REMAP) { @@ -2545,22 +2562,24 @@ retry_userptr: if (!err) goto retry_userptr; + fence = ERR_PTR(err); trace_xe_vma_fail(vma); } } - return err; + return fence; } -static int xe_vma_op_execute(struct xe_vm *vm, struct xe_vma_op *op) +static struct dma_fence * +xe_vma_op_execute(struct xe_vm *vm, struct xe_vma_op *op) { - int ret = 0; + struct dma_fence *fence = ERR_PTR(-ENOMEM); - lockdep_assert_held_write(&vm->lock); + lockdep_assert_held(&vm->lock); switch (op->base.op) { case DRM_GPUVA_OP_MAP: - ret = __xe_vma_op_execute(vm, op->map.vma, op); + fence = __xe_vma_op_execute(vm, op->map.vma, op); break; case DRM_GPUVA_OP_REMAP: { @@ -2573,42 +2592,23 @@ static int xe_vma_op_execute(struct xe_vm *vm, struct xe_vma_op *op) else vma = op->remap.next; - ret = __xe_vma_op_execute(vm, vma, op); + fence = __xe_vma_op_execute(vm, vma, op); break; } case DRM_GPUVA_OP_UNMAP: - ret = __xe_vma_op_execute(vm, gpuva_to_vma(op->base.unmap.va), - op); + fence = __xe_vma_op_execute(vm, gpuva_to_vma(op->base.unmap.va), + op); break; case DRM_GPUVA_OP_PREFETCH: - ret = __xe_vma_op_execute(vm, - gpuva_to_vma(op->base.prefetch.va), - op); + fence = __xe_vma_op_execute(vm, + gpuva_to_vma(op->base.prefetch.va), + op); break; default: drm_warn(&vm->xe->drm, "NOT POSSIBLE"); } - return ret; -} - -static void xe_vma_op_cleanup(struct xe_vm *vm, struct xe_vma_op *op) -{ - bool last = op->flags & XE_VMA_OP_LAST; - - if (last) { - while (op->num_syncs--) - xe_sync_entry_cleanup(&op->syncs[op->num_syncs]); - kfree(op->syncs); - if (op->q) - xe_exec_queue_put(op->q); - } - if (!list_empty(&op->link)) - list_del(&op->link); - if (op->ops) - drm_gpuva_ops_free(&vm->gpuvm, op->ops); - if (last) - xe_vm_put(vm); + return fence; } static void xe_vma_op_unwind(struct xe_vm *vm, struct xe_vma_op *op, @@ -2687,34 +2687,223 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm, op->flags & XE_VMA_OP_PREV_COMMITTED, op->flags & XE_VMA_OP_NEXT_COMMITTED); } + } +} - drm_gpuva_ops_free(&vm->gpuvm, __ops); +static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma, + bool validate) +{ + struct xe_bo *bo = xe_vma_bo(vma); + int err = 0; + + if (bo) { + if (!bo->vm) + err = drm_exec_lock_obj(exec, &bo->ttm.base); + if (!err && validate) + err = xe_bo_validate(bo, xe_vma_vm(vma), true); } + + return err; } -static int vm_bind_ioctl_ops_execute(struct xe_vm *vm, - struct list_head *ops_list) +static int check_ufence(struct xe_vma *vma) +{ + if (vma->ufence) { + struct xe_user_fence * const f = vma->ufence; + + if (!xe_sync_ufence_get_status(f)) + return -EBUSY; + + vma->ufence = NULL; + xe_sync_ufence_put(f); + } + + return 0; +} + +static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm, + struct xe_vma_op *op) +{ + int err = 0; + + switch (op->base.op) { + case DRM_GPUVA_OP_MAP: + err = vma_lock_and_validate(exec, op->map.vma, + !xe_vm_in_fault_mode(vm) || + op->map.immediate); + break; + case DRM_GPUVA_OP_REMAP: + err = check_ufence(gpuva_to_vma(op->base.remap.unmap->va)); + if (err) + break; + + err = vma_lock_and_validate(exec, + gpuva_to_vma(op->base.remap.unmap->va), + false); + if (!err && op->remap.prev) + err = vma_lock_and_validate(exec, op->remap.prev, true); + if (!err && op->remap.next) + err = vma_lock_and_validate(exec, op->remap.next, true); + break; + case DRM_GPUVA_OP_UNMAP: + err = check_ufence(gpuva_to_vma(op->base.unmap.va)); + if (err) + break; + + err = vma_lock_and_validate(exec, + gpuva_to_vma(op->base.unmap.va), + false); + break; + case DRM_GPUVA_OP_PREFETCH: + { + struct xe_vma *vma = gpuva_to_vma(op->base.prefetch.va); + u32 region = op->prefetch.region; + + xe_assert(vm->xe, region <= ARRAY_SIZE(region_to_mem_type)); + + err = vma_lock_and_validate(exec, + gpuva_to_vma(op->base.prefetch.va), + false); + if (!err && !xe_vma_has_no_bo(vma)) + err = xe_bo_migrate(xe_vma_bo(vma), + region_to_mem_type[region]); + break; + } + default: + drm_warn(&vm->xe->drm, "NOT POSSIBLE"); + } + + return err; +} + +static int vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec, + struct xe_vm *vm, + struct xe_vma_ops *vops) +{ + struct xe_vma_op *op; + int err; + + err = drm_exec_lock_obj(exec, xe_vm_obj(vm)); + if (err) + return err; + + list_for_each_entry(op, &vops->list, link) { + err = op_lock_and_prep(exec, vm, op); + if (err) + return err; + } + + return 0; +} + +static struct dma_fence *ops_execute(struct xe_vm *vm, + struct xe_vma_ops *vops) { struct xe_vma_op *op, *next; + struct dma_fence *fence = NULL; + + list_for_each_entry_safe(op, next, &vops->list, link) { + dma_fence_put(fence); + fence = xe_vma_op_execute(vm, op); + if (IS_ERR(fence)) { + drm_warn(&vm->xe->drm, "VM op(%d) failed with %ld", + op->base.op, PTR_ERR(fence)); + fence = ERR_PTR(-ENOSPC); + break; + } + } + + return fence; +} + +static void vma_add_ufence(struct xe_vma *vma, struct xe_user_fence *ufence) +{ + if (vma->ufence) + xe_sync_ufence_put(vma->ufence); + vma->ufence = __xe_sync_ufence_get(ufence); +} + +static void op_add_ufence(struct xe_vm *vm, struct xe_vma_op *op, + struct xe_user_fence *ufence) +{ + switch (op->base.op) { + case DRM_GPUVA_OP_MAP: + vma_add_ufence(op->map.vma, ufence); + break; + case DRM_GPUVA_OP_REMAP: + if (op->remap.prev) + vma_add_ufence(op->remap.prev, ufence); + if (op->remap.next) + vma_add_ufence(op->remap.next, ufence); + break; + case DRM_GPUVA_OP_UNMAP: + break; + case DRM_GPUVA_OP_PREFETCH: + vma_add_ufence(gpuva_to_vma(op->base.prefetch.va), ufence); + break; + default: + drm_warn(&vm->xe->drm, "NOT POSSIBLE"); + } +} + +static void vm_bind_ioctl_ops_fini(struct xe_vm *vm, struct xe_vma_ops *vops, + struct dma_fence *fence) +{ + struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, vops->q); + struct xe_user_fence *ufence; + struct xe_vma_op *op; + int i; + + ufence = find_ufence_get(vops->syncs, vops->num_syncs); + list_for_each_entry(op, &vops->list, link) { + if (ufence) + op_add_ufence(vm, op, ufence); + + if (op->base.op == DRM_GPUVA_OP_UNMAP) + xe_vma_destroy(gpuva_to_vma(op->base.unmap.va), fence); + else if (op->base.op == DRM_GPUVA_OP_REMAP) + xe_vma_destroy(gpuva_to_vma(op->base.remap.unmap->va), + fence); + } + if (ufence) + xe_sync_ufence_put(ufence); + for (i = 0; i < vops->num_syncs; i++) + xe_sync_entry_signal(vops->syncs + i, fence); + xe_exec_queue_last_fence_set(wait_exec_queue, vm, fence); + dma_fence_put(fence); +} + +static int vm_bind_ioctl_ops_execute(struct xe_vm *vm, + struct xe_vma_ops *vops) +{ + struct drm_exec exec; + struct dma_fence *fence; int err; lockdep_assert_held_write(&vm->lock); - list_for_each_entry_safe(op, next, ops_list, link) { - err = xe_vma_op_execute(vm, op); - if (err) { - drm_warn(&vm->xe->drm, "VM op(%d) failed with %d", - op->base.op, err); - /* - * FIXME: Killing VM rather than proper error handling - */ - xe_vm_kill(vm); - return -ENOSPC; + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT | + DRM_EXEC_IGNORE_DUPLICATES, 0); + drm_exec_until_all_locked(&exec) { + err = vm_bind_ioctl_ops_lock_and_prep(&exec, vm, vops); + drm_exec_retry_on_contention(&exec); + if (err) + goto unlock; + + fence = ops_execute(vm, vops); + if (IS_ERR(fence)) { + err = PTR_ERR(fence); + /* FIXME: Killing VM rather than proper error handling */ + xe_vm_kill(vm, false); + goto unlock; + } else { + vm_bind_ioctl_ops_fini(vm, vops, fence); } - xe_vma_op_cleanup(vm, op); } - return 0; +unlock: + drm_exec_fini(&exec); + return err; } #define SUPPORTED_FLAGS \ @@ -2862,6 +3051,58 @@ static int vm_bind_ioctl_signal_fences(struct xe_vm *vm, return err; } +static void xe_vma_ops_init(struct xe_vma_ops *vops, struct xe_vm *vm, + struct xe_exec_queue *q, + struct xe_sync_entry *syncs, u32 num_syncs) +{ + memset(vops, 0, sizeof(*vops)); + INIT_LIST_HEAD(&vops->list); + vops->vm = vm; + vops->q = q; + vops->syncs = syncs; + vops->num_syncs = num_syncs; +} + +static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo, + u64 addr, u64 range, u64 obj_offset, + u16 pat_index) +{ + u16 coh_mode; + + if (XE_IOCTL_DBG(xe, range > bo->size) || + XE_IOCTL_DBG(xe, obj_offset > + bo->size - range)) { + return -EINVAL; + } + + if (bo->flags & XE_BO_FLAG_INTERNAL_64K) { + if (XE_IOCTL_DBG(xe, obj_offset & + XE_64K_PAGE_MASK) || + XE_IOCTL_DBG(xe, addr & XE_64K_PAGE_MASK) || + XE_IOCTL_DBG(xe, range & XE_64K_PAGE_MASK)) { + return -EINVAL; + } + } + + coh_mode = xe_pat_index_get_coh_mode(xe, pat_index); + if (bo->cpu_caching) { + if (XE_IOCTL_DBG(xe, coh_mode == XE_COH_NONE && + bo->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB)) { + return -EINVAL; + } + } else if (XE_IOCTL_DBG(xe, coh_mode == XE_COH_NONE)) { + /* + * Imported dma-buf from a different device should + * require 1way or 2way coherency since we don't know + * how it was mapped on the CPU. Just assume is it + * potentially cached on CPU side. + */ + return -EINVAL; + } + + return 0; +} + int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { struct xe_device *xe = to_xe_device(dev); @@ -2875,7 +3116,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) u32 num_syncs, num_ufence = 0; struct xe_sync_entry *syncs = NULL; struct drm_xe_vm_bind_op *bind_ops; - LIST_HEAD(ops_list); + struct xe_vma_ops vops; int err; int i; @@ -2945,7 +3186,6 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) u32 obj = bind_ops[i].obj; u64 obj_offset = bind_ops[i].obj_offset; u16 pat_index = bind_ops[i].pat_index; - u16 coh_mode; if (!obj) continue; @@ -2957,40 +3197,10 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) } bos[i] = gem_to_xe_bo(gem_obj); - if (XE_IOCTL_DBG(xe, range > bos[i]->size) || - XE_IOCTL_DBG(xe, obj_offset > - bos[i]->size - range)) { - err = -EINVAL; - goto put_obj; - } - - if (bos[i]->flags & XE_BO_FLAG_INTERNAL_64K) { - if (XE_IOCTL_DBG(xe, obj_offset & - XE_64K_PAGE_MASK) || - XE_IOCTL_DBG(xe, addr & XE_64K_PAGE_MASK) || - XE_IOCTL_DBG(xe, range & XE_64K_PAGE_MASK)) { - err = -EINVAL; - goto put_obj; - } - } - - coh_mode = xe_pat_index_get_coh_mode(xe, pat_index); - if (bos[i]->cpu_caching) { - if (XE_IOCTL_DBG(xe, coh_mode == XE_COH_NONE && - bos[i]->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB)) { - err = -EINVAL; - goto put_obj; - } - } else if (XE_IOCTL_DBG(xe, coh_mode == XE_COH_NONE)) { - /* - * Imported dma-buf from a different device should - * require 1way or 2way coherency since we don't know - * how it was mapped on the CPU. Just assume is it - * potentially cached on CPU side. - */ - err = -EINVAL; + err = xe_vm_bind_ioctl_validate_bo(xe, bos[i], addr, range, + obj_offset, pat_index); + if (err) goto put_obj; - } } if (args->num_syncs) { @@ -3026,6 +3236,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) goto free_syncs; } + xe_vma_ops_init(&vops, vm, q, syncs, num_syncs); for (i = 0; i < args->num_binds; ++i) { u64 range = bind_ops[i].range; u64 addr = bind_ops[i].addr; @@ -3045,42 +3256,25 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) } err = vm_bind_ioctl_ops_parse(vm, q, ops[i], syncs, num_syncs, - &ops_list, - i == args->num_binds - 1); + &vops, i == args->num_binds - 1); if (err) goto unwind_ops; } /* Nothing to do */ - if (list_empty(&ops_list)) { + if (list_empty(&vops.list)) { err = -ENODATA; goto unwind_ops; } - xe_vm_get(vm); - if (q) - xe_exec_queue_get(q); - - err = vm_bind_ioctl_ops_execute(vm, &ops_list); - - up_write(&vm->lock); - - if (q) - xe_exec_queue_put(q); - xe_vm_put(vm); - - for (i = 0; bos && i < args->num_binds; ++i) - xe_bo_put(bos[i]); - - kvfree(bos); - kvfree(ops); - if (args->num_binds > 1) - kvfree(bind_ops); - - return err; + err = vm_bind_ioctl_ops_execute(vm, &vops); unwind_ops: - vm_bind_ioctl_ops_unwind(vm, ops, args->num_binds); + if (err && err != -ENODATA) + vm_bind_ioctl_ops_unwind(vm, ops, args->num_binds); + for (i = args->num_binds - 1; i >= 0; --i) + if (ops[i]) + drm_gpuva_ops_free(&vm->gpuvm, ops[i]); free_syncs: if (err == -ENODATA) err = vm_bind_ioctl_signal_fences(vm, q, syncs, num_syncs); @@ -3201,55 +3395,6 @@ int xe_vm_invalidate_vma(struct xe_vma *vma) return 0; } -int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id) -{ - struct drm_gpuva *gpuva; - bool is_vram; - uint64_t addr; - - if (!down_read_trylock(&vm->lock)) { - drm_printf(p, " Failed to acquire VM lock to dump capture"); - return 0; - } - if (vm->pt_root[gt_id]) { - addr = xe_bo_addr(vm->pt_root[gt_id]->bo, 0, XE_PAGE_SIZE); - is_vram = xe_bo_is_vram(vm->pt_root[gt_id]->bo); - drm_printf(p, " VM root: A:0x%llx %s\n", addr, - is_vram ? "VRAM" : "SYS"); - } - - drm_gpuvm_for_each_va(gpuva, &vm->gpuvm) { - struct xe_vma *vma = gpuva_to_vma(gpuva); - bool is_userptr = xe_vma_is_userptr(vma); - bool is_null = xe_vma_is_null(vma); - - if (is_null) { - addr = 0; - } else if (is_userptr) { - struct sg_table *sg = to_userptr_vma(vma)->userptr.sg; - struct xe_res_cursor cur; - - if (sg) { - xe_res_first_sg(sg, 0, XE_PAGE_SIZE, &cur); - addr = xe_res_dma(&cur); - } else { - addr = 0; - } - } else { - addr = __xe_bo_addr(xe_vma_bo(vma), 0, XE_PAGE_SIZE); - is_vram = xe_bo_is_vram(xe_vma_bo(vma)); - } - drm_printf(p, " [%016llx-%016llx] S:0x%016llx A:%016llx %s\n", - xe_vma_start(vma), xe_vma_end(vma) - 1, - xe_vma_size(vma), - addr, is_null ? "NULL" : is_userptr ? "USR" : - is_vram ? "VRAM" : "SYS"); - } - up_read(&vm->lock); - - return 0; -} - struct xe_vm_snapshot { unsigned long num_snaps; struct { @@ -3339,7 +3484,7 @@ void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap) } if (bo) { - dma_resv_lock(bo->ttm.base.resv, NULL); + xe_bo_lock(bo, false); err = ttm_bo_vmap(&bo->ttm, &src); if (!err) { xe_map_memcpy_from(xe_bo_device(bo), @@ -3348,7 +3493,7 @@ void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap) snap->snap[i].len); ttm_bo_vunmap(&bo->ttm, &src); } - dma_resv_unlock(bo->ttm.base.resv); + xe_bo_unlock(bo); } else { void __user *userptr = (void __user *)(size_t)snap->snap[i].bo_ofs; diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index 306cd0934a19..b481608b12f1 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -6,6 +6,7 @@ #ifndef _XE_VM_H_ #define _XE_VM_H_ +#include "xe_assert.h" #include "xe_bo_types.h" #include "xe_macros.h" #include "xe_map.h" @@ -208,6 +209,8 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm); int xe_vm_userptr_check_repin(struct xe_vm *vm); int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker); +struct dma_fence *xe_vma_rebind(struct xe_vm *vm, struct xe_vma *vma, + u8 tile_mask); int xe_vm_invalidate_vma(struct xe_vma *vma); @@ -240,8 +243,6 @@ int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma); bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end); -int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id); - int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma); int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec, diff --git a/drivers/gpu/drm/xe/xe_vm_doc.h b/drivers/gpu/drm/xe/xe_vm_doc.h index bdc6659891a5..4d33f310b653 100644 --- a/drivers/gpu/drm/xe/xe_vm_doc.h +++ b/drivers/gpu/drm/xe/xe_vm_doc.h @@ -25,7 +25,7 @@ * VM bind (create GPU mapping for a BO or userptr) * ================================================ * - * Creates GPU mapings for a BO or userptr within a VM. VM binds uses the same + * Creates GPU mappings for a BO or userptr within a VM. VM binds uses the same * in / out fence interface (struct drm_xe_sync) as execs which allows users to * think of binds and execs as more or less the same operation. * @@ -190,8 +190,8 @@ * Deferred binds in fault mode * ---------------------------- * - * In a VM is in fault mode (TODO: link to fault mode), new bind operations that - * create mappings are by default are deferred to the page fault handler (first + * If a VM is in fault mode (TODO: link to fault mode), new bind operations that + * create mappings are by default deferred to the page fault handler (first * use). This behavior can be overriden by setting the flag * DRM_XE_VM_BIND_FLAG_IMMEDIATE which indicates to creating the mapping * immediately. @@ -225,7 +225,7 @@ * * A VM in compute mode enables long running workloads and ultra low latency * submission (ULLS). ULLS is implemented via a continuously running batch + - * semaphores. This enables to the user to insert jump to new batch commands + * semaphores. This enables the user to insert jump to new batch commands * into the continuously running batch. In both cases these batches exceed the * time a dma fence is allowed to exist for before signaling, as such dma fences * are not used when a VM is in compute mode. User fences (TODO: link user fence @@ -244,7 +244,7 @@ * Once all preempt fences are signaled for a VM the kernel can safely move the * memory and kick the rebind worker which resumes all the engines execution. * - * A preempt fence, for every engine using the VM, is installed the VM's + * A preempt fence, for every engine using the VM, is installed into the VM's * dma-resv DMA_RESV_USAGE_PREEMPT_FENCE slot. The same preempt fence, for every * engine using the VM, is also installed into the same dma-resv slot of every * external BO mapped in the VM. @@ -314,7 +314,7 @@ * signaling, and memory allocation is usually required to resolve a page * fault, but memory allocation is not allowed to gate dma fence signaling. As * such, dma fences are not allowed when VM is in fault mode. Because dma-fences - * are not allowed, long running workloads and ULLS are enabled on a faulting + * are not allowed, only long running workloads and ULLS are enabled on a faulting * VM. * * Defered VM binds @@ -399,14 +399,14 @@ * Notice no rebind is issued in the access counter handler as the rebind will * be issued on next page fault. * - * Cavets with eviction / user pointer invalidation - * ------------------------------------------------ + * Caveats with eviction / user pointer invalidation + * ------------------------------------------------- * * In the case of eviction and user pointer invalidation on a faulting VM, there * is no need to issue a rebind rather we just need to blow away the page tables * for the VMAs and the page fault handler will rebind the VMAs when they fault. - * The cavet is to update / read the page table structure the VM global lock is - * neeeed. In both the case of eviction and user pointer invalidation locks are + * The caveat is to update / read the page table structure the VM global lock is + * needed. In both the case of eviction and user pointer invalidation locks are * held which make acquiring the VM global lock impossible. To work around this * every VMA maintains a list of leaf page table entries which should be written * to zero to blow away the VMA's page tables. After writing zero to these @@ -427,9 +427,9 @@ * VM global lock (vm->lock) - rw semaphore lock. Outer most lock which protects * the list of userptrs mapped in the VM, the list of engines using this VM, and * the array of external BOs mapped in the VM. When adding or removing any of the - * aforemented state from the VM should acquire this lock in write mode. The VM + * aforementioned state from the VM should acquire this lock in write mode. The VM * bind path also acquires this lock in write while the exec / compute mode - * rebind worker acquire this lock in read mode. + * rebind worker acquires this lock in read mode. * * VM dma-resv lock (vm->ttm.base.resv->lock) - WW lock. Protects VM dma-resv * slots which is shared with any private BO in the VM. Expected to be acquired diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h index 72a100671e5d..ce1a63a5e3e7 100644 --- a/drivers/gpu/drm/xe/xe_vm_types.h +++ b/drivers/gpu/drm/xe/xe_vm_types.h @@ -178,6 +178,13 @@ struct xe_vm { struct list_head rebind_list; /** + * @destroy_work: worker to destroy VM, needed as a dma_fence signaling + * from an irq context can be last put and the destroy needs to be able + * to sleep. + */ + struct work_struct destroy_work; + + /** * @rftree: range fence tree to track updates to page table structure. * Used to implement conflict tracking between independent bind engines. */ @@ -323,11 +330,6 @@ enum xe_vma_op_flags { struct xe_vma_op { /** @base: GPUVA base operation */ struct drm_gpuva_op base; - /** - * @ops: GPUVA ops, when set call drm_gpuva_ops_free after this - * operations is processed - */ - struct drm_gpuva_ops *ops; /** @q: exec queue for this operation */ struct xe_exec_queue *q; /** @@ -341,6 +343,8 @@ struct xe_vma_op { struct list_head link; /** @flags: operation flags */ enum xe_vma_op_flags flags; + /** @tile_mask: Tile mask for operation */ + u8 tile_mask; union { /** @map: VMA map operation specific data */ @@ -351,4 +355,19 @@ struct xe_vma_op { struct xe_vma_op_prefetch prefetch; }; }; + +/** struct xe_vma_ops - VMA operations */ +struct xe_vma_ops { + /** @list: list of VMA operations */ + struct list_head list; + /** @vm: VM */ + struct xe_vm *vm; + /** @q: exec queue these operations */ + struct xe_exec_queue *q; + /** @syncs: syncs these operation */ + struct xe_sync_entry *syncs; + /** @num_syncs: number of syncs */ + u32 num_syncs; +}; + #endif diff --git a/drivers/gpu/drm/xe/xe_vram.c b/drivers/gpu/drm/xe/xe_vram.c new file mode 100644 index 000000000000..5bcd59190353 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_vram.c @@ -0,0 +1,368 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021-2024 Intel Corporation + */ + +#include <linux/pci.h> + +#include <drm/drm_managed.h> +#include <drm/drm_print.h> + +#include "regs/xe_bars.h" +#include "regs/xe_gt_regs.h" +#include "regs/xe_regs.h" +#include "xe_assert.h" +#include "xe_device.h" +#include "xe_force_wake.h" +#include "xe_gt_mcr.h" +#include "xe_gt_sriov_vf.h" +#include "xe_mmio.h" +#include "xe_module.h" +#include "xe_sriov.h" +#include "xe_vram.h" + +#define BAR_SIZE_SHIFT 20 + +static void +_resize_bar(struct xe_device *xe, int resno, resource_size_t size) +{ + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); + int bar_size = pci_rebar_bytes_to_size(size); + int ret; + + if (pci_resource_len(pdev, resno)) + pci_release_resource(pdev, resno); + + ret = pci_resize_resource(pdev, resno, bar_size); + if (ret) { + drm_info(&xe->drm, "Failed to resize BAR%d to %dM (%pe). Consider enabling 'Resizable BAR' support in your BIOS\n", + resno, 1 << bar_size, ERR_PTR(ret)); + return; + } + + drm_info(&xe->drm, "BAR%d resized to %dM\n", resno, 1 << bar_size); +} + +/* + * if force_vram_bar_size is set, attempt to set to the requested size + * else set to maximum possible size + */ +static void resize_vram_bar(struct xe_device *xe) +{ + u64 force_vram_bar_size = xe_modparam.force_vram_bar_size; + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); + struct pci_bus *root = pdev->bus; + resource_size_t current_size; + resource_size_t rebar_size; + struct resource *root_res; + u32 bar_size_mask; + u32 pci_cmd; + int i; + + /* gather some relevant info */ + current_size = pci_resource_len(pdev, LMEM_BAR); + bar_size_mask = pci_rebar_get_possible_sizes(pdev, LMEM_BAR); + + if (!bar_size_mask) + return; + + /* set to a specific size? */ + if (force_vram_bar_size) { + u32 bar_size_bit; + + rebar_size = force_vram_bar_size * (resource_size_t)SZ_1M; + + bar_size_bit = bar_size_mask & BIT(pci_rebar_bytes_to_size(rebar_size)); + + if (!bar_size_bit) { + drm_info(&xe->drm, + "Requested size: %lluMiB is not supported by rebar sizes: 0x%x. Leaving default: %lluMiB\n", + (u64)rebar_size >> 20, bar_size_mask, (u64)current_size >> 20); + return; + } + + rebar_size = 1ULL << (__fls(bar_size_bit) + BAR_SIZE_SHIFT); + + if (rebar_size == current_size) + return; + } else { + rebar_size = 1ULL << (__fls(bar_size_mask) + BAR_SIZE_SHIFT); + + /* only resize if larger than current */ + if (rebar_size <= current_size) + return; + } + + drm_info(&xe->drm, "Attempting to resize bar from %lluMiB -> %lluMiB\n", + (u64)current_size >> 20, (u64)rebar_size >> 20); + + while (root->parent) + root = root->parent; + + pci_bus_for_each_resource(root, root_res, i) { + if (root_res && root_res->flags & (IORESOURCE_MEM | IORESOURCE_MEM_64) && + (u64)root_res->start > 0x100000000ul) + break; + } + + if (!root_res) { + drm_info(&xe->drm, "Can't resize VRAM BAR - platform support is missing. Consider enabling 'Resizable BAR' support in your BIOS\n"); + return; + } + + pci_read_config_dword(pdev, PCI_COMMAND, &pci_cmd); + pci_write_config_dword(pdev, PCI_COMMAND, pci_cmd & ~PCI_COMMAND_MEMORY); + + _resize_bar(xe, LMEM_BAR, rebar_size); + + pci_assign_unassigned_bus_resources(pdev->bus); + pci_write_config_dword(pdev, PCI_COMMAND, pci_cmd); +} + +static bool resource_is_valid(struct pci_dev *pdev, int bar) +{ + if (!pci_resource_flags(pdev, bar)) + return false; + + if (pci_resource_flags(pdev, bar) & IORESOURCE_UNSET) + return false; + + if (!pci_resource_len(pdev, bar)) + return false; + + return true; +} + +static int determine_lmem_bar_size(struct xe_device *xe) +{ + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); + + if (!resource_is_valid(pdev, LMEM_BAR)) { + drm_err(&xe->drm, "pci resource is not valid\n"); + return -ENXIO; + } + + resize_vram_bar(xe); + + xe->mem.vram.io_start = pci_resource_start(pdev, LMEM_BAR); + xe->mem.vram.io_size = pci_resource_len(pdev, LMEM_BAR); + if (!xe->mem.vram.io_size) + return -EIO; + + /* XXX: Need to change when xe link code is ready */ + xe->mem.vram.dpa_base = 0; + + /* set up a map to the total memory area. */ + xe->mem.vram.mapping = ioremap_wc(xe->mem.vram.io_start, xe->mem.vram.io_size); + + return 0; +} + +static inline u64 get_flat_ccs_offset(struct xe_gt *gt, u64 tile_size) +{ + struct xe_device *xe = gt_to_xe(gt); + u64 offset; + u32 reg; + + if (GRAPHICS_VER(xe) >= 20) { + u64 ccs_size = tile_size / 512; + u64 offset_hi, offset_lo; + u32 nodes, num_enabled; + + reg = xe_mmio_read32(gt, MIRROR_FUSE3); + nodes = REG_FIELD_GET(XE2_NODE_ENABLE_MASK, reg); + num_enabled = hweight32(nodes); /* Number of enabled l3 nodes */ + + reg = xe_gt_mcr_unicast_read_any(gt, XE2_FLAT_CCS_BASE_RANGE_LOWER); + offset_lo = REG_FIELD_GET(XE2_FLAT_CCS_BASE_LOWER_ADDR_MASK, reg); + + reg = xe_gt_mcr_unicast_read_any(gt, XE2_FLAT_CCS_BASE_RANGE_UPPER); + offset_hi = REG_FIELD_GET(XE2_FLAT_CCS_BASE_UPPER_ADDR_MASK, reg); + + offset = offset_hi << 32; /* HW view bits 39:32 */ + offset |= offset_lo << 6; /* HW view bits 31:6 */ + offset *= num_enabled; /* convert to SW view */ + + /* We don't expect any holes */ + xe_assert_msg(xe, offset == (xe_mmio_read64_2x32(gt, GSMBASE) - ccs_size), + "Hole between CCS and GSM.\n"); + } else { + reg = xe_gt_mcr_unicast_read_any(gt, XEHP_FLAT_CCS_BASE_ADDR); + offset = (u64)REG_FIELD_GET(XEHP_FLAT_CCS_PTR, reg) * SZ_64K; + } + + return offset; +} + +/* + * tile_vram_size() - Collect vram size and offset information + * @tile: tile to get info for + * @vram_size: available vram (size - device reserved portions) + * @tile_size: actual vram size + * @tile_offset: physical start point in the vram address space + * + * There are 4 places for size information: + * - io size (from pci_resource_len of LMEM bar) (only used for small bar and DG1) + * - TILEx size (actual vram size) + * - GSMBASE offset (TILEx - "stolen") + * - CSSBASE offset (TILEx - CSS space necessary) + * + * CSSBASE is always a lower/smaller offset then GSMBASE. + * + * The actual available size of memory is to the CCS or GSM base. + * NOTE: multi-tile bases will include the tile offset. + * + */ +static int tile_vram_size(struct xe_tile *tile, u64 *vram_size, + u64 *tile_size, u64 *tile_offset) +{ + struct xe_device *xe = tile_to_xe(tile); + struct xe_gt *gt = tile->primary_gt; + u64 offset; + int err; + u32 reg; + + if (IS_SRIOV_VF(xe)) { + struct xe_tile *t; + int id; + + offset = 0; + for_each_tile(t, xe, id) + for_each_if(t->id < tile->id) + offset += xe_gt_sriov_vf_lmem(t->primary_gt); + + *tile_size = xe_gt_sriov_vf_lmem(gt); + *vram_size = *tile_size; + *tile_offset = offset; + + return 0; + } + + err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); + if (err) + return err; + + /* actual size */ + if (unlikely(xe->info.platform == XE_DG1)) { + *tile_size = pci_resource_len(to_pci_dev(xe->drm.dev), LMEM_BAR); + *tile_offset = 0; + } else { + reg = xe_gt_mcr_unicast_read_any(gt, XEHP_TILE_ADDR_RANGE(gt->info.id)); + *tile_size = (u64)REG_FIELD_GET(GENMASK(14, 8), reg) * SZ_1G; + *tile_offset = (u64)REG_FIELD_GET(GENMASK(7, 1), reg) * SZ_1G; + } + + /* minus device usage */ + if (xe->info.has_flat_ccs) { + offset = get_flat_ccs_offset(gt, *tile_size); + } else { + offset = xe_mmio_read64_2x32(gt, GSMBASE); + } + + /* remove the tile offset so we have just the available size */ + *vram_size = offset - *tile_offset; + + return xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); +} + +static void vram_fini(void *arg) +{ + struct xe_device *xe = arg; + struct xe_tile *tile; + int id; + + if (xe->mem.vram.mapping) + iounmap(xe->mem.vram.mapping); + + xe->mem.vram.mapping = NULL; + + for_each_tile(tile, xe, id) + tile->mem.vram.mapping = NULL; +} + +/** + * xe_vram_probe() - Probe VRAM configuration + * @xe: the &xe_device + * + * Collect VRAM size and offset information for all tiles. + * + * Return: 0 on success, error code on failure + */ +int xe_vram_probe(struct xe_device *xe) +{ + struct xe_tile *tile; + resource_size_t io_size; + u64 available_size = 0; + u64 total_size = 0; + u64 tile_offset; + u64 tile_size; + u64 vram_size; + int err; + u8 id; + + if (!IS_DGFX(xe)) + return 0; + + /* Get the size of the root tile's vram for later accessibility comparison */ + tile = xe_device_get_root_tile(xe); + err = tile_vram_size(tile, &vram_size, &tile_size, &tile_offset); + if (err) + return err; + + err = determine_lmem_bar_size(xe); + if (err) + return err; + + drm_info(&xe->drm, "VISIBLE VRAM: %pa, %pa\n", &xe->mem.vram.io_start, + &xe->mem.vram.io_size); + + io_size = xe->mem.vram.io_size; + + /* tile specific ranges */ + for_each_tile(tile, xe, id) { + err = tile_vram_size(tile, &vram_size, &tile_size, &tile_offset); + if (err) + return err; + + tile->mem.vram.actual_physical_size = tile_size; + tile->mem.vram.io_start = xe->mem.vram.io_start + tile_offset; + tile->mem.vram.io_size = min_t(u64, vram_size, io_size); + + if (!tile->mem.vram.io_size) { + drm_err(&xe->drm, "Tile without any CPU visible VRAM. Aborting.\n"); + return -ENODEV; + } + + tile->mem.vram.dpa_base = xe->mem.vram.dpa_base + tile_offset; + tile->mem.vram.usable_size = vram_size; + tile->mem.vram.mapping = xe->mem.vram.mapping + tile_offset; + + if (tile->mem.vram.io_size < tile->mem.vram.usable_size) + drm_info(&xe->drm, "Small BAR device\n"); + drm_info(&xe->drm, "VRAM[%u, %u]: Actual physical size %pa, usable size exclude stolen %pa, CPU accessible size %pa\n", id, + tile->id, &tile->mem.vram.actual_physical_size, &tile->mem.vram.usable_size, &tile->mem.vram.io_size); + drm_info(&xe->drm, "VRAM[%u, %u]: DPA range: [%pa-%llx], io range: [%pa-%llx]\n", id, tile->id, + &tile->mem.vram.dpa_base, tile->mem.vram.dpa_base + (u64)tile->mem.vram.actual_physical_size, + &tile->mem.vram.io_start, tile->mem.vram.io_start + (u64)tile->mem.vram.io_size); + + /* calculate total size using tile size to get the correct HW sizing */ + total_size += tile_size; + available_size += vram_size; + + if (total_size > xe->mem.vram.io_size) { + drm_info(&xe->drm, "VRAM: %pa is larger than resource %pa\n", + &total_size, &xe->mem.vram.io_size); + } + + io_size -= min_t(u64, tile_size, io_size); + } + + xe->mem.vram.actual_physical_size = total_size; + + drm_info(&xe->drm, "Total VRAM: %pa, %pa\n", &xe->mem.vram.io_start, + &xe->mem.vram.actual_physical_size); + drm_info(&xe->drm, "Available VRAM: %pa, %pa\n", &xe->mem.vram.io_start, + &available_size); + + return devm_add_action_or_reset(xe->drm.dev, vram_fini, xe); +} diff --git a/drivers/gpu/drm/xe/xe_vram.h b/drivers/gpu/drm/xe/xe_vram.h new file mode 100644 index 000000000000..e31cc04ec0db --- /dev/null +++ b/drivers/gpu/drm/xe/xe_vram.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef _XE_VRAM_H_ +#define _XE_VRAM_H_ + +struct xe_device; + +int xe_vram_probe(struct xe_device *xe); + +#endif diff --git a/drivers/gpu/drm/xe/xe_vram_freq.c b/drivers/gpu/drm/xe/xe_vram_freq.c index 3e21ddc6e60c..99ff95e408e0 100644 --- a/drivers/gpu/drm/xe/xe_vram_freq.c +++ b/drivers/gpu/drm/xe/xe_vram_freq.c @@ -87,7 +87,7 @@ static const struct attribute_group freq_group_attrs = { .attrs = freq_attrs, }; -static void vram_freq_sysfs_fini(struct drm_device *drm, void *arg) +static void vram_freq_sysfs_fini(void *arg) { struct kobject *kobj = arg; @@ -122,5 +122,5 @@ int xe_vram_freq_sysfs_init(struct xe_tile *tile) return err; } - return drmm_add_action_or_reset(&xe->drm, vram_freq_sysfs_fini, kobj); + return devm_add_action_or_reset(xe->drm.dev, vram_freq_sysfs_fini, kobj); } diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c index dd214d95e4b6..26b170a0cdc7 100644 --- a/drivers/gpu/drm/xe/xe_wa.c +++ b/drivers/gpu/drm/xe/xe_wa.c @@ -238,11 +238,11 @@ static const struct xe_rtp_entry_sr gt_was[] = { }, { XE_RTP_NAME("14020316580"), XE_RTP_RULES(MEDIA_VERSION(1301)), - XE_RTP_ACTIONS(CLR(PG_ENABLE, - VD0_HCP_POWERGATE_ENABLE | - VD0_MFXVDENC_POWERGATE_ENABLE | - VD2_HCP_POWERGATE_ENABLE | - VD2_MFXVDENC_POWERGATE_ENABLE)), + XE_RTP_ACTIONS(CLR(POWERGATE_ENABLE, + VDN_HCP_POWERGATE_ENABLE(0) | + VDN_MFXVDENC_POWERGATE_ENABLE(0) | + VDN_HCP_POWERGATE_ENABLE(2) | + VDN_MFXVDENC_POWERGATE_ENABLE(2))), }, { XE_RTP_NAME("14019449301"), XE_RTP_RULES(MEDIA_VERSION(1301), ENGINE_CLASS(VIDEO_DECODE)), @@ -533,6 +533,10 @@ static const struct xe_rtp_entry_sr engine_was[] = { FUNC(xe_rtp_match_first_render_or_compute)), XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0, WR_REQ_CHAINING_DIS)) }, + { XE_RTP_NAME("14021402888"), + XE_RTP_RULES(GRAPHICS_VERSION(2001), ENGINE_CLASS(RENDER)), + XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7, CLEAR_OPTIMIZATION_DISABLE)) + }, /* Xe2_HPM */ @@ -668,6 +672,15 @@ static const struct xe_rtp_entry_sr lrc_was[] = { XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), ENGINE_CLASS(RENDER)), XE_RTP_ACTIONS(SET(COMMON_SLICE_CHICKEN1, DISABLE_BOTTOM_CLIP_RECTANGLE_TEST)) }, + { XE_RTP_NAME("14021567978"), + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED), + ENGINE_CLASS(RENDER)), + XE_RTP_ACTIONS(SET(CHICKEN_RASTER_2, TBIMR_FAST_CLIP)) + }, + { XE_RTP_NAME("14020756599"), + XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), + XE_RTP_ACTIONS(SET(WM_CHICKEN3, HIZ_PLANE_COMPRESSION_DIS)) + }, /* Xe2_HPG */ { XE_RTP_NAME("15010599737"), @@ -682,6 +695,22 @@ static const struct xe_rtp_entry_sr lrc_was[] = { XE_RTP_RULES(GRAPHICS_VERSION(2001), ENGINE_CLASS(RENDER)), XE_RTP_ACTIONS(SET(WM_CHICKEN3, HIZ_PLANE_COMPRESSION_DIS)) }, + { XE_RTP_NAME("14021490052"), + XE_RTP_RULES(GRAPHICS_VERSION(2001), ENGINE_CLASS(RENDER)), + XE_RTP_ACTIONS(SET(FF_MODE, + DIS_MESH_PARTIAL_AUTOSTRIP | + DIS_MESH_AUTOSTRIP), + SET(VFLSKPD, + DIS_PARTIAL_AUTOSTRIP | + DIS_AUTOSTRIP)) + }, + + /* Xe2_LPM */ + + { XE_RTP_NAME("14020756599"), + XE_RTP_RULES(ENGINE_CLASS(RENDER), FUNC(xe_rtp_match_when_media2000)), + XE_RTP_ACTIONS(SET(WM_CHICKEN3, HIZ_PLANE_COMPRESSION_DIS)) + }, {} }; |