ROCm (Radeon Open Compute) 是由 AMD 开发的开源软件平台,专为加速计算而设计,开源用于深度学习、机器学习和图形处理等应用程序。定位和 NVIDIA 的 CUDA 相同

安装 ROCm

添加 rocm 源

  • 添加 gpg 密钥
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
    gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/nul
  • 添加源

分别添加 AMDGPU 驱动程序的软件源和 ROCm 软件源,并设置软件源的优先级,确保在有多个版本可用时,系统会优先安装来自 AMD 官方仓库的软件包

echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/6.4/ubuntu jammy main" \
    | sudo tee /etc/apt/sources.list.d/amdgpu.list
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4 noble main" \
    | sudo tee --append /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
    | sudo tee /etc/apt/preferences.d/rocm-pin-600
  • 更新软件包
sudo apt update

安装 ROCm

  • 安装 ROCm

使用 apt 安装 ROCm 时会自动安装相关的依赖项,包括驱动、内核模块和相关工具

sudo apt install rocm

检查

  • rocminfo 查看显卡信息

rocminfo 是一个用于查询 ROCm 设备信息的工具,它可以列出系统中可用的 ROCm 平台和设备,以及每个设备的属性和特性。

rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.15
Runtime Ext Version:     1.7
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES
 # ...
  • clinfo

clinfo 是一个用于查询 OpenCL 设备信息的工具。它可以列出系统中可用的 OpenCL 平台和设备,以及每个设备的属性和特性。 使用 clinfo 可以查看设备的名称、供应商、版本、计算单元数量、最大工作项数量、最大工作组大小、最大内存分配等信息。这对于选择适合的 OpenCL 设备和优化程序性能非常有用

clinfo
Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3649.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon Graphics

# ...
  • rocm-smi

rocm-smi 是一个用于监视和管理 ROCm 设备的工具。它可以显示设备的当前状态、温度、功耗、内存使用情况等信息

rocm-smi
============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device  Node  IDs              Temp    Power     Partitions          SCLK  MCLK     Fan  Perf  PwrCap       VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Socket)  (Mem, Compute, ID)
==========================================================================================================================
0       1     0x1900,   58154  45.0°C  20.054W   N/A, N/A, 0         None  2400Mhz  0%   auto  Unsupported  4%     0%
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================