Replace usage of vis_vram with vram through AMD GPU related codebase
requested to merge 364-incorrect-vram-displayed-for-amd-radeon-gpus-after-nunet-onboarding into develop
Description
This Merge Request (MR) introduces changes aimed at correcting the reporting of GPU VRAM in our device-management-service for AMD GPUs. Previously, our codebase utilized the vis_vram
parameter when invoking rocm-smi
, which resulted in reporting only the visible portion of the VRAM. This approach led to underreporting the total VRAM, especially for high-capacity GPUs such as the AMD Radeon RX 5700 XT. To ensure accurate reporting and utilization of AMD GPUs, the following modifications have been implemented:
- Replaced
vis_vram
withvram
in all instances within therocm-smi
command executions across the codebase. This change guarantees that the total VRAM is accurately reported, aligning with the GPU's specifications. - Adjusted the regular expression patterns in
amd.go
and related files to match the output format ofrocm-smi
when using thevram
parameter. - Updated the GPU status reporting functions in
nunet
scripts to accurately calculate and display the total and used VRAM based on thevram
parameter.
These changes address the discrepancy between reported and actual VRAM, ensuring that the device-management-service accurately reflects the hardware capabilities of AMD GPUs. More details here.
Checklist
-
I have updated the @version
string in main.go. See https://semver.org/ -
I have updated CHANGELOG.md with a short description of the changes. This is for end-user quick reference. -
I have run swag init
to update the swagger docs. Note: This is only applicable if you added/deleted any REST endpoints.
Closes #364
Edited by Dagim Sisay