Incorrect VRAM Displayed for AMD Radeon GPUs After NuNet Onboarding
Estimation
Story points: 1 SP
Estimated focus duration (perfect conditions): 1 day
Estimated pessimistic duration (worst case scenario): 1 day
Summary
Incorrect VRAM Displayed for AMD Radeon GPU After NuNet Onboarding
Steps to reproduce
- Execute Nunet CLI onboarding command for a device equipped with an AMD Radeon RX 5700 XT GPU.
- Observe the reported VRAM size during the onboarding process.
- Verify reported VRAM with
rocm-smi --showmeminfo vis_vram
, noting the displayed size. - Check actual VRAM using
rocm-smi --showmeminfo vram
.
What is the current bug behavior?
The onboarding process reports the GPU's VRAM as 256MB instead of the correct 8GB. This discrepancy arises from displaying visible VRAM (vis_vram) rather than the total VRAM (vram) available on the GPU.
What is the expected correct behavior?
The onboarding process should accurately report the total VRAM of the GPU, in this case, approximately 8GB for the AMD Radeon RX 5700 XT.
Relevant logs and/or screenshots
Incorrect VRAM reported during onboarding:
"Successfully Onboarded. {"update_timestamp":1707830646,"resource":{"memory_max":128722,"total_core":32,"cpu_max":152359},"available":{"cpu":52359,"memory":28722},"reserved":{"cpu":100000,"memory":100000},"network":"nunet-test","public_key":"addr_test...","gpu_info":[{"name":"AMD Navi...","tot_vram":256,"free_vram":233}]}"
Correct VRAM reporting command output:
"VRAM Total Memory (B): 8573157376"
Version number of NuNet components
- The issue is not specific to a component version but to the Nunet CLI and its interaction with system hardware.
SO version, emulator/virtual machine type and version, network type (including NAT type), environment variables, parameters, etc
- The issue was observed on a system running a compatible version of ROCm and targeting AMD Radeon RX 5700 XT GPUs.
Possible fixes
A review and update of the Nunet CLI and documentation are suggested to ensure the onboarding process queries and displays the total VRAM correctly, avoiding confusion between visible VRAM (vis_vram) and total VRAM (vram).
Brief Explanation of vis_vram vs vram:
-
vis_vram (Visible VRAM): Refers to a portion of the GPU's total video memory that is directly addressable and visible to the GPU at any given time. This metric can be significantly lower than the total VRAM, especially in systems where the GPU's memory mapping limits the amount of VRAM that can be used simultaneously for certain operations.
-
vram (Video RAM): Represents the total amount of dedicated video memory available on the GPU. This is the full capacity of memory that can be used by the GPU for storing textures, frame buffers, and other graphics-related data. The total VRAM is what should ideally be reported during system checks and onboarding processes to accurately reflect the GPU's capabilities.