Estimated Focus Duration: 4 days
Estimated Pessimistic Duration: 9 days
Description
Click to expand
Who
What
How
Why
When
Acceptance Criteria
Click to expand
Work Breakdown Structure (WBS)
offboard command
Click to expand
Task
Description
Duration
Status
Start Date
End Date
Comment
A
Create offboard basic structure
2 Hrs
Done
September 4
September 5
B
Set flags and cases (-f, --force flag)
2 Hrs
Done
September 5
September 5
onboard-ml command
Click to expand
Task
Description
Duration
Status
Start Date
End Date
Comment
A
Outline porting strategy for commands
3 Hrs
Done
September 5
September 6
B
Create function for checking if machine is on WSL
1 Hr
Done
September 6
September 6
C
Check for AMD and NVIDIA GPUs
4 Hrs
Done
September 6
September 6
D
Pull docker images using Docker SDK
5 Hrs
Done
September 6
September 6
onboard-gpu command
Click to expand
Task
Description
Duration
Status
Start Date
End Date
Comment
A
Create basic structure for command
2 Hrs
Done
September 8
September 8
A1
Check for WSL and detect GPUs
1 Hr
Done
September 8
September 8
A2
Retrieve data from GPUs and print available
1 Hr
Done
September 8
September 8
B
Drivers/Container Runtime installation
3 Hrs
Done
September 8
September 11
B1
Prompt user for proceeding with installation
1 Hr
Done
September 8
September 8
B2
Write already existing install_x_drivers function in bash CLI to a separate file
1 Hr
Done
September 8
September 8
B3
Encapsulate function for container runtime installation
1 Hr
Done
September 11
September 11
C
Refactor and modularize
3 Hrs
Done
September 11
September 11
capacity command (add GPU flags)
Click to expand
Task
Description
Duration
Status
Start Date
End Date
Comment
A
Research into DMS code looking for reusable functions (Docker, GPUs etc.)
3 Hrs
Done
September 12
September 12
B
check_cuda_tensor function
6 Hrs
Done
September 12
September 13
Though it's simple in the bash script, took more time than expected to port it because of defining custom Docker methods for replicating the docker run command and all of its options. Tried using docker and service packages within DMS but they did not match exactly what I was aiming for. Also refactored the function after prototype.
B1
Define function for running Docker container
4 Hrs
Done
September 13
September 13
B2
Refactor for reusability
2 Hrs
Done
September 13
September 13
C
check_rocm_hip function
2 Hrs
Done
September 13
September 13
After refactoring of the previous function it was easier to implement this one
D
gpu_status function
16 Hrs
In Progress
September 15
D1
Create initial structure with NVML
4 Hrs
Done
September 18
September 18
D2
Add checking for NVIDIA/AMD GPUs before displaying info
1 Hr
Done
September 18
September 18
D3
Refactor and organize code in order to insert AMD info
4 Hrs
Done
September 19
September 19
D4
Insert AMD information using rocm_smi library
3 Hrs
Done
September 19
September 22
D5
Testing
4 Hrs
Done
September 22
September 25
rocm_smi package only supported a specific version of the ROCm library in order to work. After talking to Avi, we agreed on sticking to the rocm-smi command and then working around the output of it
D6
Update the interface methods with rocm-smi command output workaround
4 Hrs
Done
September 27
September 29
D7
Format the GPU monitoring output in gpu_status file
1 Hr
Done
September 29
September 29
D8
Testing
3 Hrs
In Progress
September 29
E
Fix bug in gpu capacity not being able to download missing Docker images