The platform should support GPU - support single GPU device

Description

Click to expand

the user story involves tasks we are going to do to enable training or inference of a machine learning model that uses GPU from the grid of connected devices

Acceptance Criteria

Click to expand

Detect and show the type of GPU with its specifications
Check, if necessary libraries are installed to allow usage of Cuda cores
Onboard device and show GPU stat on nomad
Since the GPU cannot be shared among tasks the onboarding criteria should request to use them

Task	Description	Duration	Status	Comment
1 explore onboarding script	explore how the onboarding script works	4 Hrs	Done
2 onboard GPU	Edit the current onboarding script to check for GPU and onboard it	8 Hrs	Done
3 show GPU stat	Show GPU stat on nomad	8 Hrs	Done	Nomad has no way of showing GPU stats
4 save GPU stat	save GPU stat on the machine on a file	1 Hr	Done
5 test usage	run GPU intensive task and test if it works (what I checked was if I can track GPU stat and save them while running GPU intensive task)	8 Hrs	Done
6 explore issues related to GPU	check nomad GitHub to solidify your decisions on GPU usage and nomad stat on GPU	8 hrs	Done
7 update nomad	update nomad to latest version and test if it supports GPU	8 hrs	Done
8 test GPU usage	modify dockers to use GPU and see if nomad can schedule them	16 hrs	Done
9 Test on Machines	the onboarding script was able to identify gpus	4 hrs	Done	this was tested on dagims and tedroses machine and output is recorded in the comment

Edited Aug 16, 2022 by Janaina Senna