Milestone Jul 1, 2022–Dec 15, 2022
Decentralized GPU ML Cloud -- Phase 1
Milestone Owner: Avimanyu Bandyopadhyay - @avimanyu786
NuNet platform that connects decentralized GPU hardware providers and enables secure, safe and decentralized access to GPUs for Cardano.
- On-board/manage a GPU device in NuNet;
- Run a ML model using the on-boarded GPU device;
- Compensate compute providers with NTX;
- Support data storage providers in NuNet;
- Build GPU clusters to run ML models.
- A GPU ML Prototype: A workflow for running standard ML models with TensorFlow/PyTorch [done]
- Stable workloads: Execution of lengthy ML workloads to ensure stability and feasibility [partially done]
- Rootless Containers: Ability to create rootless containers for GPU ML users [to review]
- Encompassing the entire workflow within the three primary APIs: Telemetry, Tokenomics & Compute [in progress]
Sequence workflow [in progress] to be based on:
- Link to ML model (preferably a git repository)
- Whether the model uses PyTorch or TensorFlow - model would run on corresponding container in the onboarded device with exact list of commands
- Where to upload results (e.g. send the results back as JSON or CSV file. This can be sent as text and then parsed on the user's end)
- In the background, DMS will allocate job based on available GPU device
- Estimated CPU & memory usage
- Estimated time for training/inference (an approx. Maximum)
- User sees estimated NTX corresponding to calculated computational cost (based on 4,5,6 and any other relevant parameters).
- Run Model
- Notify on completion via email/NuNet app
- Download a PyTorch or TensorFlow docker image and run a container on the selected GPU node on the NuNet network passing all necessary parameters to the docker container (as the URL of the ML model and any dependencies).
- Compute computational cost based on time taken by the complete sequence of commands/time taken by the user inside the rootless container.
Demos showing some ML on GPU features
- Using a WebApp to request to run a ML job on a machine with GPU
- Running a ML training on GPU onboarded on NuNet
- First time running a ML training on GPU onboarded on NuNet with @pgwadapool collaboration