You need to sign in or sign up before continuing.
Closed
Milestone
Jul 1, 2022–Dec 15, 2022
Decentralized GPU ML Cloud -- Phase 1
Milestone Owner: Avimanyu Bandyopadhyay - @avimanyu786
Summary
NuNet platform that connects decentralized GPU hardware providers and enables secure, safe and decentralized access to GPUs for Cardano.
See project scoping discussion with external stakeholders and full description as a Cardano Catalyst Fund8 proposal on the Catalyst platform
Functionalities
- On-board/manage a GPU device in NuNet;
- Run a ML model using the on-boarded GPU device;
- Compensate compute providers with NTX;
- Support data storage providers in NuNet;
- Build GPU clusters to run ML models.
Main Issues
- A GPU ML Prototype: A workflow for running standard ML models with TensorFlow/PyTorch [done]
- Stable workloads: Execution of lengthy ML workloads to ensure stability and feasibility [partially done]
- Rootless Containers: Ability to create rootless containers for GPU ML users [to review]
- Encompassing the entire workflow within the three primary APIs: Telemetry, Tokenomics & Compute [in progress]
Implementation plan
Sequence workflow [in progress] to be based on:
User's Perspective
- Link to ML model (preferably a git repository)
- Whether the model uses PyTorch or TensorFlow - model would run on corresponding container in the onboarded device with exact list of commands
- Where to upload results (e.g. send the results back as JSON or CSV file. This can be sent as text and then parsed on the user's end)
- In the background, DMS will allocate job based on available GPU device
- Estimated CPU & memory usage
- Estimated time for training/inference (an approx. Maximum)
- User sees estimated NTX corresponding to calculated computational cost (based on 4,5,6 and any other relevant parameters).
- Run Model
- Notify on completion via email/NuNet app
Onboarder's Perspective
- Download a PyTorch or TensorFlow docker image and run a container on the selected GPU node on the NuNet network passing all necessary parameters to the docker container (as the URL of the ML model and any dependencies).
- Compute computational cost based on time taken by the complete sequence of commands/time taken by the user inside the rootless container.