Create docker images pytorch-gpu and tensorflow-gpu with ML models to run on the compute provider
Description
Click to expand
Who
- @avimanyu786 -- developer
- @kabir.kbr @janaina.senna -- advice and help related to the architecture, reviewers
What
- Create docker images pytorch-gpu and tensorflow-gpu including the ML model that can be chosen to be downloaded and run on a NuNet compute provider machine. This is necessary to implement the Decentralized GPU ML Cloud milestone.
How
- Create two docker images pytorch-gpu and tensorflow-gpu. Use the pytorch and tensorflow images in docker hub as base and add the ML model to these images.
- Store the images on our GitLab container registry.
- Run the images on the Nomad and solve the problem reported here related with big images.
Why
- On the Decentralized GPU ML Cloud milestone it is possible to run a ML application in NuNet using a compute provider machine with GPU. The Nomad, running on the compute provider machine, uses one of these two images to run the ML model in PyTorch or TensorFlow framework.
When
This issue is related to the issues for implementing the sequence diagram for running a ML on a GPU described here:
- we pre-build containers with tensorflow and pytorch (separate containers, tagged appropriately) via gitlab ci/cd and put that on the gitlab registry;
- when parameters are passed to the compute provider, it downloads the pre-build image that is required from our gitlab registry;
- runs that image with unique parameters that are needed for that specific ML user.
Acceptance Criteria
Click to expand
- pytorch-gpu and tensorflow-gpu docker images including the ML model stored in the GitLab container registry.
- Successfully tests on Nomad using these images.
Work Breakdown Structure (WBS)
Task | Description | Duration (Tentative) | Status | Start Date | End Date | Comment |
---|---|---|---|---|---|---|
A | Create Docker images for PyTorch and TensorFlow including their respective ML models and prerequisites | 6 Days | Done | Aug 24 2022 | Aug 30 2022 | |
A.1. | Researching on the process of building Docker images on the GitLab registry | 1 Day | Done | Aug 24 2022 | Aug 24 2022 | |
A.2. | Preparing TensorFlow FMNIST Dockerfile | 1 Day | Done | Aug 25 2022 | Aug 25 2022 | |
A.3. | Building & Testing TensorFlow FMNIST Dockerfile | 1 Day | Done | Aug 26 2022 | Aug 26 2022 | |
A.4. | Preparing PyTorch CIFAR10 Dockerfile | 1 Day | Done | Aug 26 2022 | Aug 26 2022 | |
A.5. | Building & Testing PyTorch CIFAR10 Dockerfile | 1 Day | Done | Aug 26 2022 | Aug 26 2022 | |
A.6. | Uploading both Dockerfiles to corresponding repository with execution and results | 1 Day | Done | Aug 30 2022 | Aug 30 2022 | |
A.7. | Documentation on how the building and execution stages work | 1 Day | Done | Aug 30 2022 | Aug 30 2022 | |
A.8. | Send merge request to ml-on-gpu-service develop branch |
1 Day | Done | Aug 30 2022 | Aug 30 2022 |
Edited by Avimanyu Bandyopadhyay