The 17.0 major release is coming on May 16, 2024! This version brings many exciting improvements to GitLab, but also removes some deprecated features. We are introducing three breaking change windows during which we expect breaking changes to be deployed to GitLab.com. You can read more about it on our blogpost . The third breaking change window begins 2024-05-06 09:00 UTC and ends 2024-05-08 22:00 UTC.

Create docker images pytorch-gpu and tensorflow-gpu with ML models to run on the compute provider

Description

Click to expand

Who

@avimanyu786 -- developer
@kabir.kbr @janaina.senna -- advice and help related to the architecture, reviewers

What

Create docker images pytorch-gpu and tensorflow-gpu including the ML model that can be chosen to be downloaded and run on a NuNet compute provider machine. This is necessary to implement the Decentralized GPU ML Cloud milestone.

How

Create two docker images pytorch-gpu and tensorflow-gpu. Use the pytorch and tensorflow images in docker hub as base and add the ML model to these images.
Store the images on our GitLab container registry.
Run the images on the Nomad and solve the problem reported here related with big images.

Why

On the Decentralized GPU ML Cloud milestone it is possible to run a ML application in NuNet using a compute provider machine with GPU. The Nomad, running on the compute provider machine, uses one of these two images to run the ML model in PyTorch or TensorFlow framework.

When

This issue is related to the issues for implementing the sequence diagram for running a ML on a GPU described here:

we pre-build containers with tensorflow and pytorch (separate containers, tagged appropriately) via gitlab ci/cd and put that on the gitlab registry;
when parameters are passed to the compute provider, it downloads the pre-build image that is required from our gitlab registry;
runs that image with unique parameters that are needed for that specific ML user.

Acceptance Criteria

Click to expand

pytorch-gpu and tensorflow-gpu docker images including the ML model stored in the GitLab container registry.
Successfully tests on Nomad using these images.

Work Breakdown Structure (WBS)

Task	Description	Duration (Tentative)	Status	Start Date	End Date	Comment
A	Create Docker images for PyTorch and TensorFlow including their respective ML models and prerequisites	6 Days	Done	Aug 24 2022	Aug 30 2022
A.1.	Researching on the process of building Docker images on the GitLab registry	1 Day	Done	Aug 24 2022	Aug 24 2022
A.2.	Preparing TensorFlow FMNIST Dockerfile	1 Day	Done	Aug 25 2022	Aug 25 2022
A.3.	Building & Testing TensorFlow FMNIST Dockerfile	1 Day	Done	Aug 26 2022	Aug 26 2022
A.4.	Preparing PyTorch CIFAR10 Dockerfile	1 Day	Done	Aug 26 2022	Aug 26 2022
A.5.	Building & Testing PyTorch CIFAR10 Dockerfile	1 Day	Done	Aug 26 2022	Aug 26 2022
A.6.	Uploading both Dockerfiles to corresponding repository with execution and results	1 Day	Done	Aug 30 2022	Aug 30 2022
A.7.	Documentation on how the building and execution stages work	1 Day	Done	Aug 30 2022	Aug 30 2022
A.8.	Send merge request to `ml-on-gpu-service` develop branch	1 Day	Done	Aug 30 2022	Aug 30 2022

Edited Aug 30, 2022 by Avimanyu Bandyopadhyay