Easy MLFlow deployment

What

Deploy MLFlow as a GitLab component (https://gitlab.com/gitlab-org/incubation-engineering/mlops/gitlab-mlflow)

Context

https://www.youtube.com/watch?v=V4hos3VFeC4

A critical piece of infrastructure for MLOps is the Model Registry

A model registry is a repository used to store and version trained machine learning (ML) models. Model registries greatly simplify the task of tracking models as they move through the ML lifecycle, from training to production deployments and ultimately retirement. (https://www.phdata.io/blog/what-is-a-model-registry/)

One of the most popular open source model registries is MLFlow, from Data Bricks (https://www.mlflow.org/), and we believe we can bring in a better experience to users by combining MLFlow and GitLab.

Why

What do Data Scientists gain from this?

  • Initially an easy to use Model Registry
  • Later surface MLFlow API information to different areas of GitLab. For example, on an MR, we could already display the models new accuracies

What do Platform Engineers gain from this?

Account Management For Platform Engineers, one of the worst drawbacks of MLFlow is that it doesn't provide authentication in any form, and it needs to be implemented manually. By routing through a GitLab project, we get user management for free. We might want to add a specific permission level for this later, but for now

Object Storage

We already provide storage management for docker containers, library registries and others. We can also use it as MLFlow storage.

MVP

  • Install gitlab-mlflow along GitLab installation
  • Start MLFlow tracking server and UI with GitLab
  • my-gitlab/my_project/-/mlflow opens MLFlow UI
  • my-gitlab/my_project/-/mlflow shows the runs tracked within that project
  • my-gitlab/my_project/-/mlflow/track tracks runs on MLFlow
  • Artifacts saved on GitLab storage by default

User Experience

  1. Installation

Create a component that install MLFlow along GitLab

  1. Usage
  • Accessing MLFlow: <my-gitlab-instalation>/<my-project>/-/mlflow Will route to UI of the created mlflow instance
  • Accessing MLFlow API: <my-api-url>/mlflow/<my-call> Will route the api calls to the created MLFlow instance
  • Registering Models: <my-gitlab-instalation>/<my-project>/-/mlflow.mlflow, this requires users to have the proper authentication keys setup locally

Questions that need answer

  1. What about other Model Registries?

While the first iteration will be done with MLFlow, if we find this is a feature users are looking for we can add connection to other model registries down the road

Some examples:

  1. How will we measure success?

Qualitatively through user engagement (issues created, bugs raised, social, etc) and ServicePing (number of installations and how often they are used). A threshold for success still needs to be decided.

  1. Shouldn't we be starting by connecting to existing installations of MLFlow to extract value from that?

Perhaps. Installing is the initial step of the funnel, and it's time consuming. But adding features that depend on an external API might not be that easy either

Edited by Eduardo Bonet