[Incubating Feature] Machine Learning Model Experiments (#8560) · Epics · GitLab.org

[Incubating Feature] Machine Learning Model Experiments

[[_TOC_]] DRI: @eduardobonet This Epic serves as the Single Source of Truth for updates, evolution and feedback on incubating Machine Learning Experiment Tracking. ## [Job To Be Done](https://about.gitlab.com/handbook/engineering/incubation/mlops/jtbd.html) When creating machine learning models, I want to compare the outcomes of potential hyperparemeter variations, so that I can choose the best candidate (IE_MLOPS_CR_3) ## Latest Updates - [2022/10/10] Using MLFlow Client to log experiments to GitLab - https://www.youtube.com/watch?v=baUIOexfcmA ## Context ### What is Machine Learning Experiment Tracking? When Data Scientists are working on Machine Learning, it is common to run train the same model using different configuration. These configurations can take many forms, which range from a parameter to the algorithm used for learning, or the learning itself, or the data used for training, all o which being able to significantly impact the final performance. In this context, we are calling each trial a Candidate and an Experiment a collection of comparable Candidates. A candidate can eventually be promoted to a model to be released, or the whole experience can happen just for exploration. The current open source alternative for this is [MLFlow](https://mlflow.org/), by DataBricks. While it does the job of tracking different trials well, it fails at addressing some common requirements such as multi-user support with permission. https://www.youtube.com/watch?v=V4hos3VFeC4 ## Why ### What do Data Scientists gain from this? - Ready to use Tracking Server, **no support needed from platform Engineers** - **Minimal changes in their code**: we intend to provide full support for MLFlow's client - **Surface the information across the platform**: for example and MR can be an experiment, and we display the different parameter results for that MR ### What do Platform Engineers gain from this? - **Multi User Support** For Platform Engineers, one of the worst drawbacks of MLFlow is that it doesn't provide authentication in any form, and it needs to be implemented manually. By routing through a GitLab project, we get user management for free. We might want to add a specific permission level for this later, but for now - **Object Storage** We already provide storage management for docker containers, library registries and others, we can do the same for the ## MVP - https://gitlab.com/groups/gitlab-org/-/epics/9342+ ## MRs Under Review: - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/101442+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/104166+ Merged: - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/95168+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/95689+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/97003+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/97394+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/97815+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/98106+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/101251+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/98664+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/103451+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/102903+ - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/104267+ ## Questions to answer 1. How will we measure success? Qualitatively through user engagement (issues created, bugs raised, social, etc) and ServicePing (number of of experiments created). A threshold for success still needs to be decided. ## Discarded Options 1. MLFlow is MIT licensed, so we could package and deploy along GitLab. - Advantages: - Not needing to code our own Tracking feature - Server is sync with future Client releases - Easier to integrate with existing MLFlow installations - Disadvantages: - Databricks might decide it's not MIT anymore - Integrating with the rest of the application becomes harder - The tracking service is very simple. Packaging it as an service is considerably more complex - By using RoR, I can rely on the experience of our devs

epic