Skip to content

MLflow model registry integration breaking errors

To propose your idea effectively as a GitLab issue, you need to format it clearly and comprehensively. Below is a refined draft based on your input:


Release notes

Summary: Improve the integration of GitLab's model registry with MLflow to better align with MLflow's functionalities and remove roadblocks for users leveraging MLflow's API.
Proposed Solution: Update GitLab's model registry to eliminate semantic versioning and use artifact-based storage for model versions while enhancing support for aliases for better environment tracking (production, pre-prod, etc.). Currently, this poses as a problem to use a set of functions that are native of the mlflow api and result in difficulties in integrating the gitlab's mlflow client in the applications. Programmatic access to the registered models would suppose a great push for productivity to ML teams.


Problem to solve

As a Data Scientist using MLflow's model registry, I encounter several challenges with GitLab's model registry that limit usability:

  1. Semantic versioning incompatibility: MLflow's pyfunc.load_model() and other basic API calls fail due to GitLab enforcing semantic versioning, which MLflow does not support.
  2. Artifact handling: GitLab stores artifacts in the package registry, which is inconsistent with MLflow's direct artifact-based storage for models.
  3. Alias limitations: GitLab's model registry does not allow me to use aliases (e.g., production, pre-prod) effectively, making it difficult to track deployment stages.

User story:
As a Data Scientist using MLflow with GitLab, I want seamless compatibility between GitLab's model registry and MLflow's native functionalities so I can manage models efficiently without workarounds.


Intended users

  • Sasha (Software Developer): Building and deploying machine learning solutions.
  • Parker (Product Manager): Managing ML workflows and deployments.
  • Priyanka (Platform Engineer): Maintaining infrastructure for ML deployments.

User experience goal

The user should be able to:

  1. Register and retrieve models using MLflow's API (mlflow.pyfunc.load_model()) without semantic versioning issues. Also, add suport for other key functions like client.get_model_version_by_alias(model_name, "champion"), etc.
  2. Store and manage model artifacts directly in the model registry, aligning with MLflow's artifact structure.
  3. Use aliases like champion, challenger, etc., to tag models for specific deployment stages effectively.

Proposal

Functional Changes:

  1. Disable semantic versioning for model registry versions.
    Allow users to register models without forcing semantic versioning, ensuring compatibility with MLflow.

  2. Store artifacts as model artifacts.
    Shift storage from the package registry to the model registry so artifacts align with MLflow's structure.

  3. Enhance alias support.
    Introduce a feature to assign and manage aliases (production, pre-prod, etc.) for models to track deployment environments.

User Journey:

  • Users can interact with GitLab's model registry via the MLflow API without breaking functionality.
  • Models will be stored as artifacts and retrieved directly for seamless deployment.
  • Aliases will be managed in the GitLab UI/API for better deployment tracking.

Further details

Benefits:

  • Improved compatibility with MLflow's core functionality.
  • Reduced friction in workflows for ML model management.
  • Better tracking of models across deployment stages.

Risks:

  • Possible migration challenges for existing model registry users.

Permissions and Security

No changes to permissions are required for this update. Existing model registry permissions should apply consistently.


Documentation

  • Update documentation for the model registry to reflect changes in artifact storage, semantic versioning options, and alias support.
  • Include examples of MLflow integration with GitLab's updated model registry.

Availability & Testing

Testing:

  • Unit tests for MLflow integration points (e.g., mlflow.pyfunc.load_model()).
  • End-to-end tests for registering, retrieving, and aliasing models via GitLab's UI and API.

Risks:

  • Users may have legacy models stored with semantic versioning. Provide clear migration guidance.

Available Tier

This feature should be available to all the current tiers who can use the model registry.


Feature Usage Metrics

Track the following metrics:

  • Number of models registered without semantic versioning.
  • Alias usage frequency (production, pre-prod, etc.).
  • API calls for artifact retrieval.

Success Criteria

Metrics:

  • Increase in original Mlflow API functions which can be used with the gitlab integration.

Outcomes:

  • User feedback indicating smoother workflows with MLflow and GitLab.
  • Reduced support tickets related to MLflow integration.

Links / references