Incorporate Model Registery into DS Pipelines

We have investigated the Model Registry and after thorough test, we are not able to implement it for our DS/ML pipeline for two main reasons:

Incompatibility between Model Experiments and Model Registry
- Currently all our models in development and in production exist in Model Experiments. There is no way to import them into the Model Registry. We can manually download the model artifacts from Model Experiments and the manually upload them to the Model Registry, but that does not preserve the metadata about the model (performance metrics, etc)
Inability to deploy models in Model Registry
- Once a model is in the Model Registry, there is not a whole lot we can do with it because there are no APIs that allow us to retrieve data from the Model Registry. That is, unlike Model Experiments, there is no mlflow.artifacts.download_artifacts command , and unlike generic packages, there is no /projects/:id/packages/generic/:package_name/:package_version/:file_name api to to retrieve model artifacts and use in a production pipeline
- There is one suggested work around, but it is cumbersome, requires knowledge of APIs, and not compatible with our python workflows. In involves first using the GitLab REST API to retrieve the model name and version number from the model registry, then the GraphQL API to retrieve the download path of each model artifact, and finally loop through the download URLs for each artifact to download to the production environment (rough example below). If we could get the workaround to work, it's a lot of hoops to jump through for any data scientist when the easier solution is to just store the model directly in the repository. It would be great to see a seamless way (both in the UI and via 1 API) to be able to deploy a model.

## Determine Model Registry Package ID from Model Name and Model Version
import requests, json
from pandas import json_normalize

gitlab_api_token=os.getenv("MLFLOW_TRACKING_TOKEN")
project_id = 35142505
model_name = 'test'
model_version = '2.1.0'

#Get package id
url = f'https://gitlab.com/api/v4/projects/{project_id}/packages?package_name={model_name}&package_type=ml_model&package_version={model_version}'

headers = {'Authorization': f'Bearer {gitlab_api_token}'}
get_package_id = requests.get(url, headers=headers)
package_id = json_normalize(get_package_id.json())['id'][0]
print(package_id)

## Determine Download Path for Each Artifact
### CANNOT GET TO WORK IN PYTHON
curl "https://gitlab.com/api/graphql" --header "Authorization: Bearer GRAPHQL_TOKEN" \
     --header "Content-Type: application/json" --request POST \
     --data '{"query": "query getPackageFiles($id: PackagesPackageID!, $first: Int, $last: Int, $after: String, $before: String) {\n  package(id: $id) {\n    id\n    packageFiles(after: $after, before: $before, first: $first, last: $last) {\n      pageInfo {\n        ...PageInfo\n        __typename\n      }\n      nodes {\n        id\n        fileMd5\n        fileName\n        fileSha1\n        fileSha256\n        size\n        createdAt\n        downloadPath\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n\nfragment PageInfo on PageInfo {\n  hasNextPage\n  hasPreviousPage\n  startCursor\n  endCursor\n  __typename\n}\n", "variables": {"first": 20, "id": "gid://gitlab/Packages::Package/26314133"}}'

## Loop and download each model artifact
## eg. www.gitlab.com/gitlab-data/data-science-projects/propensity-to-purchase/-/package_files/132885863/download, www.gitlab.com/gitlab-data/data-science-projects/propensity-to-purchase/-/package_files/132885851/download

Edited Jul 23, 2024 by Kevin Dietz

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information