Skip to content

Add MLFlow Rest API with 3 endpoints

Eduardo Bonet requested to merge 370478-add-mlflow-compatibility into master

What does this MR do and why?

We are incubating ML Experiment Tracking as a GitLab feature, and to ease in users we aim to provide API parity to MLFlow API (#370478 (closed)). This MR adds the first three endpoints

How to set up and validate locally

  1. Create a Project and an personal access token:

    export PROJECT_ID=<Your Project Id>
    export GITLAB_PAT=<your api token>
  2. Create an Experiment:

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
  3. This should 404 as the FF is off

  4. Enable the Feature flag

    rails console
    echo "Feature.enable(:ml_experiment_tracking)" | bundle exec rails c
  5. Create Again, now it should output {"experiment_id"="1"}

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
    {"experiment_id":"1"} 
  6. You should be able to see an experiment created

    echo "::Ml::Experiment.all" | bundle exec rails c
  7. Query by name

    curl -X GET -H "Authorization: Bearer $GITLAB_PAT" http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/get-by-name?experiment_name=my_cool_experiment
    {"experiment":{"experiment_id":"1","name":"my_cool_experiment","lifecycle_stage":"active","artifact_location":"not_implemented"}}
  8. Query by experiment_id

    curl -X GET -H "Authorization: Bearer $GITLAB_PAT" http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/get?experiment_id=1
    {"experiment":{"experiment_id":"1","name":"my_cool_experiment","lifecycle_stage":"active","artifact_location":"not_implemented"}}

DB Migration

Up

rails db:migrate:down:main VERSION=20220818132108

main: == 20220818132108 AddDeletedOnToMlExperiments: migrating ======================
main: -- add_column(:ml_experiments, :deleted_on, :datetime_with_timezone, {:index=>true})
main:    -> 0.0073s
main: == 20220818132108 AddDeletedOnToMlExperiments: migrated (0.0087s) =============

Down

rails db:migrate:down:main VERSION=20220818132108

main: == 20220818132108 AddDeletedOnToMlExperiments: reverting ======================
main: -- remove_column(:ml_experiments, :deleted_on, :datetime_with_timezone, {:index=>true})
main:    -> 0.0061s
main: == 20220818132108 AddDeletedOnToMlExperiments: reverted (0.0138s) =============

DB Queries

ML::Experiment.find_by_project_id_and_iid

Ml::Experiment.by_project_id_and_iid(1, 1)
  Ml::Experiment Load (12.8ms)  SELECT "ml_experiments".* FROM "ml_experiments" WHERE "ml_experiments"."project_id" = 1 AND "ml_experiments"."iid" = 1 LIMIT 1 /*application:console,db_config_name:main,line:/app/models/ml/experiment.rb:21:in `by_project_id_and_iid'*

Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/11796/commands/41821

ML::Experiment.find_by_project_and_name

Ml::Experiment.by_project_id_and_name(1, "abc")
  Ml::Experiment Load (6.0ms)  SELECT "ml_experiments".* FROM "ml_experiments" WHERE "ml_experiments"."project_id" = 1 AND "ml_experiments"."name" = 'abc' LIMIT 1

Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/11796/commands/41819

ML::Experiment.has_record?

::Ml::Experiment.has_record?(29, 'my_cool_experiment')
  Ml::Experiment Exists? (1.5ms)  SELECT 1 AS one FROM "ml_experiments" WHERE "ml_experiments"."project_id" = 29 AND "ml_experiments"."name" = 'my_cool_experiment' LIMIT 1 /*application:console,db_config_name:main,line:/app/models/ml/experiment.rb:29:in `has_record?'*/
=> true

Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/11796/commands/41820

Difference between APIs

The response below were generated with the script found in https://gitlab.com/gitlab-org/incubation-engineering/mlops/mlflow_experiment/-/blob/main/api_parity.py

Create Experiment

When name does not yet exist

GITLAB

POST -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/create
Params: {}
Body: {'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 201
Response Body: {'experiment_id': '12'}

MLFLOW

POST -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/create
Params: {}
Body: {'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 200
Response Body: {'experiment_id': '21'}

When name already exists

GITLAB

POST -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/create
Params: {}
Body: {'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 400
Response Body: {'error_code': 'RESOURCE_ALREADY_EXISTS'}

MLFLOW

POST -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/create
Params: {}
Body: {'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 400
Response Body: {'error_code': 'RESOURCE_ALREADY_EXISTS', 'message': "Experiment(name=aa2da357-282b-4a81-a39e-322f0eb70af9) already exists. Error: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)\n(sqlite3.IntegrityError) UNIQUE constraint failed: experiments.name\n[SQL: INSERT INTO experiments (name, artifact_location, lifecycle_stage) VALUES (?, ?, ?)]\n[parameters: ('aa2da357-282b-4a81-a39e-322f0eb70af9', '', 'active')]\n(Background on this error at: https://sqlalche.me/e/14/gkpj)"}

When name is missing

GITLAB

POST -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/create
Params: {}
Body: {'other_key': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 400
Response Body: {'error': 'name is missing'}

MLFLOW

POST -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/create
Params: {}
Body: {'other_key': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 400
Response Body: {'error_code': 'INVALID_PARAMETER_VALUE', 'message': "Missing value for required parameter 'name'. See the API docs for more information about request parameters."}

Get By Id

When id exists

GITLAB

GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get
Params: {'experiment_id': '12'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '12', 'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9', 'lifecycle_stage': 'active', 'artifact_location': 'not_implemented'}}

MLFLOW

GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get
Params: {'experiment_id': '21'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '21', 'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9', 'artifact_location': './mlruns2/21', 'lifecycle_stage': 'active'}}

When id does not exist

GITLAB

GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get
Params: {'experiment_id': 'asasdfsadf'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST'}

MLFLOW

GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get
Params: {'experiment_id': 'asasdfsadf'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST', 'message': 'No Experiment with id=asasdfsadf exists'}

When id is missing

GITLAB

GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get
Params: {'yolo': '12'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST'}

MLFLOW

GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get
Params: {'yolo': '12'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '0', 'name': 'Default', 'artifact_location': './mlruns2/0', 'lifecycle_stage': 'active'}}

Get By Name

When name exists

GITLAB

GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get-by-name
Params: {'experiment_name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '12', 'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9', 'lifecycle_stage': 'active', 'artifact_location': 'not_implemented'}}

MLFLOW

GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get-by-name
Params: {'experiment_name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '21', 'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9', 'artifact_location': './mlruns2/21', 'lifecycle_stage': 'active'}}

When name does not exist

GITLAB

GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get-by-name
Params: {'experiment_name': 'abcde'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST'}

MLFLOW

GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get-by-name
Params: {'experiment_name': 'abcde'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST', 'message': "Could not find experiment with name 'abcde'"}

When name is missing

GITLAB

GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get-by-name
Params: {'yolo': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST'}

MLFLOW

GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get-by-name
Params: {'yolo': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST', 'message': "Could not find experiment with name ''"}

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #370478 (closed)

Edited by Eduardo Bonet

Merge request reports