Add MLFlow Rest API with 3 endpoints
- What does this MR do and why?
- How to set up and validate locally
- DB Migration
- DB Queries
- Difference between APIs
- MR acceptance checklist
What does this MR do and why?
We are incubating ML Experiment Tracking as a GitLab feature, and to ease in users we aim to provide API parity to MLFlow API (#370478 (closed)). This MR adds the first three endpoints
How to set up and validate locally
-
Create a Project and an personal access token:
export PROJECT_ID=<Your Project Id> export GITLAB_PAT=<your api token>
-
Create an Experiment:
curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
-
This should 404 as the FF is off
-
Enable the Feature flag
rails console echo "Feature.enable(:ml_experiment_tracking)" | bundle exec rails c
-
Create Again, now it should output
{"experiment_id"="1"}
curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
{"experiment_id":"1"}
-
You should be able to see an experiment created
echo "::Ml::Experiment.all" | bundle exec rails c
-
Query by name
curl -X GET -H "Authorization: Bearer $GITLAB_PAT" http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/get-by-name?experiment_name=my_cool_experiment
{"experiment":{"experiment_id":"1","name":"my_cool_experiment","lifecycle_stage":"active","artifact_location":"not_implemented"}}
-
Query by experiment_id
curl -X GET -H "Authorization: Bearer $GITLAB_PAT" http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/get?experiment_id=1
{"experiment":{"experiment_id":"1","name":"my_cool_experiment","lifecycle_stage":"active","artifact_location":"not_implemented"}}
DB Migration
Up
rails db:migrate:down:main VERSION=20220818132108
main: == 20220818132108 AddDeletedOnToMlExperiments: migrating ======================
main: -- add_column(:ml_experiments, :deleted_on, :datetime_with_timezone, {:index=>true})
main: -> 0.0073s
main: == 20220818132108 AddDeletedOnToMlExperiments: migrated (0.0087s) =============
Down
rails db:migrate:down:main VERSION=20220818132108
main: == 20220818132108 AddDeletedOnToMlExperiments: reverting ======================
main: -- remove_column(:ml_experiments, :deleted_on, :datetime_with_timezone, {:index=>true})
main: -> 0.0061s
main: == 20220818132108 AddDeletedOnToMlExperiments: reverted (0.0138s) =============
DB Queries
ML::Experiment.find_by_project_id_and_iid
Ml::Experiment.by_project_id_and_iid(1, 1)
Ml::Experiment Load (12.8ms) SELECT "ml_experiments".* FROM "ml_experiments" WHERE "ml_experiments"."project_id" = 1 AND "ml_experiments"."iid" = 1 LIMIT 1 /*application:console,db_config_name:main,line:/app/models/ml/experiment.rb:21:in `by_project_id_and_iid'*
Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/11796/commands/41821
ML::Experiment.find_by_project_and_name
Ml::Experiment.by_project_id_and_name(1, "abc")
Ml::Experiment Load (6.0ms) SELECT "ml_experiments".* FROM "ml_experiments" WHERE "ml_experiments"."project_id" = 1 AND "ml_experiments"."name" = 'abc' LIMIT 1
Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/11796/commands/41819
ML::Experiment.has_record?
::Ml::Experiment.has_record?(29, 'my_cool_experiment')
Ml::Experiment Exists? (1.5ms) SELECT 1 AS one FROM "ml_experiments" WHERE "ml_experiments"."project_id" = 29 AND "ml_experiments"."name" = 'my_cool_experiment' LIMIT 1 /*application:console,db_config_name:main,line:/app/models/ml/experiment.rb:29:in `has_record?'*/
=> true
Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/11796/commands/41820
Difference between APIs
The response below were generated with the script found in https://gitlab.com/gitlab-org/incubation-engineering/mlops/mlflow_experiment/-/blob/main/api_parity.py
Create Experiment
When name does not yet exist
GITLAB
POST -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/create
Params: {}
Body: {'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 201
Response Body: {'experiment_id': '12'}
MLFLOW
POST -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/create
Params: {}
Body: {'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 200
Response Body: {'experiment_id': '21'}
When name already exists
GITLAB
POST -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/create
Params: {}
Body: {'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 400
Response Body: {'error_code': 'RESOURCE_ALREADY_EXISTS'}
MLFLOW
POST -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/create
Params: {}
Body: {'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 400
Response Body: {'error_code': 'RESOURCE_ALREADY_EXISTS', 'message': "Experiment(name=aa2da357-282b-4a81-a39e-322f0eb70af9) already exists. Error: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)\n(sqlite3.IntegrityError) UNIQUE constraint failed: experiments.name\n[SQL: INSERT INTO experiments (name, artifact_location, lifecycle_stage) VALUES (?, ?, ?)]\n[parameters: ('aa2da357-282b-4a81-a39e-322f0eb70af9', '', 'active')]\n(Background on this error at: https://sqlalche.me/e/14/gkpj)"}
When name is missing
GITLAB
POST -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/create
Params: {}
Body: {'other_key': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 400
Response Body: {'error': 'name is missing'}
MLFLOW
POST -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/create
Params: {}
Body: {'other_key': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Status Code: 400
Response Body: {'error_code': 'INVALID_PARAMETER_VALUE', 'message': "Missing value for required parameter 'name'. See the API docs for more information about request parameters."}
Get By Id
When id exists
GITLAB
GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get
Params: {'experiment_id': '12'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '12', 'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9', 'lifecycle_stage': 'active', 'artifact_location': 'not_implemented'}}
MLFLOW
GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get
Params: {'experiment_id': '21'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '21', 'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9', 'artifact_location': './mlruns2/21', 'lifecycle_stage': 'active'}}
When id does not exist
GITLAB
GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get
Params: {'experiment_id': 'asasdfsadf'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST'}
MLFLOW
GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get
Params: {'experiment_id': 'asasdfsadf'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST', 'message': 'No Experiment with id=asasdfsadf exists'}
When id is missing
GITLAB
GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get
Params: {'yolo': '12'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST'}
MLFLOW
GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get
Params: {'yolo': '12'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '0', 'name': 'Default', 'artifact_location': './mlruns2/0', 'lifecycle_stage': 'active'}}
Get By Name
When name exists
GITLAB
GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get-by-name
Params: {'experiment_name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '12', 'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9', 'lifecycle_stage': 'active', 'artifact_location': 'not_implemented'}}
MLFLOW
GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get-by-name
Params: {'experiment_name': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Body: {}
Status Code: 200
Response Body: {'experiment': {'experiment_id': '21', 'name': 'aa2da357-282b-4a81-a39e-322f0eb70af9', 'artifact_location': './mlruns2/21', 'lifecycle_stage': 'active'}}
When name does not exist
GITLAB
GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get-by-name
Params: {'experiment_name': 'abcde'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST'}
MLFLOW
GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get-by-name
Params: {'experiment_name': 'abcde'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST', 'message': "Could not find experiment with name 'abcde'"}
When name is missing
GITLAB
GET -- http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/experiments/get-by-name
Params: {'yolo': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST'}
MLFLOW
GET -- http://127.0.0.1:5000/api/2.0/mlflow/experiments/get-by-name
Params: {'yolo': 'aa2da357-282b-4a81-a39e-322f0eb70af9'}
Body: {}
Status Code: 404
Response Body: {'error_code': 'RESOURCE_DOES_NOT_EXIST', 'message': "Could not find experiment with name ''"}
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #370478 (closed)