Adds Run endpoints for MLFlow Integration
- What does this MR do and why?
- Database ~database
- How to Reproduce
- How to set up and validate locally
What does this MR do and why?
Adds MLFlow endpoints for the Run entity, which map to GitLab Candidates.
This MR is part of a larger feature to be released on 15.5, gated by :ml_experiment_tracking feature flag
database
DatabaseMigrations
Up:
bin/rails db:migrate RAILS_ENV=test
main: == 20220902155105 AddStartTimeAndEndTimeAndStatusToMlCandidates: migrating ====
main: -- add_column(:ml_candidates, :start_time, :bigint)
main: -> 0.0065s
main: -- add_column(:ml_candidates, :end_time, :bigint)
main: -> 0.0010s
main: -- add_column(:ml_candidates, :status, :smallint, {:default=>0})
main: -> 0.0047s
main: == 20220902155105 AddStartTimeAndEndTimeAndStatusToMlCandidates: migrated (0.0135s)
Down:
bin/rails db:rollback:main RAILS_ENV=test
main: == 20220902155105 AddStartTimeAndEndTimeAndStatusToMlCandidates: reverting ====
main: -- remove_column(:ml_candidates, :status, :integer, {:default=>0})
main: -> 0.0056s
main: -- remove_column(:ml_candidates, :end_time, :bigint)
main: -> 0.0008s
main: -- remove_column(:ml_candidates, :start_time, :bigint)
main: -> 0.0019s
main: == 20220902155105 AddStartTimeAndEndTimeAndStatusToMlCandidates: reverted (0.0114s)
Queries
- ::Ml::Candidate.with_project_id_and_iid
SELECT
"ml_candidates".*
FROM
"ml_candidates"
INNER JOIN "ml_experiments" "experiment" ON "experiment"."id" = "ml_candidates"."experiment_id"
WHERE
"experiment"."project_id" = 29
AND "ml_candidates"."iid" = 'fe220020-0314-4bc5-b189-718ca9615285';
EXPLAIN ANALYZE
SELECT
"ml_candidates".*
FROM
"ml_candidates"
INNER JOIN "ml_experiments" "experiment" ON "experiment"."id" = "ml_candidates"."experiment_id"
WHERE
"experiment"."project_id" = 29
AND "ml_candidates"."iid" = 'fe220020-0314-4bc5-b189-718ca9615285';
Nested Loop (cost=0.30..12.51 rows=1 width=76) (actual time=0.046..0.058 rows=1 loops=1)
-> Index Scan using index_ml_experiments_on_project_id_and_name on ml_experiments experiment (cost=0.15..5.22 rows=4 width=8) (actual time=0.018..0.024 rows=38 loops=1)
Index Cond: (project_id = 29)
-> Index Scan using index_ml_candidates_on_experiment_id_and_iid on ml_candidates (cost=0.15..1.67 rows=1 width=76) (actual time=0.001..0.001 rows=0 loops=38)
Index Cond: ((experiment_id = experiment.id) AND (iid = 'fe220020-0314-4bc5-b189-718ca9615285'::uuid))
Planning Time: 0.163 ms
Execution Time: 0.089 ms
How to Reproduce
How to set up and validate locally
-
Create a Project and a project access token, with api level:
export PROJECT_ID=<Your Project Id> export GITLAB_PAT=<your api token>
-
Create an Experiment:
curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
-
This should 404 as the FF is off
-
Enable the Feature flag
echo "Feature.enable(:ml_experiment_tracking)" | bundle exec rails c
-
Create Again, now it should output
{"experiment_id"="1"}
curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
{"experiment_id":"1"}
-
Create a Run
curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d experiment_id=1 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/create
-
Query Run By Id
curl -X GET -H "Authorization: Bearer $GITLAB_PAT" http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/get?run_id=<RUN_ID>
-
Update
curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d status=FAILED -d end_time=12345678 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/update
Differences between APIs
Autogenerated in https://gitlab.com/gitlab-org/incubation-engineering/mlops/mlflow_experiment/-/blob/main/results.md
POST /runs/create
When experiment exists
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/create | http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/create |
Params | {} | {} |
Body | {'experiment_id': '78', 'start_time': 1234} | {'experiment_id': '38', 'start_time': 1234} |
Status Code | 200 | 201 |
Reponse | { "run": { "info": { "run_uuid": "d38aec53aabf4cf39d4673432f5dff32", "experiment_id": "78", "user_id": "", "status": "RUNNING", "start_time": 1234, "artifact_uri": "./mlruns2/78/d38aec53aabf4cf39d4673432f5dff32/artifacts", "lifecycle_stage": "active", "run_id": "d38aec53aabf4cf39d4673432f5dff32" }, "data": {} } } | { "run": { "info": { "run_id": "7f634288-5109-4561-91f4-77d7b6435d6d", "run_uuid": "7f634288-5109-4561-91f4-77d7b6435d6d", "experiment_id": "38", "start_time": 1234, "status": "RUNNING", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "45" }, "data": {} } } |
When experiment does not exist
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/create | http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/create |
Params | {} | {} |
Body | {'experiment_id': 'asasdfsadf'} | {'experiment_id': 'asasdfsadf'} |
Status Code | 404 | 400 |
Reponse | { "error_code": "RESOURCE_DOES_NOT_EXIST", "message": "No Experiment with id=asasdfsadf exists" } | { "error": "experiment_id is invalid" } |
When experiment is not passed
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/create | http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/create |
Params | {} | {} |
Body | {'yolo': 'asasdfsadf'} | {'yolo': 'asasdfsadf'} |
Status Code | 400 | 400 |
Reponse | { "error_code": "BAD_REQUEST", "message": "(sqlite3.IntegrityError) FOREIGN KEY constraint failed\n[SQL: INSERT INTO runs (run_uuid, name, source_type, source_name, entry_point_name, user_id, status, start_time, end_time, source_version, lifecycle_stage, artifact_uri, experiment_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]\n[parameters: ('247f569dc50b4093a526cc1ca48f7c16', '', 'UNKNOWN', '', '', '', 'RUNNING', 0, None, '', 'active', './mlruns2/0/247f569dc50b4093a526cc1ca48f7c16/artifacts', '')]\n(Background on this error at: https://sqlalche.me/e/14/gkpj)" } | { "error": "experiment_id is missing" } |
GET /runs/get
When run exists
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/get | http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/get |
Params | {'run_id': 'd38aec53aabf4cf39d4673432f5dff32'} | {'run_id': '7f634288-5109-4561-91f4-77d7b6435d6d'} |
Body | {} | {} |
Status Code | 200 | 200 |
Reponse | { "run": { "info": { "run_uuid": "d38aec53aabf4cf39d4673432f5dff32", "experiment_id": "78", "user_id": "", "status": "RUNNING", "start_time": 1234, "artifact_uri": "./mlruns2/78/d38aec53aabf4cf39d4673432f5dff32/artifacts", "lifecycle_stage": "active", "run_id": "d38aec53aabf4cf39d4673432f5dff32" }, "data": {} } } | { "run": { "info": { "run_id": "7f634288-5109-4561-91f4-77d7b6435d6d", "run_uuid": "7f634288-5109-4561-91f4-77d7b6435d6d", "experiment_id": "38", "start_time": 1234, "status": "RUNNING", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "45" }, "data": {} } } |
When run does not exist
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/get | http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/get |
Params | {'run_id': 'asasdfsadf'} | {'run_id': 'asasdfsadf'} |
Body | {} | {} |
Status Code | 404 | 404 |
Reponse | { "error_code": "RESOURCE_DOES_NOT_EXIST", "message": "Run with id=asasdfsadf not found" } | { "error_code": "RESOURCE_DOES_NOT_EXIST" } |
POST /runs/update
When run exists
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/update | http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/update |
Params | {} | {} |
Body | {'run_id': 'd38aec53aabf4cf39d4673432f5dff32', 'status': 'FAILED', 'end_time': 12345678} | {'run_id': '7f634288-5109-4561-91f4-77d7b6435d6d', 'status': 'FAILED', 'end_time': 12345678} |
Status Code | 200 | 201 |
Reponse | { "run_info": { "run_uuid": "d38aec53aabf4cf39d4673432f5dff32", "experiment_id": "78", "user_id": "", "status": "FAILED", "start_time": 1234, "end_time": 12345678, "artifact_uri": "./mlruns2/78/d38aec53aabf4cf39d4673432f5dff32/artifacts", "lifecycle_stage": "active", "run_id": "d38aec53aabf4cf39d4673432f5dff32" } } | { "run_info": { "run_id": "7f634288-5109-4561-91f4-77d7b6435d6d", "run_uuid": "7f634288-5109-4561-91f4-77d7b6435d6d", "experiment_id": "38", "start_time": 1234, "end_time": 12345678, "status": "FAILED", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "45" } } |
When run does not exist
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/update | http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/update |
Params | {} | {} |
Body | {'run_id': 'asasdfsadf'} | {'run_id': 'asasdfsadf'} |
Status Code | 404 | 404 |
Reponse | { "error_code": "RESOURCE_DOES_NOT_EXIST", "message": "Run with id=asasdfsadf not found" } | { "error_code": "RESOURCE_DOES_NOT_EXIST" } |
When run exists, but state is invalid
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/update | http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/update |
Params | {} | {} |
Body | {'run_id': 'd38aec53aabf4cf39d4673432f5dff32', 'status': 'YOLO'} | {'run_id': '7f634288-5109-4561-91f4-77d7b6435d6d', 'status': 'YOLO'} |
Status Code | 200 | 400 |
Reponse | { "run_info": { "run_uuid": "d38aec53aabf4cf39d4673432f5dff32", "experiment_id": "78", "user_id": "", "status": "RUNNING", "start_time": 1234, "artifact_uri": "./mlruns2/78/d38aec53aabf4cf39d4673432f5dff32/artifacts", "lifecycle_stage": "active", "run_id": "d38aec53aabf4cf39d4673432f5dff32" } } | { "message": "400 Bad request - Invalid status" } |
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #370478 (closed)
Edited by Eduardo Bonet