Skip to content

Adds Run endpoints for MLFlow Integration

Eduardo Bonet requested to merge 370478-add-mlflow-endpoints-2 into master

What does this MR do and why?

Adds MLFlow endpoints for the Run entity, which map to GitLab Candidates.

This MR is part of a larger feature to be released on 15.5, gated by :ml_experiment_tracking feature flag

Database database

Migrations

Up:

bin/rails db:migrate RAILS_ENV=test
main: == 20220902155105 AddStartTimeAndEndTimeAndStatusToMlCandidates: migrating ====
main: -- add_column(:ml_candidates, :start_time, :bigint)
main:    -> 0.0065s
main: -- add_column(:ml_candidates, :end_time, :bigint)
main:    -> 0.0010s
main: -- add_column(:ml_candidates, :status, :smallint, {:default=>0})
main:    -> 0.0047s
main: == 20220902155105 AddStartTimeAndEndTimeAndStatusToMlCandidates: migrated (0.0135s)

Down:

bin/rails db:rollback:main RAILS_ENV=test
main: == 20220902155105 AddStartTimeAndEndTimeAndStatusToMlCandidates: reverting ====
main: -- remove_column(:ml_candidates, :status, :integer, {:default=>0})
main:    -> 0.0056s
main: -- remove_column(:ml_candidates, :end_time, :bigint)
main:    -> 0.0008s
main: -- remove_column(:ml_candidates, :start_time, :bigint)
main:    -> 0.0019s
main: == 20220902155105 AddStartTimeAndEndTimeAndStatusToMlCandidates: reverted (0.0114s)

Queries

  • ::Ml::Candidate.with_project_id_and_iid
SELECT
    "ml_candidates".*
FROM
    "ml_candidates"
    INNER JOIN "ml_experiments" "experiment" ON "experiment"."id" = "ml_candidates"."experiment_id"
WHERE
    "experiment"."project_id" = 29
    AND "ml_candidates"."iid" = 'fe220020-0314-4bc5-b189-718ca9615285';
EXPLAIN ANALYZE
SELECT
    "ml_candidates".*
FROM
    "ml_candidates"
    INNER JOIN "ml_experiments" "experiment" ON "experiment"."id" = "ml_candidates"."experiment_id"
WHERE
    "experiment"."project_id" = 29
    AND "ml_candidates"."iid" = 'fe220020-0314-4bc5-b189-718ca9615285';
Nested Loop  (cost=0.30..12.51 rows=1 width=76) (actual time=0.046..0.058 rows=1 loops=1)
  ->  Index Scan using index_ml_experiments_on_project_id_and_name on ml_experiments experiment  (cost=0.15..5.22 rows=4 width=8) (actual time=0.018..0.024 rows=38 loops=1)
        Index Cond: (project_id = 29)
  ->  Index Scan using index_ml_candidates_on_experiment_id_and_iid on ml_candidates  (cost=0.15..1.67 rows=1 width=76) (actual time=0.001..0.001 rows=0 loops=38)
        Index Cond: ((experiment_id = experiment.id) AND (iid = 'fe220020-0314-4bc5-b189-718ca9615285'::uuid))
Planning Time: 0.163 ms
Execution Time: 0.089 ms

How to Reproduce

How to set up and validate locally

  1. Create a Project and a project access token, with api level:

    export PROJECT_ID=<Your Project Id>
    export GITLAB_PAT=<your api token>
  2. Create an Experiment:

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
  3. This should 404 as the FF is off

  4. Enable the Feature flag

    echo "Feature.enable(:ml_experiment_tracking)" | bundle exec rails c
  5. Create Again, now it should output {"experiment_id"="1"}

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
    {"experiment_id":"1"} 
  6. Create a Run

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d experiment_id=1 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/create
  7. Query Run By Id

    curl -X GET -H "Authorization: Bearer $GITLAB_PAT" http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/get?run_id=<RUN_ID>
  8. Update

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d status=FAILED -d end_time=12345678  http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/update

Differences between APIs

Autogenerated in https://gitlab.com/gitlab-org/incubation-engineering/mlops/mlflow_experiment/-/blob/main/results.md

POST /runs/create

When experiment exists

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/create http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/create
Params {} {}
Body {'experiment_id': '78', 'start_time': 1234} {'experiment_id': '38', 'start_time': 1234}
Status Code 200 201
Reponse { "run": { "info": { "run_uuid": "d38aec53aabf4cf39d4673432f5dff32", "experiment_id": "78", "user_id": "", "status": "RUNNING", "start_time": 1234, "artifact_uri": "./mlruns2/78/d38aec53aabf4cf39d4673432f5dff32/artifacts", "lifecycle_stage": "active", "run_id": "d38aec53aabf4cf39d4673432f5dff32" }, "data": {} } } { "run": { "info": { "run_id": "7f634288-5109-4561-91f4-77d7b6435d6d", "run_uuid": "7f634288-5109-4561-91f4-77d7b6435d6d", "experiment_id": "38", "start_time": 1234, "status": "RUNNING", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "45" }, "data": {} } }

When experiment does not exist

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/create http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/create
Params {} {}
Body {'experiment_id': 'asasdfsadf'} {'experiment_id': 'asasdfsadf'}
Status Code 404 400
Reponse { "error_code": "RESOURCE_DOES_NOT_EXIST", "message": "No Experiment with id=asasdfsadf exists" } { "error": "experiment_id is invalid" }

When experiment is not passed

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/create http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/create
Params {} {}
Body {'yolo': 'asasdfsadf'} {'yolo': 'asasdfsadf'}
Status Code 400 400
Reponse { "error_code": "BAD_REQUEST", "message": "(sqlite3.IntegrityError) FOREIGN KEY constraint failed\n[SQL: INSERT INTO runs (run_uuid, name, source_type, source_name, entry_point_name, user_id, status, start_time, end_time, source_version, lifecycle_stage, artifact_uri, experiment_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]\n[parameters: ('247f569dc50b4093a526cc1ca48f7c16', '', 'UNKNOWN', '', '', '', 'RUNNING', 0, None, '', 'active', './mlruns2/0/247f569dc50b4093a526cc1ca48f7c16/artifacts', '')]\n(Background on this error at: https://sqlalche.me/e/14/gkpj)" } { "error": "experiment_id is missing" }

GET /runs/get

When run exists

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/get http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/get
Params {'run_id': 'd38aec53aabf4cf39d4673432f5dff32'} {'run_id': '7f634288-5109-4561-91f4-77d7b6435d6d'}
Body {} {}
Status Code 200 200
Reponse { "run": { "info": { "run_uuid": "d38aec53aabf4cf39d4673432f5dff32", "experiment_id": "78", "user_id": "", "status": "RUNNING", "start_time": 1234, "artifact_uri": "./mlruns2/78/d38aec53aabf4cf39d4673432f5dff32/artifacts", "lifecycle_stage": "active", "run_id": "d38aec53aabf4cf39d4673432f5dff32" }, "data": {} } } { "run": { "info": { "run_id": "7f634288-5109-4561-91f4-77d7b6435d6d", "run_uuid": "7f634288-5109-4561-91f4-77d7b6435d6d", "experiment_id": "38", "start_time": 1234, "status": "RUNNING", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "45" }, "data": {} } }

When run does not exist

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/get http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/get
Params {'run_id': 'asasdfsadf'} {'run_id': 'asasdfsadf'}
Body {} {}
Status Code 404 404
Reponse { "error_code": "RESOURCE_DOES_NOT_EXIST", "message": "Run with id=asasdfsadf not found" } { "error_code": "RESOURCE_DOES_NOT_EXIST" }

POST /runs/update

When run exists

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/update http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/update
Params {} {}
Body {'run_id': 'd38aec53aabf4cf39d4673432f5dff32', 'status': 'FAILED', 'end_time': 12345678} {'run_id': '7f634288-5109-4561-91f4-77d7b6435d6d', 'status': 'FAILED', 'end_time': 12345678}
Status Code 200 201
Reponse { "run_info": { "run_uuid": "d38aec53aabf4cf39d4673432f5dff32", "experiment_id": "78", "user_id": "", "status": "FAILED", "start_time": 1234, "end_time": 12345678, "artifact_uri": "./mlruns2/78/d38aec53aabf4cf39d4673432f5dff32/artifacts", "lifecycle_stage": "active", "run_id": "d38aec53aabf4cf39d4673432f5dff32" } } { "run_info": { "run_id": "7f634288-5109-4561-91f4-77d7b6435d6d", "run_uuid": "7f634288-5109-4561-91f4-77d7b6435d6d", "experiment_id": "38", "start_time": 1234, "end_time": 12345678, "status": "FAILED", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "45" } }

When run does not exist

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/update http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/update
Params {} {}
Body {'run_id': 'asasdfsadf'} {'run_id': 'asasdfsadf'}
Status Code 404 404
Reponse { "error_code": "RESOURCE_DOES_NOT_EXIST", "message": "Run with id=asasdfsadf not found" } { "error_code": "RESOURCE_DOES_NOT_EXIST" }

When run exists, but state is invalid

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/update http://gdk.test:3000/api/v4/projects/29/ml/mflow/api/2.0/mlflow/runs/update
Params {} {}
Body {'run_id': 'd38aec53aabf4cf39d4673432f5dff32', 'status': 'YOLO'} {'run_id': '7f634288-5109-4561-91f4-77d7b6435d6d', 'status': 'YOLO'}
Status Code 200 400
Reponse { "run_info": { "run_uuid": "d38aec53aabf4cf39d4673432f5dff32", "experiment_id": "78", "user_id": "", "status": "RUNNING", "start_time": 1234, "artifact_uri": "./mlruns2/78/d38aec53aabf4cf39d4673432f5dff32/artifacts", "lifecycle_stage": "active", "run_id": "d38aec53aabf4cf39d4673432f5dff32" } } { "message": "400 Bad request - Invalid status" }

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #370478 (closed)

Edited by Eduardo Bonet

Merge request reports