Eduardo Bonet requested to merge 370478-add-mlflow-endpoints-3 into master Sep 08, 2022

What does this MR do and why?

Enables logging metrics to a ML Experiment Candidate

Screenshots or screen recordings

Database

Migration

❯ bin/rails db:migrate:main RAILS_ENV=test
main: == 20220913084123 AddTimestampToMlCandidateMetrics: migrating =================
main: -- add_column(:ml_candidate_metrics, :tracked_at, :bigint)
main:    -> 0.0059s
main: == 20220913084123 AddTimestampToMlCandidateMetrics: migrated (0.0072s) ========

Down

bin/rails db:rollback:ci RAILS_ENV=test

ci: == 20220913084123 AddTimestampToMlCandidateMetrics: reverting =================
ci: -- remove_column(:ml_candidate_metrics, :tracked_at, :bigint)
ci:    -> 0.0078s
ci: == 20220913084123 AddTimestampToMlCandidateMetrics: reverted (0.0120s) ========

Queries

:Ml::Candidate.with_project_id_and_iid(31, "0b4a000b-566e-445d-9228-4cc1aafa7d3e", include_associations: true)

SELECT
    "ml_candidates".*
FROM
    "ml_candidates"
    INNER JOIN "ml_experiments" "experiment" ON "experiment"."id" = "ml_candidates"."experiment_id"
WHERE
    "experiment"."project_id" = 31
    AND "ml_candidates"."iid" = '0b4a000b-566e-445d-9228-4cc1aafa7d3e'
ORDER BY
    "ml_candidates"."id" ASC
LIMIT 1;

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/12119/commands/42998

Running explain analyse locally:

  ->  Sort  (cost=12.52..12.52 rows=1 width=74) (actual time=0.031..0.032 rows=1 loops=1)
        Sort Key: ml_candidates.id
        Sort Method: quicksort  Memory: 25kB
        ->  Nested Loop  (cost=0.30..12.51 rows=1 width=74) (actual time=0.023..0.027 rows=1 loops=1)
              ->  Index Scan using index_ml_experiments_on_project_id_and_name on ml_experiments experiment  (cost=0.15..5.22 rows=4 width=8) (actual time=0.009..0.012 rows=11 loops=1)
                    Index Cond: (project_id = 31)
              ->  Index Scan using index_ml_candidates_on_experiment_id_and_iid on ml_candidates  (cost=0.15..1.67 rows=1 width=74) (actual time=0.001..0.001 rows=0 loops=11)
                    Index Cond: ((experiment_id = experiment.id) AND (iid = '0b4a000b-566e-445d-9228-4cc1aafa7d3e'::uuid))
Planning Time: 0.148 ms
Execution Time: 0.053 ms

candidate.metrics

SELECT
    "ml_candidate_metrics".*
FROM
    "ml_candidate_metrics"
WHERE
    "ml_candidate_metrics"."candidate_id" = 1;

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/12119/commands/43002

How to Reproduce

How to set up and validate locally

Create a Project and a project access token, with api level:

export PROJECT_ID=<Your Project Id>
export GITLAB_PAT=<your api token>

Enable the Feature flag

echo "Feature.enable(:ml_experiment_tracking)" | bundle exec rails c

Create an Experiment:

curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create

Create a Run, and make a note of the run id returned

curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d experiment_id=1 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/create

Log a Metric

curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d run_id="<RUN_ID>" -d key=hello -d value=10.0 -d timestamp=12345 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/log-metric

Get the run now has a metric in the run.data.metrics field

curl -X GET -H "Authorization: Bearer $GITLAB_PAT" "http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/get?run_id=<RUN_ID>"

{
 "run": {
  "info": {
   "run_id": "<RUNID>",
   "run_uuid": "<RUNID>",
   "experiment_id": "3",
   "start_time": 0,
   "status": "RUNNING",
   "artifact_uri": "not_implemented",
   "lifecycle_stage": "active",
   "user_id": "1"
  },
  "data": {
   "metrics": [
     {
       "key": "hello",
       "value": 10,
       "timestamp": 12345
     }
   ]
  }
 }
}

Difference in API responses

POST /runs/log-metric

When run exists

	Mlflow	Gitlab
URL	http://127.0.0.1:5000/api/2.0/mlflow/runs/log-metric	http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/log-metric
Params	{}	{}
Body	{'run_id': '9ada49268205448a8396004819309379', 'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3}	{'run_id': '0b4a000b-566e-445d-9228-4cc1aafa7d3e', 'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3}
Status Code	200	201
Reponse	{}	{}

When run id is not passed

	Mlflow	Gitlab
URL	http://127.0.0.1:5000/api/2.0/mlflow/runs/log-metric	http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/log-metric
Params	{}	{}
Body	{'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3}	{'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3}
Status Code	400	400
Reponse	{ "error_code": "INVALID_PARAMETER_VALUE", "message": "Missing value for required parameter 'run_id'. See the API docs for more information about request parameters." }	{ "error": "run_id is missing" }

When key is not passed

	Mlflow	Gitlab
URL	http://127.0.0.1:5000/api/2.0/mlflow/runs/log-metric	http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/log-metric
Params	{}	{}
Body	{'run_id': '9ada49268205448a8396004819309379', 'value': 10.0, 'timestamp': 12345678, 'step': 3}	{'run_id': '0b4a000b-566e-445d-9228-4cc1aafa7d3e', 'value': 10.0, 'timestamp': 12345678, 'step': 3}
Status Code	400	400
Reponse	{ "error_code": "INVALID_PARAMETER_VALUE", "message": "Missing value for required parameter 'key'. See the API docs for more information about request parameters." }	{ "error": "key is missing" }

GET /runs/get

When run exists

	Mlflow	Gitlab
URL	http://127.0.0.1:5000/api/2.0/mlflow/runs/get	http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/get
Params	{'run_id': '9ada49268205448a8396004819309379'}	{'run_id': '0b4a000b-566e-445d-9228-4cc1aafa7d3e'}
Body	{}	{}
Status Code	200	200
Reponse	{ "run": { "info": { "run_uuid": "9ada49268205448a8396004819309379", "experiment_id": "103", "user_id": "", "status": "RUNNING", "start_time": 1234, "artifact_uri": "./mlruns2/103/9ada49268205448a8396004819309379/artifacts", "lifecycle_stage": "active", "run_id": "9ada49268205448a8396004819309379" }, "data": { "metrics": [ { "key": "hello", "value": 10.0, "timestamp": 12345678, "step": 3 } ] } } }	{ "run": { "info": { "run_id": "0b4a000b-566e-445d-9228-4cc1aafa7d3e", "run_uuid": "0b4a000b-566e-445d-9228-4cc1aafa7d3e", "experiment_id": "2", "start_time": 1234, "status": "RUNNING", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "1" }, "data": { "metrics": [ { "key": "hello", "value": 10.0, "timestamp": 12345678, "step": 3 } ] } } }

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #370478 (closed)

Edited Sep 15, 2022 by Eduardo Bonet

Add LogMetric endpoint

What does this MR do and why?

Screenshots or screen recordings

Database

Migration

Queries

How to Reproduce

How to set up and validate locally

Difference in API responses

POST /runs/log-metric

When run exists

When run id is not passed

When key is not passed

GET /runs/get

When run exists

MR acceptance checklist

Merge request reports