Skip to content

Add LogMetric endpoint

Eduardo Bonet requested to merge 370478-add-mlflow-endpoints-3 into master

What does this MR do and why?

Enables logging metrics to a ML Experiment Candidate

Screenshots or screen recordings

Database

Migration

  • Up
❯ bin/rails db:migrate:main RAILS_ENV=test
main: == 20220913084123 AddTimestampToMlCandidateMetrics: migrating =================
main: -- add_column(:ml_candidate_metrics, :tracked_at, :bigint)
main:    -> 0.0059s
main: == 20220913084123 AddTimestampToMlCandidateMetrics: migrated (0.0072s) ========
  • Down
bin/rails db:rollback:ci RAILS_ENV=test
ci: == 20220913084123 AddTimestampToMlCandidateMetrics: reverting =================
ci: -- remove_column(:ml_candidate_metrics, :tracked_at, :bigint)
ci:    -> 0.0078s
ci: == 20220913084123 AddTimestampToMlCandidateMetrics: reverted (0.0120s) ========

Queries

  • :Ml::Candidate.with_project_id_and_iid(31, "0b4a000b-566e-445d-9228-4cc1aafa7d3e", include_associations: true)
SELECT
    "ml_candidates".*
FROM
    "ml_candidates"
    INNER JOIN "ml_experiments" "experiment" ON "experiment"."id" = "ml_candidates"."experiment_id"
WHERE
    "experiment"."project_id" = 31
    AND "ml_candidates"."iid" = '0b4a000b-566e-445d-9228-4cc1aafa7d3e'
ORDER BY
    "ml_candidates"."id" ASC
LIMIT 1;

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/12119/commands/42998

Running explain analyse locally:

  ->  Sort  (cost=12.52..12.52 rows=1 width=74) (actual time=0.031..0.032 rows=1 loops=1)
        Sort Key: ml_candidates.id
        Sort Method: quicksort  Memory: 25kB
        ->  Nested Loop  (cost=0.30..12.51 rows=1 width=74) (actual time=0.023..0.027 rows=1 loops=1)
              ->  Index Scan using index_ml_experiments_on_project_id_and_name on ml_experiments experiment  (cost=0.15..5.22 rows=4 width=8) (actual time=0.009..0.012 rows=11 loops=1)
                    Index Cond: (project_id = 31)
              ->  Index Scan using index_ml_candidates_on_experiment_id_and_iid on ml_candidates  (cost=0.15..1.67 rows=1 width=74) (actual time=0.001..0.001 rows=0 loops=11)
                    Index Cond: ((experiment_id = experiment.id) AND (iid = '0b4a000b-566e-445d-9228-4cc1aafa7d3e'::uuid))
Planning Time: 0.148 ms
Execution Time: 0.053 ms
  • candidate.metrics
SELECT
    "ml_candidate_metrics".*
FROM
    "ml_candidate_metrics"
WHERE
    "ml_candidate_metrics"."candidate_id" = 1;

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/12119/commands/43002

How to Reproduce

How to set up and validate locally

  1. Create a Project and a project access token, with api level:

    export PROJECT_ID=<Your Project Id>
    export GITLAB_PAT=<your api token>
  2. Enable the Feature flag

    echo "Feature.enable(:ml_experiment_tracking)" | bundle exec rails c
  3. Create an Experiment:

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
  4. Create a Run, and make a note of the run id returned

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d experiment_id=1 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/create
  5. Log a Metric

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d run_id="<RUN_ID>" -d key=hello -d value=10.0 -d timestamp=12345 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/log-metric
  6. Get the run now has a metric in the run.data.metrics field

    curl -X GET -H "Authorization: Bearer $GITLAB_PAT" "http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/get?run_id=<RUN_ID>"
    {
     "run": {
      "info": {
       "run_id": "<RUNID>",
       "run_uuid": "<RUNID>",
       "experiment_id": "3",
       "start_time": 0,
       "status": "RUNNING",
       "artifact_uri": "not_implemented",
       "lifecycle_stage": "active",
       "user_id": "1"
      },
      "data": {
       "metrics": [
         {
           "key": "hello",
           "value": 10,
           "timestamp": 12345
         }
       ]
      }
     }
    }

Difference in API responses

POST /runs/log-metric

When run exists

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/log-metric http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/log-metric
Params {} {}
Body {'run_id': '9ada49268205448a8396004819309379', 'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3} {'run_id': '0b4a000b-566e-445d-9228-4cc1aafa7d3e', 'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3}
Status Code 200 201
Reponse {} {}

When run id is not passed

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/log-metric http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/log-metric
Params {} {}
Body {'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3} {'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3}
Status Code 400 400
Reponse { "error_code": "INVALID_PARAMETER_VALUE", "message": "Missing value for required parameter 'run_id'. See the API docs for more information about request parameters." } { "error": "run_id is missing" }

When key is not passed

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/log-metric http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/log-metric
Params {} {}
Body {'run_id': '9ada49268205448a8396004819309379', 'value': 10.0, 'timestamp': 12345678, 'step': 3} {'run_id': '0b4a000b-566e-445d-9228-4cc1aafa7d3e', 'value': 10.0, 'timestamp': 12345678, 'step': 3}
Status Code 400 400
Reponse { "error_code": "INVALID_PARAMETER_VALUE", "message": "Missing value for required parameter 'key'. See the API docs for more information about request parameters." } { "error": "key is missing" }

GET /runs/get

When run exists

Mlflow Gitlab
URL http://127.0.0.1:5000/api/2.0/mlflow/runs/get http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/get
Params {'run_id': '9ada49268205448a8396004819309379'} {'run_id': '0b4a000b-566e-445d-9228-4cc1aafa7d3e'}
Body {} {}
Status Code 200 200
Reponse { "run": { "info": { "run_uuid": "9ada49268205448a8396004819309379", "experiment_id": "103", "user_id": "", "status": "RUNNING", "start_time": 1234, "artifact_uri": "./mlruns2/103/9ada49268205448a8396004819309379/artifacts", "lifecycle_stage": "active", "run_id": "9ada49268205448a8396004819309379" }, "data": { "metrics": [ { "key": "hello", "value": 10.0, "timestamp": 12345678, "step": 3 } ] } } } { "run": { "info": { "run_id": "0b4a000b-566e-445d-9228-4cc1aafa7d3e", "run_uuid": "0b4a000b-566e-445d-9228-4cc1aafa7d3e", "experiment_id": "2", "start_time": 1234, "status": "RUNNING", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "1" }, "data": { "metrics": [ { "key": "hello", "value": 10.0, "timestamp": 12345678, "step": 3 } ] } } }

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #370478 (closed)

Edited by Eduardo Bonet

Merge request reports