Add LogMetric endpoint
What does this MR do and why?
Enables logging metrics to a ML Experiment Candidate
Screenshots or screen recordings
Database
Migration
- Up
❯ bin/rails db:migrate:main RAILS_ENV=test
main: == 20220913084123 AddTimestampToMlCandidateMetrics: migrating =================
main: -- add_column(:ml_candidate_metrics, :tracked_at, :bigint)
main: -> 0.0059s
main: == 20220913084123 AddTimestampToMlCandidateMetrics: migrated (0.0072s) ========
- Down
bin/rails db:rollback:ci RAILS_ENV=test
ci: == 20220913084123 AddTimestampToMlCandidateMetrics: reverting =================
ci: -- remove_column(:ml_candidate_metrics, :tracked_at, :bigint)
ci: -> 0.0078s
ci: == 20220913084123 AddTimestampToMlCandidateMetrics: reverted (0.0120s) ========
Queries
- :Ml::Candidate.with_project_id_and_iid(31, "0b4a000b-566e-445d-9228-4cc1aafa7d3e", include_associations: true)
SELECT
"ml_candidates".*
FROM
"ml_candidates"
INNER JOIN "ml_experiments" "experiment" ON "experiment"."id" = "ml_candidates"."experiment_id"
WHERE
"experiment"."project_id" = 31
AND "ml_candidates"."iid" = '0b4a000b-566e-445d-9228-4cc1aafa7d3e'
ORDER BY
"ml_candidates"."id" ASC
LIMIT 1;
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/12119/commands/42998
Running explain analyse locally:
-> Sort (cost=12.52..12.52 rows=1 width=74) (actual time=0.031..0.032 rows=1 loops=1)
Sort Key: ml_candidates.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.30..12.51 rows=1 width=74) (actual time=0.023..0.027 rows=1 loops=1)
-> Index Scan using index_ml_experiments_on_project_id_and_name on ml_experiments experiment (cost=0.15..5.22 rows=4 width=8) (actual time=0.009..0.012 rows=11 loops=1)
Index Cond: (project_id = 31)
-> Index Scan using index_ml_candidates_on_experiment_id_and_iid on ml_candidates (cost=0.15..1.67 rows=1 width=74) (actual time=0.001..0.001 rows=0 loops=11)
Index Cond: ((experiment_id = experiment.id) AND (iid = '0b4a000b-566e-445d-9228-4cc1aafa7d3e'::uuid))
Planning Time: 0.148 ms
Execution Time: 0.053 ms
- candidate.metrics
SELECT
"ml_candidate_metrics".*
FROM
"ml_candidate_metrics"
WHERE
"ml_candidate_metrics"."candidate_id" = 1;
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/12119/commands/43002
How to Reproduce
How to set up and validate locally
-
Create a Project and a project access token, with api level:
export PROJECT_ID=<Your Project Id> export GITLAB_PAT=<your api token>
-
Enable the Feature flag
echo "Feature.enable(:ml_experiment_tracking)" | bundle exec rails c
-
Create an Experiment:
curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/experiments/create
-
Create a Run, and make a note of the run id returned
curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d experiment_id=1 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/create
-
Log a Metric
curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d run_id="<RUN_ID>" -d key=hello -d value=10.0 -d timestamp=12345 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/log-metric
-
Get the run now has a metric in the run.data.metrics field
curl -X GET -H "Authorization: Bearer $GITLAB_PAT" "http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mflow/api/2.0/mlflow/runs/get?run_id=<RUN_ID>"
{ "run": { "info": { "run_id": "<RUNID>", "run_uuid": "<RUNID>", "experiment_id": "3", "start_time": 0, "status": "RUNNING", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "1" }, "data": { "metrics": [ { "key": "hello", "value": 10, "timestamp": 12345 } ] } } }
Difference in API responses
POST /runs/log-metric
When run exists
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/log-metric | http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/log-metric |
Params | {} | {} |
Body | {'run_id': '9ada49268205448a8396004819309379', 'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3} | {'run_id': '0b4a000b-566e-445d-9228-4cc1aafa7d3e', 'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3} |
Status Code | 200 | 201 |
Reponse | {} | {} |
When run id is not passed
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/log-metric | http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/log-metric |
Params | {} | {} |
Body | {'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3} | {'key': 'hello', 'value': 10.0, 'timestamp': 12345678, 'step': 3} |
Status Code | 400 | 400 |
Reponse | { "error_code": "INVALID_PARAMETER_VALUE", "message": "Missing value for required parameter 'run_id'. See the API docs for more information about request parameters." } | { "error": "run_id is missing" } |
When key is not passed
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/log-metric | http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/log-metric |
Params | {} | {} |
Body | {'run_id': '9ada49268205448a8396004819309379', 'value': 10.0, 'timestamp': 12345678, 'step': 3} | {'run_id': '0b4a000b-566e-445d-9228-4cc1aafa7d3e', 'value': 10.0, 'timestamp': 12345678, 'step': 3} |
Status Code | 400 | 400 |
Reponse | { "error_code": "INVALID_PARAMETER_VALUE", "message": "Missing value for required parameter 'key'. See the API docs for more information about request parameters." } | { "error": "key is missing" } |
GET /runs/get
When run exists
Mlflow | Gitlab | |
URL | http://127.0.0.1:5000/api/2.0/mlflow/runs/get | http://gdk.test:3000/api/v4/projects/31/ml/mflow/api/2.0/mlflow/runs/get |
Params | {'run_id': '9ada49268205448a8396004819309379'} | {'run_id': '0b4a000b-566e-445d-9228-4cc1aafa7d3e'} |
Body | {} | {} |
Status Code | 200 | 200 |
Reponse | { "run": { "info": { "run_uuid": "9ada49268205448a8396004819309379", "experiment_id": "103", "user_id": "", "status": "RUNNING", "start_time": 1234, "artifact_uri": "./mlruns2/103/9ada49268205448a8396004819309379/artifacts", "lifecycle_stage": "active", "run_id": "9ada49268205448a8396004819309379" }, "data": { "metrics": [ { "key": "hello", "value": 10.0, "timestamp": 12345678, "step": 3 } ] } } } | { "run": { "info": { "run_id": "0b4a000b-566e-445d-9228-4cc1aafa7d3e", "run_uuid": "0b4a000b-566e-445d-9228-4cc1aafa7d3e", "experiment_id": "2", "start_time": 1234, "status": "RUNNING", "artifact_uri": "not_implemented", "lifecycle_stage": "active", "user_id": "1" }, "data": { "metrics": [ { "key": "hello", "value": 10.0, "timestamp": 12345678, "step": 3 } ] } } } |
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #370478 (closed)
Edited by Eduardo Bonet