Skip to content

Adds relation package_id to ml_candidates

Eduardo Bonet requested to merge mlops/precreate-candidate-package into master

What does this MR do and why?

Adds relation package_id to ml_candidates

Replaces an implicit relation between ml_candidates and packages_packages by an explicit one.

Adds foreign_key from ml_candiates to packages_packages. This key is populated when the package is created using a new event PackageCreatedEvent. Existing ml_candidate packages are populated with a migration.

Database

Queries

This query replaces: !104166 (merged)

SELECT
    "packages_packages".*
FROM
    "packages_packages"
WHERE
    "packages_packages"."id" IN (126, 125)
Index Scan using packages_packages_pkey on packages_packages  (cost=0.14..3.32 rows=2 width=89) (actual time=0.043..0.049 rows=2 loops=1)
"  Index Cond: (id = ANY ('{126,125}'::bigint[]))"

Migrations

Up
❯ bundle exec rails db:migrate
main: == 20230308154243 AddPackageIdToMlCandidates: migrating =======================
main: -- add_column(:ml_candidates, :package_id, :bigint, {:null=>true})
main:    -> 0.0088s
main: == 20230308154243 AddPackageIdToMlCandidates: migrated (0.0346s) ==============

main: == 20230308154244 AddPackageIdForeignKeyToMlCandidates: migrating =============
main: -- transaction_open?()
main:    -> 0.0000s
main: -- transaction_open?()
main:    -> 0.0000s
main: -- execute("ALTER TABLE ml_candidates ADD CONSTRAINT fk_a1d5f1bc45 FOREIGN KEY (package_id) REFERENCES packages_packages (id) ON DELETE SET NULL NOT VALID;")
main:    -> 0.0032s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0002s
main: -- execute("ALTER TABLE ml_candidates VALIDATE CONSTRAINT fk_a1d5f1bc45;")
main:    -> 0.0032s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20230308154244 AddPackageIdForeignKeyToMlCandidates: migrated (0.1146s) ====

main: == 20230308154245 AddIndexOnPackageIdForMlCandidates: migrating ===============
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0028s
main: -- index_exists?(:ml_candidates, :package_id, {:name=>"index_ml_candidates_on_package_id", :algorithm=>:concurrently})
main:    -> 0.0131s
main: -- add_index(:ml_candidates, :package_id, {:name=>"index_ml_candidates_on_package_id", :algorithm=>:concurrently})
main:    -> 0.0082s
main: == 20230308154245 AddIndexOnPackageIdForMlCandidates: migrated (0.0870s) ======

main: == 20230313142631 BackfillMlCandidatesPackageId: migrating ====================
main: -- execute("      UPDATE ml_candidates\n      SET package_id = candidate_id_to_package_id.package_id\n      FROM (SELECT id as package_id, TRIM(LEADING 'ml_candidates_' FROM name) as candidate_id\n            FROM packages_packages\n            WHERE name LIKE 'ml_candidate_%'\n              and version = '-') AS candidate_id_to_package_id\n      WHERE cast(ml_candidates.id as text) = candidate_id_to_package_id.candidate_id\n")
main:    -> 0.1023s
main: == 20230313142631 BackfillMlCandidatesPackageId: migrated (0.1615s) ===========
Down
❯ bundle exec rails db:migrate:down:main RAILS_ENV=development VERSION=20230313142631
main: == 20230313142631 BackfillMlCandidatesPackageId: reverting ====================
main: == 20230313142631 BackfillMlCandidatesPackageId: reverted (0.0233s) ===========

❯ bundle exec rails db:migrate:down:main RAILS_ENV=development VERSION=20230308154245
main: == 20230308154245 AddIndexOnPackageIdForMlCandidates: reverting ===============
main: -- transaction_open?()
main:    -> 0.0003s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1933s
main: -- indexes(:ml_candidates)
main:    -> 0.0063s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0007s
main: -- remove_index(:ml_candidates, {:algorithm=>:concurrently, :name=>"index_ml_candidates_on_package_id"})
main:    -> 0.0051s
main: -- execute("RESET statement_timeout")
main:    -> 0.0008s
main: == 20230308154245 AddIndexOnPackageIdForMlCandidates: reverted (0.2388s) ======

❯ bundle exec rails db:migrate:down:main RAILS_ENV=development VERSION=20230308154244
main: == 20230308154244 AddPackageIdForeignKeyToMlCandidates: reverting =============
main: -- transaction_open?()
main:    -> 0.0000s
main: -- remove_foreign_key(:ml_candidates, {:column=>:package_id})
main:    -> 0.0062s
main: == 20230308154244 AddPackageIdForeignKeyToMlCandidates: reverted (0.2534s) ====

❯ bundle exec rails db:migrate:down:main RAILS_ENV=development VERSION=20230308154244
main: == 20230308154244 AddPackageIdForeignKeyToMlCandidates: reverting =============
main: -- transaction_open?()
main:    -> 0.0000s
main: -- remove_foreign_key(:ml_candidates, {:column=>:package_id})
main:    -> 0.0062s
main: == 20230308154244 AddPackageIdForeignKeyToMlCandidates: reverted (0.2534s) ====

❯ bundle exec rails db:migrate:down:main RAILS_ENV=development VERSION=20230308154243
main: == 20230308154243 AddPackageIdToMlCandidates: reverting =======================
main: -- remove_column(:ml_candidates, :package_id, :bigint, {:null=>true})
main:    -> 0.0045s
main: == 20230308154243 AddPackageIdToMlCandidates: reverted (0.0168s) ==============

How to set up and validate locally

  1. Enable the feature flag

    echo "Feature.enable(:ml_experiment_tracking)" | bundle exec rails c
  2. Create a Project and a project access token, with api level:

    export PROJECT_ID=<Your Project Id>
    export GITLAB_PAT=<your api token>
  3. Create an Experiment:

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d name=my_cool_experiment http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mlflow/api/2.0/mlflow/experiments/create
  4. Create a Run. The artifact_uri on the response should be a url to the generic packages api, and look something like http://gdk.test:3000/api/v4/projects/21/packages/generic/ml_candidate_{id}/-/

    curl -X POST -H "Authorization: Bearer $GITLAB_PAT" -d experiment_id=1 http://gdk.test:3000/api/v4/projects/$PROJECT_ID/ml/mlflow/api/2.0/mlflow/runs/create
  5. In rails console, make sure the candidate has no package associated (iid is the run_iid returned in the previous call):

    Ml::Candidate.last.package_id
  6. Upload a file:

    curl --header "PRIVATE-TOKEN: $GITLAB_PAT" --upload-file file.txt "{CANDIDATE_UPLOAD_URL}/file.txt"
  7. Check that the worker AssociateMlCandidateToPackageWorker has been called by looking at logs/sidekiq.log

  8. In rails console, make sure the candidate has a package associated.

    Ml::Candidate.last.package_id

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Eduardo Bonet

Merge request reports