Spike: Decide whether to Remove Candidate table and model

Proposal

  • Add column is_candidate to ml_model_versions
  • Remove table ml_candidates, creating an entry on ml_model_versions for each candidate
  • Rename table ml_candidate_params to ml_model_version_params (optional, can simply rename the model and keep the underlying table)
  • Rename table ml_candidate_metrics to ml_model_version_metrics (optional, can simply rename the model and keep the underlying table)
  • Remove table ml_candidate_metadata, creating a entry on ml_model_version_metadata for each ml_candidate_metadata

Context

Version candidates a entities used to hold metadata and artifacts for model trainings that are not yet production ready. They are a combination of code+params+data that were created to support Model experiments, and their main purpose is to create comparable sets (experiments) out of which the best are chosen to become model versions. This design was influenced by our mlflow compatibility layer, where runs mirror the mlflow concept of runs https://mlflow.org/docs/latest/tracking.html#runs.

In terms of database setup, this has several implications:

  • An Ml::ModelVersion always has one Ml::Candidate, which holds params, metrics and metadata
  • Promoting a candidate version to a model would mean always creating a new row in the ModelVersion table
  • A ModelVersion can have metadata stored both on Ml::ModelVersionMetadata and its Ml::CandidateMetadata
  • At the current stage, model version uses a different package type (ml_model) than ml_candidates (generic), which makes the promotion process a bit more complicated

However, as we absorb Model experiments into model registry, which will remove the Ml::Experiment table, the need for a separate table for candidates becomes unnecessary. This has no implications to users, but for developers the existence of an additional table creates unnecessary complexity.