Skip to content

Discussion: Model registry data topology

Context

Model registry artefact management will use Package Registry, but it doesn't handle metadata and other features we might need. We also need to think of a path to promote a Candidate (which will use the same Model package type) into a model version.

Model Experiments data topology

Model experiments has a similar situation, where it uses package registry to hold the artefacts, but then has a layer of tables to provide information for UI:

  • Well defined GitLab metadata (such as Merge Request ID, or Build id) go into Ml::Candidate
  • MlCandidateMetadata creates a flexible way for users to add any metadata that is not covered by GitLab. An improvement to MlCandidateMetadata would be to add a type column, so that we can add better rendering for some specific types (eg image_url, image, int, etc)
erDiagram
    Project ||--o{ MlExperiment : owns
    Project ||--o{ MlCandidate : owns
    User ||--|{ MlCandidate : creates
    User ||--|{ MlExperiment : creates
    MlExperiment ||--o{ MlCandidate : compares
    MlExperiment ||--o{ MlExperimentMedatadata : has
    MlCandidate ||--o{ MlCandidateParam : has
    MlCandidate ||--o{ MlCandidateMetric : has
    MlCandidate ||--o{ MlCandidateMetadata : has
    MlCandidate ||--o{ PackagesPackage : stores
    MlCandidateParam {
        string name
        string value
    }
    MlCandidateMetric {
        string name
        float value
        int step
    }
    MlCandidateMetadata {
        string name
        string value
    }
    MlExperimentMedatadata {
        string name
        string value
    }
    MlCandidate {
        bigint id
        bigint iid
        string name
        uuid eid
    }
    MlExperiment {
        bigint id
        bigint iid
        string name
    }

Suggested solution:

Follow the same general id from above:

  1. Add Ml::Model This has the description of a model, and other model specific attributes. It has a name, iid, description, and 0 or many versions. A version
  2. A model version is a Packages::Package of type ml_model, where the package name is the model name
  3. Add Packages::MlModelMetadata, which has general metadata about that specific version of the model (eg model_id)

Eventually:

Connecting Model Experiments to Model Registry:

  1. Add association from Ml::Experiment to Ml::Model.
  2. Each model has a default Ml::Experiment.
  3. Each Ml::ModelVersion can have 0 or 1 Ml::Candidate.

From the data layer perspective, promoting a Candidate to Version would mean:

  • The package associated to the package becomes the package for the model version. Candidates without package cannot be promoted.
  • The metadata for the Ml::ModelVersion is the Ml::ModelVersionMetadata and the Ml::CandidateMetadata associated to the ModelVersion. Same for Metrics and Params.
Edited by Eduardo Bonet