Skip to content

Add manifests to deploy the v2 model

Tan Le requested to merge deploy-model-v2 into main

This MR provides the k8s YAML manifests to deploy the model v2 after fine-tuning for 7 additional languages.

Steps to reproduce

  1. Create a new disk to store models in the k8s cluster

    gcloud compute disks create --size=250GB --zone=us-central1-c nfs-code-suggestions-models-disk
  2. Get cluster credentials

    gcloud container clusters get-credentials ai-assist --zone us-central1-c --project unreview-poc-390200e5
    kubectl config set-context --current --namespace fauxpilot
  3. Deploy the service account JSON key as a secret

    export GOOGLE_APPLICATION_CREDENTIALS=<path to gcp application credentials>
    kubectl create secret generic gcp-storage-credentials \
        --from-file=key.json="$GOOGLE_APPLICATION_CREDENTIALS"
  4. Deploy the NFS server to access the model across the cluster. Note, we deploy the NFS server to support the ReadWriteMany access mode. This allows us to increase replicas safely when pods are deployed to different nodes.

    kubectl apply -f ./manifests/fauxpilot/v2/models-nfs-server.yaml
  5. Create the persistent volume, persistent volume claim, and start the model loader k8s job

    kubectl apply -f ./manifests/fauxpilot/v2/models-persistense-volumes.yaml
    kubectl apply -f ./manifests/fauxpilot/v2/model-loader.yaml
    kubectl wait --for=condition=complete --timeout=30m job/model-loader-job-v2
  6. Deploy the triton server

    kubectl apply -f ./manifests/fauxpilot/v2/model-triton.yaml

Ref: https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/82

Edited by Alexander Chueshev

Merge request reports