Add manifests to deploy the v2 model
This MR provides the k8s YAML manifests to deploy the model v2 after fine-tuning for 7 additional languages.
Steps to reproduce
-
Create a new disk to store models in the k8s cluster
gcloud compute disks create --size=250GB --zone=us-central1-c nfs-code-suggestions-models-disk
-
Get cluster credentials
gcloud container clusters get-credentials ai-assist --zone us-central1-c --project unreview-poc-390200e5 kubectl config set-context --current --namespace fauxpilot
-
Deploy the service account JSON key as a secret
export GOOGLE_APPLICATION_CREDENTIALS=<path to gcp application credentials> kubectl create secret generic gcp-storage-credentials \ --from-file=key.json="$GOOGLE_APPLICATION_CREDENTIALS"
-
Deploy the NFS server to access the model across the cluster. Note, we deploy the NFS server to support the
ReadWriteMany
access mode. This allows us to increase replicas safely when pods are deployed to different nodes.kubectl apply -f ./manifests/fauxpilot/v2/models-nfs-server.yaml
-
Create the persistent volume, persistent volume claim, and start the model loader k8s job
kubectl apply -f ./manifests/fauxpilot/v2/models-persistense-volumes.yaml kubectl apply -f ./manifests/fauxpilot/v2/model-loader.yaml kubectl wait --for=condition=complete --timeout=30m job/model-loader-job-v2
-
Deploy the triton server
kubectl apply -f ./manifests/fauxpilot/v2/model-triton.yaml
Ref: https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/82
Edited by Alexander Chueshev