Deploy triton to the k8s cluster (!3) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

Alexander Chueshev requested to merge deploy-triton into main Nov 04, 2022

This MR adds manifests and utility Docker images to deploy the Triton inference server to the k8s cluster. Please, note that we use the k8s cluster in the Applied ML namespace. By default, Triton serves the codegen-2B-multi model using 1 Nvidia T4 GPU 16 GB GDDR6. Find out README.md for more info.

Ref: https://gitlab.com/groups/gitlab-org/modelops/applied-ml/ai-assist/-/epics/1 cc @mray2020 @fdegier

Edited Nov 07, 2022 by Alexander Chueshev

Deploy triton to the k8s cluster

Merge request reports