Skip to content

Deploy triton to the k8s cluster

Alexander Chueshev requested to merge deploy-triton into main

This MR adds manifests and utility Docker images to deploy the Triton inference server to the k8s cluster. Please, note that we use the k8s cluster in the Applied ML namespace. By default, Triton serves the codegen-2B-multi model using 1 Nvidia T4 GPU 16 GB GDDR6. Find out README.md for more info.

Ref: https://gitlab.com/groups/gitlab-org/modelops/applied-ml/ai-assist/-/epics/1 cc @mray2020 @fdegier

Edited by Alexander Chueshev

Merge request reports