Add readinessProbe for Triton with rolling updates (!81) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

Dylan Bernardi requested to merge add-readiness-probe into main May 18, 2023

🧩 Problem to solve

Currently there is no readinessProbe for the triton server in CS Model Gateway. This MR hopes to fix that.

💡 Proposal

The recent effort in creating liveness/readiness probes in Triton was missing the ability to keep pods running at all times while still checking the health of the pods. This MR fixes that by introducing Rolling Updates which will not kill a pod until the newest pod is ready to take over functionality/services. This will be implemented with the readinessProbe which will utilize triton's built in api health request endpoint: /v2/health/ready.

⚔️ Plan of attack

Use a combination of this effort, the information on rolling updates, and documentation to implement a readiness probe in Model Gateway.

Note for Reviewer

The values maxSurge and maxUnavailable are set at the default settings. They are not required to be present in the yaml file if at default, but figured they should be there for transparency.

cc @mray2020 @tle_gitlab @AndrasHerczeg @srayner

Relates to #14 (closed)

Edited May 31, 2023 by Andras Herczeg

Add readinessProbe for Triton with rolling updates

🧩 Problem to solve

💡 Proposal

⚔️ Plan of attack

Note for Reviewer

Merge request reports