Scalability : Liveness and Readiness checks in all deployments in production environments

⚠️ Issue

At present there are no liveness check or readiness checks in our production Kubernetes manifests for ai-assist.

We should add all Kubernetes checks to code suggestion. Below are some helpful links for understanding this better.

startup. A startup probe verifies whether the application within a container is started. Startup probes run before any other probe, and, unless it finishes successfully, disables other probes. If a container fails its startup probe, then the container is killed and follows the pod’s restartPolicy. This probe should verify if the codegen model is ready. The required grpc endpoint is available through the InferenceServerClient.
readiness. Readiness probes determine whether or not a container is ready to serve requests. If the readiness probe returns a failed state, then Kubernetes removes the IP address for the container from the endpoints of all Services. Similar to startup, we need to verify if the codegen model is ready.
liveness. Liveness probes determine whether or not an application running in a container is in a healthy state. If the liveness probe detects an unhealthy state, then Kubernetes kills the container and tries to redeploy it. This probe should verify if the Triton server is available and online. - !42 (merged)

For examples of this implemented elsewhere see the following

Edited Jan 31, 2023 by Alexander Chueshev