Refinement of Instrumentation of Model Inference Server Logs for GPU , CPU , Latency
Add more instrumentation and timing information to model gateway logs that would help a lot in debugging this is proper latency attribution in logs. e.g. XX seconds spent on cpu, XX seconds spent waiting on triton, XX seconds spent in gitlab API
This would be very similar to https://cloud.google.com/ai-platform/prediction/docs/load-testing-and-monitoring-aiplatform-models
GPU Metrics for Triton Server: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md
Edited by Mon Ray