Backend API Refinement
In this iteration, we decided to stick with the Python implementation of the API backend due to the following reasons:
- API backend uses the
transformers
Python dependency - Triton provides an already generated gRPC stub for Python. For Golang, we need to generate it ourselves, which can take a significant amount of time due to a large number of dependencies.
However, we need to apply the following changes to better structure the API backend:
-
make components loosely coupled to add business logic on top safely -
remove hardcoded paths to various files (e.g., /python-docker/cgtok/tokenizer.json) -
refine the authentication layer moving it to the middlewares -
update return status codes -
cover critical functions with tests -
update gitlab.yaml
to add GitLab scanning jobs -
introduce the poetry
dependency management system. We already usepoetry
with Suggested-Reviewer. -
store Triton model-related files under the models/
folder. -
remove all transformers
Python dependency from model-gateway and move them to Triton as a part of the ensemble model.
UPD #1 Things that can be addressed with follow-up issues:
- Development environment.
- Integration tests.
- Building Triton Docker image with FasterTransformer backend using gitlab-ci.
- Documentation.
Edited by Alexander Chueshev