Document Mistral model serving

Choose a serving framework and document how to serve a Mistral & Mixtral models and make it available to AI Gateway. Identify the interface to this serving framework.

Mistral recommends either:

vLLM (Apache-2.0 license)
- "A high-throughput and memory-efficient inference and serving engine for LLMs"
- One of our design collab customers are using vLLM for their inferencing.
TensorRT-LLM (Apache-2.0 license)
- "TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines."

Edited Apr 29, 2024 by Sean Carroll