Draft: Resolve "Self-Hosted Model Expertise: VLLM Setup, Quantization, and Performance Optimization"
Closes #4 This merge request adds a comprehensive internal guide for GitLab team members on setting up and running large AI language models on their own infrastructure. The document covers practical topics like choosing the right graphics cards (GPUs) based on memory requirements, selecting appropriate AI models for different use cases, and understanding performance trade-offs. It includes detailed tables showing which AI models can run on different hardware configurations, explains technical concepts in accessible terms, and provides specific setup instructions for running these models efficiently. The guide is designed to help GitLab's customer-facing teams demonstrate AI capabilities effectively during sales presentations and proof-of-concept projects. It also includes performance benchmarks showing how well different models perform on software engineering tasks, helping teams choose the best model for their specific needs and available hardware resources.