Draft: Resolve "Self-Hosted Model Expertise: VLLM Setup, Quantization, and Performance Optimization" (!6) · Merge requests · GitLab Learn Labs / CS Shared Demo Space / GitLab Duo Agent Platform / Duo Agent Platform

Closes #4 This merge request adds a comprehensive internal guide for GitLab team members on setting up and running large AI language models on their own infrastructure. The document covers practical topics like choosing the right graphics cards (GPUs) based on memory requirements, selecting appropriate AI models for different use cases, and understanding performance trade-offs. It includes detailed tables showing which AI models can run on different hardware configurations, explains technical concepts in accessible terms, and provides specific setup instructions for running these models efficiently. The guide is designed to help GitLab's customer-facing teams demonstrate AI capabilities effectively during sales presentations and proof-of-concept projects. It also includes performance benchmarks showing how well different models perform on software engineering tasks, helping teams choose the best model for their specific needs and available hardware resources.

Edited Oct 05, 2025 by Nicolas Bosc

Draft: Resolve "Self-Hosted Model Expertise: VLLM Setup, Quantization, and Performance Optimization"

Merge request reports