Load quantized GGUF models on vllm

Loading GGUF models on vllm is not as straightforward as some files seem to be missing from some HuggingFace repos.

When trying to load https://huggingface.co/bartowski/Mistral-22B-v0.2-GGUF for example, it fails with:

python -m vllm.entrypoints.api_server --host 0.0.0.0 \
  --model bartowski/Mistral-22B-v0.2-GGUF \
  --served-model-name Mixtral-8x22B-v0.1 \
  --enforce-eager --trust-remote-code --gpu-memory-utilization 0.95 --tensor-parallel-size 2 --config-format mistral
...

Entry Not Found for url: https://huggingface.co/bartowski/Mistral-22B-v0.2-GGUF/resolve/main/params.json.

Definition of Done

  1. Serve a GGUF model through vllm.