I would recommend using `llama3.1:8b` ([link](https://ai.meta.com/blog/meta-llama-3-1/)) for this experiment, especially if you run a machine with little RAM. It is quite compact and works reasonably well[^1]. Get it with:
I would recommend using `llama3.1:8b` ([link](https://ai.meta.com/blog/meta-llama-3-1/)) for this experiment, especially if you run a machine with little RAM. It is quite compact and works reasonably well[^1]. Additionally, some models like `gemma3:12b` do not support embeddings, so you need to choose carefully.
Get `llama3.1:8b` with:
```bash
ollama pull llama3.1:8b
...
...
@@ -101,7 +103,10 @@ llm_model = input("Model to use: ")
## Vector storage
Now, we need to initialize the ChromaDB client with a persistent storage. Our vector storage will be in the directory `chroma_db/`.
Chroma DB is a vector database designed and optimized for storing and searching **vector embeddings**, which are crucial in RAG. Vector embeddings are numerical representations of data (text, images, audio, etc.). They capture the *meaning* or *semantic* information of the data. They are generated by machine learning models (often
transformer models like those used in LLMs). These models are trained to map similar data points to nearby points in a high-dimensional vector space. That's precisely what we need to do with our context data for it to be useful to the LLM. And to that purpose we use Chroma DB.
So, we need to initialize the Chroma DB client with a persistent storage, and get it ready to embed our context data. We decide to persist our storage to disk, in the directory `chroma_db/`.