Spike on using LLMs for API Discovery
Pros
- LLMs are excellent at finding API endpoints scattered throughout a codebase (via decorators, imports, etc)
- You can ask the LLM to output an OpenAPI spec directly
- Zero configuration
- An open-source/private LLM would be ideal to keep code secure
- Possible with RAG: Retrieval-Augmented Generation
Cons
- We need to find the best LLM to use that is trained in the right languages and that is cost-effective
- The context of the LLM would need to support code bases of various sizes to accommodate larger code bases
- There would need to be some optimizations/based heuristics for what pieces of the codebase to send to an LLM's context for where parts of the API might live
LLM/Project Contenders for Experimentation
- CodeZen (Closed-source, OpenAI)
- StarCoder (Open-source, Hugging Face)
- Code Llama (Open-source, Meta)
- DeepSeek Coder (Open-source)
Edited by Alex Groleau