Experiment with LLM judge for code search results

Experiment with some models as LLM judges to see how well they can rerank code search results.

Edited by Ben Venker