Foundational Model Analysis for Various Tasks and actions for Workflow
Problem to solve
We would need to understand which of the models available are reasonable for agentic actions and do a foundational model comparison with CEF
This would include
- Anthropic Model ( Claude 3.5 sonnet , Claude 3 ensemble)
- Google Gemini Series
- Open AI ( GPT 4-0 , GPT-Turbo) ( This is always part of our eval comparison to benchmark)
- Later Iterations: Open Source ( Short and Long Term memory models) - Orca ? ( This would depend on future tasks perhaps
Proposal
Further details
GPT-4 has been leading in agentic actions based on SWE-Bench , to analyse Claude 3.5 as well if par with GPT-40
Links / references
Edited by Mon Ray