Foundational Model Analysis for Various Tasks and actions for Workflow

Problem to solve

We would need to understand which of the models available are reasonable for agentic actions and do a foundational model comparison with CEF

This would include

Anthropic Model ( Claude 3.5 sonnet , Claude 3 ensemble)
Google Gemini Series
Open AI ( GPT 4-0 , GPT-Turbo) ( This is always part of our eval comparison to benchmark)
Later Iterations: Open Source ( Short and Long Term memory models) - Orca ? ( This would depend on future tasks perhaps

Proposal

Further details

GPT-4 has been leading in agentic actions based on SWE-Bench , to analyse Claude 3.5 as well if par with GPT-40

Links / references

Edited Jul 08, 2024 by Mon Ray