Foundational Model Analysis for Various Tasks and actions for Workflow

Problem to solve

We would need to understand which of the models available are reasonable for agentic actions and do a foundational model comparison with CEF

This would include

  1. Anthropic Model ( Claude 3.5 sonnet , Claude 3 ensemble)
  2. Google Gemini Series
  3. Open AI ( GPT 4-0 , GPT-Turbo) ( This is always part of our eval comparison to benchmark)
  4. Later Iterations: Open Source ( Short and Long Term memory models) - Orca ? ( This would depend on future tasks perhaps

Proposal

Further details

GPT-4 has been leading in agentic actions based on SWE-Bench , to analyse Claude 3.5 as well if par with GPT-40

Links / references

Edited by Mon Ray