Model Validation Weekly Report -02-11

Overview

Model Validation is dedicated to developing a centralized Evaluation and Decision Science Framework for Gen AI models and features. In 16.9, the main focus lies in collaborating closely with the Chat team to streamline large-scale evaluations for code and search related tasks. Further we plan to integrate Gemini into the ensemble of models and meticulously documenting the architecture through a blueprint.

📣 Completed Last Week

Chat Evaluation

🎯 Focus for This Week

Chat Evaluation

📖 Gen AI Reading List

Any Tool

This paper explores an LLM-based agent that can utilize 16K APIs from Rapid API; proposes a simple framework consisting of 1) a hierarchical API-retriever to identify relevant API candidates to a query, 2) a solver to resolve user queries, and 3) a self-reflection mechanism to reactivate AnyTool if the initial solution is impracticable; this tool leverages the function calling capability of GPT-4 so no further training is needed; the hierarchical API-retriever is inspired by a divide-and-conquer approach to help reduce the search scope of the agents which leads to overcoming limitations around context length in LLMs; the self-reflection component helps with resolving easy and complex queries efficiently

More Agents is all you need

This paper presents a study on the scaling property of raw agents instantiated by LLMs; finds that performance scales when increasing agents by simply using a sampling-and-voting method

LLM Based Multi Agent

This paper discusses the essential aspects of LLM-based multi-agent systems; it includes a summary of recent applications for problem-solving and word simulation; it also discusses datasets, benchmarks, challenges, and future opportunities to encourage further research and development from researchers and practitioners

Indirect Reasoning with LLM's

This paper proposes an indirect reasoning method to strengthen the reasoning power of LLMs; it employs the logic of contrapositives and contradictions to tackle IR tasks such as factual reasoning and mathematic proof; it consists of two key steps: 1) enhance the comprehensibility of LLMs by augmenting data and rules (i.e., the logical equivalence of contrapositive), and 2) design prompt templates to stimulate LLMs to implement indirect reasoning based on proof by contradiction; experiments on LLMs like GPT-3.5-turbo and Gemini Pro show that the proposed method enhances the overall accuracy of factual reasoning by 27.33% and mathematic proof by 31.43% compared to traditional direct reasoning methods. Something that can be tested for Chat and Code Suggestions?

👀 Whats happening in AI Company Wide?

Praise

We would like to extend our gratitude to the entire team and to the extended AI Core teams for their dedicated efforts.👏🏻 👏🏼 Thanks all for making through the reading as well . If anyone would like to subscribe to this do tag yourself in the issue or ping in #g_ai_model_validation

Edited by Mon Ray