Model Validation Weekly Report -04-15

Overview

Model Validation is dedicated to developing a centralised Evaluation and Decision Science Framework for Gen AI models and features. In 16.10, the main focus lies in collaborating closely with the Chat team to streamline large-scale evaluations for code and search related tasks. Further we plan to integrate Code-Gemma into the ensemble of models , solution validating explain the vulnerability and meticulously documenting the architecture through a blueprint.

📣 Completed Last Week

Chat Evaluation

Code Completion

🎯 Focus for This Week

Chat Evaluation

  1. We will continue working on expanding the doc , working with legal and code generation dataset.
  2. We will continue to support on the pattern and investigation for the experiments here.
  3. We are improving the developer experience based on the knowledge sharing session for Prompt library by making input and output much easier to follow Remove input_adapter to improve usability of pr... (gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library#217 - closed) and Auto detect input schema, remove input_adapter (gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library!369 - merged)

Code Completion and Competitive Intelligence

  1. We will start working on Comptetive Intelligence for Duo Features https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/ai-experiments/-/issues/21+ now that we have iterated upon the Code Suggestion Pipeline.

Foundational Model Evaluations

  1. We are working on Mistral and Code-Gemma evaluation this week https://gitlab.com/groups/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/-/epics/4 and Adding Mistral OS Mixtral Models to Prompt Library (gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library#187 - closed)

📖 Gen AI Reading List

LM Guided Chain-of-thought

This paper applies knowledge distillation to a small LM with rationales generated by the large LM with the hope of narrowing the gap in reasoning capabilities; the rationale is generated by the lightweight LM and the answer prediction is then left for the frozen large LM; this resource-efficient approach avoids the need to fine-tune the large model and instead offloads the rationale generation to the small language model; the knowledge-distilled LM is further optimized with reinforcement learning using several rational-oriented and task-oriented reward signals; the LM-guided CoT prompting approach proposed in this paper outperforms both standard prompting and CoT prompting. Self-consistency decoding also enhances performance

Best Practices on Synthetic Data

This paper proposes an overview by Google DeepMind on synthetic data research, covering applications, challenges, and future directions; discusses important topics when working with synthetic data such as ensuring quality, factuality, fidelity, unbiasedness, trustworthiness, privacy, and more

Representation Finetuning for LMs

This paper proposes a method for representation fine-tuning (ReFT) that operates on a frozen base model and learns task-specific interventions on hidden representations; in other words, by manipulating a small fraction of model representations it is possible to effectively steer model behavior to achieve better downstream performance at inference time; also proposes LoReFT as a drop-in replacement for PEFTs that is 10-50x more parameter efficient.

Gemma

This paper walkthroug a family of open code LLMs based on Gemma; CodeGemma 7B models excel in mathematical reasoning and match the code capabilities of other open models; the instruction-tuned CodeGemma 7B model is the more powerful model for Python coding as assessed via the HumanEval benchmark; results also suggest that the model performs best on GSM8K among 7B models; the CodeGemma 2B model achieves SoTA code completion and is designed for fast code infilling and deployment in latency-sensitive settings

👀 Whats happening in AI Company Wide?

Praise

We would like to extend our gratitude to the entire team and to the extended AI Core teams for their dedicated efforts.👏🏻 👏🏼 Thanks all for making through the reading as well . If anyone would like to subscribe to this do tag yourself in the issue or ping in #g_ai_model_validation

Edited by Mon Ray