Skip to content

[Handoff issue] - Evals for context features

How to use

Duo Chat context evals are supported in the ELI5 project

poetry run eli5 duo-chat evaluate context-use --help poetry run eli5 duo-chat evaluate context-use

Currently only file context is available in the dataset, but any supported additional context can be ran with this eval

How it works

The dataset used is chat-context-dataset in LangSmith

Each example in the dataset supplies 3 input fields

  • input: The input text from the user (i.e. "What is happening with this code?")
  • context: The additional context supplied to chat, in the JSON format specified by the chat REST API. id, category and content fields are required.
  • current_text: The text or code in the current file that's selected by the user

Each example also supplies an expected_answer in the output which is just the expected output from chat.

Example

  • input: What's the max model length for this model?
  • current_text: model=providers.Factory(vertex_code_gecko,name=KindVertexTextModel.CODE_GECKO_002),
  • context: [{"category": "file", "id": "vertex_text.py", "type": "file", "content": (python class definition for VertexTextModel)}]
  • expected_answer: The maximum model length for this model is 2048

Tips for good test cases

The example above works because it is asking specific information that is not available to the model in it's training data without the context.

Asking "What's the max model length for code-gecko?" could cause it to find the answer in it's training data from Google's public documentation, which is why you want to ask "this model" instead.

Likewise, asking for information that exists in the current_text or current file would mean it could find the answer without referring to it's context.

For a test case to be an effective test of context usage, you need information that is

  1. Not in the training data, such as a specific detail about supplied context
  2. Not in the current_text or current file supplied to chat
  3. Concrete information that has a clear answer
  • Merge requests
  • Issues
  • Dependencies
  • Imports
  • Files
  • Git diffs
Edited by Allen Cook