Low quality answers due to incompatible prompt with null context

Problem to solve

There are many answers that could be a result of a wrong prompt template. They often starts with Unfortunately, there is no context provided to answer.... This is due to the empty <context> blob for use cases that do not provide/need this in the prompt, i.e. Code Generation, Documentation.

Below is the percentages of such answers from dev-ai-research-0e2f8974.duo_chat_foundation_models.llm_judge table.

created_at	task	answering_model	percentage
2024-03-08	Code Explanation	gemini-1.0-pro-001	5.85
2024-03-08	Code Explanation	gemini-1.5-pro-preview-0215	0.57
2024-03-07	Documentation	gemini-1.0-pro-001	85.8
2024-03-07	Documentation	gemini-1.5-pro-preview-0215	39.05
2024-03-07	Code Generation	gemini-1.0-pro-001	9.84
2024-03-07	Code Generation	gemini-1.5-pro-preview-0215	1.25
2024-03-07	Code Explanation	gemini-1.0-pro-001	5.34
2024-03-07	Code Explanation	gemini-1.5-pro-preview-0215	1.0
2024-03-06	Issue/Epic	claude-2	7.75
2024-03-06	Issue/Epic	gpt-4	3.06
2024-03-06	Issue/Epic	claude-3-sonnet	4.55
2024-03-06	Issue/Epic	claude-3-opus	1.61
2024-03-06	Code Explanation	claude-2	64.5
2024-03-06	Code Explanation	claude-3-opus	0.32
2024-03-06	Code Explanation	claude-3-sonnet	4.78
2024-03-05	Documentation	claude-2	100.0
2024-03-05	Documentation	gpt-4	9.02
2024-03-05	Documentation	claude-3-sonnet	92.62
2024-03-05	Code Generation	claude-2	78.62
2024-03-05	Code Generation	claude-3-sonnet	45.37
2024-03-05	Issue/Epic	claude-2	7.58
2024-03-05	Issue/Epic	claude-3-sonnet	5.68
2024-03-05	Issue/Epic	claude-3-opus	2.26
2024-03-05	Issue/Epic	gpt-4	2.99

SQL to generate the data

WITH totalCount AS (
  SELECT
  COUNT(*) total,
  EXTRACT(date
  FROM
    created_at) AS created_at,
  task,
  answering_model
FROM
  `dev-ai-research-0e2f8974.duo_chat_foundation_models.llm_judge`
GROUP BY
  created_at,
  task,
  answering_model
LIMIT
  1000
), totalErrorCount AS (
  SELECT
  COUNT(*) total,
  EXTRACT(date
  FROM
    created_at) AS created_at,
  task,
  answering_model
FROM
  `dev-ai-research-0e2f8974.duo_chat_foundation_models.llm_judge`
WHERE
  answer LIKE "Unfortunately%context%" OR
  answer LIKE "The provided context%"
GROUP BY
  created_at,
  task,
  answering_model
LIMIT
  1000
)

SELECT
  ROUND(totalErrorCount.total/totalCount.total * 100, 2),
  totalErrorCount.created_at,
  totalErrorCount.task,
  totalErrorCount.answering_model
FROM
  totalErrorCount, 
  totalCount
WHERE
  totalErrorCount.task = totalCount.task AND
  totalErrorCount.created_at = totalCount.created_at AND
  totalErrorCount.answering_model = totalCount.answering_model

Low quality answers due to incompatible prompt with null context

Problem to solve

Proposal

Further details

Links / references