Skip to content

Duo Chat adds unrelated sources to its answers

Summary

Steps to reproduce

The prompt was Please show an example for real-time processing on an Arduino hardware, including source code and hardware requirements. Context in https://gitlab.com/gitlab-com/marketing/developer-relations/developer-advocacy/developer-advocacy-meta/-/issues/415#note_1925859117

image image image.png

What is the current bug behavior?

Initially, there was a long list of (unrelated) source references (see dark screenshots). Then @shinya.maeda reduced the number of source references to four (see white screenshots). However, the sources referenced can still be unrelated to the question/answer - the is just shorter.

It looks like the chat:

  1. attempted to answer the question with the docs tool,
  2. then decided these snippets don’t help answering the question,
  3. then answered the question based on its training data,
  4. but still added the links to all snippets that were considered.

What is the expected correct behavior?

Chat should only show those snippets that were actually used to answer the questions.

Additional recommendation: CEF should test not only the answer but also the whether the references are the right ones ( #466662).

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of: \`sudo gitlab-rake gitlab:env:info\`) (For installations from source run and paste the output of: \`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production\`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:check SANITIZE=true`)

(For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`)

(we will only investigate if the tests are passing)

Possible fixes

Edited by Torsten Linz