Skip to content

Drop regex based testing of Duo Chat answers

Bruno Cardoso requested to merge bc/fix-chat-rspec-tests into master

What does this MR do and why?

Following up on #438177.

Drop the regex based test from ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb leaving only the tooling matcher.

  • The rational is that we know when we want Duo Chat to select the appropriate tool for a given question.
  • But looking for specific words within the answer itself is finicky given the probabilistic behaviour of LLMs.
  • Left a single answer matching test which is when asked about Duo Chat's name.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Run the following rspec test:
LLM_DEBUG=1 \
ANTHROPIC_API_KEY="<your-anthropic-api-key>"  \
VERTEX_AI_PROJECT='<your-gcp-project-id>' \
REAL_AI_REQUEST=1 \
bin/rspec ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb
Full log
LLM_DEBUG=1 \
ANTHROPIC_API_KEY="<your-anthropic-api-key>"  \
VERTEX_AI_PROJECT='<your-gcp-project-id>' \
REAL_AI_REQUEST=1 \
bin/rspec ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb
/Users/bruno/gitlab/gitlab-development-kit/gitlab/spec/deprecation_warnings.rb:13: warning: Ignored warnings for Ruby < 3.2 are no longer necessary.
Run options: include {:focus=>true}

All examples were filtered out; ignoring {:focus=>true}

Test environment set up in 1.450435 seconds
......................F...F.................F......F...............

Failures:

  1) Gitlab::Llm::Completions::Chat real requests with predefined issue with chat history input_template: "Can you provide more details about that issue?", tools: ["IssueIdentifier", "ResourceReader"] behaves like successful prompt processing answers query using expected tools
     Failure/Error: expect(executor.context).to match_llm_tools(tools)

       expected tools: ["IssueIdentifier", "ResourceReader"]
                  got: ["IssueIdentifier"]
     Shared Example Group: "successful prompt processing" called from ./ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb:212
     # ./ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb:53:in `block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:419:in `block (3 levels) in <top (required)>'
     # ./spec/support/sidekiq_middleware.rb:9:in `with_sidekiq_server_middleware'
     # ./spec/spec_helper.rb:410:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:406:in `block (3 levels) in <top (required)>'
     # ./lib/gitlab/application_context.rb:68:in `with_raw_context'
     # ./spec/spec_helper.rb:406:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:262:in `block (2 levels) in <top (required)>'
     # ./gems/gitlab-rspec/lib/gitlab/rspec/configurations/time_travel.rb:25:in `block (3 levels) in configure!'
     # ./gems/gitlab-rspec/lib/gitlab/rspec/configurations/time_travel.rb:25:in `block (2 levels) in configure!'
     # ./spec/support/system_exit_detected.rb:7:in `block (2 levels) in <main>'
     # ./spec/support/redis.rb:17:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:106:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:60:in `with_cross_joins_prevented'
     # ./spec/support/database/prevent_cross_joins.rb:106:in `block (2 levels) in <main>'
     # ./ee/spec/support/llm.rb:14:in `block (3 levels) in <main>'
     # ./spec/support/webmock.rb:41:in `with_net_connect_allowed'
     # ./ee/spec/support/llm.rb:13:in `block (2 levels) in <main>'

  2) Gitlab::Llm::Completions::Chat real requests with predefined issue with chat history input_template: "Can you identify the unique use cases the commenters have raised on this issue?", tools: ["IssueIdentifier", "ResourceReader"] behaves like successful prompt processing answers query using expected tools
     Failure/Error: expect(executor.context).to match_llm_tools(tools)

       expected tools: ["IssueIdentifier", "ResourceReader"]
                  got: ["IssueIdentifier"]
     Shared Example Group: "successful prompt processing" called from ./ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb:212
     # ./ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb:53:in `block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:419:in `block (3 levels) in <top (required)>'
     # ./spec/support/sidekiq_middleware.rb:9:in `with_sidekiq_server_middleware'
     # ./spec/spec_helper.rb:410:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:406:in `block (3 levels) in <top (required)>'
     # ./lib/gitlab/application_context.rb:68:in `with_raw_context'
     # ./spec/spec_helper.rb:406:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:262:in `block (2 levels) in <top (required)>'
     # ./gems/gitlab-rspec/lib/gitlab/rspec/configurations/time_travel.rb:25:in `block (3 levels) in configure!'
     # ./gems/gitlab-rspec/lib/gitlab/rspec/configurations/time_travel.rb:25:in `block (2 levels) in configure!'
     # ./spec/support/system_exit_detected.rb:7:in `block (2 levels) in <main>'
     # ./spec/support/redis.rb:17:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:106:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:60:in `with_cross_joins_prevented'
     # ./spec/support/database/prevent_cross_joins.rb:106:in `block (2 levels) in <main>'
     # ./ee/spec/support/llm.rb:14:in `block (3 levels) in <main>'
     # ./spec/support/webmock.rb:41:in `with_net_connect_allowed'
     # ./ee/spec/support/llm.rb:13:in `block (2 levels) in <main>'

  3) Gitlab::Llm::Completions::Chat real requests when asking about how to use GitLab input_template: "What is DevOps? What is DevSecOps?", tools: ["GitlabDocumentation"] behaves like successful prompt processing answers query using expected tools
     Failure/Error: expect(executor.context).to match_llm_tools(tools)

       expected tools: ["GitlabDocumentation"]
                  got: []
     Shared Example Group: "successful prompt processing" called from ./ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb:289
     # ./ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb:53:in `block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:419:in `block (3 levels) in <top (required)>'
     # ./spec/support/sidekiq_middleware.rb:9:in `with_sidekiq_server_middleware'
     # ./spec/spec_helper.rb:410:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:406:in `block (3 levels) in <top (required)>'
     # ./lib/gitlab/application_context.rb:68:in `with_raw_context'
     # ./spec/spec_helper.rb:406:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:262:in `block (2 levels) in <top (required)>'
     # ./spec/support/system_exit_detected.rb:7:in `block (2 levels) in <main>'
     # ./spec/support/redis.rb:17:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:106:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:60:in `with_cross_joins_prevented'
     # ./spec/support/database/prevent_cross_joins.rb:106:in `block (2 levels) in <main>'
     # ./ee/spec/support/llm.rb:14:in `block (3 levels) in <main>'
     # ./spec/support/webmock.rb:41:in `with_net_connect_allowed'
     # ./ee/spec/support/llm.rb:13:in `block (2 levels) in <main>'

  4) Gitlab::Llm::Completions::Chat real requests with predefined epic with chat history input_template: "Can you provide more details about that epic?", tools: ["EpicIdentifier", "ResourceReader"] behaves like successful prompt processing answers query using expected tools
     Failure/Error: expect(executor.context).to match_llm_tools(tools)

       expected tools: ["EpicIdentifier", "ResourceReader"]
                  got: ["EpicIdentifier"]
     Shared Example Group: "successful prompt processing" called from ./ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb:377
     # ./ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb:53:in `block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:419:in `block (3 levels) in <top (required)>'
     # ./spec/support/sidekiq_middleware.rb:9:in `with_sidekiq_server_middleware'
     # ./spec/spec_helper.rb:410:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:406:in `block (3 levels) in <top (required)>'
     # ./lib/gitlab/application_context.rb:68:in `with_raw_context'
     # ./spec/spec_helper.rb:406:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:262:in `block (2 levels) in <top (required)>'
     # ./spec/support/system_exit_detected.rb:7:in `block (2 levels) in <main>'
     # ./spec/support/redis.rb:17:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:106:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:60:in `with_cross_joins_prevented'
     # ./spec/support/database/prevent_cross_joins.rb:106:in `block (2 levels) in <main>'
     # ./ee/spec/support/llm.rb:14:in `block (3 levels) in <main>'
     # ./spec/support/webmock.rb:41:in `with_net_connect_allowed'
     # ./ee/spec/support/llm.rb:13:in `block (2 levels) in <main>'

Finished in 15 minutes 50 seconds (files took 14.49 seconds to load)
67 examples, 4 failures

Failed examples:

rspec './ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb[1:1:3:2:1:1:1]' # Gitlab::Llm::Completions::Chat real requests with predefined issue with chat history input_template: "Can you provide more details about that issue?", tools: ["IssueIdentifier", "ResourceReader"] behaves like successful prompt processing answers query using expected tools
rspec './ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb[1:1:3:2:5:1:1]' # Gitlab::Llm::Completions::Chat real requests with predefined issue with chat history input_template: "Can you identify the unique use cases the commenters have raised on this issue?", tools: ["IssueIdentifier", "ResourceReader"] behaves like successful prompt processing answers query using expected tools
rspec './ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb[1:1:5:5:1:1]' # Gitlab::Llm::Completions::Chat real requests when asking about how to use GitLab input_template: "What is DevOps? What is DevSecOps?", tools: ["GitlabDocumentation"] behaves like successful prompt processing answers query using expected tools
rspec './ee/spec/lib/gitlab/llm/completions/chat_real_requests_spec.rb[1:1:6:2:1:1:1]' # Gitlab::Llm::Completions::Chat real requests with predefined epic with chat history input_template: "Can you provide more details about that epic?", tools: ["EpicIdentifier", "ResourceReader"] behaves like successful prompt processing answers query using expected tools

Randomized with seed 42200
Edited by Bruno Cardoso

Merge request reports