Support enablement of Fireworks/Qwen model by top-level group (!176841) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

We recently introduced Fireworks/Qwen as a new Code Completions model behind the fireworks_qwen_code_completion beta type FF. We plan to start rolling this out to GitLab.com soon. However, some organizations cannot switch to a new model according to our rollout timeline, so we need to allow organizations to opt out of Fireworks/Qwen while they are evaluating the model on their end.

In this MR:

Introduce an ops Feature Flag for opting out of Fireworks/Qwen
- as mentioned, Fireworks already has a beta FF. The beta FF is for rollout, while this new ops FF is for long-term opt in/out operations.
Use the new ops FF in combination with the existing beta FF in the model selection logic.
- On GitLab SaaS, the ops FF is checked against the top-level groups that are giving the user Duo access. The logic for the check is:
  1. gather all the top-level groups that are giving the current user Duo access
  2. for each group, check if they have opted out of Fireworks
  3. if at least one group has opted out of Fireworks, disable Fireworks
- On GitLab self-managed, the ops FF is checked against the current user. While we can check the FF on the instance level, we will follow FF development recommendations and check against the user actor.

The FF check for SaaS and Self-Managed are different because the subscription structure is different. On SaaS, subscription to "GitLab with Duo Addon" is managed by group. On Self-Managed, the subscription is managed by instance.

New Query

This MR introduces one new query through Ai::UserAuthorizable#duo_available_namespace_ids.

Query Plan

Expand for raw SQL

SELECT "subscription_add_on_purchases"."namespace_id" 
FROM "subscription_user_add_on_assignments" 
INNER JOIN "subscription_add_on_purchases" ON "subscription_add_on_purchases"."id" = "subscription_user_add_on_assignments"."add_on_purchase_id" 
WHERE "subscription_user_add_on_assignments"."user_id" = 2036978
AND "subscription_add_on_purchases"."subscription_add_on_id" IN (
  SELECT "subscription_add_ons"."id" 
  FROM "subscription_add_ons" 
  WHERE "subscription_add_ons"."name" IN (1, 3)
) 
AND (started_at IS NULL OR started_at <= '2025-01-07') 
AND ('2025-01-07' < expires_on)

References

Issues related to this MR:

Feature change issue: Allow Organizations to opt out of Fireworks / Q... (#509365 - closed)
Feature Flag issue: [Feature flag] Rollout of `code_completion_prim... (#510875 - closed)

Preceding Fireworks/Qwen issues:

Fireworks/Qwen epic: Code Completion: Fireworks platform & Qwen 2.5 ... (&15850 - closed)
Main Fireworks issue: Add Fireworks Code Completion Support (#500742 - closed)
- Main Rails MR: Add fireworks qwen code completion support (!170503 - merged)
- Main AIGW MR: feat: add fireworks code completion support (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!1517 - merged)
beta FF issue: [Feature flag] Rollout of `fireworks_qwen_code_... (#500744 - closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

N/A

How to set up and validate locally

Setup - AI Gateway

Make sure AI Gateway is running on your GDK, and you have Code Suggestions enabled (see instructions).

You can setup a Fireworks/Qwen connection on your local machine, but the Fireworks/Qwen test server may be down. The quickest way to test is to mock the Fireworks/Qwen response with:

Option 1: Mock Fireworks Response

Apply the following change on AIGW:

diff --git a/ai_gateway/models/litellm.py b/ai_gateway/models/litellm.py
index e78e788f..344dd4cc 100644
--- a/ai_gateway/models/litellm.py
+++ b/ai_gateway/models/litellm.py
@@ -333,37 +333,12 @@ class LiteLlmTextGenModel(TextGenModelBase):
         code_context: Optional[Sequence[str]] = None,
         snowplow_event_context: Optional[SnowplowEventContext] = None,
     ) -> Union[TextGenModelOutput, AsyncIterator[TextGenModelChunk]]:
-        should_stream = not self.disable_streaming and stream
-
-        with self.instrumentator.watch(stream=should_stream) as watcher:
-            try:
-                suggestion = await self._get_suggestion(
-                    prefix=prefix,
-                    suffix=suffix,
-                    stream=should_stream,
-                    temperature=temperature,
-                    max_output_tokens=max_output_tokens,
-                    top_p=top_p,
-                    snowplow_event_context=snowplow_event_context,
-                )
-            except APIConnectionError as ex:
-                raise LiteLlmAPIConnectionError.from_exception(ex)
-            except InternalServerError as ex:
-                raise LiteLlmInternalServerError.from_exception(ex)
-
-            if should_stream:
-                return self._handle_stream(
-                    suggestion,
-                    watcher.finish,
-                    watcher.register_error,
-                )
-
         return TextGenModelOutput(
-            text=self._extract_suggestion_text(suggestion),
+            text="mock suggestion",
             # Give a high value, the model doesn't return scores.
             score=10**5,
             safety_attributes=SafetyAttributes(),
-            metadata=self._extract_suggestion_metadata(suggestion),
+            metadata=TokensConsumptionMetadata(output_tokens=0),
         )
 
     async def _handle_stream(

Option 2: Setup Fireworks/Qwen connection on your AI Gateway

Setup - set the following environment variables:

AIGW_MODEL_ENDPOINTS__FIREWORKS_REGIONAL_ENDPOINTS='{ "us": { "endpoint": "https://gitlab-ab7e8cb8.us-texas-1.direct.fireworks.ai:30443/v1", "identifier": "accounts/fireworks/models/qwen2p5-coder-7b#accounts/gitlab/deployments/ab7e8cb8"} }'
AIGW_MODEL_KEYS__FIREWORKS_API_KEY=<the fireworks API key found in 1Password>

Command:

# example request
curl -X "POST" "http://gdk.test:5052/v2/code/completions" \
     -H 'Content-Type: application/json; charset=utf-8' \
     -d $'{
  "current_file": {
    "content_below_cursor": "",
    "file_name": "main.go",
    "language_identifier": "go",
    "content_above_cursor": "print(\\"hello"
  },
  "model_provider": "fireworks_ai",
  "model_name": "qwen2p5-coder-7b",
  "prompt_version": 1
}' \
| json_pp -json_opt pretty,canonical

# expected response
{
   "choices" : [
      {
         "finish_reason" : "length",
         "index" : 0,
         "text" : " world\")\nfor i in range(5):\n\tprint(i)"
      }
   ],
   "created" : 1734490639,
   "experiments" : [],
   "id" : "id",
   "model" : {
      "engine" : "fireworks_ai",
      "lang" : "go",
      "name" : "text-completion-fireworks_ai/qwen2p5-coder-7b",
      "tokens_consumption_metadata" : {
         "context_tokens_sent" : 0,
         "context_tokens_used" : 0,
         "input_tokens" : 3,
         "output_tokens" : 12
      }
   },
   "object" : "text_completion"
}

Tests

The checking and operations for the ops Feature Flag is different for GitLab SaaS vs Self-Managed, so we need to test both modes.

Testing on GitLab SaaS

Setup

Make sure your GitLab instance is running in SaaS mode by setting GITLAB_SIMULATE_SAAS=1 environment variable
Create a top-level group.

Setup a GitLab Subscription with Duo Addon by running the relevant rake task:

GITLAB_SIMULATE_SAAS=1 bundle exec 'rake gitlab:duo:setup[test-group-name]'

Make sure that the Fireworks/Qwen beta FF is enabled:
```
Feature.enable(:fireworks_qwen_code_completion)
```

Test - opted in to Fireworks/Qwen

Make sure that the ops FF is disabled globally and for the group you are testing:

pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen)
=> false

pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen, group)
=> false

Call the /code_suggestions/completions API and verify that it is using Fireworks/Qwen

  # refer to the "Example Requests" section below for the example API call

  # expected response, note the model engine and name
  {
     ...other fields...,
     "model" : {
        "engine" : "fireworks_ai",
        "name" : "text-completion-fireworks_ai/qwen2p5-coder-7b",
        ...other fields...
     },
  }

Call the /code_suggestions/direct_access API and verify that it includes the Fireworks/Qwen model details in the response

  # refer to the "Example Requests" section below for the example API call

  # expected response
  {
     ...other fields...,
     "model_details" : {
        "model_name" : "qwen2p5-coder-7b",
        "model_provider" : "fireworks_ai"
     },
  }

Test - opted out of Fireworks/Qwen

Make sure that the ops FF is enabled for the group you are testing:

group = # find your test group
Feature.enable(:code_completion_model_opt_out_from_fireworks_qwen, group)

# should be disabled globally
pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen)
=> false

# should be enabled for your test group
pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen, group)
=> true

Call the /code_suggestions/completions API and verify that it is NOT using Fireworks/Qwen

  # refer to the "Example Requests" section below for the example API call

  # example response, note the model engine and name
  {
     ...other fields...,
     "model" : {
        "engine" : "vertex-ai",
        "name" : "code-gecko@002",
     },
  }

Call the /code_suggestions/direct_access API and verify that it does not include a model_details field

Testing on GitLab Self-Managed

Setup

Make sure your GitLab instance is running in self-managed mode by setting GITLAB_SIMULATE_SAAS=0 environment variable
Make sure that the Fireworks/Qwen beta FF is enabled:
```
Feature.enable(:fireworks_qwen_code_completion)
```

Test - opted in to Fireworks/Qwen

Make sure that the ops FF is disabled globally and for the user you are testing:

pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen)
=> false

pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen, your_user)
=> false

Call the /code_suggestions/completions API and verify that it is using Fireworks/Qwen

  # refer to the "Example Requests" section below for the example API call

  # expected response, note the model engine and name
  {
     ...other fields...,
     "model" : {
        "engine" : "fireworks_ai",
        "name" : "text-completion-fireworks_ai/qwen2p5-coder-7b",
        ...other fields...
     },
  }

Call the /code_suggestions/direct_access API and verify that it includes the Fireworks/Qwen model details in the response

  # refer to the "Example Requests" section below for the example API call

  # expected response
  {
     ...other fields...,
     "model_details" : {
        "model_name" : "qwen2p5-coder-7b",
        "model_provider" : "fireworks_ai"
     },
  }

Test - opted out of Fireworks/Qwen

Make sure that the ops FF is enabled globally and/or the user you are testing:

Feature.enable(:code_completion_model_opt_out_from_fireworks_qwen)

# OR

Feature.enable(:code_completion_model_opt_out_from_fireworks_qwen, your_user)

Call the /code_suggestions/completions API and verify that it is NOT using Fireworks/Qwen

  # refer to the "Example Requests" section below for the example API call

  # example response, note the model engine and name
  {
     ...other fields...,
     "model" : {
        "engine" : "vertex-ai",
        "name" : "code-gecko@002",
     },
  }

Call the /code_suggestions/direct_access API and verify that it does not include a model_details field

Example Requests

/code_suggestions/completions

curl "http://gdk.test:3000/api/v4/code_suggestions/completions" \
-X POST \
--header "Authorization: Bearer $PERSONAL_ACCESS_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "current_file": {
    "file_name": "hello.rb",
    "content_above_cursor": "class HelloWorld\n    def hello_world",
    "content_below_cursor": "end"
  },
  "prompt_version": 1,
  "stream": false,
  "intent": "completion"
}' \
| json_pp -json_opt pretty,canonical

/code_suggestions/direct_access

curl "http://gdk.test:3000/api/v4/code_suggestions/direct_access" \
-X POST \
--header "Authorization: Bearer $PERSONAL_ACCESS_TOKEN" \
--header "Content-Type: application/json" \
| json_pp -json_opt pretty,canonical

Related to #509365 (closed)

Edited Jan 07, 2025 by Pam Artiaga

Support enablement of Fireworks/Qwen model by top-level group