Support enablement of Fireworks/Qwen model by top-level group
What does this MR do and why?
We recently introduced Fireworks/Qwen as a new Code Completions model behind the fireworks_qwen_code_completion beta type FF. We plan to start rolling this out to GitLab.com soon. However, some organizations cannot switch to a new model according to our rollout timeline, so we need to allow organizations to opt out of Fireworks/Qwen while they are evaluating the model on their end.
In this MR:
- Introduce an
opsFeature Flag for opting out of Fireworks/Qwen- as mentioned, Fireworks already has a
betaFF. ThebetaFF is for rollout, while this newopsFF is for long-term opt in/out operations.
- as mentioned, Fireworks already has a
- Use the new
opsFF in combination with the existingbetaFF in the model selection logic.- On GitLab SaaS, the
opsFF is checked against the top-level groups that are giving the user Duo access. The logic for the check is:- gather all the top-level groups that are giving the current user Duo access
- for each group, check if they have opted out of Fireworks
- if at least one group has opted out of Fireworks, disable Fireworks
- On GitLab self-managed, the
opsFF is checked against the current user. While we can check the FF on the instance level, we will follow FF development recommendations and check against the user actor.
- On GitLab SaaS, the
The FF check for SaaS and Self-Managed are different because the subscription structure is different. On SaaS, subscription to "GitLab with Duo Addon" is managed by group. On Self-Managed, the subscription is managed by instance.
New Query
This MR introduces one new query through Ai::UserAuthorizable#duo_available_namespace_ids.
Expand for raw SQL
SELECT "subscription_add_on_purchases"."namespace_id"
FROM "subscription_user_add_on_assignments"
INNER JOIN "subscription_add_on_purchases" ON "subscription_add_on_purchases"."id" = "subscription_user_add_on_assignments"."add_on_purchase_id"
WHERE "subscription_user_add_on_assignments"."user_id" = 2036978
AND "subscription_add_on_purchases"."subscription_add_on_id" IN (
SELECT "subscription_add_ons"."id"
FROM "subscription_add_ons"
WHERE "subscription_add_ons"."name" IN (1, 3)
)
AND (started_at IS NULL OR started_at <= '2025-01-07')
AND ('2025-01-07' < expires_on)
References
Issues related to this MR:
- Feature change issue: Allow Organizations to opt out of Fireworks / Q... (#509365 - closed)
- Feature Flag issue: [Feature flag] Rollout of `code_completion_prim... (#510875 - closed)
Preceding Fireworks/Qwen issues:
- Fireworks/Qwen epic: Code Completion: Fireworks platform & Qwen 2.5 ... (&15850 - closed)
- Main Fireworks issue: Add Fireworks Code Completion Support (#500742 - closed)
-
betaFF issue: [Feature flag] Rollout of `fireworks_qwen_code_... (#500744 - closed)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
N/A
How to set up and validate locally
Setup - AI Gateway
Make sure AI Gateway is running on your GDK, and you have Code Suggestions enabled (see instructions).
You can setup a Fireworks/Qwen connection on your local machine, but the Fireworks/Qwen test server may be down. The quickest way to test is to mock the Fireworks/Qwen response with:
Option 1: Mock Fireworks Response
Apply the following change on AIGW:
diff --git a/ai_gateway/models/litellm.py b/ai_gateway/models/litellm.py
index e78e788f..344dd4cc 100644
--- a/ai_gateway/models/litellm.py
+++ b/ai_gateway/models/litellm.py
@@ -333,37 +333,12 @@ class LiteLlmTextGenModel(TextGenModelBase):
code_context: Optional[Sequence[str]] = None,
snowplow_event_context: Optional[SnowplowEventContext] = None,
) -> Union[TextGenModelOutput, AsyncIterator[TextGenModelChunk]]:
- should_stream = not self.disable_streaming and stream
-
- with self.instrumentator.watch(stream=should_stream) as watcher:
- try:
- suggestion = await self._get_suggestion(
- prefix=prefix,
- suffix=suffix,
- stream=should_stream,
- temperature=temperature,
- max_output_tokens=max_output_tokens,
- top_p=top_p,
- snowplow_event_context=snowplow_event_context,
- )
- except APIConnectionError as ex:
- raise LiteLlmAPIConnectionError.from_exception(ex)
- except InternalServerError as ex:
- raise LiteLlmInternalServerError.from_exception(ex)
-
- if should_stream:
- return self._handle_stream(
- suggestion,
- watcher.finish,
- watcher.register_error,
- )
-
return TextGenModelOutput(
- text=self._extract_suggestion_text(suggestion),
+ text="mock suggestion",
# Give a high value, the model doesn't return scores.
score=10**5,
safety_attributes=SafetyAttributes(),
- metadata=self._extract_suggestion_metadata(suggestion),
+ metadata=TokensConsumptionMetadata(output_tokens=0),
)
async def _handle_stream(
Option 2: Setup Fireworks/Qwen connection on your AI Gateway
Setup - set the following environment variables:
AIGW_MODEL_ENDPOINTS__FIREWORKS_REGIONAL_ENDPOINTS='{ "us": { "endpoint": "https://gitlab-ab7e8cb8.us-texas-1.direct.fireworks.ai:30443/v1", "identifier": "accounts/fireworks/models/qwen2p5-coder-7b#accounts/gitlab/deployments/ab7e8cb8"} }'
AIGW_MODEL_KEYS__FIREWORKS_API_KEY=<the fireworks API key found in 1Password>
Command:
# example request
curl -X "POST" "http://gdk.test:5052/v2/code/completions" \
-H 'Content-Type: application/json; charset=utf-8' \
-d $'{
"current_file": {
"content_below_cursor": "",
"file_name": "main.go",
"language_identifier": "go",
"content_above_cursor": "print(\\"hello"
},
"model_provider": "fireworks_ai",
"model_name": "qwen2p5-coder-7b",
"prompt_version": 1
}' \
| json_pp -json_opt pretty,canonical
# expected response
{
"choices" : [
{
"finish_reason" : "length",
"index" : 0,
"text" : " world\")\nfor i in range(5):\n\tprint(i)"
}
],
"created" : 1734490639,
"experiments" : [],
"id" : "id",
"model" : {
"engine" : "fireworks_ai",
"lang" : "go",
"name" : "text-completion-fireworks_ai/qwen2p5-coder-7b",
"tokens_consumption_metadata" : {
"context_tokens_sent" : 0,
"context_tokens_used" : 0,
"input_tokens" : 3,
"output_tokens" : 12
}
},
"object" : "text_completion"
}
Tests
The checking and operations for the ops Feature Flag is different for GitLab SaaS vs Self-Managed, so we need to test both modes.
Testing on GitLab SaaS
Setup
-
Make sure your GitLab instance is running in SaaS mode by setting
GITLAB_SIMULATE_SAAS=1environment variable -
Create a top-level group.
-
Setup a GitLab Subscription with Duo Addon by running the relevant rake task:
GITLAB_SIMULATE_SAAS=1 bundle exec 'rake gitlab:duo:setup[test-group-name]' -
Make sure that the Fireworks/Qwen beta FF is enabled:
Feature.enable(:fireworks_qwen_code_completion)
Test - opted in to Fireworks/Qwen
-
Make sure that the ops FF is disabled globally and for the group you are testing:
pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen) => false pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen, group) => false -
Call the
/code_suggestions/completionsAPI and verify that it is using Fireworks/Qwen# refer to the "Example Requests" section below for the example API call # expected response, note the model engine and name { ...other fields..., "model" : { "engine" : "fireworks_ai", "name" : "text-completion-fireworks_ai/qwen2p5-coder-7b", ...other fields... }, } -
Call the
/code_suggestions/direct_accessAPI and verify that it includes the Fireworks/Qwen model details in the response# refer to the "Example Requests" section below for the example API call # expected response { ...other fields..., "model_details" : { "model_name" : "qwen2p5-coder-7b", "model_provider" : "fireworks_ai" }, }
Test - opted out of Fireworks/Qwen
-
Make sure that the ops FF is enabled for the group you are testing:
group = # find your test group Feature.enable(:code_completion_model_opt_out_from_fireworks_qwen, group) # should be disabled globally pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen) => false # should be enabled for your test group pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen, group) => true -
Call the
/code_suggestions/completionsAPI and verify that it is NOT using Fireworks/Qwen# refer to the "Example Requests" section below for the example API call # example response, note the model engine and name { ...other fields..., "model" : { "engine" : "vertex-ai", "name" : "code-gecko@002", }, } -
Call the
/code_suggestions/direct_accessAPI and verify that it does not include amodel_detailsfield
Testing on GitLab Self-Managed
Setup
-
Make sure your GitLab instance is running in self-managed mode by setting
GITLAB_SIMULATE_SAAS=0environment variable -
Make sure that the Fireworks/Qwen beta FF is enabled:
Feature.enable(:fireworks_qwen_code_completion)
Test - opted in to Fireworks/Qwen
-
Make sure that the ops FF is disabled globally and for the user you are testing:
pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen) => false pry(main)> Feature.enabled?(:code_completion_model_opt_out_from_fireworks_qwen, your_user) => false -
Call the
/code_suggestions/completionsAPI and verify that it is using Fireworks/Qwen# refer to the "Example Requests" section below for the example API call # expected response, note the model engine and name { ...other fields..., "model" : { "engine" : "fireworks_ai", "name" : "text-completion-fireworks_ai/qwen2p5-coder-7b", ...other fields... }, } -
Call the
/code_suggestions/direct_accessAPI and verify that it includes the Fireworks/Qwen model details in the response# refer to the "Example Requests" section below for the example API call # expected response { ...other fields..., "model_details" : { "model_name" : "qwen2p5-coder-7b", "model_provider" : "fireworks_ai" }, }
Test - opted out of Fireworks/Qwen
-
Make sure that the ops FF is enabled globally and/or the user you are testing:
Feature.enable(:code_completion_model_opt_out_from_fireworks_qwen) # OR Feature.enable(:code_completion_model_opt_out_from_fireworks_qwen, your_user) -
Call the
/code_suggestions/completionsAPI and verify that it is NOT using Fireworks/Qwen# refer to the "Example Requests" section below for the example API call # example response, note the model engine and name { ...other fields..., "model" : { "engine" : "vertex-ai", "name" : "code-gecko@002", }, } -
Call the
/code_suggestions/direct_accessAPI and verify that it does not include amodel_detailsfield
Example Requests
/code_suggestions/completions
curl "http://gdk.test:3000/api/v4/code_suggestions/completions" \
-X POST \
--header "Authorization: Bearer $PERSONAL_ACCESS_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"current_file": {
"file_name": "hello.rb",
"content_above_cursor": "class HelloWorld\n def hello_world",
"content_below_cursor": "end"
},
"prompt_version": 1,
"stream": false,
"intent": "completion"
}' \
| json_pp -json_opt pretty,canonical
/code_suggestions/direct_access
curl "http://gdk.test:3000/api/v4/code_suggestions/direct_access" \
-X POST \
--header "Authorization: Bearer $PERSONAL_ACCESS_TOKEN" \
--header "Content-Type: application/json" \
| json_pp -json_opt pretty,canonical
Related to #509365 (closed)