Allow Organizations to opt out of Fireworks / Qwen 2.5

Context

We have a customer with a dedicated model risk management process who is asking for more time (on the order of ~6 months) to review Fireworks and the Qwen2.5 7B model that we plan to enable in January. We need to allow this customer (and possible other SaaS customers) to opt out of the Fireworks model.

Technical context

Fireworks Qwen is introduced behind a Feature Flag that checks if the FF is enabled for the current user.

References

Feature Flags guide: https://docs.gitlab.com/ee/development/feature_flags/

These are the places where the Code Completion model is selected:

Proposal

We should have an additional Feature Flag check on whether Fireworks is enabled for the current user with the current project. Since this is going to be a long-term check, we should introduce a new ops Feature Flag. This new FF will be used in combination with the current Fireworks feature flag.

Actual implementation

Introduce an ops Feature Flag for opting-out of Fireworks
When selecting models, check the Feature Flag by the top-level group that is giving the user Duo access.
- gather all the top-level groups that are giving the current user Duo access
- for each group, check if they have opted out of Fireworks
- if at least one group has opted out of Fireworks, disable Fireworks

Since we are dealing with multiple groups, this has potential to be expensive in terms of Feature Flag checking. However, we are expecting that for most if not all users, only 1 group is providing them the Duo access. So, in practice, we would only be querying for 1 group and checking the Feature Flag against 1 group.

For further details, see #509365 (comment 2282111238)

Further Details

Scope of this issue

The scope of this issue is to introduce an ops feature flag that would allow a customer to opt out of using Qwen 2.5 via Fireworks for code completion, and their code completion requests would be routed to Vertex AI. We're planning to maintain this ops feature flag for about 6 months until the middle of calendar year 2025. We have flexibility to change this timing based on customer demand. More details on the motivation below. With this ops feature flag in place, we can remove the existing fireworks_qwen_code_completion feature flag in early 2025.

Acceptance criteria

We have a new ops feature flag established, separate from the current fireworks_qwen_code_completion feature flag
The ops feature flag is default_enabled = false, which routes code completion requests to Fireworks
The ops feature flag operates on the group or project actor
- As an exception on behalf of a SaaS customer, we can update the feature flag on their behalf.
- A self-managed customer can update the feature flag directly.
A customer can enable the feature flag, which will ensure code completion requests are not routed to Fireworks
- Happy to discuss this if we prefer the inverse logic (i.e. disabling the flag to route traffic to Fireworks); my proposal above seems most consistent with https://docs.gitlab.com/ee/development/feature_flags/#constraints-3.

Background and motivation

This ops feature flag is intended to be an exception option for customers who have their own internal approval process for GitLab subprocessor which may take longer than the 2-3 months we typically persist a derisk or beta feature flag. We've heard this specific concern from a large TCV customer. We want to ensure customers don't have to confront the decision to potentially disable Duo while they're working through their approval process. Feature flags are available for self-managed customers, and are not available to SaaS customers. The proposed scope includes an exception path for SaaS customers where GitLab configures the feature flag on their behalf.

Further in the future, we might consider an admin option to select from a finite list of models and providers. This would be an alternative solution to the ops feature flag but we haven’t yet committed to this solution pattern.

Separately, we have an existing failover solution in place to move traffic to a backup model/provider. This is orthogonal to the scope of this issue but mentioning it here for completeness - we’ll use this solution to direct traffic away from Fireworks in the case of an outage, highly elevated latency, or similar problem.

Edited Jan 07, 2025 by Pam Artiaga