Add prompt caching opt out backend
What does this MR do and why?
We would like to implement an opt-out mechanism for any users who do not want to use prompt caching for code completion.
This MR does the following
- Add a top-level namespace setting (in GitLab Rails) to let admins opt out of prompt caching
- This setting should then apply to all groups and projects within the top-level namespace.
- Pass that information along in the headers which we pass to the client when fetching direct_access (see code here). We need to make sure this information is added for indirect connections as well.
Implementation
followed the guide here: !143278 (diffs) and !143278 (diffs)
AI Settings notes
We will also need to work across teams to ensure there's an AI Settings admin option to support the opt out. Prior details on this related to Chat are captured here: &16708 (closed)
Note: Using this as example: !188732 (diffs)
let(:top_level_namespace) { create(:group) }
let(:enabled_group) { create(:group, parent: top_level_namespace) }
let(:project_setting) { create(:project_setting, model_prompt_cache_enabled: false) }
let(:disabled_project) { create(:project, group: enabled_group, project_setting: project_setting) }
# set at project_setting
project.read_attribute_before_type_cast(:model_prompt_cache_enabled)
=> false
group.read_attribute_before_type_cast(:model_prompt_cache_enabled)
=> true
application_setting.read_attribute_before_type_cast(:model_prompt_cache_enabled)
=> true
project.model_prompt_cache_enabled => false
group.model_prompt_cache_enabled => true
application_setting.model_prompt_cache_enabled => true
# set at group namespace_setting
project.read_attribute_before_type_cast(:model_prompt_cache_enabled)
=> nil
group.read_attribute_before_type_cast(:model_prompt_cache_enabled)
=> false
application_setting.read_attribute_before_type_cast(:model_prompt_cache_enabled)
=> true
project.model_prompt_cache_enabled => false
group.model_prompt_cache_enabled => false
application_setting.model_prompt_cache_enabled => true
# set at application_setting
project.read_attribute_before_type_cast(:model_prompt_cache_enabled)
=> nil
group.read_attribute_before_type_cast(:model_prompt_cache_enabled)
=> nil
application_setting.read_attribute_before_type_cast(:model_prompt_cache_enabled)
=> true
project.model_prompt_cache_enabled = true
group.model_prompt_cache_enabled = true
application_setting.model_prompt_cache_enabled => true
References
gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!2376 (merged)
Screenshots or screen recordings
verify:
- set the project setting to false:
[33] pry(main)> p = Project.where(name: "Gitlab Shell")
[33] pry(main)> p = p.first
[33] pry(main)> ps = p.project_setting
[33] pry(main)> ps.model_prompt_cache_enabled = false
[33] pry(main)> ps.save
curl --request POST \
--url 127.0.0.1:3000/api/v4/code_suggestions/direct_access \
--header "Authorization: Bearer glpat-4dXV1h" \
--header "Content-Type: application/json" \
-d '{"project_path":"gitlab-org/gitlab-shell"}'
{
"base_url":"http://0.0.0.0:5052",
"token":"e",
"expires_at":1745421283,
"headers":{
"X-Gitlab-Feature-Enablement-Type":"add_on",
"x-gitlab-enabled-feature-flags":"",
"x-gitlab-enabled-instance-verbose-ai-logs":"true",
"X-Gitlab-Host-Name":"127.0.0.1",
"X-Gitlab-Instance-Id":"5b4465f2-986e-44b1-adcf-35492a19369a",
"X-Gitlab-Realm":"self-managed",
"X-Gitlab-Version":"18.0.0",
"X-Gitlab-Global-User-Id":"pYtWOLH65PxMJSvaiKZhAESiyW2Gi/I/c+/0QjgQls8=",
"X-Gitlab-Duo-Seat-Count":"100",
"X-Gitlab-Feature-Enabled-By-Namespace-Ids":"1000000",
"X-Gitlab-Model-Prompt-Cache-Enabled":"false",
"X-Gitlab-Authentication-Type":"oidc"
},
"model_details":{
"model_provider":"fireworks_ai",
"model_name":"codestral-2501"
}
- set the top-level namespace setting to false:
[33] pry(main)> ps.model_prompt_cache_enabled = nil
[33] pry(main)> ps.save
[33] pry(main)> g = p.parent
[33] pry(main)> g.parent ==========================================> returns nil
[33] pry(main)> gn = g.namespace_settings
[33] pry(main)> gn.model_prompt_cache_enabled = false
[33] pry(main)> gn.save
curl --request POST \
--url 127.0.0.1:3000/api/v4/code_suggestions/direct_access \
--header "Authorization: Bearer glpat-4dXV1hMz" \
--header "Content-Type: application/json" \
-d '{"project_path":"gitlab-org/gitlab-shell"}'
{
"base_url":"http://0.0.0.0:5052",
"token":"e",
"expires_at":1745421283,
"headers":{
"X-Gitlab-Feature-Enablement-Type":"add_on",
"x-gitlab-enabled-feature-flags":"",
"x-gitlab-enabled-instance-verbose-ai-logs":"true",
"X-Gitlab-Host-Name":"127.0.0.1",
"X-Gitlab-Instance-Id":"5b4465f2-986e-44b1-adcf-35492a19369a",
"X-Gitlab-Realm":"self-managed",
"X-Gitlab-Version":"18.0.0",
"X-Gitlab-Global-User-Id":"pYtWOLH65PxMJSvaiKZhAESiyW2Gi/I/c+/0QjgQls8=",
"X-Gitlab-Duo-Seat-Count":"100",
"X-Gitlab-Feature-Enabled-By-Namespace-Ids":"1000000",
"X-Gitlab-Model-Prompt-Cache-Enabled":"false",
"X-Gitlab-Authentication-Type":"oidc"
},
"model_details":{
"model_provider":"fireworks_ai",
"model_name":"codestral-2501"
}
- not setting at project and namespace level, use application setting as default
[33] pry(main)> gn.model_prompt_cache_enabled = nil
[33] pry(main)> gn.save
[42] pry(main)> ApplicationSetting.first.model_prompt_cache_enabled
=> true
{
"base_url":"http://0.0.0.0:5052",
"token":"e",
"expires_at":1745422111,
"headers":{
"X-Gitlab-Feature-Enablement-Type":"add_on",
"x-gitlab-enabled-feature-flags":"",
"x-gitlab-enabled-instance-verbose-ai-logs":"true",
"X-Gitlab-Host-Name":"127.0.0.1",
"X-Gitlab-Instance-Id":"5b4465f2-986e-44b1-adcf-35492a19369a",
"X-Gitlab-Realm":"self-managed",
"X-Gitlab-Version":"18.0.0",
"X-Gitlab-Global-User-Id":"pYtWOLH65PxMJSvaiKZhAESiyW2Gi/I/c+/0QjgQls8=",
"X-Gitlab-Duo-Seat-Count":"100",
"X-Gitlab-Feature-Enabled-By-Namespace-Ids":"1000000",
"X-Gitlab-Model-Prompt-Cache-Enabled":"true",
"X-Gitlab-Authentication-Type":"oidc"
},
"model_details":{
"model_provider":"fireworks_ai",
"model_name":"codestral-2501"
}
}
| Before | After |
|---|---|
How to set up and validate locally
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #535651 (closed)
Edited by Allen Cook