Let the client (IDE) request Code Suggestions (completions only) directly form the AI Gateway (Skip the monolith)
We currently have the client (in this case the IDE) request Code Suggestions by talking to the monolith (either .com or the user's specific instance), which in turn forwards this request to the AI Gateway. With efforts to [decrease latency](https://gitlab.com/groups/gitlab-org/-/epics/12224 "Improved Code Suggestions Architecture to Decrease Latency") we want to skip the step of talking to the monolith and let the Client talk directly to Cloud Connector (which will forward the request to AI Gateway). Initially this path would only be used for Code Completion requests, since they are more latency sensitive than Code Generation and require less context from Rails.
The flow is visualised in the following diagram:
<details>
<summary>Click to expand diagram</summary>
Click 'Display' at the bottom of the diagram text
```mermaid
sequenceDiagram
autonumber
actor C as Client (IDE)
participant G as GitLab
participant CD as CustomersDot
box rgb(245, 245, 245) cloud.gitlab.com
participant AI as AI Gateway
end
rect rgba(179, 240, 255, .2)
alt SM instance?
Note over G,CD: Daily refresh instance JWT (IJWT) token sync job
loop Sync instance service token (daily)
activate G
G->>CD: POST `graphql/cloudConnectorAccess`<br>with `licenseKey`
activate CD
CD->>CD: Verify cloud license, and if CS add-on is purchased
CD-->>G: return IJWT with correct scopes, expires in 3 days
deactivate CD
G->>G: store the IJWT token to the database
deactivate G
end
end
end
rect rgba(179, 240, 255, .2)
Note over C,AI: End user authentication
activate C
C->>G: POST GitLab (instance)<br>with "/api/v4/code_suggestions/direct_access"
activate G
G->>G: authenticate with PAT
G->>G: verify user assigned to seat
alt SM instance?
G->>AI: POST cloud.gitlab.com/cloud_connector_auth<br>with IJWT
activate AI
alt no cache for public keys?
AI->>CD: GET .well-known/openid-configuration
activate CD
CD->>AI: JWK URI
AI->>CD: GET oauth/discovery/keys
CD->>AI: public keys
deactivate CD
end
AI->>AI: Decode IJWT token<br>with public keys
AI->>AI: issue 'user_jwt'
AI->>G: return 'user_jwt'
deactivate AI
else SaaS?
G->>G: issue 'user_jwt', expires: 1h
end
G->>C: return `user_jwt`
deactivate G
C->>C: store 'user_jwt'
deactivate C
end
rect rgba(179, 240, 255, .2)
Note over C, AI: Code suggestions requests
loop Each code suggestions request
activate C
alt token is about to expire?
C->>G: POST GitLab (instance)<br>with "/api/v4/code_suggestions/direct_access"
alt SM instance?
activate G
G->>AI: POST cloud.gitlab.com/cloud_connector_auth<br>with IJWT
activate AI
alt no cache for public keys?
AI->>CD: GET .well-known/openid-configuration
activate CD
CD->>AI: JWK URI
AI->>CD: GET oauth/discovery/keys
CD->>AI: public keys
deactivate CD
end
AI->>AI: Decode IJWT token<br>with public keys
AI->>AI: issue 'user_jwt', expires: 1h
AI->>G: return 'user_jwt', expires: 1h
deactivate AI
else SaaS?
G->>G: issue 'user_jwt', expires: 1h
end
G->>C: return `user_jwt`, expires: 1h
deactivate G
deactivate C
end
critical request completions with 'user jwt'
activate C
C->>AI: POST cloud.gitlab.com/ai/completions<br>with `user_jwt`
activate AI
alt no cache for public keys?
alt SM instance?
AI->>AI: public keys
else SaaS?
AI->>G: GET .well-known/openid-configuration
activate G
G->>AI: JWK URI
AI->>G: GET oauth/discovery/keys
G->>AI: public keys
deactivate G
end
end
AI->>AI: decode 'user_jwt' using <br>CloudConnector or Gitlab public keys
AI->>AI: POST <br>codesuggestions.gitlab.com/completions
activate AI
deactivate AI
AI->>C: Response
deactivate AI
end
deactivate C
end
end
```
In this sequence diagram you can see AI Gateway under `cloud.gitlab.com`. All calls to AI GW will be made to `Cloud Connector`, i.e. `cloud.gitlab.com` endpoint, which will then forward it to the AI Gateway. Since Cloud Connector is a not a material service or backend, it does not explicitly show up in this diagram.
</details>
### Required changes
To realize this change, we need to make some changes. AI Gateway has to become a token authority ([issue](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/429 "Support creation of short-lived user JWTs for Client <-> AIGW connection")) so that it can create UJWTs. This change will be similar to what we implemented in CustomersDot for UJWTs. The major difference here is that AI GW will validate these tokens as well ([issue](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/430 "Support authentication of self-served short-lived user JWTs")), so it doesn't need to get a JWKS from another backend.
We want to abstract this behind a `cloud.gitlab.com` endpoint ([issue](https://gitlab.com/gitlab-org/gitlab/-/issues/452364 "Cloudflare: Create an endpoint to request short-lived user JWTs that is forwarded to AIGW")), so that SM customers do not have to add every new backend to their proxy allowlist and we can easily interchange it with other backends if needed in the future.
GitLab Rails (instance) should have a specific endpoint that Clients (IDEs) can call ([issue](https://gitlab.com/gitlab-org/gitlab/-/issues/452044 "Create an endpoint for clients to auth and get a user JWT")) to get UJWTs and forward those to the Client. And the client should be able to call this endpoint ([issue](https://gitlab.com/gitlab-org/editor-extensions/gitlab-lsp/-/issues/182 "IDE: Call to the monolith for required information to make requests to Cloud Connector")) and when it has the UJWT, directly call the AI GW (behind the `cloud.gitlab.com` endpoint ([issue](https://gitlab.com/gitlab-org/editor-extensions/gitlab-lsp/-/issues/183 "IDE: Call Cloud Connector directly instead of going through the monolith first")).
### Outstanding Issue tracker
**Notes:** [Daily Status updates posted here.](https://gitlab.com/gitlab-org/ai-powered/daily-updates/-/issues/1 "Monolith Bypass Daily Updates")
<table>
<tr>
<th>Team</th>
<th>Issue</th>
<th>Description</th>
<th>Status</th>
<th>Priority</th>
</tr>
<tr>
<td>Cloud Connector</td>
<td>
https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/461+
https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/430+
</td>
<td>2 issues covering authenticating user level tokens in the AI gateway</td>
<td>Complete</td>
<td>High</td>
</tr>
<tr>
<td>Cloud Connector</td>
<td>
https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/429+
</td>
<td>Create user level token in the AI gateway</td>
<td>Complete</td>
<td>High</td>
</tr>
<tr>
<td>Cloud Connector</td>
<td>
https://gitlab.com/gitlab-org/gitlab/-/issues/452364+
</td>
<td>Update Cloudfare config to create a new cloud.gitlab.com endpoint that passes through to AI gateway for the IDE's to get the user tokens</td>
<td>Complete</td>
<td>High</td>
</tr>
<tr>
<td>Cloud Connector</td>
<td>
https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/455+
</td>
<td>Documentation on the JWK rotation process</td>
<td>Complete</td>
<td>Medium</td>
</tr>
<tr>
<td>Cloud Connector</td>
<td>
https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/449+
</td>
<td>
</td>
<td>Complete</td>
<td>High</td>
</tr>
<tr>
<td>Cloud Connector</td>
<td>
https://gitlab.com/gitlab-org/gitlab/-/issues/456471+
</td>
<td>Update blueprint</td>
<td>Complete</td>
<td>Medium</td>
</tr>
<tr>
<td>Code Creation</td>
<td>
https://gitlab.com/gitlab-org/gitlab/-/issues/456443+
</td>
<td>
</td>
<td>Complete</td>
<td>
</td>
</tr>
<tr>
<td>Code Creation</td>
<td>
https://gitlab.com/gitlab-org/gitlab/-/issues/455607+
</td>
<td>Once the Cloud Connector endpoint is set up (first three items in this table) and working for AI gateway user tokens, update the rails code to start calling it</td>
<td>Complete</td>
<td>High</td>
</tr>
<tr>
<td>Code Creation</td>
<td>
https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/428+
</td>
<td>Ensure we don't lose usage tracking once we remove Rails from the request path</td>
<td>Complete</td>
<td>High</td>
</tr>
<tr>
<td>Editor Extensions</td>
<td>
https://gitlab.com/gitlab-org/editor-extensions/gitlab-lsp/-/issues/182+
</td>
<td>Implement IDE initialisation call that will get a new user token on a regular basis</td>
<td>Done</td>
<td>High</td>
</tr>
<tr>
<td>Editor Extensions</td>
<td>
https://gitlab.com/gitlab-org/editor-extensions/gitlab-lsp/-/issues/183+
</td>
<td>Actually call cloud.gitlab.com/ai instead of Rails from the IDE</td>
<td>Done</td>
<td>High</td>
</tr>
</table>
epic