Let the client (IDE) request Code Suggestions (completions only) directly form the AI Gateway (Skip the monolith)
We currently have the client (in this case the IDE) request Code Suggestions by talking to the monolith (either .com or the user's specific instance), which in turn forwards this request to the AI Gateway. With efforts to [decrease latency](https://gitlab.com/groups/gitlab-org/-/epics/12224 "Improved Code Suggestions Architecture to Decrease Latency") we want to skip the step of talking to the monolith and let the Client talk directly to Cloud Connector (which will forward the request to AI Gateway). Initially this path would only be used for Code Completion requests, since they are more latency sensitive than Code Generation and require less context from Rails. The flow is visualised in the following diagram: <details> <summary>Click to expand diagram</summary> Click 'Display' at the bottom of the diagram text ```mermaid sequenceDiagram autonumber actor C as Client (IDE) participant G as GitLab participant CD as CustomersDot box rgb(245, 245, 245) cloud.gitlab.com participant AI as AI Gateway end rect rgba(179, 240, 255, .2) alt SM instance? Note over G,CD: Daily refresh instance JWT (IJWT) token sync job loop Sync instance service token (daily) activate G G->>CD: POST `graphql/cloudConnectorAccess`<br>with `licenseKey` activate CD CD->>CD: Verify cloud license, and if CS add-on is purchased CD-->>G: return IJWT with correct scopes, expires in 3 days deactivate CD G->>G: store the IJWT token to the database deactivate G end end end rect rgba(179, 240, 255, .2) Note over C,AI: End user authentication activate C C->>G: POST GitLab (instance)<br>with "/api/v4/code_suggestions/direct_access" activate G G->>G: authenticate with PAT G->>G: verify user assigned to seat alt SM instance? G->>AI: POST cloud.gitlab.com/cloud_connector_auth<br>with IJWT activate AI alt no cache for public keys? AI->>CD: GET .well-known/openid-configuration activate CD CD->>AI: JWK URI AI->>CD: GET oauth/discovery/keys CD->>AI: public keys deactivate CD end AI->>AI: Decode IJWT token<br>with public keys AI->>AI: issue 'user_jwt' AI->>G: return 'user_jwt' deactivate AI else SaaS? G->>G: issue 'user_jwt', expires: 1h end G->>C: return `user_jwt` deactivate G C->>C: store 'user_jwt' deactivate C end rect rgba(179, 240, 255, .2) Note over C, AI: Code suggestions requests loop Each code suggestions request activate C alt token is about to expire? C->>G: POST GitLab (instance)<br>with "/api/v4/code_suggestions/direct_access" alt SM instance? activate G G->>AI: POST cloud.gitlab.com/cloud_connector_auth<br>with IJWT activate AI alt no cache for public keys? AI->>CD: GET .well-known/openid-configuration activate CD CD->>AI: JWK URI AI->>CD: GET oauth/discovery/keys CD->>AI: public keys deactivate CD end AI->>AI: Decode IJWT token<br>with public keys AI->>AI: issue 'user_jwt', expires: 1h AI->>G: return 'user_jwt', expires: 1h deactivate AI else SaaS? G->>G: issue 'user_jwt', expires: 1h end G->>C: return `user_jwt`, expires: 1h deactivate G deactivate C end critical request completions with 'user jwt' activate C C->>AI: POST cloud.gitlab.com/ai/completions<br>with `user_jwt` activate AI alt no cache for public keys? alt SM instance? AI->>AI: public keys else SaaS? AI->>G: GET .well-known/openid-configuration activate G G->>AI: JWK URI AI->>G: GET oauth/discovery/keys G->>AI: public keys deactivate G end end AI->>AI: decode 'user_jwt' using <br>CloudConnector or Gitlab public keys AI->>AI: POST <br>codesuggestions.gitlab.com/completions activate AI deactivate AI AI->>C: Response deactivate AI end deactivate C end end ``` In this sequence diagram you can see AI Gateway under `cloud.gitlab.com`. All calls to AI GW will be made to `Cloud Connector`, i.e. `cloud.gitlab.com` endpoint, which will then forward it to the AI Gateway. Since Cloud Connector is a not a material service or backend, it does not explicitly show up in this diagram. </details> ### Required changes To realize this change, we need to make some changes. AI Gateway has to become a token authority ([issue](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/429 "Support creation of short-lived user JWTs for Client <-> AIGW connection")) so that it can create UJWTs. This change will be similar to what we implemented in CustomersDot for UJWTs. The major difference here is that AI GW will validate these tokens as well ([issue](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/430 "Support authentication of self-served short-lived user JWTs")), so it doesn't need to get a JWKS from another backend. We want to abstract this behind a `cloud.gitlab.com` endpoint ([issue](https://gitlab.com/gitlab-org/gitlab/-/issues/452364 "Cloudflare: Create an endpoint to request short-lived user JWTs that is forwarded to AIGW")), so that SM customers do not have to add every new backend to their proxy allowlist and we can easily interchange it with other backends if needed in the future. GitLab Rails (instance) should have a specific endpoint that Clients (IDEs) can call ([issue](https://gitlab.com/gitlab-org/gitlab/-/issues/452044 "Create an endpoint for clients to auth and get a user JWT")) to get UJWTs and forward those to the Client. And the client should be able to call this endpoint ([issue](https://gitlab.com/gitlab-org/editor-extensions/gitlab-lsp/-/issues/182 "IDE: Call to the monolith for required information to make requests to Cloud Connector")) and when it has the UJWT, directly call the AI GW (behind the `cloud.gitlab.com` endpoint ([issue](https://gitlab.com/gitlab-org/editor-extensions/gitlab-lsp/-/issues/183 "IDE: Call Cloud Connector directly instead of going through the monolith first")). ### Outstanding Issue tracker **Notes:** [Daily Status updates posted here.](https://gitlab.com/gitlab-org/ai-powered/daily-updates/-/issues/1 "Monolith Bypass Daily Updates") <table> <tr> <th>Team</th> <th>Issue</th> <th>Description</th> <th>Status</th> <th>Priority</th> </tr> <tr> <td>Cloud Connector</td> <td> https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/461+ https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/430+ </td> <td>2 issues covering authenticating user level tokens in the AI gateway</td> <td>Complete</td> <td>High</td> </tr> <tr> <td>Cloud Connector</td> <td> https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/429+ </td> <td>Create user level token in the AI gateway</td> <td>Complete</td> <td>High</td> </tr> <tr> <td>Cloud Connector</td> <td> https://gitlab.com/gitlab-org/gitlab/-/issues/452364+ </td> <td>Update Cloudfare config to create a new cloud.gitlab.com endpoint that passes through to AI gateway for the IDE's to get the user tokens</td> <td>Complete</td> <td>High</td> </tr> <tr> <td>Cloud Connector</td> <td> https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/455+ </td> <td>Documentation on the JWK rotation process</td> <td>Complete</td> <td>Medium</td> </tr> <tr> <td>Cloud Connector</td> <td> https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/449+ </td> <td> </td> <td>Complete</td> <td>High</td> </tr> <tr> <td>Cloud Connector</td> <td> https://gitlab.com/gitlab-org/gitlab/-/issues/456471+ </td> <td>Update blueprint</td> <td>Complete</td> <td>Medium</td> </tr> <tr> <td>Code Creation</td> <td> https://gitlab.com/gitlab-org/gitlab/-/issues/456443+ </td> <td> </td> <td>Complete</td> <td> </td> </tr> <tr> <td>Code Creation</td> <td> https://gitlab.com/gitlab-org/gitlab/-/issues/455607+ </td> <td>Once the Cloud Connector endpoint is set up (first three items in this table) and working for AI gateway user tokens, update the rails code to start calling it</td> <td>Complete</td> <td>High</td> </tr> <tr> <td>Code Creation</td> <td> https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/428+ </td> <td>Ensure we don't lose usage tracking once we remove Rails from the request path</td> <td>Complete</td> <td>High</td> </tr> <tr> <td>Editor Extensions</td> <td> https://gitlab.com/gitlab-org/editor-extensions/gitlab-lsp/-/issues/182+ </td> <td>Implement IDE initialisation call that will get a new user token on a regular basis</td> <td>Done</td> <td>High</td> </tr> <tr> <td>Editor Extensions</td> <td> https://gitlab.com/gitlab-org/editor-extensions/gitlab-lsp/-/issues/183+ </td> <td>Actually call cloud.gitlab.com/ai instead of Rails from the IDE</td> <td>Done</td> <td>High</td> </tr> </table>
epic