[Proposal] Use API keys to control GraphQL API access
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Context
The discussion originated while we worked on this issue: https://gitlab.com/gitlab-org/gitlab/-/issues/405002
Problem
Currently, we are very generous with our GraphQL API access, as we allow even non-authenticated users (and apps/projects) to access our GraphQL API (and, in turn, query our Database). Moreover, the query complexity limitation we impose on non-auth users is not strict, it is very similar to authenticated users: https://gitlab.com/gitlab-org/gitlab/-/issues/405002#note_1358677795
That provides plenty of DDoS capabilities and opens GitLab.com up to uncontrolled workloads.
In the past, we observed a significant load coming from projects/apps such as https://www.gitkraken.com/
Also currently, we label our anonymous GraphQL API access by json.meta.client_id (user agent) + operation_name, and the user agent is easy to spoof.
Proposal
We want better segregation between our GraphQL API consumers.
To do that, we shall consider API keys to control our GraphQL API access.
We want to rate-limit and complexity-limit API usage per client, not per user, especially for the traffic that is currently anonymous.
This will separate the User from the API Consumer (Client) abstractions.
It will improve the monitoring and will allow us to explore improved complexity and rate-limiting models.
Additionally, I have a strong opinion that we should disallow anonymous API access entirely or significantly limit it and improve its monitoring to avoid potential availability issues.
For example, we could provide keys with an increased limitation threshold for our partners when it's safe.
On the contrary, we could revoke or temporarily suspend the key easily and even automatically in case of abuse.
It allows flexible, dynamic, and data-driven API usage control.
Notes
We may want to keep GraphiQL (query explorer) open for non-auth access just for playground purposes. But even there, we could apply very strict limits and invite the user to obtain an API key when too much data was requested or the API is under load. From an availability perspective, I would disallow anonymous GraphQL API access even for GraphiQL query explorer.
Of course, we don't want to restrict or rate limit the usage coming from our own FrontEnd. That is necessary for GitLab to function correctly. In that case, we could set no limits, but to monitor and alert on various patterns that could hint at potential performance issues.
More on the topic: https://cloud.google.com/endpoints/docs/openapi/when-why-api-key#when_to_use_api_keys