Paginate Wiks APIs
Problems
In #357651 (closed), we discover that Wiki APIs don't support pagination. Pagination is an essential part of GitLab API, as specified in the documentation. If the length of the collection is fixed, it's understandable to return the full list of resources. This is not the case for Wikis.
GitLab supports two Wikis APIs:
Issuing some quick commands, we can see the problem:
❯ curl --header "PRIVATE-TOKEN: X1iiA_1nQofE-28Jjvz5" "http://localhost:3000/api/v4/projects/6/wikis?with_content=1&page=1" | jq '. | length'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 47049 0 47049 0 0 45812 0 --:--:-- 0:00:01 --:--:-- 46081
376
❯ curl --header "PRIVATE-TOKEN: X1iiA_1nQofE-28Jjvz5" "http://localhost:3000/api/v4/projects/6/wikis?with_content=1&page=1" | jq '.[0]'
{
"format": "markdown",
"slug": "Example-page-1",
"title": "Example page 1",
"content": "Content of page 1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"encoding": "UTF-8"
}
The API may return the full content of a page. Wiki page content is persisted as a full-flex Git repository. When a page is created/updated, the corresponding file in the git repository is touched. So, the content of a file may be significant. Combing with all the list of returned pages, this could open a DDoS attack surface.
Solution
Currently, the list of pages is fetched from Gitaly via Wiki rpc. Unfortunately, that RPC doesn't support pagination. We are working on that at #357651 (closed). Therefore, in short-term, the coding solution is simple, we just need to add Kamirari to the returned pages (source).
In longer term, when the above issue is done, we simply pass limit
and offset
to the underlying page fetcher method.
Customer impact
Although those APIs don't support pagination, the logs show interesting information. A reasonable portion (need quote) of API requests attach page
or per_page
.
At the first glance, this behavior is the result of two things:
- Bot user dominates the overall wiki API traffic
- Clients set page and per_page parameters at global level.
Therefore, if we introduce pagination, I think most clients are not affected by that chance.