Currently the per_page parameter for API pagination defaults to 20 and can be set to a maximum of 100. The spirit of this limit was to help with performance, by restricting very large queries. However, there are many specific cases in which requesting a very large number of items are actually more performant than many small queries.
Proposal
In the Admin settings, add a setting that reflects the per_page setting, with a configurable value
In the Admin settings, add option to allow a per_page flag to be passed as a parameter on individual requests
UX should determine right place in admin settings hierarchy to put this. Potentially under /admin/application_settings/network/#User and IP Rate Limits
it depends on how the OS on server is configured......
It well may not be recipe for timeouts.....
@wojciechlisik Then it will be a recipe for a huge memory consumption since all the Ruby objects will need to be created before the response is sent back. :)
Make per_page default and maximum value configurable
@athar I'd be ok with that but I'd also like to have a rationale on why this is needed: if you want more than 100 results, you can just issue multiple requests?
Keeping pagination on by default is a reasonable way to prevent people from unknowingly causing themselves problems. However it also seems reasonable to at least a knowledgeable user pass a parameter to turn off pagination for a specific request.
the arbitrary limit of 100 records on our self-hosted enterprise edition of gitlab adds unnecessary complexity to the automation of tasks we intend to execute across groups with 100+ projects via the API using CLI tools and shell scripting.
i understand that a change like this could conflate expectation and accountability for performance but, ceteris paribus, available server resources/performance are our problem and not gitlab's.
allowing the user to decide the limits (or to even turn pagination off) is greatly preferable to writing any number of workarounds that then need to be maintained/remembered when upgrading to a new version and the workarounds stop working.
What workarounds have you had to use? 2.5 years ago there was an issue where you asked for a page and Gitlab returned everything (causing infinite loops on improperly error-checked code), but since then it's been rock-solid for us (from both Python and Rust). We made our Gitlab API callers just do the depagination for us so that when something asked for all projects, they got all projects rather than each API user having do depaginate manually.
This issue has been around for a year, and I'm surprised it hasn't gotten more traction. My company actually had to roll back the version we are using because the upgrade broke most of our interactions with GitLab's API.
As has been mentioned in this thread already: GitLab is source control, not the guardian of my system's resources.
Here's an example. I do not feel it is unreasonable to want to query a list of a project's files in its entirety. A "recursive" parameter even exists to help this scenario. Previously, a project with 10,000 files would return in a single sub-second request. Now it takes a minimum of 100 requests, which could take many seconds (even when run in parallel). When compared to this forced alternative, arguments relating to GitLab's performance carry less weight.
Timeouts can be configured. The only workaround for this seems to be an excessive number of API requests, which is an objectively terrible solution to a problem that doesn't seem necessary to begin with.
Thanks for the ping. Let's see if there is more widespread interest in this change. One of the requests was around improving the performance of an integration with GitLab by making fewer larger requests, but that use case wouldn't benefit most users unless we lifted the limit on GitLab.com.
What parts of this issue are important to you and why? – We would like to be able to configure the per_page default and maximum values in the admin panel and to also be able to go past the current 100 per_page limit to a value of our need.
Have you tried any workarounds? – There are no workarounds for our use case. Having more items per_page would reduce the total number of API calls on some expensive endpoints.
What is the priority of this issue to your organization? – Increasingly useful and needed.
We don't want the API to randomly timeout or reduce the performance of GitLab for other users. Allowing very large or unbounded response sizes is a recipe for all sorts of failures that the client would need to handle. Pagination unlike slow responses and poor performance is predictable and a common restriction of many other APIs.
I don't care @jramsay, let me assume the risk by adding a knob to disable it. By all means, keep the current settings, but allow me to live dangerously when I want. This "safety" feature effectively turns a quick curl into a script.
This is for my 3000 seat instance (that is edging very close on being a 0 seat license).
@jramsay Since most of us run our own installation/setup in our own infrastructure i would say that it is our risk to take when increasing the per_page item limits. The proposed change is to add a setting in the admin panel for each customer to tailor the API calls as needed, this way GitLab.com would not be impacted or other customers if they keep the default 100 per page; but give the customers that need more items per page the ability to do so at their own risk. As you said, there are ways to mitigate slow responses or overall worker queues, timeouts, etc.
We're discussing this now, and someone pointed out that if we had an SDK to make it easy to handle pagination, maybe it would make this less of an issue.
Not sure what you have in mind with the SDK @markpundsack, but unless I can craft a quick curl that can disable pagination, it's still broken in my mind.
Not that it matters to me any more. We've decided to move to a different platform.
They would like to see a "max api pagination size" configurable in the Admin panel. Let end users use whatever pagination size they want, but allow the GitLab admins to set an upper limit to prevent folks from getting out of hand.