Currently the per_page parameter for API pagination defaults to 20 and can be set to a maximum of 100. The spirit of this limit was to help with performance, by restricting very large queries. However, there are many specific cases in which requesting a very large number of items are actually more performant than many small queries.
Proposal
In the Admin settings, add a setting that reflects the per_page setting, with a configurable value
In the Admin settings, add option to allow a per_page flag to be passed as a parameter on individual requests
UX should determine right place in admin settings hierarchy to put this. Potentially under /admin/application_settings/network/#User and IP Rate Limits
it depends on how the OS on server is configured......
It well may not be recipe for timeouts.....
@wojciechlisik Then it will be a recipe for a huge memory consumption since all the Ruby objects will need to be created before the response is sent back. :)
Make per_page default and maximum value configurable
@athar I'd be ok with that but I'd also like to have a rationale on why this is needed: if you want more than 100 results, you can just issue multiple requests?
Keeping pagination on by default is a reasonable way to prevent people from unknowingly causing themselves problems. However it also seems reasonable to at least a knowledgeable user pass a parameter to turn off pagination for a specific request.
the arbitrary limit of 100 records on our self-hosted enterprise edition of gitlab adds unnecessary complexity to the automation of tasks we intend to execute across groups with 100+ projects via the API using CLI tools and shell scripting.
i understand that a change like this could conflate expectation and accountability for performance but, ceteris paribus, available server resources/performance are our problem and not gitlab's.
allowing the user to decide the limits (or to even turn pagination off) is greatly preferable to writing any number of workarounds that then need to be maintained/remembered when upgrading to a new version and the workarounds stop working.
What workarounds have you had to use? 2.5 years ago there was an issue where you asked for a page and Gitlab returned everything (causing infinite loops on improperly error-checked code), but since then it's been rock-solid for us (from both Python and Rust). We made our Gitlab API callers just do the depagination for us so that when something asked for all projects, they got all projects rather than each API user having do depaginate manually.
This issue has been around for a year, and I'm surprised it hasn't gotten more traction. My company actually had to roll back the version we are using because the upgrade broke most of our interactions with GitLab's API.
As has been mentioned in this thread already: GitLab is source control, not the guardian of my system's resources.
Here's an example. I do not feel it is unreasonable to want to query a list of a project's files in its entirety. A "recursive" parameter even exists to help this scenario. Previously, a project with 10,000 files would return in a single sub-second request. Now it takes a minimum of 100 requests, which could take many seconds (even when run in parallel). When compared to this forced alternative, arguments relating to GitLab's performance carry less weight.
Timeouts can be configured. The only workaround for this seems to be an excessive number of API requests, which is an objectively terrible solution to a problem that doesn't seem necessary to begin with.
Thanks for the ping. Let's see if there is more widespread interest in this change. One of the requests was around improving the performance of an integration with GitLab by making fewer larger requests, but that use case wouldn't benefit most users unless we lifted the limit on GitLab.com.
What parts of this issue are important to you and why? – We would like to be able to configure the per_page default and maximum values in the admin panel and to also be able to go past the current 100 per_page limit to a value of our need.
Have you tried any workarounds? – There are no workarounds for our use case. Having more items per_page would reduce the total number of API calls on some expensive endpoints.
What is the priority of this issue to your organization? – Increasingly useful and needed.
We don't want the API to randomly timeout or reduce the performance of GitLab for other users. Allowing very large or unbounded response sizes is a recipe for all sorts of failures that the client would need to handle. Pagination unlike slow responses and poor performance is predictable and a common restriction of many other APIs.
I don't care @jramsay, let me assume the risk by adding a knob to disable it. By all means, keep the current settings, but allow me to live dangerously when I want. This "safety" feature effectively turns a quick curl into a script.
This is for my 3000 seat instance (that is edging very close on being a 0 seat license).
@jramsay Since most of us run our own installation/setup in our own infrastructure i would say that it is our risk to take when increasing the per_page item limits. The proposed change is to add a setting in the admin panel for each customer to tailor the API calls as needed, this way GitLab.com would not be impacted or other customers if they keep the default 100 per page; but give the customers that need more items per page the ability to do so at their own risk. As you said, there are ways to mitigate slow responses or overall worker queues, timeouts, etc.
We're discussing this now, and someone pointed out that if we had an SDK to make it easy to handle pagination, maybe it would make this less of an issue.
Not sure what you have in mind with the SDK @markpundsack, but unless I can craft a quick curl that can disable pagination, it's still broken in my mind.
Not that it matters to me any more. We've decided to move to a different platform.
They would like to see a "max api pagination size" configurable in the Admin panel. Let end users use whatever pagination size they want, but allow the GitLab admins to set an upper limit to prevent folks from getting out of hand.
@nhxnguyen I'm dropping this in to %"Next 3-4 releases" , but I'd like to spend some time weighting this out. I could see it being either relatively straightforward or alternately a huge lift. Can we sort out (broad strokes) which it is?
Additionally, I know we've been doing some other limit-configuration work in #34634 (closed) , are there any ties in to those efforts that we can leverage?
Adding something like a api_per_page_limit field to application settings and the UI to update it in the Admin panel seems straightforward. We use Kaminari for pagination and should be able to use max_paginates_per to override using the application setting if we are calling paginate from API::Helpers::Pagination. What do you think about this @abrandl? I see you recently extracted pagination code from API::Helpers::Pagination.
Global defaults are configured here. We would have to figure out how to set different defaults for the API. But I agree, the rest should be straightforward.
Can I have this for the normal GUI, please? I'm tired of clicking on the page number buttons after having scrolled through a short page of just 20 items. The page could load just as fast with 50 items and I could just keep on scrolling through without the need to position the mouse, click and wait so often. It's a little annoying.
I'm surprised this hasn't been mentioned in the 3 years this issue has existed, but pagination causes race issues.
If I list the files of a project with 10.000 files, it requires 100 requests (with the limit), but if someone else removes one of the files that have already been returned to me after I have made some of those requests, regenerating the list of files (which is basically what the server has to do, when there's no state - which seems equivalent to pagination) causes the files to shift on the pages so one file will move from a page I haven't fetched to a page I have, but it wasn't on that page when I did, so that file is never returned to me, causing the result I can assemble to be wrong (at any point in time).
Keyset pagination would alleviate the risk of races as described, but it still means we have to make many requests and do some client-side merging, so I still favour disabling pagination altogether.
I don't know that that is advisable for many endpoints due to the database and server load this could cause (user lists, project lists, commit listings, etc.). I personally think disabling pagination client-side is too big of a foot-gun for such things. An option for admins to turn it off would be fine, but everyone will have to support pagination anyways (unless they know they're talking to a given instance with support for a single fetch).
@ben.boeckel I've personally not seen a load issue, even with thousands of users and projects. I'm sure it's possible, but probably unlikely unless someone is running an overloaded all-in-one system. Be that as it may, this should still be an option IMO. I don't need Gitlab to protect me from a boogeyman. Limit it to admin users being able to disable pagination, at least.
Although to be fair, this doesn't really affect me any more as I've moved on from Gitlab.
Nobody thinks about lazy loading a.k.a. endless scrolling? Even with classic pagination, what's the use of having 100 pages? If you get that as a user, you know "that was too broad", but you certainly won't go through all 100 pages. So with or without pagination, too many results are unusable. If performance is an issue then, a result limit should be used.
Oops, sorry, forgot about the API. But what, pagination in an API? That sounds weird. Wouldn't you use configurable limits for that? Or is the API designed to be used primarily as a UI backend?
That's the gist of this request; to be able to disable pagination. You can set records per page, but that number is capped at (I think) 100. Probably it's there to make the UI cleaner/faster, but at the cost of complexity when writing quick scripts.
@hcgrove dont quite get it. Why are you so against configurable size of result page? May be done in many ways: with lazy load (non-api way), or with setting some param that will be added to API request.
@devorgint Although not directed at me, I'm ok with whatever...so long as it can be disabled on a per-call basis. The reasons have been made clear throughout this thread.
I'm not against configurable sizes, that's what I tried to clarify by emphasising possibility. The only thing I'm against is a limit to the configurability (configuring the limit to give a page of 1.000.000.000 results is almost always equivalent to disabling pagination), and I find &no_pagination a more natural way to express what I want than &per_page=1000000000.
(I'm also somewhat against people using features they don't understand the limitations of, like most uses of pagination.)
@troyengel Thanks so much for your valuable feedback. I agree this could be better documented, and we received the same feedback in the past, so feel free to follow/comment on this dedicated issue: #22976.
The word "all" has connotations which indicate that by accessing that API URL, I will receive a JSON of all snippets, not an invisible, not documented here pagination limit. The per_page stuff is also 4/5ths down this doc page, not up top where it should be telling me I'm only getting 20 results.
Pagination is a common practice for APIs (e.g. what if you had 1 million snippets?) but I agree this can be better documented (#22976).
Hey @.luke depending on the weight of this feature, I would argue that due to requests by very large customers, this item should still have some priority. I only removed the milestone because it was %"Next 3-4 releases" for roughly a year. Not sure you want to reconsider @deuley?
@luke@mnohr This really has been languishing for a long time, and I've talked to a number of customers who are affected by it.
Can we get a proper weight on this and get it queued up for a future milestone? I know this is a major issue and we don't tend to put a ton of time in to Category:API needs -- it's well worth doing.
The max_per_page is set in an initializer, but it looks like we can dynamically update the configuration without the app needing to be restarted by calling .configure again on it. E.g.:
The frontend might use our official APIs without specifying a specific pagination size, implictly relying on a page size of 20. If the API would default to larger sizes, certain pages could start looking weird, load longer or even break.
Do we have a complete list of places where we are relying on a hard coded upper limit?
@wortschi No, and I think the list would be growing over time. If we count our GraphQL API too, we set a max_page_size: for some GraphQL connections. I get 12 results of hard-coded limits in our GraphQL API, ranging from 2000 to 20 (the default is 100).
There's a variety of reasons why we set these values in GraphQL, and not just for performance. For example, the 2000 limit was set for UI reasons !27467 (merged). As @leipert mentioned #17329 (comment 549722900) overriding all our paging could make the GitLab UI buggy, or not, it's a bit unpredictable and depends on the limit that was set. But, for example, if a customer set a new API page limit of 250 for their instance, and we applied it to the hard-coded limits set in !27467 (merged) it sounds like the UI would fail to display boards when there were >250 of them.
It makes it tricky and unpredictable what possible bugs might happen when overriding these values.
Perhaps we either:
Ignore the places where the max page size is hard-coded, so the limit would only apply when the API is using the defaults.
Or, apply logic like: Allow the setting to be null which means always use GitLab's limits. When set: Where we hard-code a page limit, take the max of either the setting or our override. (So, if an admin to be 500, it would override the max_page_size: 20 but not the max_page_size: 2000) . Generally, instances would want to keep it null in this case.
We could perhaps have separate REST and GraphQL limits?
Is the weight for this issue still accurate?
At a guess I think:
If we want to tackle all the hard-coded page limits in our API, the weight is probably about a 4-5.
If we just apply it to the defaults and not to the hard-coded page limits, it's probably about a 3.