API method to list Projects and relevant metadata details based on selection criteria
Problem to solve
As a GitLab admin I would like to be able to programmatically acquire a selection of projects base on a number of different criteria, such as:
- Projects with the largest repositories.
- (This is the most important selection aspect. The rest of these would be very nice to have, but would be more difficult to implement, and would probably require additional database tables, caching, and asynchronous workers doing some metadata analysis or querying.)
- Projects with the highest commit activity over a given period of time.
- Projects with the most amount of operational activity of any kind.
- Projects with the least amount of operational activity of any kind.
Intended users
A Sidney or an administrator of a GitLab installation.
Further details
- I often find a need to identify which projects have the largest repositories, because I may be running out of space on a storage shard. This means that I may wish to "migrate" the project repository to another chard, or really, clone it and delete the original.
- I would very much like to determine which projects have no activity, so that I archive them, put them in cold-storage, or otherwise cull them from the shard fs to make room for other projectories.
- It would also be nice to simply have access to a list of the most popular or active projects in the inventory of my installation.
Proposal
As an admin, I've already tried to go about solving this problem using a custom script which executes within the context of a gitlab runner. This is a brittle, non-sustainable approach, however, since there is no contract for the internal Rails ActiveRecord model classes.
So instead, I'd prefer to be able to invoke an HTTP API method which responds with a json format list of project details (especially including the statistics.repository_sizefield which is currently not included in the existing
GET /projects/` api) according to an order and set of conditions that are either provided using some parameter arguments, or else are pre-built into specific api route resources/actions, such as:
GET /projects/largest[?limit=N]
GET /projects/most_commits[?since=date&limit=N]
GET /projects/busiest[?since=date&limit=N]
GET /projects/dormant[?since=date&limit=N]
Where date
defaults to say, a month.
Permissions and Security
The permission level here should be admin, and a Private-Token
header should be required, the value of which should be a token set with the lowest level appropriate for read-only access to metadata information for all projectories in the GitLab managed inventory.
Documentation
Are the ~devops::create
and ~group::source code
labels sufficient here?
Availability & Testing
- What risks does this change pose to our availability?
- There should be no risks to availability. If gathering such information is processing intensive in terms of long-running queries and so forth, then perhaps a cache/datastore with some asynchronous job workers would be recommended.
- How might it affect the quality of the product?
- If some of the queries for metadata/aggregation/calculation are long-running, then this could potentially create some resource contention for serving other requests.
- What additional test coverage or changes to tests will be needed?
- Unit tests for the backend controller and service/worker implementations.
- Functional API tests to confirm output is expected compared to database/disk information.
- Will it require cross-browser testing?
- No.
Test areas (unit, integration and end-to-end) that need to be added or updated to ensure that this feature will work as intended:
- Unit test changes
- End-to-end test change
What does success look like, and how can we measure that?
Success looks like a speedy, sustainable, contracted, API method which provides the desired metadata information about the projects which are stored and managed by a GitLab installation.
What is the type of buyer?
Not sure.
Is this a cross-stage feature?
I don't think so.