Corpus Management - API Contract
Requests
Avoiding (n+1) fetches from client side.
Some of the data we need like individual corpus (package) file sizes are available when we fetch one package at a time. It uses a different endpoint than getting a list of corpuses (packages) This is not sufficient, we need this data in the endpoint that returns the array of corpuses.
Fetch Corpuses
Data Needed:
- Corpus name
- Last job to use corpus
- Aggregate Corpus Size
- Corpus file size per corpus
- The target branch for the last pipeline that ran with the corpus
- Corpus last used timestamp
- Corpus last updated timestamp
- Way to distinguish package as a corpus
Existing Package Registry Endpoints:
REST API: https://docs.gitlab.com/ee/api/packages.html#list-packages
Proposed additions:
[
{
"id": 3,
"name": "Test Corpus",
"version": "0.1",
"package_type": "internal",
"_links": {
"web_path": "/foo/bar/-/packages/3",
"delete_api_path": "https://gitlab.example.com/api/v4/projects/1/packages/3"
"last_job_path" : "https://gitlab.com/gitlab-org/gitlab/-/jobs/1079624703"
"target": "294444-apollo-management-table"
},
"last_updated_at": "2029-12-16T20:33:34.316Z",
"created_at": "2029-12-16T20:33:34.316Z",
"last_used": "2029-12-16T20:33:34.316Z",
"tags": []
"size": "123"
}
]
- If we add to the existing API I'm not sure how we will add the aggregate file field since it returns an array of packages/corpuses. I suggest add it in the response headers since that's where we add the pagination info.
x-total-file-size: 421
graphQL: https://docs.gitlab.com/ee/api/graphql/reference/index.html#package
Proposed graphQL Query:
query getCorpuses($projectPath: ID!, $first: Int, $last: Int, $before: String, $after: String) {
project(fullPath: $projectPath) {
packages(first: $first, last: $last, before: $before, after: $after) {
nodes {
id
name
lastUpdatedAt
createdAt
lastUsed
links
fileSize
downloadPath
lastJobPath
target
}
aggregateSize {
total
}
pageInfo {
...PageInfo
}
}
}
}
Download Corpus
Data Needed:
- Corpus download path
I believe it is available as {_links:{web_path:'/url'}} within the REST endpoint.
Existing Package Registry Endpoints:
REST API: https://docs.gitlab.com/ee/api/packages.html#within-a-project Existing API:
[
{
"id": 3,
"name": "Test Corpus",
"version": "0.1",
"package_type": "internal",
"_links": {
"web_path": "/foo/bar/-/packages/3",
"delete_api_path": "https://gitlab.example.com/api/v4/projects/1/packages/3"
},
"last_updated_at": "2029-12-16T20:33:34.316Z",
"created_at": "2029-12-16T20:33:34.316Z",
"last_used": "2029-12-16T20:33:34.316Z",
"tags": []
"size": "123"
}
]
graphQL: https://docs.gitlab.com/ee/api/graphql/reference/index.html#package
Proposed graphQL:
query getCorpuses($projectPath: ID!, $first: Int, $last: Int, $before: String, $after: String) {
project(fullPath: $projectPath) {
packages(first: $first, last: $last, before: $before, after: $after) {
nodes {
id
name
lastUpdatedAt
createdAt
lastUsed
links
fileSize
}
aggregateSize {
total
}
pageInfo {
...PageInfo
}
}
}
}
Upload Corpuses
Data Needed:
- Need a way to flag the "package" as a corpus
- Need a 10 gb file upload limit validation error
Existing Package Registry Endpoints:
REST: https://docs.gitlab.com/ee/user/packages/generic_packages/#publish-a-package-file
PUT /projects/:id/packages/generic/:package_name/:package_version/:file_name
Attribute | Type | Required | Description |
---|---|---|---|
id | integer/string | yes | The ID or URL-encoded path of the project. |
package_name | string | yes | The package name. It can contain only lowercase letters (a-z), uppercase letter (A-Z), numbers (0-9), dots (.), hyphens (-), or underscores (_). |
package_version | string | yes | The package version. It can contain only numbers (0-9), and dots (.). Must be in the format of X.Y.Z, i.e. should match /\A\d+.\d+.\d+\z/ regular expression. |
file_name | string | yes | The filename. It can contain only lowercase letters (a-z), uppercase letter (A-Z), numbers (0-9), dots (.), hyphens (-), or underscores (_). |
status | string | no | The package status. It can be default (default) or hidden. Hidden packages do not appear in the UI or package API list endpoints. |
corpus | integer | yes | Indicate that the upload is a corpus |
graphQL: No upload mutation exists
Proposal Upload mutation:
mutation uploadPackage($files: [Upload!]!, $projectPath: ID!) {
packageUpload(input: { projectPath: $projectPath, files: $files }) {
errors
}
}
Delete Corpus
Data Needed:
- graphQL mutation to delete needs to be added, OR
Existing Package Registry Endpoints:
REST: https://docs.gitlab.com/ee/api/packages.html#delete-a-project-package
DELETE /projects/:id/packages/:package_id
Attribute | Type | Required | Description |
---|---|---|---|
id | integer/string | yes | ID or URL-encoded path of the project |
package_id | integer | yes | ID of a package. |
Can return the following status codes:
204 No Content, if the package was deleted successfully. 404 Not Found, if the package was not found.
graphQL: No delete mutation exists
Proposal:
mutation deletePackage($id: ProjectID!) {
packageDestroy(input: { id: $id }) {
errors
}
}
Nothing new needed if we use the existing REST endpoint.
Corpus Configuration Status
Data Needed:
- The ee/app/presenters/projects/security/configuration_presenter.rb needs to be updated to include an entry for
Corpus Management
and any other logic to determine if it'senabled
Presenter is passed to javascript via the HAML template and a HTLM data attribute