Investigate usage of npm metadata cache for groups
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Description
Currently the API npm metadata endpoints at the group and instance levels are the slowest endpoints in the Package registry logs (internal), that are significantly contribute to the Error budget.
In Improve Package Registry metadata generation (&9835 - closed) we've introduced npm metadata cache and started using it for project level API endpoints. This allowed to significantly reduced load by returning cached metadata.
Given the hierarchical nature of groups and projects at GitLab, and comprehensive permissions system, implementing metadata cache at the group (root group) level might be insufficient since the access to the projects is dynamic.
Proposal with the idea
What if instead of finding all packages within a group for the projects user has access to, we find all npm metadata caches for a given package in the projects?
Below are the examples of responses:
Group API metadata response
{
"name": "@gitlab-org/berlin",
"versions": {
"1.0.0": {
"dist": {
"shasum": "318f1cb2dc10849fe098601d53751ad5c50cda58",
"tarball": "http://gdk.test:3000/api/v4/projects/1/packages/npm/@gitlab-org/berlin/-/@gitlab-org/berlin-1.0.0.tgz"
},
"name": "@gitlab-org/berlin",
"version": "1.0.0"
},
"2.0.0": {
"dist": {
"shasum": "e7338fd83986200f3f3be378058a93c3c1a9df33",
"tarball": "http://gdk.test:3000/api/v4/projects/611/packages/npm/@gitlab-org/berlin/-/@gitlab-org/berlin-2.0.0.tgz"
},
"name": "@gitlab-org/berlin",
"version": "2.0.0"
},
"4.0.0": {
"dist": {
"shasum": "4718384b0a52603c0f798fc719c02327f427e89b",
"tarball": "http://gdk.test:3000/api/v4/projects/1/packages/npm/@gitlab-org/berlin/-/@gitlab-org/berlin-4.0.0.tgz"
},
"name": "@gitlab-org/berlin",
"version": "4.0.0"
}
},
"dist-tags": {
"latest": "3.0.0"
}
}
Project 1 API metadata response
{
"name": "@gitlab-org/berlin",
"versions": {
"1.0.0": {
"dist": {
"shasum": "318f1cb2dc10849fe098601d53751ad5c50cda58",
"tarball": "http://gdk.test:3000/api/v4/projects/1/packages/npm/@gitlab-org/berlin/-/@gitlab-org/berlin-1.0.0.tgz"
},
"name": "@gitlab-org/berlin",
"version": "1.0.0"
},
"4.0.0": {
"dist": {
"shasum": "4718384b0a52603c0f798fc719c02327f427e89b",
"tarball": "http://gdk.test:3000/api/v4/projects/1/packages/npm/@gitlab-org/berlin/-/@gitlab-org/berlin-4.0.0.tgz"
},
"name": "@gitlab-org/berlin",
"version": "4.0.0"
}
},
"dist-tags": {
"latest": "4.0.0"
}
}
Project 611 API metadata response
{
"name": "@gitlab-org/berlin",
"versions": {
"2.0.0": {
"dist": {
"shasum": "e7338fd83986200f3f3be378058a93c3c1a9df33",
"tarball": "http://gdk.test:3000/api/v4/projects/611/packages/npm/@gitlab-org/berlin/-/@gitlab-org/berlin-2.0.0.tgz"
},
"name": "@gitlab-org/berlin",
"version": "2.0.0"
}
},
"dist-tags": {
"latest": "2.0.0"
}
}
The group level response is a combination of project level responses. Could we leverage it and instead of generating the metadata from zero every time, combine metadata caches from projects?