Skip to content

Helm Metadata cache

🔥 Problem

The helm package registry is currently generating the metadata endpoint response on the fly. This involves getting records from the database to form the json response.

The problem is that the amount of records can be very large and thus bring a performance impact.

To counter this situation, we implemented a hard limit on the amount of records considered.

From https://docs.gitlab.com/user/packages/helm_repository/#install-a-package, we consider only the 1000 most recent packages when building the response for the metadata endpoint.

This limit helps but is also not a definitive fix. Users can still have a high number of packages and request a package that is outside of this 1000 most recent packages window. When that happens, the GitLab helm package registry will return a 404 response.

🚒 Solution

We should apply what has been done for NPM in &9835 (closed).

Basically, the metadata response should not be generated on the fly but precomputed and stored on object storage.

This pre computation can happen in the helm background jobs. There, we have more lenient execution time limits which means that, in a background job, we are able to use a much higher limit or even not have any kind of limit.

Also, this change can be seen as a slight performance improvement: the metadata response can be pre computed once for all instead of generating it for each metadata request. The tradeoff is the object storage usage that the result will use but that tradeoff is very reasonable compared to the benefits.

Update the message on https://docs.gitlab.com/user/packages/helm_repository/#install-a-package accordingly.

This change should be hided behind a feature flag helm_metadata_cache

Edited by Sylvia Shen