Maven virtual registry: technical debts
🔥 Problem
Due to the high implementation velocity we want to achieve, several aspects of the maven virtual registry were put to a side.
Here is the list:
-
lib/api/virtual_registries/packages/maven.rbcode organization. See !162614 (comment 2066648290).- In the same context, I opened Maven VReg: extract api specs to their own files (!172865 - merged) to extract each entity's specs into its own file.
- Moved in its own issue Virtual Maven registries: API endpoints code or... (#506835 - closed).
- [-]
expose errors to maven package manager clients. For example, if the file is not found on upstream, we should surface that.- not sure that this is possible though.
- this is not possible.
-
Add feature specs for lib/api/virtual_registries/packages/maven.rb. This allows us to test the🏓 with workhorse and increase our confidence in the accuracy of the implementation. (!173250 (merged)) -
Root group destruction will trigger 🌊 of deletes oncached_responses. Workaround that. See #468113 (comment 2075345374).- Implemented in Maven VReg: use LFK update_column_to (!170064 - merged)
-
Make sure that group statistics are properly updated. When a cache response is created or destroyed. - [-]
The download/upload endpoints accept basic auth and http headers. Http headers leads to a custom configuration of$ mvnwhereas basic auth is more straightforward. See https://docs.gitlab.com/ee/user/packages/maven_repository/#edit-the-client-configuration. Accepting only the basic auth might be worth it.- Won't do. Let's follow what we have for the maven package registry: http and basic auth headers.
-
Update the default cached validity hours to 24? (1 full day) -
Cache validity set to 0properly considered as immutable cache? (no pings to upstream, ever) -
credentials are stored in a dedicated credentialsstructure. The idea is that the class was a common one across different package types. However, during the implementation, we opted for a dedicated upstream table. Thus, maven upstreams have their own table. We don't really need a generic approach anymore. It could be worth to simplify this and simply use two encrypted columns (username/password). (Handled in !171702 (merged))-
⚠️ Keep this callback -
⚠️ Also https://gitlab.com/gitlab-org/security/gitlab/-/merge_requests/4424#note_2090640615.
-
-
Move thecache_validity_periodto the upstream table. See !164076 (comment 2097064510).- Moved in its own issue: Maven VReg: Move the cache_validity_period colu... (#500664 - closed).
- [-]
Make theCachedResponse.find_or_update_by!method trigger aIndex Only Scanquery. See !164076 (comment 2097859293).- Won't do. Not possible: the
group_idis not part of the composite primary key and given that accessing this table with agroup_idis not really part of the usage patterns = we will never add it to the composite primary key.
- Won't do. Not possible: the
-
Strange behavior observed with $ mvn. It seems that the.sha1file is not downloaded on the first time a package is downloaded. Instead, the.sha1is requested when the package is downloaded a second time. This triggers a cache creation when the package is downloaded a second time. If it is pulled a third time, then all good: all cache entries are used and the upstream is not used.- This behavior is now not impacting the performance after the delivery of Maven virtual registry: do not cache digests (#497290 - closed).
-
Implement support for: cached response exists but upstream is not responding or fails -> use the cached response. -
Verify that users can leave a trailing /in upstream URLs. This should not cause any error.- handled by this method
-
In the upload confirmation endpoint, could we get away with 0 interactions with object storage?- Moved in its own issue: Maven Virtual Registry: optimize the upload con... (#499280 - closed).
-
Does the labels of the group statistics, dependency_proxy_size, need an update to include virtual registries? -
Detect that an upstream being created is using the official repository url and automatically set the cache validity period to 0(eg. files are never checked again with upstream). This is leveraging that fact that files on Maven Central are immutable (no overwrites and no deletes). See https://central.sonatype.org/faq/can-i-change-a-component/#can-i-change-modify-delete-remove-or-update-a-component-on-central. (!174548 (merged)) - [-]
Currently, we only have a single upstream. As such, we don't really need to head the upstream first before using thesend_dependencylogic of workhorse. We could let workhorse handle all cases already (if404 Not Foundis returned by the upstream, it will return that to the client too). We save oneHEADrequests.- This will obviously not work with multiple upstreams support but could be seen as an optimization for the "single upstream" case.
- Won't do. The multiple upstream support will quickly follow up once we're done with the single upstream support. Thus, this could be a premature optimization that will be canceled in the next iteration. Effort is not worth it.
-
This line can raise errors (example: content_type missing) due to using CachedResponse.create_or_update_by!. These errors are not catched at all and end up in a500response. That's not great. The service should catch them and return a properServiceResponse.error. (!173284 (merged))
🚒 Solution
For each point above:
- open an MR to fix it.
- link the MR above.
Edited by David Fernandez