Investigate: How to add support for multiple upstreams

Investigation: Multiple Upstream Support for Maven Virtual Registry

Background

Currently, the Maven virtual registry MVC supports a single upstream repository. This investigation explores the implementation of multiple upstream support, considering performance, user experience, and potential feature refinements.

Investigation Areas

Performance Considerations

Evaluate different strategies for checking artifact existence across multiple upstreams:
- Sequential vs parallel requests
- Potential for request timeouts and failure handling
- Caching strategies for upstream availability and artifact existence
- Impact on memory usage and request latency
- Connection pooling strategies for multiple upstreams

Implementation Approaches for Multi-upstream Resolution

User Interface and Configuration

Design API endpoints for managing multiple upstreams
Consider the following configuration options:
- Priority/order of upstreams
- Individual timeout settings
- Health check configurations
- Authentication settings per upstream
Evaluate UX patterns for managing multiple upstreams in the UI (STRETCH)

4. Shared Upstreams Implementation

Following the pattern established by JFrog Artifactory and Sonatype Nexus, implement a shared upstream repositories feature:

Implementation Considerations

Design database schema for shared upstream repositories
Determine permission model for shared upstream access
Define the scope of sharing (group-level vs instance-level)
Plan migration path for existing single-upstream configurations

Key Implementation Areas

Database schema changes to support shared repository references
API endpoints for managing shared upstreams
UI components for shared upstream management (STRETCH)
Permission system integration
Caching strategy for shared upstream metadata

Technical Considerations

Performance Metrics to Consider

Response time for artifact resolution
Memory usage during parallel requests
Cache hit rates
Error rates and timeout frequency
Impact on GitLab instance resources

Technical Questions to Resolve

What is the optimal balance between parallel requests and system resources?
How should we handle timeouts and failures in a multi-upstream environment?
Should shared upstreams be scoped to group level or instance level?
How can we ensure efficient caching with shared upstreams?
What's the most efficient way to handle permissions for shared upstreams?
How should we handle updates to shared upstream configurations?
What monitoring and observability features are needed?

Edited Jan 30, 2025 by Tim Rizzi