Skip to content

Investigate: How to add support for multiple upstreams

Investigation: Multiple Upstream Support for Maven Virtual Registry

Background

Currently, the Maven virtual registry MVC supports a single upstream repository. This investigation explores the implementation of multiple upstream support, considering performance, user experience, and potential feature refinements.

Investigation Areas

Performance Considerations

  • Evaluate different strategies for checking artifact existence across multiple upstreams:
    • Sequential vs parallel requests
    • Potential for request timeouts and failure handling
    • Caching strategies for upstream availability and artifact existence
    • Impact on memory usage and request latency
    • Connection pooling strategies for multiple upstreams

Implementation Approaches for Multi-upstream Resolution

User Interface and Configuration

  • Design API endpoints for managing multiple upstreams
  • Consider the following configuration options:
    • Priority/order of upstreams
    • Individual timeout settings
    • Health check configurations
    • Authentication settings per upstream
  • Evaluate UX patterns for managing multiple upstreams in the UI (STRETCH)

4. Shared Upstreams Implementation

Following the pattern established by JFrog Artifactory and Sonatype Nexus, implement a shared upstream repositories feature:

Implementation Considerations

  • Design database schema for shared upstream repositories
  • Determine permission model for shared upstream access
  • Define the scope of sharing (group-level vs instance-level)
  • Plan migration path for existing single-upstream configurations

Key Implementation Areas

  • Database schema changes to support shared repository references
  • API endpoints for managing shared upstreams
  • UI components for shared upstream management (STRETCH)
  • Permission system integration
  • Caching strategy for shared upstream metadata

Technical Considerations

Performance Metrics to Consider

  • Response time for artifact resolution
  • Memory usage during parallel requests
  • Cache hit rates
  • Error rates and timeout frequency
  • Impact on GitLab instance resources

Technical Questions to Resolve

  1. What is the optimal balance between parallel requests and system resources?
  2. How should we handle timeouts and failures in a multi-upstream environment?
  3. Should shared upstreams be scoped to group level or instance level?
  4. How can we ensure efficient caching with shared upstreams?
  5. What's the most efficient way to handle permissions for shared upstreams?
  6. How should we handle updates to shared upstream configurations?
  7. What monitoring and observability features are needed?
Edited by Tim Rizzi