Support for downloading sources from mirrors
This describes the client side support for source mirroring as discussed in this thread. For details on the automation of project driven mirroring, see separate issue #330.
This is largely based on the extension of my counter proposal in this email
The following draft makes the assumption that a given source alias should be mirrored as a single unit; which makes the implementation a bit simpler, while imposing that a project.conf
declare aliases in such a way that matches the chosen mirroring solution. This is to say, if you have a lot of git repositories in the same upstream location, but only wish to mirror a portion of those git repositories, you must use separate source aliases to refer to them in your project.
What is a "mirror"
We are going to treat a "mirror" as an abstract thing, with some elaborate configuration which will allow both interoperability with any mirroring solutions, and also allow for more simplified configuration for a more turn key solution, where BuildStream has an opportunity to provide something more practical and easy to use.
A mirror, in BuildStream terms, is a configuration object which is declared in a project.conf
, which may define 0 or more mirrors associated to the given project.
Some general rules about a mirror:
- A single mirror is related to a geographical location
- The first defined mirror is the default mirror
- User configuration, or possibly command line options, should allow the user to select a preferred mirror
- A mirror defines a single alias mapping for a given source alias (see below for a description of an alias mapping
- There is no restriction on domains of alias mappings within a single mirror, although we expect that they are grouped together by geographical location, this also need not necessarily be true.
- A mirror definition is allowed to be sparse, such that not all source aliases defined by the project need to have an alias mapping
- This is mostly a provision for projects which manage their own central VCS repositories for a hand full of sources
- A mirror may address and override alias mappings in subprojects accessed via junctions
- This allows higher level projects to easily redirect fetches for subproject sources from a more efficient location
Structure of an alias mapping:
- An alias mapping is used to indicate one or more alternative URLs with which BuildStream can resolve a source alias
- The ability to map multiple aliases inside a single mirror definition is a provision to group multiple mirrors into a single logical geographic location, allowing fallback repositories for the cases where an upstream source has been modified, this is required for when:
- An upstream tarball retains the same name, but was manually updated, such that the upstream URI remains the same, but the sha256sum has changed
- An upstream VCS repository has undergone surgery, such that for example, a git commit sha which a project is using has disappeared from upstream by means of a history rewrite.
- There is an open question as to whether using multiple URIs in an alias mapping for the purpose described above should be a rule rather than just a provision, if we enforce this this structure as a rule, we can make some assumptions, such as:
- The order of URIs in the mapping is meaningful, such that earlier URIs in the list represent older versions of the same repo, meaning that it is always appropriate to
bst track
with later URIs in the list
- The order of URIs in the mapping is meaningful, such that earlier URIs in the list represent older versions of the same repo, meaning that it is always appropriate to
project.conf
Configuration example for
aliases:
foo-git: git://git.foo.com/git
foo-tar: https://download.foo.com/sources
mirrors:
- location-name: united-kingdom
aliases:
foo-git:
- git://git.foo.co.uk/git
foo-tar:
- https://download.foo.co.uk/sources0
- https://download.foo.co.uk/sources1
# Subproject "bar" gits
bar:bar-git:
- git://git.bar.co.uk/git
- location-name: korea
aliases:
foo-git:
- git://git.foo.co.kr/git
foo-tar:
- https://download.foo.co.kr/sources0
- https://download.foo.co.kr/sources1
# Subproject "bar" gits
bar:bar-git:
- git://git.bar.co.uk/git
Runtime behaviors
When running a fetch
or track
task for a given Source
, multiple iterations must now be made such that we can try multiple aliases.
First, we can compose a normalized list of URIs to traverse for a given source alias somewhere in the pipeline initialization phase.
At fetch
time, the general order should usually be:
- First try the default mirror
- Iterate through alias mappings for the given mirror
- Open question: While iterating through alias mappings, should we optimize by assuming that if we have not yet found the ref we are looking for in a
fetch
task, and that the repository is unreachable for the next URI, we can safely move on to the next mirror ? This would turn our expectation of how the alias mappings work into a rule.
- Try the remaining mirrors in the order they are declared in the
project.conf
- Resort to the true upstream URL, i.e. default expansion of the source alias itself.
At track
time, things are a bit more complicated, we may want to do the whole thing in reverse; such that we are guaranteed to always make an attempt to track the latest by default.
Implementation of iterating over aliases
The Source
plugin facing API should remain unchanged for this, existing plugins can continue to function without any added bells or whistles.
Initially, Source
objects are instantiated in the main data model with source aliases resolved to the true upstream default URLs in the normal way.
During this initial instantiation for the main data model, we can infer which source alias was used by the given Source object by observing it's call to Source.translate_url()
. This allows us to construct a list of URLs later on which should be tried.
Inside the TrackQueue
and FetchQueue
tasks, the Source may be reinstantiated multiple times, for each try in this context, we make Source.translate_url()
resolve the URI to something different.