Geo: runners cloning from secondary sites with proxying fail to find new pipeline refs

As discovered in #345267 (closed), the following QA test fails:

rspec ./qa/specs/features/browser_ui/6_release/pages/pages_pipeline_spec.rb:34 # Release Pages runs a Pages-specific pipeline

The error is:

image

There's some longer discussion in #345267 (comment 904358176), but essentially:

We manually create the ref in Ci::PersistentRef#create, called in the pipeline transition from pending->running. There's no event here that schedules a sync, nor the last_repository_updated_at gets updated as in other places.

I don't think scheduling a sync for each running pipeline will be helpful, nor enough (as the sync won't happen in time in most cases by the time a runner picks up a job from said pipeline).

It can end in a chicken-and-egg problem with the last_repository_updated_at route, if we change the last_repository_updated_at here in order to "bypass" pipelines, then for active repositories this will always change and the git operations will always be proxied then.

@mkozono suggested:

Some initial thoughts on handling the special case:

  • Only the first job picked should move the pipeline to running, so perhaps subsequent jobs have a chance to avoid their Git pull from being proxied?
  • Just to be clear, does a sync replicate this ref?
  • If a sync is too slow, I wonder if a special Geo event that just adds the ref on secondaries would be significantly faster