Push-based job dispatch in Job Router
Before we can implement any meaningful Job Router features as described in the design doc at https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/runner_job_router (like priorization, custom queues, scaling signals, etc) we need to change from a pull-based job routing model to a push-based model. Today, the runner even when connected to the Job Router will use a pull-based model via a `GetJob` gRPC method that internally uses an internal REST API endpoint similar to the already existing public REST API endpoint used without the Job Router. So the Job Router is just a **proxy** right now. We need to change that. The majority and complexity of this work will be how to trigger a job dispatching to the runner. How will the Job Router know that? Likely, (some) state has to move from Rails to the Job Router so that the Job Router has authority / ownership over the current state of jobs that need to be run. The Job Router and GitLab Relay (previously KAS) in general is stateless (apart from a Redis cache), thus, introducing state is something none trivial and needs refinement. This also likely requires close collaboration with someone that has expertise on the current job routing in Rails (someone from ~"group::pipeline execution" ?). We can use this epic to refine this work and should update the design documentation accordingly. ### Business Case Completing this epic unlocks the full potential of the Job Router. Today the router is a proxy with no real authority over job dispatch, meaning features like intelligent job routing, pipeline affinity and job prioritization cannot be built on top of it. With push-based dispatch in place, these capabilities become independently deliverable enabling us to address active enterprise customer requirements and positioning Runner as a scalable, policy-aware execution platform. See https://gitlab.com/groups/gitlab-com/account-management/emea/barclays/-/work_items/6+ for some direct customer demand for these features. ### Success Metrics * Job scheduling latency with push-based dispatch should be the same or faster than current polling baseline. ### Dependencies * ~"group::pipeline execution": Moving job assignment state from Rails into the Job Router requires a clear understanding of how job state is currently stored and transitioned in the database. We might require collaboration with Pipeline Execution these questions as we continue to break this down.
epic