Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    • Switch to GitLab Next
  • Sign in / Register
buildgrid
buildgrid
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 93
    • Issues 93
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 4
    • Merge Requests 4
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • BuildGrid
  • buildgridbuildgrid
  • Wiki
  • execution service

Last edited by Martin Blanchard Jan 22, 2019
Page history

execution service

This page describes a new distributed architecture for a remote-execution service. This is work-in-progress!

Data Model

Buildgrid's execution service manipulates jobs submitted by peers and processed by bots:

  • Peers submit Actions and get streamed Operations back, ultimately carrying an ActionResult.
  • Bots receive Leases and produce ActionResults out of them.
  • The Job class ties the REAPI and RWAPI messages together.
classDiagram
Job *-- Action: 1:1
Job *-- Operation: 1:n
Job *-- Lease: 1:(0,1)
Job o-- ActionResult: 1:(0,1)
Job: name [str]
Job: priority [int]
Job: n_tries [int]
Job: done [bool]
Operation: name [str]
Lease: name [str]
Peer <--> Operation
Bot <--> Lease
Peer: id [str]
Bot: id [str]

Out of any external timeline consideration:

  • An Action maps to one and only one Job.
  • The Job holds one Operation for each peer interested in its Action result.
  • The Job holds a Lease if at least one of its Operation has reach the EXECUTING stage.
  • The Job may hold an ActionResulteither if:
    • The ActionCache hits for its Action.
    • Its Lease as reach COMPLETED and hasn't been cancelled.

Operations life-cycle:

An Operation passes through different stages:

graph LR;
  UNKNOWN -- 1 --> CACHE_CHECK;
  CACHE_CHECK -- 3 --> QUEUED;
  UNKNOWN -- 2 --> QUEUED;
  CACHE_CHECK -- 4 --> COMPLETED;
  QUEUED -- 5 --> COMPLETED;
  QUEUED -- 6 --> EXECUTING;
  EXECUTING -- 7 --> QUEUED;
  EXECUTING -- 8 --> COMPLETED;

Stages transition details:

  1. If ExecuteRequest.skip_cache_lookup is False
  2. If ExecuteRequest.skip_cache_lookup is True
  3. On ActionCache miss
  4. On ActionCachehit or cancellation
  5. On cancellation
  6. On scheduling
  7. On bot failure
  8. On success or build failure

Leases life-cycle:

A Lease passes through different states:

graph LR;
  UNSPECIFIED -- 1 --> PENDING;
  PENDING -- 2 --> CANCELLED;
  PENDING -- 3 --> ACTIVE;
  ACTIVE -- 4 --> PENDING;
  ACTIVE -- 5 --> CANCELLED;
  ACTIVE -- 6 --> COMPLETED;

State transition details:

  1. On initial emission
  2. On cancellation (server-side)
  3. On acceptation (bot-side)
  4. On bot failure
  5. On cancellation (server-side)
  6. On success or build failure

High-level architecture

In order for an execution service to scale regarding the number of peers and bots it can handle simultaneously, the proposed architecture exposes the REAPI and RWAPI through respectively two sets of n and m gRCP front-end nodes that can be dynamically spin-up and tear-down. The global state is stored in a central database:

graph LR;
  DB((DB));
  Peer-1 -.-> RE-node-1;
  Peer-2 -.-> RE-node-1;
  Peer-3 -.-> RE-node-3;
subgraph Service;
  RE-node-1 --- DB;
  RE-node-2 --- DB;
  RE-node-3 --- DB;
  RE-node-n -.- DB;
  DB --- RW-node-2;
  DB --- RW-node-1;
  DB -.- RW-node-m;
end;
  RW-node-2 -.- Bot-1;
  RW-node-2 -.- Bot-2;
  RW-node-1 -.- Bot-3;
  RW-node-1 -.- Bot-4;

Considering:

  • Peer: any REAPI client (BuildStream, Bazel, RECC...)
  • Bot: any RWAPI bot (buildbox-worker, bgd-bot...)
  • RE-node: remote-execution (REAPI) front-end node
  • RW-node: remote-worker (RWAPI) front-end node
  • DB: central state database

RE front-end nodes

graph LR;
  DB((DB));
  CC{Controller};
  Peer -.- Capabilities;
  Peer -.- ActionCache;
  Peer -.- Execution;
  Peer -.- Operations;
subgraph RE-node;
  Capabilities --- CC;
  ActionCache  --- CC;
  Execution --- CC;
  Operations --- CC;
  CC --- DB-driver;
  CC --- CAS-client;
end;
  DB-driver -.- DB;
  CAS-client -.- CAS;

RW front-end nodes

graph LR;
  DB((DB));
  CC{Controller};
  DB -.- DB-driver;
subgraph RW-node;
  CC --- Capabilities;
  CC --- Bots;
  DB-driver --- CC;
end;
  Capabilities -.- Bot;
  Bots -.- Bot;
Clone repository
  • execution service
  • Home
  • logo
  • project roadmap