Skip to content

Create a ruby script to populate the GDK with a representative sample of namespaces, projects, and runners

Problem

As evidenced by Slow page load times for Admin Area > Runners (#384066 - closed), backend and frontend developers are missing a work environment in GDK that is representative of the complex relationships that exist in real-world deployments (both self-managed users and .com). We normally create projects, namespaces, and runners in an ad-hoc manner with the simplest setup that allows us to test the functionality that we need to test (this also helps the reviewers by not imposing a big workload on them in terms of reproduction steps).

Requirements

The N+1 issues uncovered in the issue above have a common theme - they happen when we have:

  • lots of runners (3000+);
  • these runners don't belong to a single project, but actually to several projects, and these projects belong to different parent groups, and even different root namespaces;
  • the runners have hundreds of thousands of executed jobs (ci_builds records).

Proposal

MVP

A solution could be having a rake task as part of the GitLab repo. It would create the simplest representation of the different scenarios that we must support in our daily work in Category:Runner Fleet, optionally adding enough runners and jobs to strive to represent the load of a production system. This could be something like the following (where runner count and job count are configurable arguments):

graph TD
    G1[Top level group 1] --> G11
    G2[Top level group 2] --> G21
    G11[Group 1.1] --> G111
    G11[Group 1.1] --> G112
    G111[Group 1.1.1] --> P1111
    G112[Group 1.1.2] --> P1121
    G21[Group 2.1] --> P211

    P1111[Project 1.1.1.1<br><i>70% of jobs, sent to first 5 runners</i>]
    P1121[Project 1.1.2.1<br><i>15% of jobs, sent to first 5 runners</i>]
    P211[Project 2.1.1<br><i>15% of jobs, sent to first 5 runners</i>]

    IR1[Instance runner]
    P1111R1[Shared runner]
    P1111R[Project 1.1.1.1 runners<br>20% total runners]
    P1121R[Project 1.1.2.1 runners<br>49% total runners]
    G111R[Group 1.1.1 runners<br>30% total runners<br><i>remaining jobs</i>]
    G21R[Group 2.1 runners<br>1% total runners]

    P1111 --> P1111R1
    P1111 --> G111R
    P1111 --> IR1
    P1111 --> P1111R
    P1121 --> P1111R1
    P1121 --> IR1
    P1121 --> P1121R
    P211 --> P1111R1
    P211 --> G21R
    P211 --> IR1

    classDef groups fill:#09f6,color:#000000,stroke:#333,stroke-width:3px;
    classDef projects fill:#f96a,color:#000000,stroke:#333,stroke-width:2px;
    class G1,G2,G11,G111,G112,G21 groups
    class P1111,P1121,P211 projects

Sub-tasks

  • Create rake task
  • Option to set group/project prefix to avoid clash on subsequent runs, and allow creating more load;
  • Create group/project hierarchy
  • Create runners with random versions and ci_runner_versions records
  • Assign random tags to runners
  • Assign random executors to runners
  • Create fake jobs with actual durations
  • Create an instance runner as part of the seed
  • Create fake merge requests associated with pipelines

Future iterations

  1. Option to nuke existing data and replace with up-to-date version;

Advantages

  • Working day-to-day with a database that more closely resembles that of a production system;
  • Having a standardized work environment that is common to all involved stakeholders (backend/frontend/UX/product). This allows them to more easily communicate about usage scenarios and be aware of performance issues earlier in the development cycle;
  • Any future improvements are now done in a SSOT location and can be quickly distributed to other team members;
  • Having this tool additionally represents a significant improvement in MR creation/review experience, since instead of going through the motions of explaining a reviewer how to create a couple of projects and then do a gitlab-runner register to create a shared project runner, we could just link the tool, have them do a simple run, and refer them to runner 3.1.1 🎉
Edited by Pedro Pombeiro