Skip to content

WIP: Use BuildMetadata to store build configuration in OLD/YAML serialized form

Kamil Trzciński requested to merge kamil-refactor-ci-builds-v3 into master

What does this MR do?

My initial work on introducing Ci::BuildConfig model which should be used for presenting all CI-job-runner-oriented options.

Ideally, we should move everything that is crucial only for CI job processing to this class:

  • all variables,
  • all helpers generations and so on.

Why re-use serialization?

This is the second simpler iteration after: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/21450. The other MR uses serialization, but it because too complex. I took another approach that allow us to be more iterative and reduce the risk of such change. This is pretty much no-risk change, as it touches implementation in a very specific and targeted way.

The general outline of the plan:

  1. let us introduce base model now,
  2. let us hide a usage of new model behind feature flag: we can confidently test that on large scale system and reduce impact,
  3. let us migrate all logic in next iterations,
  4. let us add columns next to use a new model to not use serialization anymore,

I consider this a minimal change, because:

  1. allow us to introduce a life-cycle of data: this model is disposable and should be able to remove it aggressively (1-3 months),
  2. allow us to add additional columns to store new data in a new form,
  3. allow us to not migrate data, but rather assume that after "3 months" we no longer care about existing ones,
  4. move away from serialization to add explicit columns next, in the backward compatible way.

Next steps

Next step will be:

  1. we gonna add soft-archiving: all builds older than 3 months will no longer be retryable nor playable: https://gitlab.com/gitlab-org/gitlab-ce/issues/50939

The state of different models

ci_builds - we use that for long-term, frequently and frequently updated data. Each row in this table can be updated 10-20 times, as it holds various relations, informations about who and when performed action on the subject,

ci_build_metadata - we use that to store mid-term data, that is sometimes updated, but it contains very specific information for jobs. Right now it is timeout, but next would be: features available by runner, runner version, used executor/shell/platfrom/system. Each row of this table can be updated 3-4 times during the lifecycle of Build. Generally: 1. created once, 2. updated on the job being picked by runner, 3. on job marked as finished.

What are the relevant issue numbers?

Relates to https://gitlab.com/gitlab-org/gitlab-ce/issues/50195

Does this MR meet the acceptance criteria?

Edited by Kamil Trzciński

Merge request reports