Document how we're using Knapsack in our development documentation

We'll look at this pipeline in this issue: https://gitlab.com/gitlab-org/gitlab-ce/pipelines/7050679

As I understand our current CI setup we have:

  1. The knapsack job in the prepare stage that is supposed to ensure we have a knapsack/rspec_report.json file
  • the knapsack/rspec_report.json file is fetched from the cache with the knapsack key, if it's not here we initialize the file with {}
  1. Each rspec x y job are using knapsack rspec and should have an evenly distributed share of tests
  • it works because the jobs have access to the knapsack/rspec_report.json since "Note that artifacts from all previous stages are passed by default." 1
  • the jobs set their own report path to KNAPSACK_REPORT_PATH=knapsack/spinach_node_${CI_NODE_INDEX}_${CI_NODE_TOTAL}_report.json
  • Note: if knapsack is doing its job, test files that are run should be listed under Report specs, not under Leftover specs
  1. The update-knapsack job takes all the knapsack/spinach_node_${CI_NODE_INDEX}_${CI_NODE_TOTAL}_report.json files from the rspec x y jobs and merge them all together into a single knapsack/rspec_report.json that is then cached with the knapsack key
  2. Next pipeline will use the up-to-date knapsack/rspec_report.json file.

The problem

If I'm understanding the setup correctly, there's something that I found strange: if you look at the RSpec jobs at https://gitlab.com/gitlab-org/gitlab-ce/pipelines/7050679/builds, there are jobs that took 45, 30, 40, 68, 25, 27, 27, 38, 29, 56, 28, 78, 32, 26, 38, 43, 44, 45, 55, 45 minutes respectively. This is a total of 819 minutes (more than 13 hours!), and if the tests were distributed evenly, we would have an average of 41 minutes per job. This is clearly not the case.

A look into the artifacts

  1. By looking at the artifacts from the knapsack job, something strange struck me: knapsack/rspec_report.json contains only {}, so either it was not in the cache, or the cached file contained {}.
  2. The artifacts from the rspec x y jobs look good, they include the list of test files that were run and their runtime, e.g. https://gitlab.com/gitlab-org/gitlab-ce/builds/12349862/artifacts/file/knapsack/rspec_node_0_20_report.json
  3. The artifact from the update-knapsack job looks good, it includes the list of all the test files that were run from all the rspec x y jobs, and their runtime, e.g. https://gitlab.com/gitlab-org/gitlab-ce/builds/12349985/artifacts/file/knapsack/rspec_report.json

What could go wrong?

Since reports are generated correctly in the rspec x y and update-knapsack jobs, the only possibility I see is that our cache mechanism doesn't work (or that we configured it incorrectly) and knapsack/rspec_report.json is not retrieved correctly from the knapsack job...

For reference both the knapsack and update-knapsack share this common definition:

.knapsack-state: &knapsack-state
  services: []
  variables:
    SETUP_DB: "false"
    USE_BUNDLE_INSTALL: "false"
  cache:
    key: "knapsack"
    paths:
    - knapsack/
  artifacts:
    expire_in: 31d
    paths:
    - knapsack/

This allows the knapsack job to retrieve knapsack/rspec_report.json and pass it to the other stages, and it also allows the update-knapsack to cache the up-to-date knapsack/rspec_report.json.

@ayufan Any thoughts on this?

  1. https://docs.gitlab.com/ce/ci/yaml/README.html#dependencies