Skip to content

Optimise `builds.each` of `RegisterJobService`

Problem

Currently the RegisterJobService uses builds.each to iterate next build to pick, but:

  • sometimes we fetch 1k of objects
  • we allocate and populate all objects at once
  • we over-allocate memory
  • sometimes queue depth is 1-3 which means that the object returned is already stale and this increases amount of 409 and can_pick? and InvalidStateMachine type of errors

Possible optimisations

We cannot really use .find_each(of: 1), as this impose limit and requires predictable sorting, where-as builds are sorted based on different criteria, ones not being to be sequential ordering (in some cases). What we can do:

  • fetching each build individually
  • it should have still a very big positive effect on system due to reduction of picking violations

Change the loop to be:

builds_id = builds.pluck(:id)
histogram_queue_size.observe(builds_id.count)

builds_id.each do |build_id|
  build = Ci::Build.find(build_id)

We could actually test this hypothesis by changing iterator schema via Feature Flag.

Metrics to look at

  • Amount of SQL queries db_count
  • Amount of allocated mem_bytes
  • The queue_depth (aka how many builds_id we had to iterate to pick a next build)

For each of them we should see a reduction in used resources in total (for the whole system).

Edited by Kamil Trzciński