Skip to content

Cache size calculation trashes too much the IO

Summary

BuildStream is doing a full calculation of the cache size every time an artifact is committed to or pulled to the cache.

Though the page cache helps on subsequent calculations, it is still a wasteful operation to do after every commit and pull. In addition, this operation doesn't scale as the artifact cache grows in size.

Possible fixes

It would be more efficient to:

  1. Calculate the full cache size once at startup.
  2. Modify ArtifactCache.commit() and ArtifactCache.pull() to return the number of bytes added to the cache.
  3. Dynamically update the cache size by adding up the bytes returned from ArtifactCache.commit() and ArtifactCache.pull().

Other relevant information

Some experiments:

sync; echo 3 > /proc/sys/vm/drop_caches
time du -hs ~/.cache/buildstream/artifacts/
11G	/home/tiagogomes/.cache/buildstream/artifacts/

real	0m5.912s
user	0m0.398s
sys	0m2.926s

time du -hs ~/.cache/buildstream/artifacts/
11G	/home/tiagogomes/.cache/buildstream/artifacts/

real	0m1.120s
user	0m0.245s
sys	0m0.864s

I expect the time taken to do the calculation to grow up linearly as the artifact cache grows.

  • BuildStream version affected: /milestone %BuildStream_v1.x

Edited by Jürg Billeter
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information