Cache size calculation trashes too much the IO
Summary
BuildStream is doing a full calculation of the cache size every time an artifact is committed to or pulled to the cache.
Though the page cache helps on subsequent calculations, it is still a wasteful operation to do after every commit and pull. In addition, this operation doesn't scale as the artifact cache grows in size.
Possible fixes
It would be more efficient to:
- Calculate the full cache size once at startup.
- Modify
ArtifactCache.commit()
andArtifactCache.pull()
to return the number of bytes added to the cache. - Dynamically update the cache size by adding up the bytes returned from
ArtifactCache.commit()
andArtifactCache.pull()
.
Other relevant information
Some experiments:
sync; echo 3 > /proc/sys/vm/drop_caches
time du -hs ~/.cache/buildstream/artifacts/
11G /home/tiagogomes/.cache/buildstream/artifacts/
real 0m5.912s
user 0m0.398s
sys 0m2.926s
time du -hs ~/.cache/buildstream/artifacts/
11G /home/tiagogomes/.cache/buildstream/artifacts/
real 0m1.120s
user 0m0.245s
sys 0m0.864s
I expect the time taken to do the calculation to grow up linearly as the artifact cache grows.
- BuildStream version affected: /milestone %BuildStream_v1.x
Edited by Jürg Billeter