Skip to content

WIP: Resolve "Expire artifacts in local cache"

Tristan Maat requested to merge 135-expire-artifacts-in-local-cache into master

This will implement the cache expiry described in #135 (closed) as follows:

User configuration

  • Adding "cache-quota" to userconfig.yaml

Guarding the quota

  • Intercept commit
  • OSTree cache
    • Calculate cache size
    • Sort artifacts by atime
    • Expire artifacts from the cache
  • Tar cache
    • Calculate cache size
    • Sort artifacts by atime
    • Expire artifacts from the cache

UX

  • Have a useful default quota
  • Ensure multiple simultaneously running instances at least avoid issues

After some recent bugfixes in element-related changes I made I realized that almost everything element-related is not on the main thread, hence the previous version doesn't seem viable. Instead also adding some new scheduler logic:

Adding an interface to run jobs after a queue's job finishes

  • Make jobs element-agnostic
  • Add cleanup condition checks to the scheduler
  • Launch "cleanup" jobs after normal build/pull jobs when the above is met
  • Move current deletion methods to the scheduler

We also should do some performance testing, though it's possible the new implementation will just obviously add negligible overhead after this refactor, since it allows for batch jobs.


After those features were added and everything was rebased to master, the following tests started failing with a non-trivial issue:

  • tests/artifactcache/junctions.py::test_push_pull
  • tests/frontend/overlaps.py::test_overlaps

Buildstream suddenly crashes trying to call scheduler.loop.stop - loop has become None. From initial debugging it seems that this can only occur if buildstream attempts to finish scheduler cleanup twice.

These tests only fail if they are run after a previous test has run - this leads me to believe that we do not create a new asyncio loop under certain circumstances.


The problem was indeed asyncio-related, but it was actually that we closed the loop when we shouldn't have. This looks to have been a bug since the beginning of time, never discovered since buildstream couldn't reach the state in which this occurs - until now.

Since then enough time has passed and another expiry branch has landed, rebasing and fixing issues that have cropped up since, tackling review comments after.

Edited by Tristan Maat

Merge request reports