Commit 0791480a authored by Adam Coldrick's avatar Adam Coldrick Committed by Adam Coldrick
Browse files

Add some documentation of the CAS cleanup functionality

This documents how the cleanup tool should be used, and describes some
of the tradeoffs involved in its configuration.
parent d7bb6f63
Pipeline #155665380 passed with stages
in 24 minutes and 1 second
......@@ -40,5 +40,6 @@ Other
:maxdepth: 3
.. _cas-cleanup:
CAS Cleanup
If using an :ref:`Indexed CAS <indexed-cas>` along with an S3 storage backend,
you can run a separate daemon to handle LRU cleanup/expiry of blobs from S3,
to stay within usage quotas for example. Non-S3 backends can broadly use other
mechanisms to handle expiry of old blobs.
This continuously monitors the current size of the CAS contents, and triggers
cleanup if the contents reach a specified "high water mark" size. The cleanup
deletes blobs (in configurably-sized chunks) in least recently used order
until the size of the CAS contents reaches a specified "low water mark", when
it stops deleting and goes back to monitoring the size.
To run the cleanup daemon,
.. code-block:: sh
bgd cleanup --high-watermark 10G --low-watermark 7.5G --batch-size 100M \
--sleep-interval 10 deployment.conf
The batch size and high/low water mark parameters take numbers in bytes.
Shorthands for kB, MB, GB, and TB are available as K, M, G, and T respectively,
as seen in the example.
The batch size is the minimum amount of space cleared in one go. The cleanup
tool will try to remain as close as possible to the configured batch size, but
depending on the size of blobs in the CAS will sometimes delete more than the
specified batch at a time.
A smaller batch size adds more load to the database and the storage backend,
but space will start to be actually cleared faster than with large batch sizes.
If the batch size is larger than the difference between the current CAS size
and the low water mark, then the whole set of deletions required will be done
in one batch.
The sleep interval is the time in seconds to sleep after checking whether the
CAS size has reached the configured high water mark. A lower sleep interval
means a more reactive cleanup, at the cost of more database load.
The configuration file used should contain the index and backend storage
definitions. The easiest way to achieve this is to just use the same config
file that was used to deploy the indexed CAS in the first place.
It should be noted that if monitoring is configured in the provided config
file (see :ref:`monitoring-configuration`) then any metrics produced by the
cleanup tool will be published in the configured place. If that shouldn't be
the same place as the indexed CAS metrics for whatever reason then the config
will need to be changed.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment