Skip to content

Speed up the cleanup process

Jimmy F requested to merge speedier-cleanup into master

Before raising this MR, consider whether the following are required, and complete if so:

  • Unit tests
  • Metrics
  • Documentation update(s)

If not required, please explain in brief why not.

Description

The clean up process makes a SQL execution to obtain the "windowed lru digests" and then use a small portion of the data. This query can be very time expensive with large database. This MR mitigates the performance hit due to the expensive SQL call by changing the frequency of its execution from once per batch cleanup to once per cleanup worker launch.

There are work-in-progress ideas to speed up the cleaning process by changing the query or the logic of obtaining windows, which may or my not be included in this MR. Note that any such change needs to include returning the premarked entries in the results so that they would be eventually picked up for BCS cleanup.

  • Replace the use of modulo operation (row_number%1000=1 as used in _colum_windows()) with a direct LIMIT n. This is because the modulo is possibly the most time-intensive operation in the query execution.

Changes proposed in this merge request:

  • The list of whereclauses are saved in the SQLIndex class data member.
  • A "refresh" parameter is added to mark_n_bytes_as_deleted() to choose whether to generate new whereclauses (hence the windows) by executing the expensive SQL query, or to use the whereclauses stored in the SQLIndex class.
  • The cleanup worker only invokes mark_n_bytes_as_deleted() with "refresh" the first time. Subsequent loops will not use refresh.

Validation

  • Pass the CI/CD tests.
  • Pass the stress test defined by docker-compose-cleanup.yml

To run stress test:

cd buildgrid/tests/stress-testing-dockerfiles/compose-files

# Build the docker containers.
docker compose -f docker-compose-cleanup.yml build

# Launch the test (with parametrized env).
CLEANUP_DELETE_BATCH_SIZE_IN_BYTES=5000 \
CLEANUP_START_HIGH_WATERMARK=100000 \
CLEANUP_STOP_LOW_WATERMARK=50000 \
CLEANUP_MAX_UPLOAD_FILE_CONTENT_SIZE=1000 \
CLEANUP_TEST_FOR_N_SECONDS=100 \
docker compose -f docker-compose-cleanup.yml up | tee output.log


# Useful container greps
cat output.log | grep "compose-files-bgd"                 | less -N
cat output.log | grep "compose-files-database"            | less -N
cat output.log | grep "compose-files-minio"               | less -N
cat output.log | grep "compose-files-cleanup"             | less -N
cat output.log | grep "compose-files-test-cleanup"        | less -N
cat output.log | grep "compose-files-upload-random-blobs" | less -N

# Docker cleanup as needed.
docker container prune

Observe from output.log that the clean up triggers upon reaching high watermark, and that the mark_n_bytes_deleted():

  • emits either:
    • Requesting new LRU windows. Number of windows remaining: n when a new cleanup worker is launched.
    • Using stored LRU windows Number of windows remaining: n when merely another batch cleanup is on-going.
  • exits when reaching below the high water mark of 50,000 bytes.

Side note: The mark_n_bytes_as_deleted() method is most affected by this MR. In MacOS, the steps below exemplify how to run the group of tests related to this method.

brew install postgres
brew install rabbitmq
cd buildgrid
tox -e venv -- pytest tests/cas/index/test_index.py -k "mark_n_bytes_as_deleted"
Edited by Jimmy F

Merge request reports