[BB-3775] Setup periodic task to kill zombie app servers
Created by: spokerman12
We want to set up a periodic task for the kill_zombies
command.
Requirements summarized after discussion:
- Frequency configurable via
environment variablesdjango settings (opencraft/settings.py) - Defaults to once a day
- Call the kill_zombies management command using call_command
- Have a reasonable threshold (that is configurable) and if the number of VMs to be deleted exceeds that we should alert by sending an email to the ops@opencraft.com
- Have a settings toggle (configurable via environment variables) to disable the job
kill_zombies
has no tests, so we convened on testing the following for the periodic task:
- Picks up and applies all the proposed
env variablessettings - Sends email if we're over the threshold
- Sends email on failure
- Invokes the management command to do the actual cleanup
JIRA tickets: BB-3775
Dependencies: None
Screenshots: See Gitlab issue
Merge deadline: "None"
Testing instructions:
- Pull this branch to your local OCIM environment
- Inside the vagrant VM, run unit tests with
./manage.py test instance.tests.management.test_kill_zombies.KillZombiesPeriodicallyTestCase
- On normal execution, the task will overwrite the default arguments with whatever env variables are available. Set these on
opencraft/settings.py
to test manually. These are the defaults:
KILL_ZOMBIES_SCHEDULE="0 0 */1 * *" # Daily at 12AM
KILL_ZOMBIES_ENABLED=False
OPENSTACK_REGION="gra" # This is already set in the file
ADMINS = env.json('ADMINS', default=set()) # Already set, has the form of (name, email) tuples)
KILL_ZOMBIES_WARNING_THRESHOLD=10
These defaults can be changed per the reviewer's request.
- Edit Procfile.dev to
worker_low_priority: HUEY_QUEUE_NAME=opencraft_low_priority python3 manage.py run_huey
- Because
kill_zombies
is not tested, we have to override its behavior and treat is as a black box. To do so, hack this ininstance/management/commands/kill_zombies.Command.handle
:
...
# Set options
self.region = options.get("region")
self.dry_run = options.get("dry_run", False)
if self.dry_run: # This block of code is lower in the handle method
result = "Would have terminated {} zombies if this weren't a dry run.".format(death_count)
else:
result = "Terminated {} zombies.".format(death_count)
self.log(result)
return result
...
Then manually change death_count
so you can test.
6. Run make run.dev
inside the vagrant VM. You should see entries similar to:
21:31:01 worker_low_priority.1 | [2021-02-26 21:31:01,880] INFO:root:Worker-1:Executing periodic task `kill_zombies_periodically`
Author notes and concerns:
-
kill_zombies
needs some tests.
Reviewers
-
@lgp171188