Skip to content

[BB-3775] Setup periodic task to kill zombie app servers

Boros Gábor requested to merge danielf/BB-3775 into master

Created by: spokerman12

We want to set up a periodic task for the kill_zombies command.

Requirements summarized after discussion:

  • Frequency configurable via environment variables django settings (opencraft/settings.py)
  • Defaults to once a day
  • Call the kill_zombies management command using call_command
  • Have a reasonable threshold (that is configurable) and if the number of VMs to be deleted exceeds that we should alert by sending an email to the ops@opencraft.com
  • Have a settings toggle (configurable via environment variables) to disable the job

kill_zombies has no tests, so we convened on testing the following for the periodic task:

  • Picks up and applies all the proposed env variables settings
  • Sends email if we're over the threshold
  • Sends email on failure
  • Invokes the management command to do the actual cleanup

JIRA tickets: BB-3775

Dependencies: None

Screenshots: See Gitlab issue

Merge deadline: "None"

Testing instructions:

  1. Pull this branch to your local OCIM environment
  2. Inside the vagrant VM, run unit tests with
./manage.py test instance.tests.management.test_kill_zombies.KillZombiesPeriodicallyTestCase
  1. On normal execution, the task will overwrite the default arguments with whatever env variables are available. Set these on opencraft/settings.py to test manually. These are the defaults:
KILL_ZOMBIES_SCHEDULE="0 0 */1 * *"       # Daily at 12AM
KILL_ZOMBIES_ENABLED=False         
OPENSTACK_REGION="gra"                          # This is already set in the file
ADMINS = env.json('ADMINS', default=set())   # Already set, has the form  of (name, email) tuples)
KILL_ZOMBIES_WARNING_THRESHOLD=10

These defaults can be changed per the reviewer's request.

  1. Edit Procfile.dev to
worker_low_priority: HUEY_QUEUE_NAME=opencraft_low_priority python3 manage.py run_huey
  1. Because kill_zombies is not tested, we have to override its behavior and treat is as a black box. To do so, hack this in instance/management/commands/kill_zombies.Command.handle:
...
# Set options
self.region = options.get("region")
self.dry_run = options.get("dry_run", False)
if self.dry_run:    # This block of code is lower in the handle method
    result = "Would have terminated {} zombies if this weren't a dry run.".format(death_count)
else:
    result = "Terminated {} zombies.".format(death_count)
self.log(result)
return result
...

Then manually change death_countso you can test. 6. Run make run.dev inside the vagrant VM. You should see entries similar to:

21:31:01 worker_low_priority.1 | [2021-02-26 21:31:01,880] INFO:root:Worker-1:Executing periodic task `kill_zombies_periodically`

Author notes and concerns:

  1. kill_zombies needs some tests.

Reviewers

  • @lgp171188

Merge request reports