Skip to content

WIP: New application to periodically build edx/edx-platform, and notification system when deployment fails

Daniel Clemente Laboreo requested to merge clemente/periodic-builds into master

Prototype for https://tasks.opencraft.com/browse/OC-2167 (discovery document).

This adds two features:

  • a task that regularly (every two hours) builds edx/edx-platform (master branch) at its current status, using edx/configuration.
  • a notification system for appserver deployment failures, it will send e-mails to DevOps OpenCraft when OVH or the ansible playbook fails. This applies to all appservers. In the case of the servers built for the previous task (edx CI), it will also send e-mails to edX devops.

WIP because there are still FIXMEs for things that need to be decided or improved..

JIRA tickets: None Discussions: None Dependencies: Please merge https://github.com/open-craft/documentation/pull/240 too Screenshots: None Sandbox URL: None Merge deadline: None

Testing instructions:

  1. Define variables in your .env
INFRASTRUCTURE_DEPLOYMENT_PROBLEMS_EMAIL = "ops@example.com"
OPENEDX_DEPLOYMENT_PROBLEMS_EMAIL = "edx-devops@example.com"
  1. honcho run python3 ./manage.py shell_plus
  2. from periodic_builds.tasks import deploy_edx_edxplatform; deploy_edx_edxplatform() to test that the task runs. But from now on, you can also use instance=OpenEdXInstance.objects.get(internal_lms_domain__startswith='master') and then just spawn_appserver(instance.ref.pk, mark_active_on_success=False, num_attempts=2) each time you test (no need to reload shell_plus, but Ocim itself needs to be reloaded)
  3. The first time, an OpenEdXInstance will be created and a server spawned (check it in Django's admin). Watch the logs from console or from Ocim's admin to check that it's deploying
  4. Repeat the past steps in many more situations, including different types of failures. Details follow:
  5. To simulate an OVH failure, go to instance/models/mixins/load_balanced.py, at set_dns_records function, and add a raise Exception("DNS on strike today"). This should cause an infrastructure error and en e-mail should be sent to OpenCraft only, with a stack trace
  6. To simulate an ansible playbook failure, go to instance/models/openedx_instance.py, at spawn_appserver and a return None very early. This should cause an openedx error, with e-mail sent to OpenCraft and edX
  7. Verify that the server deploys correctly. WIP. This won't happen yet in the prototype due to a missing "openstack" role; see the discovery document or https://github.com/edx/configuration/pull/3522
  8. Delete all test servers from OVH

Reviewers

  • TBD, not yet

Author concerns: None Settings: None

Merge request reports