WIP: New application to periodically build edx/edx-platform, and notification system when deployment fails
Prototype for https://tasks.opencraft.com/browse/OC-2167 (discovery document).
This adds two features:
- a task that regularly (every two hours) builds
edx/edx-platform
(master
branch) at its current status, usingedx/configuration
. - a notification system for appserver deployment failures, it will send e-mails to DevOps OpenCraft when OVH or the ansible playbook fails. This applies to all appservers. In the case of the servers built for the previous task (edx CI), it will also send e-mails to edX devops.
WIP because there are still FIXMEs for things that need to be decided or improved..
JIRA tickets: None Discussions: None Dependencies: Please merge https://github.com/open-craft/documentation/pull/240 too Screenshots: None Sandbox URL: None Merge deadline: None
Testing instructions:
- Define variables in your
.env
INFRASTRUCTURE_DEPLOYMENT_PROBLEMS_EMAIL = "ops@example.com"
OPENEDX_DEPLOYMENT_PROBLEMS_EMAIL = "edx-devops@example.com"
honcho run python3 ./manage.py shell_plus
-
from periodic_builds.tasks import deploy_edx_edxplatform; deploy_edx_edxplatform()
to test that the task runs. But from now on, you can also useinstance=OpenEdXInstance.objects.get(internal_lms_domain__startswith='master')
and then justspawn_appserver(instance.ref.pk, mark_active_on_success=False, num_attempts=2)
each time you test (no need to reloadshell_plus
, but Ocim itself needs to be reloaded) - The first time, an
OpenEdXInstance
will be created and a server spawned (check it in Django's admin). Watch the logs from console or from Ocim's admin to check that it's deploying - Repeat the past steps in many more situations, including different types of failures. Details follow:
- To simulate an OVH failure, go to
instance/models/mixins/load_balanced.py
, atset_dns_records
function, and add araise Exception("DNS on strike today")
. This should cause an infrastructure error and en e-mail should be sent to OpenCraft only, with a stack trace - To simulate an ansible playbook failure, go to
instance/models/openedx_instance.py
, atspawn_appserver
and areturn None
very early. This should cause an openedx error, with e-mail sent to OpenCraft and edX - Verify that the server deploys correctly. WIP. This won't happen yet in the prototype due to a missing "openstack" role; see the discovery document or https://github.com/edx/configuration/pull/3522
- Delete all test servers from OVH
Reviewers
-
TBD, not yet
Author concerns: None Settings: None