[BB-1537] Migrate to New Relict Alerts for monitoring alert notifications
Created by: lgp171188
With the legacy New Relic Synthetics notification reaching EOL in September 2019 and us having issues with not getting notified for the failing heartbeat checks, a migration to the New Relic Alerts is required and this PR implements the changes needed for it.
Testing instructions: Though the production environment is the ideal place to test this, it is possible to test the changes in the stage environment with some configuration and tweaks.
- Create a new
.env.<suffix>
file in the stage server by copying the existing.env
file and add the variable configuring the New Relic API admin key. It's okay to use the production key but ensure that all the alert policies, notification channels, alert conditions and monitors created for testing purposes are deleted after testing. - Run
honcho -e <the created .env.* file> run python3 manage.py shell_plus
command to start the Django shell. - Play around with the new functions added in
instance/newrelic.py
for creating/deleting New Relic Alert policies, email notification channels, adding them to policies, creating alert conditions for existing monitors etc. and verify that everything works as expected. - Apply the migrations and test the following scenarios through Ocim.
- Open the Django shell using the ``honcho -e <the created .env.* file> run python3 manage.py shell_plus` command.
- Add
NewRelicEmailNotificationChannel
instances for thesettings.ADMINS
email addresses with theshared
flag set toTrue
. Note that this is likely to send a few emails to these addresses. - Run the
enable_monitoring()
method on an existing, successfully provisioned instance and verify that everything works okay and the monitoring is enabled properly. - Add some additional monitoring emails and repeat the previous step and verify the result.
- Run the
disable_monitoring()
method and verify that all the resources created byenable_monitoring()
are deleted. Verify that the shared email addresses are not removed from New Relic or deleted from the database. - Try a series of
enable_monitoring()
anddisable_monitoring()
calls in random order and verify that nothing unexpected happens. Ensure that at the end thedisable_monitoring()
method is called to clean up any test resources created. - Update the
.env
file with the New Relic admin API key (get it from production or use.env.newrelic
on stage). It is a good idea to inform the team not to create new sandboxes or spawn new appservers while testing this as they will end up creating more resources (including the monitors which may cost money) to be cleaned up. - Restart the app by killing the
make run
process and restarting it. - Restart the Django shell as well.
- Create a new sandbox instance, spawn a new appserver and verify that every thing works okay. Confirm and verify that the monitoring has been set up properly.
- Disable monitoring on the new sandbox and verify that it works okay. Confirm and verify that all the monitoring resources are deleted.
- Archive the instance and verify that the cleanup is performed and nothing related to that instance is left undeleted.
- Also verify that the
NewRelicEmailNotificationChannel
instances with theshared
flag set toTrue
are not deleted when disabling the monitoring and archiving instances.
Author notes and concerns:
- All the new models use the integer type for the primary key to match the type returned by te New Relic API. This could require changes if New Relic changes the data types.
- This might require additional testing scenarios not listed in the testing instructions above.