Skip to content

[BB-1537] Migrate to New Relict Alerts for monitoring alert notifications

Boros Gábor requested to merge guruprasad/BB-1537-New-Relic-Alerts into master

Created by: lgp171188

With the legacy New Relic Synthetics notification reaching EOL in September 2019 and us having issues with not getting notified for the failing heartbeat checks, a migration to the New Relic Alerts is required and this PR implements the changes needed for it.

Testing instructions: Though the production environment is the ideal place to test this, it is possible to test the changes in the stage environment with some configuration and tweaks.

  • Create a new .env.<suffix> file in the stage server by copying the existing .env file and add the variable configuring the New Relic API admin key. It's okay to use the production key but ensure that all the alert policies, notification channels, alert conditions and monitors created for testing purposes are deleted after testing.
  • Run honcho -e <the created .env.* file> run python3 manage.py shell_plus command to start the Django shell.
  • Play around with the new functions added in instance/newrelic.py for creating/deleting New Relic Alert policies, email notification channels, adding them to policies, creating alert conditions for existing monitors etc. and verify that everything works as expected.
  • Apply the migrations and test the following scenarios through Ocim.
    • Open the Django shell using the ``honcho -e <the created .env.* file> run python3 manage.py shell_plus` command.
    • Add NewRelicEmailNotificationChannel instances for the settings.ADMINS email addresses with the shared flag set to True. Note that this is likely to send a few emails to these addresses.
    • Run the enable_monitoring() method on an existing, successfully provisioned instance and verify that everything works okay and the monitoring is enabled properly.
    • Add some additional monitoring emails and repeat the previous step and verify the result.
    • Run the disable_monitoring() method and verify that all the resources created by enable_monitoring() are deleted. Verify that the shared email addresses are not removed from New Relic or deleted from the database.
    • Try a series of enable_monitoring() and disable_monitoring() calls in random order and verify that nothing unexpected happens. Ensure that at the end the disable_monitoring() method is called to clean up any test resources created.
    • Update the .env file with the New Relic admin API key (get it from production or use .env.newrelic on stage). It is a good idea to inform the team not to create new sandboxes or spawn new appservers while testing this as they will end up creating more resources (including the monitors which may cost money) to be cleaned up.
    • Restart the app by killing the make run process and restarting it.
    • Restart the Django shell as well.
    • Create a new sandbox instance, spawn a new appserver and verify that every thing works okay. Confirm and verify that the monitoring has been set up properly.
    • Disable monitoring on the new sandbox and verify that it works okay. Confirm and verify that all the monitoring resources are deleted.
    • Archive the instance and verify that the cleanup is performed and nothing related to that instance is left undeleted.
    • Also verify that the NewRelicEmailNotificationChannel instances with the shared flag set to True are not deleted when disabling the monitoring and archiving instances.

Author notes and concerns:

  • All the new models use the integer type for the primary key to match the type returned by te New Relic API. This could require changes if New Relic changes the data types.
  • This might require additional testing scenarios not listed in the testing instructions above.

Merge request reports