[SE-3487] enable LOS for NRQL alert conditions
Based on NewRelic's announcement, we must change our NRQL alert condition to enable loss of signal detection. Also, based on my understanding, in our use case it worth to fill the gap with static "0" values to indicate we did not receive data during that time period. This way we would remain "backward compatible" in terms of the shape of the data.
This change will require us to "re-install" all conditions we have, which means that first we will delete them, then add it again. An alternate solution could be to update all conditions with the "Migrator" application NewRelic provided to help the migration. Both directions can work, but I'd go with re-installing the conditions OCIM knows.
For those conditions which are added manually (like which checks OCIM itself) we need to use the Migrator application.
Screenshots:
Sandbox URL: N/A
Testing instructions:
- Go to NewRelic and check the policy created from stage env and check the "Thresholds" section. It must contain the new "Loss of Signal" related changes (Signal lost after settings)
- Go to OCIM production and get the NewRelic related settings
- Go to OCIM stage, checkout this branch and add the copied settings to .env
- Restart the shell on stage
- Get a random (successfully provisioned) instance's ID
- Get the instance object from OpenEdXInstance model
- Call the
enable_monitoring()
function on the instance - Check that NewRelic created the monitoring policy and set the desired parameters
- Revert the changes on stage to use master branch - including deleting the copied NewRelic settings
Author notes and concerns:
- The REST documentation is not updated yet, but the API already knows the new options. For the APIs request scheme, please check it's playground. For the documentation of NerdGraph - which already contains the documentation of the new functionality - , you can check this documentation.
- Based on the UI (I did not find documentation about that though) the
expiration_duration
must be at least60
seconds. - I set
expiration.closeViolationsOnExpiration
toFalse
to be more explicit, though its default value isFalse
Reviewers