fix: use nerdgraph for fetching new relic monitors
Description
This PR fixes a bug with deployment of New Relic changes. If there are a large amount of monitors on an account, then Grove deployment will error out as seen in https://gitlab.com/opencraft/ops/grove-stage-digitalocean/-/jobs/2912584123#L855 with an error similar to:
Retrieving email notification channels.
Creating synthetic monitors if required.
Adding monitor: https://as-ill-documentation-committee.trycloudflare.com/.
NewRelic Error : {"errors":[{"error":"Invalid name specified: '821114-https://as-ill-documentation-committee.trycloudflare.com/'; a monitor with that name already exists."}],"count":1}
400 Client Error: Bad Request for url: https://synthetics.eu.newrelic.com/synthetics/api/v3/monitors
The bug was caused because it's not possible to fetch monitors for a specific policy using the New Relic REST API and the current implementation doesn't check past the first page of results.
By making use of using the newer GraphQL API which does allow filtering via a policy id we are able to bypass this limitation and fix the deployment issue.
Note that this PR also contains pyproject.toml
for the black
configuration. I then ran black
on the project to make sure the formatting is consistent.
Supporting information
- https://docs.newrelic.com/docs/apis/synthetics-rest-api/monitor-examples/manage-synthetics-monitors-rest-api/
- https://docs.newrelic.com/docs/apis/nerdgraph/get-started/introduction-new-relic-nerdgraph/
- https://docs.newrelic.com/docs/apis/nerdgraph/examples/nerdgraph-synthetics-tutorial/
Testing instructions
Using current main branch
- Deploy an instance on your local Grove install. Make sure that at least one of
LMS_HOST
,CMS_HOST
orPREVIEW_LMS_HOST
is defined for your instance (the actual value doesn't matter for this PR). - Add the New Relic settings to your
private.yml
from https://gitlab.com/opencraft/ops/grove-stage-digitalocean/-/settings/ci_cd - Run
./grove postdeploy [instance name]
- You will see messages that your New Relic monitors were added:
Enabling monitoring for minitest Adding an alert policy... No active alert policy found. Creating a new one... Retrieving email notification channels. Adding notification channel 327221 to policy. Adding notification channel 326947 to policy. Creating synthetic monitors if required. Adding monitor: https://as-ill-documentation-committee.trycloudflare.com/. Monitor https://as-ill-documentation-committee.trycloudflare.com/ added successfully. Adding monitor: https://as-ill-documentation-committee.trycloudflare.com/heartbeat?extended. Monitor https://as-ill-documentation-committee.trycloudflare.com/heartbeat?extended added successfully. Monitors created successfully.
- Run
./grove postdeploy [instance name]
again. This time you'll get an error in the pipeline. - Switch to the
keith/fix-new-relic-monitor-pagination
branch - Run the
postdeploy
command again. - You will now see that it skips adding monitors, because they've already been added.
Enabling monitoring for minitest Adding an alert policy... Found existing alert policy {'id': 821114, 'incident_preference': 'PER_POLICY', 'name': 'minitest', 'created_at': 1661519735585, 'updated_at': 1661519739773} Retrieving email notification channels. Creating synthetic monitors if required. Skipping monitor https://as-ill-documentation-committee.trycloudflare.com/ as it is already added. Skipping monitor https://as-ill-documentation-committee.trycloudflare.com/heartbeat?extended as it is already added.
Checklist
If any of the items below is not applicable, do not remove them, but put a check in it.
-
All providers include the new feature/change -
All affected providers can provision new clusters -
Unit tests are added/updated -
Documentation is added/updated -
The TOOLS_CONTAINER_IMAGE_VERSION
in ci_vars.yml is updated -
The grove-template repository is updated