Skip to content

fix: use nerdgraph for fetching new relic monitors

Keith Grootboom requested to merge keith/fix-new-relic-monitor-pagination into main

Description

This PR fixes a bug with deployment of New Relic changes. If there are a large amount of monitors on an account, then Grove deployment will error out as seen in https://gitlab.com/opencraft/ops/grove-stage-digitalocean/-/jobs/2912584123#L855 with an error similar to:

Retrieving email notification channels.
Creating synthetic monitors if required.
Adding monitor: https://as-ill-documentation-committee.trycloudflare.com/.
NewRelic Error : {"errors":[{"error":"Invalid name specified: '821114-https://as-ill-documentation-committee.trycloudflare.com/'; a monitor with that name already exists."}],"count":1}
400 Client Error: Bad Request for url: https://synthetics.eu.newrelic.com/synthetics/api/v3/monitors

The bug was caused because it's not possible to fetch monitors for a specific policy using the New Relic REST API and the current implementation doesn't check past the first page of results.

By making use of using the newer GraphQL API which does allow filtering via a policy id we are able to bypass this limitation and fix the deployment issue.

Note that this PR also contains pyproject.toml for the black configuration. I then ran black on the project to make sure the formatting is consistent.

Supporting information

Testing instructions

Using current main branch

  • Deploy an instance on your local Grove install. Make sure that at least one of LMS_HOST, CMS_HOST or PREVIEW_LMS_HOST is defined for your instance (the actual value doesn't matter for this PR).
  • Add the New Relic settings to your private.yml from https://gitlab.com/opencraft/ops/grove-stage-digitalocean/-/settings/ci_cd
  • Run ./grove postdeploy [instance name]
  • You will see messages that your New Relic monitors were added:
    Enabling monitoring for minitest
    Adding an alert policy...
    No active alert policy found. Creating a new one...
    Retrieving email notification channels.
    Adding notification channel 327221 to policy.
    Adding notification channel 326947 to policy.
    Creating synthetic monitors if required.
    Adding monitor: https://as-ill-documentation-committee.trycloudflare.com/.
    Monitor https://as-ill-documentation-committee.trycloudflare.com/ added successfully.
    Adding monitor: https://as-ill-documentation-committee.trycloudflare.com/heartbeat?extended.
    Monitor https://as-ill-documentation-committee.trycloudflare.com/heartbeat?extended added successfully.
    Monitors created successfully.
  • Run ./grove postdeploy [instance name] again. This time you'll get an error in the pipeline.
  • Switch to the keith/fix-new-relic-monitor-pagination branch
  • Run the postdeploy command again.
  • You will now see that it skips adding monitors, because they've already been added.
    Enabling monitoring for minitest
    Adding an alert policy...
    Found existing alert policy {'id': 821114, 'incident_preference': 'PER_POLICY', 'name': 'minitest', 'created_at': 1661519735585, 'updated_at': 1661519739773}
    Retrieving email notification channels.
    Creating synthetic monitors if required.
    Skipping monitor https://as-ill-documentation-committee.trycloudflare.com/ as it is already added.
    Skipping monitor https://as-ill-documentation-committee.trycloudflare.com/heartbeat?extended as it is already added.

Checklist

If any of the items below is not applicable, do not remove them, but put a check in it.

  • All providers include the new feature/change
  • All affected providers can provision new clusters
  • Unit tests are added/updated
  • Documentation is added/updated
  • The TOOLS_CONTAINER_IMAGE_VERSION in ci_vars.yml is updated
  • The grove-template repository is updated
Edited by Keith Grootboom

Merge request reports