Prometheus server fails to get metrics from deployed environments

Summary

Metrics dashboard is showing the message No data found

Steps to reproduce

In the Metrics section inside of Operations

Conditions:

✅ Kubernetes set up correctly
✅ Pipeline green
✅ Runners set up correctly

What is the current bug behavior?

There are no metrics shown.

Possible fixes

This can happen for a variety of reasons, but the main one being that there are no PrometheusMetric in your database.

These can be re-added by running ::Gitlab::DatabaseImporters::CommonMetrics::Importer.new.execute in your rails console.

Tasks to complete

Add a rake task for ::Gitlab::DatabaseImporters::CommonMetrics::Importer.new.execute so that it is easier to execute
In the documentation, let it be known that the rake task above can be run if this issue occurs to developers, or a users GitLab instance.
Perform a short investigation into why this may occur and any further causes & preventative steps that could be taken.

Previous description:

The only way of getting it to work is to completely reinstall GDK.
These are the things that have been tried without success:

Reconnecting an existing cluster with a project

Recreating the project and connecting it to the existing cluster

Recreating both the cluster and the project and doing everything from scratch

Some clues 🔎 that may help:

This message appeared in the logs for metrics-server pod: unable to fully collect metrics: unable to fully scrape metrics from source

/additional_metrics.json returns a 200 response, but data is empty. e.g:

{"success":true,"data":[],"last_update":"2019-05-27T21:06:10:640Z"}

All responses are empty when hitting the Prometheus proxy API directly

More information if you want to dig deeper into this issue:

In the Rails console, run Deployment.find(<id>).additional_metrics and work the way inside the method. It may help narrow down

Said on Slack about the issue: "The most likely theory I think is that the reactive cache is still waiting for results so GitLab is returning nothing."

If you encounter this issue, please contact @tkuah

Edited Nov 19, 2019 by Sean Arnold