Prometheus server fails to get metrics from deployed environments

Summary

Metrics dashboard is showing the message No data found

Screen_Shot_2019-06-26_at_6.19.10_PM

Steps to reproduce

In the Metrics section inside of Operations

Conditions:

Kubernetes set up correctly
Pipeline green
Runners set up correctly

What is the current bug behavior?

There are no metrics shown.

Possible fixes

The only way of getting it to work is to completely reinstall GDK.

These are the things that have been tried without success:

  • Reconnecting an existing cluster with a project
  • Recreating the project and connecting it to the existing cluster
  • Recreating both the cluster and the project and doing everything from scratch

Some clues 🔎 that may help:

  • This message appeared in the logs for metrics-server pod: unable to fully collect metrics: unable to fully scrape metrics from source
  • /additional_metrics.json returns a 200 response, but data is empty. e.g:
    {"success":true,"data":[],"last_update":"2019-05-27T21:06:10:640Z"}
  • All responses are empty when hitting the Prometheus proxy API directly

More information if you want to dig deeper into this issue:

  • In the Rails console, run Deployment.find(<id>).additional_metrics and work the way inside the method. It may help narrow down
  • Said on Slack about the issue: "The most likely theory I think is that the reactive cache is still waiting for results so GitLab is returning nothing."

If you encounter this issue, please contact @tkuah

Edited by Tristan Read