Added Prometheus Service and Prometheus graphs
What does this MR do?
Part of https://gitlab.com/gitlab-org/gitlab-ce/issues/26910
This MR adds a Prometheus Service, with support for fetching metrics for an environment and displaying that on environments page.
We miss actual data fetching, but this is a placeholder that can be used to be worked on.
Are there points in the code the reviewer needs to double check?
Why was this MR needed?
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
-
Changelog entry added -
Documentation created/updated -
API support added - Tests
-
Added for this feature/bug -
All builds are passing
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Branch has no merge conflicts with master
(if it does - rebase it please) -
Squashed related commits together
What are the relevant issue numbers?
Functionality Notes
Kamil: Please, edit queries and @jivanvl can add graphs to metrics action of environments_controller, as part of my MR: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/8935
Kamil: For development make sure to have sidekiq running as the data is fetched in the background, API can return the http status code 204 it means that the data is not yet ready, retry with some delay if that happens. In normal circumstances you will receive a 200 http status code with the JSON status: true
Josh: I spun up a prometheus server outside of the omnibus package, here: https://kubeprom.35.185.3.210.nip.io/graph. This one has the k8s metrics we want to use for this.
\cc @joshlambert @ayufan
Merge request reports
Activity
We need to add actual API query here: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/8935/diffs#b335d1d914057fc352d2a831094df64281e2ec82_0_47. One to test Prometheus service, second to fetch data.
This data will be returned to frontend, when frontend asks this endpoint:
metrics_namespace_project_environment_path(@project.namespace, @project, environment, format: :json)
. Since this is single endpoint, all metrics will be returned in single response.mentioned in issue #26910 (closed)
@Kamil This is amazing, thank you!!
I have an issue open for the Prometheus specific service, #27550 (closed), to have an issue trail. I've made some suggestions on the text, what do you think?
@bjk-gitlab can you help on the query exact? I have some underlying questions we need to get answered in the scrape config in omnibus-gitlab#1936 (closed).
Edited by Joshua LambertOkay, let's do 30s. We'll pick up every other scrape interval. @bjk-gitlab is that okay with you?
added 2 commits
This is how API looks:
{ "success": true, "metrics": { "memory_values": [ { "metric": { "__name__": "node_netstat_Ip_Forwarding", "instance": "localhost:9100", "job": "node" }, "values": [ [ 1486123050.874, "1" ], [ 1486123110.874, "1" ] ] } ], "memory_current": [ { "metric": { "__name__": "node_netstat_Ip_Forwarding", "instance": "localhost:9100", "job": "node" }, "value": [ 1486151851.481, "1" ] } ], "cpu_values": [ { "metric": { "__name__": "node_netstat_Ip_Forwarding", "instance": "localhost:9100", "job": "node" }, "values": [ [ 1486123052.087, "1" ], [ 1486123112.087, "1" ] ] } ], "cpu_current": [ { "metric": { "__name__": "node_netstat_Ip_Forwarding", "instance": "localhost:9100", "job": "node" }, "value": [ 1486151852.662, "1" ] } ] }, "last_update": "2017-02-03T19:57:33.231Z" }
Edited by Kamil TrzcińskiScrape interval only has a slight affect on how long we should cache queries. Say you have a scrape interval of 15s, and a 15 task instances for a project. Prometheus will attempt to spread the scrape of each instance over 15s for the job evenly. This means you will have an average of one new project sample per second.
Since there will be no invalidation pipeline, the cache should refelect user expected UI updates. I think for these graphs, 30s is a good starting point. These are not going to be highly detailed graphs to start.
@joshlambert @jivanvl Is anything happening with this MR?
@ayufan Are you closing this down in favor for another one? I'm making the graphs using this branch as a base, has something changed?
@jivanvl Hmm, no. I'm asking because I'm not seeing a mention of this MR. Probably we should ship the both of them in the same MR. Or move everything that is related to query to your MR, and leave only Prometheus Service here.
@ayufan I think it would be best to leave the query related stuff and the frontend in a separate MR so we don't have to review a gigantic MR
added 1766 commits
Toggle commit listadded 1 commit
- fda3a7b7 - Change Prometheus test ping to a functional query so it returns success.
- Resolved by Joshua Lambert