Added Prometheus Service and Prometheus graphs
What does this MR do?
Part of https://gitlab.com/gitlab-org/gitlab-ce/issues/26910
This MR adds a Prometheus Service, with support for fetching metrics for an environment and displaying that on environments page.
We miss actual data fetching, but this is a placeholder that can be used to be worked on.
Are there points in the code the reviewer needs to double check?
Why was this MR needed?
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
-
Changelog entry added -
Documentation created/updated -
API support added - Tests
-
Added for this feature/bug -
All builds are passing
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Branch has no merge conflicts with master
(if it does - rebase it please) -
Squashed related commits together
What are the relevant issue numbers?
Functionality Notes
Kamil: Please, edit queries and @jivanvl can add graphs to metrics action of environments_controller, as part of my MR: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/8935
Kamil: For development make sure to have sidekiq running as the data is fetched in the background, API can return the http status code 204 it means that the data is not yet ready, retry with some delay if that happens. In normal circumstances you will receive a 200 http status code with the JSON status: true
Josh: I spun up a prometheus server outside of the omnibus package, here: https://kubeprom.35.185.3.210.nip.io/graph. This one has the k8s metrics we want to use for this.
\cc @joshlambert @ayufan
Merge request reports
Activity
We need to add actual API query here: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/8935/diffs#b335d1d914057fc352d2a831094df64281e2ec82_0_47. One to test Prometheus service, second to fetch data.
This data will be returned to frontend, when frontend asks this endpoint:
metrics_namespace_project_environment_path(@project.namespace, @project, environment, format: :json)
. Since this is single endpoint, all metrics will be returned in single response.mentioned in issue #26910 (closed)
@Kamil This is amazing, thank you!!
I have an issue open for the Prometheus specific service, #27550 (closed), to have an issue trail. I've made some suggestions on the text, what do you think?
@bjk-gitlab can you help on the query exact? I have some underlying questions we need to get answered in the scrape config in omnibus-gitlab#1936 (closed).
Edited by Joshua LambertOkay, let's do 30s. We'll pick up every other scrape interval. @bjk-gitlab is that okay with you?
added 2 commits
This is how API looks:
{ "success": true, "metrics": { "memory_values": [ { "metric": { "__name__": "node_netstat_Ip_Forwarding", "instance": "localhost:9100", "job": "node" }, "values": [ [ 1486123050.874, "1" ], [ 1486123110.874, "1" ] ] } ], "memory_current": [ { "metric": { "__name__": "node_netstat_Ip_Forwarding", "instance": "localhost:9100", "job": "node" }, "value": [ 1486151851.481, "1" ] } ], "cpu_values": [ { "metric": { "__name__": "node_netstat_Ip_Forwarding", "instance": "localhost:9100", "job": "node" }, "values": [ [ 1486123052.087, "1" ], [ 1486123112.087, "1" ] ] } ], "cpu_current": [ { "metric": { "__name__": "node_netstat_Ip_Forwarding", "instance": "localhost:9100", "job": "node" }, "value": [ 1486151852.662, "1" ] } ] }, "last_update": "2017-02-03T19:57:33.231Z" }
Edited by Kamil TrzcińskiScrape interval only has a slight affect on how long we should cache queries. Say you have a scrape interval of 15s, and a 15 task instances for a project. Prometheus will attempt to spread the scrape of each instance over 15s for the job evenly. This means you will have an average of one new project sample per second.
Since there will be no invalidation pipeline, the cache should refelect user expected UI updates. I think for these graphs, 30s is a good starting point. These are not going to be highly detailed graphs to start.
@joshlambert @jivanvl Is anything happening with this MR?
@ayufan Are you closing this down in favor for another one? I'm making the graphs using this branch as a base, has something changed?
@jivanvl Hmm, no. I'm asking because I'm not seeing a mention of this MR. Probably we should ship the both of them in the same MR. Or move everything that is related to query to your MR, and leave only Prometheus Service here.
@ayufan I think it would be best to leave the query related stuff and the frontend in a separate MR so we don't have to review a gigantic MR
added 1766 commits
Toggle commit listadded 1 commit
- fda3a7b7 - Change Prometheus test ping to a functional query so it returns success.
- Resolved by Joshua Lambert
added 1 commit
- fb2447cb - Change query to be a simple scalar to reduce load and any chance for failure.
added 1 commit
- a9b4bcf8 - Update Prometheus queries for CPU and Memory.
- Resolved by Joshua Lambert
- Resolved by Joshua Lambert
added 660 commits
-
a947eebc...4aa66428 - 650 commits from branch
master
- 1703e926 - Add PrometheusService with API URL
- f6846dbc - Added metrics endpoint to EnvironmentsController
- 46b59386 - Added metrics views
- 5ef7bdc6 - Fetch monitoring data from Prometheus
- 96b5a9bb - Include last_update in response
- 5e1989b2 - Change Prometheus test ping to a functional query so it returns success.
- 8f6d8b74 - Change query to be a simple scalar to reduce load and any chance for failure.
- b412efe6 - Update Prometheus queries for CPU and Memory.
- f4d7fa16 - Adjust CPU metrics for 2min rate.
- 8cac0bb3 - Fix wording of API URL default text
Toggle commit list-
a947eebc...4aa66428 - 650 commits from branch
added 1 commit
- ff918058 - Add a controller test and tweak the view a bit
- Resolved by Rémy Coutable
added 7 commits
- 50ba999f - Created initial version of the graph
- 804e527a - Improvements to the design and code cleanup
- dc800eed - Code cleanup
- 4983a21c - Changed the getData method a $.ajax call
- 2e8807e5 - Added initial version of specs
- af17318b - Improved spec coverage on prometheus_graph.js
- f9390414 - Merge branch 'prometheus-graphs' into prometheus-monitoring
Toggle commit listadded 1 commit
- 69866fc6 - Removed the median from the current graphs, also, fixed tests
added 1 commit
- 9c893e54 - Update monitoring service comment text, and add Prometheus service help text.
assigned to @jschatz1
added 1 commit
- c95f1a15 - Update metrics button icon, clarify legend text and page title.
@joshlambert you have conflicts and it is a WIP so I cannot merge. Remove the WIP and fix the conflicts and assign to me.
assigned to @joshlambert
added 284 commits
-
44325b48...9f908cfc - 283 commits from branch
master
- 17296cb2 - Sync with Master, resolve merge conflicts, remove es6 extension.
-
44325b48...9f908cfc - 283 commits from branch
added 1 commit
- fe9e5103 - Resolve javascript es6 extension issues, indenting.
assigned to @jschatz1
@joshlambert LGTM. Would you like me to merge?
- Resolved by Kamil Trzciński
- Resolved by Rémy Coutable
- Resolved by Rémy Coutable
- Resolved by Rémy Coutable
added 255 commits
-
d9917e54...56814482 - 253 commits from branch
master
- a2f7a719 - Remove metrics expectations from the environment feature spec
- fda6a54a - Merge remote-tracking branch 'origin/master' into prometheus-monitoring
-
d9917e54...56814482 - 253 commits from branch
assigned to @rymai
added 1 commit
- 01a6c1b9 - Add PrometheusService and metrics page for environment
changed milestone to %9.0
added 1 commit
- c78e7fe3 - Add PrometheusService and metrics page for environment
- Resolved by Rémy Coutable
added 1 commit
- 9310bc99 - Add PrometheusService and metrics page for environment
The last commit should fix the V3 API failure. I've also opened an EE MR: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1374
added 1 commit
- 87d3cdcd - Add PrometheusService and metrics page for environment
mentioned in issue #26944 (closed)
mentioned in commit xhang/gitlab@4998f151
mentioned in issue #28717 (closed)
mentioned in issue #42251 (closed)
added devopsmonitor label
mentioned in merge request gitlab!84351 (merged)