GitLab issueshttps://gitlab.com/gitlab-org/gitlab/-/issues2021-04-06T00:07:58Zhttps://gitlab.com/gitlab-org/gitlab/-/issues/326245Add project badges to operations dashboard2021-04-06T00:07:58ZJosh MillerAdd project badges to operations dashboard### Proposal
The Operations Dashboard should incorporate project badges. Badges are everything a dashboard could want: Helpful deep links with tiny graphical "status" colors for a variety of project-specific metrics.### Proposal
The Operations Dashboard should incorporate project badges. Badges are everything a dashboard could want: Helpful deep links with tiny graphical "status" colors for a variety of project-specific metrics.Backloghttps://gitlab.com/gitlab-org/gitlab/-/issues/301096Proposal: Add new metrics for pipelines and jobs and export metrics to OpenMe...2024-03-15T00:20:27ZBrie CarranzaProposal: Add new metrics for pipelines and jobs and export metrics to OpenMetrics format (ideally) or Prometheus<!-- This template is a great use for issues that are feature::additions or technical tasks for larger issues.-->
### Proposal
I am opening this feature proposal on behalf of a subscriber who would like to have a number of [new metric...<!-- This template is a great use for issues that are feature::additions or technical tasks for larger issues.-->
### Proposal
I am opening this feature proposal on behalf of a subscriber who would like to have a number of [new metrics exported by GitLab](https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html). It would be ideal to export these metrics [in OpenMetrics format](https://openmetrics.io/) but Prometheus would be acceptable.
Today, the requester is retrieving the requisite data via the GitLab API and converting it to Prometheus format and sending it to Prometheus. The goal of this feature proposal is to remove the requirement for an approach like this, which adds overhead.
The idea is that these metrics will be reported as time series data. These metrics fall into these two categories:
- Pipelines
- Jobs
Here's a list of the desired metrics, by category:
#### Pipelines
**Metric 1: total number of pipelines per project**
Total number of pipelines / project / timeseries: 1hr, 4hr, 8hr, 24hr -> specific interval is flexible.
- Include project ID, project name, branch/tag/ref/kind, reference name, pipeline status (one of: created, waiting_for_resource, preparing, pending, running, success, failed, canceled, skipped, manual, scheduled)
- Optionally: include all project info made available via the [project API](https://docs.gitlab.com/ee/api/projects.html#get-single-project) in timeseries format. This would be a collection of how that information looked at the point in time when the data was collected.
**Metric 2: average number of pipelines per project per time series**
Avg number of pipelines / project / timeseries: 1hr, 4hr, 8hr, 24hr -> specific interval is flexible.
- Include project ID, project name, branch/tag/ref/kind, reference name, pipeline status (one of: created, waiting_for_resource, preparing, pending, running, success, failed, canceled, skipped, manual, scheduled)
- Optionally: include all project info made available via the [project API](https://docs.gitlab.com/ee/api/projects.html#get-single-project) in timeseries format. This would be a collection of how that information looked at the point in time when the data was collected.
**Metric 3: average pipeline duration per project per timeseries**
Avg pipeline duration / project / timeseries: 1hr, 4hr, 8hr, 24hr -> specific interval is flexible.
- Include project ID, project name, branch/tag/ref/kind, reference name, pipeline status (one of: created, waiting_for_resource, preparing, pending, running, success, failed, canceled, skipped, manual, scheduled)
- Optionally: include all project info made available via the [project API](https://docs.gitlab.com/ee/api/projects.html#get-single-project) in timeseries format. This would be a collection of how that information looked at the point in time when the data was collected.
#### Jobs
**Metric 4: Total number of jobs per project**
Total number of jobs / project / timeseries: 1hr, 4hr, 8hr, 24hr -> specific interval is flexible.
- Include job status
- Include project ID, project name, branch/tag/ref/kind, reference name, pipeline status (one of: created, waiting_for_resource, preparing, pending, running, success, failed, canceled, skipped, manual, scheduled)
**Metric 5: Average job duration per project**
Avg job duration / project / timeseries: 1hr, 4hr, 8hr, 24hr -> specific interval is flexible.
- Include job status
- Include project ID, project name, branch/tag/ref/kind, reference name, status of pipeline that the job is running in (one of: created, waiting_for_resource, preparing, pending, running, success, failed, canceled, skipped, manual, scheduled)
<!-- Use this section to explain the feature and how it will work. It can be helpful to add technical details, design proposals, and links to related epics or issues. -->
<!-- Consider adding related issues and epics to this issue. You can also reference the Feature Proposal Template (https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/issue_templates/Feature%20proposal.md) for additional details to consider adding to this issue. Additionally, as a data oriented organization, when your feature exits planning breakdown, consider adding the `What does success look like, and how can we measure that?` section.
-->Backloghttps://gitlab.com/gitlab-org/gitlab/-/issues/299203Docs - product feedback: Health checks failing on Load balancer - whats the fix2021-01-20T10:11:59ZGaurav GawandeDocs - product feedback: Health checks failing on Load balancer - whats the fixneed more inputs on the load balancer setup, after performing all the steps for Gitlab install on AWS, the health checks keep failing, the document need to be update to use new load balancer or need to provide more updates on how to trou...need more inputs on the load balancer setup, after performing all the steps for Gitlab install on AWS, the health checks keep failing, the document need to be update to use new load balancer or need to provide more updates on how to troubleshoot health check failures on load balancerhttps://gitlab.com/gitlab-org/gitlab/-/issues/299102Fix "Add a to do" spelling on alerts2021-02-02T14:25:37ZMarcin Sedlak-JakubowskiFix "Add a to do" spelling on alertsChange the text on the "Add a To Do" button on alerts to "Add a to do", to follow the agreed forms: https://gitlab.com/gitlab-org/technical-writing/-/issues/252.
https://gitlab.com/gitlab-org/gitlab/-/merge_requests/42488/ fixes it for ...Change the text on the "Add a To Do" button on alerts to "Add a to do", to follow the agreed forms: https://gitlab.com/gitlab-org/technical-writing/-/issues/252.
https://gitlab.com/gitlab-org/gitlab/-/merge_requests/42488/ fixes it for MRs and issues.
## Possible files to modify
- `[file with the actual code]`
- https://gitlab.com/gitlab-org/gitlab/blob/master/spec/frontend/alert_management/components/alert_management_sidebar_todo_spec.js
- https://gitlab.com/gitlab-org/gitlab/blob/master/spec/features/alert_management/alert_details_spec.rb
- https://gitlab.com/gitlab-org/gitlab/blob/master/doc/operations/incident_management/alerts.md
## Related:
- https://gitlab.com/gitlab-org/gitlab/-/issues/230890
- https://gitlab.com/gitlab-org/gitlab/-/issues/263253
- https://gitlab.com/gitlab-org/technical-writing/-/issues/25213.9yoginth.lensyoginth@hey.comyoginth.lensyoginth@hey.comhttps://gitlab.com/gitlab-org/gitlab/-/issues/280564Operations menu goes to 404 when logged out2023-03-18T06:45:36ZMark PundsackOperations menu goes to 404 when logged out<!---
Please read this!
Before opening a new issue, make sure to search for keywords in the issues
filtered by the "regression" or "bug" label:
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression
- https://gitlab....<!---
Please read this!
Before opening a new issue, make sure to search for keywords in the issues
filtered by the "regression" or "bug" label:
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=bug
and verify the issue you're about to submit isn't a duplicate.
--->
### Summary
<!-- Summarize the bug encountered concisely. -->
When logged out, the Operations menu goes to `/-/metrics` and 404s.
### Steps to reproduce
<!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. -->
### Example Project
<!-- If possible, please create an example project here on GitLab.com that exhibits the problematic
behavior, and link to it here in the bug report. If you are using an older version of GitLab, this
will also determine whether the bug is fixed in a more recent version. -->
https://gitlab.com/markpundsack/docker-example
### What is the current *bug* behavior?
<!-- Describe what actually happens. -->
Operations goes to metrics page, which is not available when not logged in.
### What is the expected *correct* behavior?
<!-- Describe what you should see instead. -->
Operations should go to some page that *is* available when not logged in. or at least show something about Metrics only being available to project/group members.
### Relevant logs and/or screenshots
<!-- Paste any relevant logs - please use code blocks (```) to format console output, logs, and code
as it's tough to read otherwise. -->
### Output of checks
<!-- If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com -->
#### Results of GitLab environment info
<!-- Input any relevant GitLab environment information if needed. -->
<details>
<summary>Expand for output related to GitLab environment info</summary>
<pre>
(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)
(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
</pre>
</details>
#### Results of GitLab application Check
<!-- Input any relevant GitLab application check information if needed. -->
<details>
<summary>Expand for output related to the GitLab application check</summary>
<pre>
(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:check SANITIZE=true`)
(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`)
(we will only investigate if the tests are passing)
</pre>
</details>
### Proposal
<!-- If you can, link to the line of code that might be responsible for the problem. -->
1. Change default visibility setting for Operations from: Everyone with access --> Only project members
1. When someone has the visibility set to `only project members` in place:
- Non-project members would not see Operations at all.
- Project members would see the Operations nav item. For permissions level Developers and up, clicking on Operations would display the metrics page. But, Guests and Reporters would instead see the Incidents page when clicking on Operations. Guests would not have the "create new incident" button on the Incident page.
3. When someone changes the visibility to `everyone with access`:
- For project members: Operations would link to the Metrics page for Developers and up. Operations would link to incidents for Guests and Reporters. Guests would not see the "Create an incident" button, as only Reporters can create an Incident.
- For non-project members - Clicking Operations would show the Incident page (the only page available in the section, other than Environments) but the "Create an incident" button would be hidden (as we are limiting the creation of incidents to project reporters only)Backloghttps://gitlab.com/gitlab-org/gitlab/-/issues/267758Validate existing monitor metrics2022-01-27T16:29:26ZSean ArnoldValidate existing monitor metrics
The metrics in `usage_data.rb` ([CE](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb), [EE](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/ee/gitlab/usage_data.rb)) often use a time period query, w...
The metrics in `usage_data.rb` ([CE](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb), [EE](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/ee/gitlab/usage_data.rb)) often use a time period query, with the intention that we can see the value of those metric queries between a given time period.
For example, my interpretation of intention of `distinct_count(::Project.with_enabled_error_tracking.where(time_period), :creator_id)` is: "Count the distinct projects, where the error tracking setting was enabled between the time period".
However, when this runs the SQL query actually looks like:
```sql
SELECT COUNT(DISTINCT "projects"."creator_id")
FROM "projects"
INNER JOIN "project_error_tracking_settings" ON "project_error_tracking_settings"."project_id" = "projects"."id"
WHERE "project_error_tracking_settings"."enabled" = true
AND "projects"."created_at" BETWEEN "2020-09-14 19:59:15.179470"
AND "2020-10-12 19:59:15.179636"
AND "projects"."creator_id" BETWEEN 0
AND 100000
```
The important part is `AND "projects"."created_at" BETWEEN "2020-09-14 19:59:15.179470"
AND "2020-10-12 19:59:15.179636"`.
We are actually scoping by when the _Project was created_, not when the error tracking setting was added or enabled.
This is the case for a lot of our metrics, including:
- `projects_with_tracing_enabled: distinct_count(::Project.with_tracing_enabled.where(time_period), :creator_id)`
- `projects_prometheus_active: distinct_count(::Project.with_active_prometheus_service.where(time_period), :creator_id)`
- `projects_with_error_tracking_enabled: distinct_count(::Project.with_enabled_error_tracking.where(time_period), :creator_id)`Sean ArnoldSean Arnoldhttps://gitlab.com/gitlab-org/gitlab/-/issues/231497Prometheus alerts delivered from alertmanager into GitLab issues are silently...2021-10-27T10:40:31ZAndrew Newdigateandrew@gitlab.comPrometheus alerts delivered from alertmanager into GitLab issues are silently being droppedSee https://gitlab.com/gitlab-com/runbooks/-/merge_requests/2592 and https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2451 for more details.
GitLab.com's AlertManager infrastructure delivers some alerts to GitLab.com issues, b...See https://gitlab.com/gitlab-com/runbooks/-/merge_requests/2592 and https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2451 for more details.
GitLab.com's AlertManager infrastructure delivers some alerts to GitLab.com issues, but these alerts are being silently dropped.
On `Jul 23, 2020 @ 00:40:07.791`, AlertManager delivered a webhook alert to GitLab.com:
Log entry (while it lasts) https://log.gprd.gitlab.net/app/kibana#/discover/doc/AW5F1e45qthdGjPJueGO/pubsub-rails-inf-gprd-003224?id=lWQceXMBOELd9C8V9tGa
The server responded with a 200.
The following params were delivered to GitLab.com:
```
{
"key": "receiver",
"value": "issue:gitlab\\.com/gitlab-com/gl-infra/production"
},
{
"key": "status",
"value": "firing"
},
{
"key": "alerts",
"value": "[{\"status\"=>\"firing\", \"labels\"=>{\"alert_type\"=>\"cause\", \"alertname\"=>\"SSLCertExpiresSoon\", \"env\"=>\"gprd\", \"environment\"=>\"gprd\", \"instance\"=>\"https://status.gitlab.com\", \"job\"=>\"blackbox\", \"monitor\"=>\"default\", \"pager\"=>\"issue\", \"project\"=>\"gitlab.com/gitlab-com/gl-infra/production\", \"provider\"=>\"gcp\", \"region\"=>\"us-east\", \"severity\"=>\"s2\", \"shard\"=>\"default\", \"stage\"=>\"main\", \"tier\"=>\"sv\", \"type\"=>\"blackbox\"}, \"annotations\"=>{\"description\"=>\"[FILTERED]\", \"runbook\"=>\"docs/frontend/ssl_cert.md\", \"title\"=>\"[FILTERED]\"}, \"startsAt\"=>\"2020-07-23T00:30:00.587237764Z\", \"endsAt\"=>\"0001-01-01T00:00:00Z\", \"generatorURL\"=>\"https://prometheus.gprd.gitlab.net/graph?g0.expr=probe_ssl_earliest_cert_expiry%7Bjob%3D%22blackbox%22%7D+-+time%28%29+%3C+14+%2A+86400&g0.tab=1\", \"fingerprint\"=>\"1f00c90951546e3b\"}]"
},
{
"key": "groupLabels",
"value": "{\"alertname\"=>\"SSLCertExpiresSoon\", \"env\"=>\"gprd\", \"stage\"=>\"main\", \"tier\"=>\"sv\", \"type\"=>\"blackbox\"}"
},
{
"key": "commonLabels",
"value": "{\"alert_type\"=>\"cause\", \"alertname\"=>\"SSLCertExpiresSoon\", \"env\"=>\"gprd\", \"environment\"=>\"gprd\", \"instance\"=>\"https://status.gitlab.com\", \"job\"=>\"blackbox\", \"monitor\"=>\"default\", \"pager\"=>\"issue\", \"project\"=>\"gitlab.com/gitlab-com/gl-infra/production\", \"provider\"=>\"gcp\", \"region\"=>\"us-east\", \"severity\"=>\"s2\", \"shard\"=>\"default\", \"stage\"=>\"main\", \"tier\"=>\"sv\", \"type\"=>\"blackbox\"}"
},
{
"key": "commonAnnotations",
"value": "{\"description\"=>\"[FILTERED]\", \"runbook\"=>\"docs/frontend/ssl_cert.md\", \"title\"=>\"[FILTERED]\"}"
},
{
"key": "externalURL",
"value": "http://alerts-01-inf-ops:9093"
},
{
"key": "version",
"value": "4"
},
{
"key": "groupKey",
"value": "{}/{env=\"gprd\",pager=\"issue\",project=\"gitlab.com/gitlab-com/gl-infra/production\"}:{alertname=\"SSLCertExpiresSoon\", env=\"gprd\", stage=\"main\", tier=\"sv\", type=\"blackbox\"}"
},
{
"key": "namespace_id",
"value": "gitlab-com/gl-infra"
},
{
"key": "project_id",
"value": "production"
},
{
"key": "alert",
"value": "{\"receiver\"=>\"issue:gitlab\\\\.com/gitlab-com/gl-infra/production\", \"status\"=>\"firing\", \"alerts\"=>[{\"status\"=>\"firing\", \"labels\"=>{\"alert_type\"=>\"cause\", \"alertname\"=>\"SSLCertExpiresSoon\", \"env\"=>\"gprd\", \"environment\"=>\"gprd\", \"instance\"=>\"https://status.gitlab.com\", \"job\"=>\"blackbox\", \"monitor\"=>\"default\", \"pager\"=>\"issue\", \"project\"=>\"gitlab.com/gitlab-com/gl-infra/production\", \"provider\"=>\"gcp\", \"region\"=>\"us-east\", \"severity\"=>\"s2\", \"shard\"=>\"default\", \"stage\"=>\"main\", \"tier\"=>\"sv\", \"type\"=>\"blackbox\"}, \"annotations\"=>{\"description\"=>\"[FILTERED]\", \"runbook\"=>\"docs/frontend/ssl_cert.md\", \"title\"=>\"[FILTERED]\"}, \"startsAt\"=>\"2020-07-23T00:30:00.587237764Z\", \"endsAt\"=>\"0001-01-01T00:00:00Z\", \"generatorURL\"=>\"https://prometheus.gprd.gitlab.net/graph?g0.expr=probe_ssl_earliest_cert_expiry%7Bjob%3D%22blackbox%22%7D+-+time%28%29+%3C+14+%2A+86400&g0.tab=1\", \"fingerprint\"=>\"1f00c90951546e3b\"}], \"groupLabels\"=>{\"alertname\"=>\"SSLCertExpiresSoon\", \"env\"=>\"gprd\", \"stage\"=>\"main\", \"tier\"=>\"sv\", \"type\"=>\"blackbox\"}, \"commonLabels\"=>{\"alert_type\"=>\"cause\", \"alertname\"=>\"SSLCertExpiresSoon\", \"env\"=>\"gprd\", \"environment\"=>\"gprd\", \"instance\"=>\"https://status.gitlab.com\", \"job\"=>\"blackbox\", \"monitor\"=>\"default\", \"pager\"=>\"issue\", \"project\"=>\"gitlab.com/gitlab-com/gl-infra/production\", \"provider\"=>\"gcp\", \"region\"=>\"us-east\", \"severity\"=>\"s2\", \"shard\"=>\"default\", \"stage\"=>\"main\", \"tier\"=>\"sv\", \"type\"=>\"blackbox\"}, \"commonAnnotations\"=>{\"description\"=>\"[FILTERED]\", \"runbook\"=>\"docs/frontend/ssl_cert.md\", \"title\"=>\"[FILTERED]\"}, \"externalURL\"=>\"http://alerts-01-inf-ops:9093\", \"version\"=>\"4\", \"groupKey\"=>\"{}/{env=\\\"gprd\\\",pager=\\\"issue\\\",project=\\\"gitlab.com/gitlab-com/gl-infra/production\\\"}:{alertname=\\\"SSLCertExpiresSoon\\\", env=\\\"gprd\\\", stage=\\\"main\\\", tier=\\\"sv\\\", type=\\\"blackbox\\\"}\"}"
}
```
The webhook was successfully delivered, but did not create an issue.
The alertmanager configuration is as follows:
```
- name: issue:gitlab.com/gitlab-com/gl-infra/production
webhook_configs:
- http_config:
bearer_token: SECRET
send_resolved: true
url: https://gitlab.com/gitlab-com/gl-infra/production/prometheus/alerts/notify.json
```
In the case of the GitLab.com alert that was lost, we could have easily missed a SSL certificate renewal alert had it not been noticed through other means. It is critical for the availability of GitLab.com that our alerting infrastructure works as expected.
Therefore I'm marking this as ~P2 ~S2
cc @crystalpoole @sarahwaldner @bjk-gitlab13.3Andrew Newdigateandrew@gitlab.comAndrew Newdigateandrew@gitlab.comhttps://gitlab.com/gitlab-org/gitlab/-/issues/229973Remove the Epic feature from the side bar on Incidents2020-09-18T07:12:02ZSarah WaldnerRemove the Epic feature from the side bar on Incidents## Overview
Incidents have been established as a type of Issue. They next step is to remove features from Incidents that are not relevant to responding to and remediating an IT service outage in a fire-fight.
### Scope
- Remove the Epi...## Overview
Incidents have been established as a type of Issue. They next step is to remove features from Incidents that are not relevant to responding to and remediating an IT service outage in a fire-fight.
### Scope
- Remove the Epic feature from the side bar on Incidents (https://gitlab.com/gitlab-org/gitlab/-/merge_requests/40501)
- Prevent incidents from being linked to epics (https://gitlab.com/gitlab-org/gitlab/-/merge_requests/40501)
### Follow-up Issues:
- Hide related quick actions https://gitlab.com/gitlab-org/gitlab/-/merge_requests/40509 (#244937)
- Prevent incidents from being linked to epics in "New issue" form (#244938)
- Filter incidents from "Epics and Issues" autocomplete input field (#244939)13.4Peter Leitzenpleitzen@gitlab.comDavid O'ReganPeter Leitzenpleitzen@gitlab.comhttps://gitlab.com/gitlab-org/gitlab/-/issues/208900Check license for Status Page operations settings2020-03-09T15:02:43ZPeter Leitzenpleitzen@gitlab.comCheck license for Status Page operations settingsThe following discussion from !25863 should be addressed:
- [ ] @splattael started a [discussion](c):
> @seanarnold @engwan As Status Page is ~"GitLab Ultimate" we need to if the feature is available. Similar to https://gitlab.com...The following discussion from !25863 should be addressed:
- [ ] @splattael started a [discussion](c):
> @seanarnold @engwan As Status Page is ~"GitLab Ultimate" we need to if the feature is available. Similar to https://gitlab.com/gitlab-org/gitlab/-/merge_requests/25863/diffs#60551febe7c92dcd2a178fd08c8f7d4170207fb9_40_40
>
> I suggest to name it :drum: `:status_page` :nerd:12.9Sean ArnoldSean Arnoldhttps://gitlab.com/gitlab-org/gitlab/-/issues/55260cache health check returns HTTP 5002021-08-23T03:04:32ZBram Daamscache health check returns HTTP 500We experience cache health check issues with GitLab (omnibus) since upgrading to gitlab-cd_12.5.3.
We run two GitLab servers, they where both updated from gitlab-ce_12.3.4 to gitlab-cd_12.5.3 on December 6th. Since that time, the healt...We experience cache health check issues with GitLab (omnibus) since upgrading to gitlab-cd_12.5.3.
We run two GitLab servers, they where both updated from gitlab-ce_12.3.4 to gitlab-cd_12.5.3 on December 6th. Since that time, the health check page returns 500. Atm we run 12.5.4 with the same issue.
The screenshots below show size of the health-check output. The larger size is when the check returns a 500 error page, which is larger in size than just the bytes "success"
(The weekly spikes are the 500 pages returned just during the startup phase of gitlab. These are scheduled restarts.)
![image](/uploads/e096951656b60ae164b1462f3025a7ab/image.png)
The check has been running quite stable since we run it...
![image](/uploads/e4b2c9dfd2f6aa5b8366d468c66583c7/image.png)Backloghttps://gitlab.com/gitlab-org/gitlab/-/issues/34891Allow alerts to connect to other observability artifacts via a link formatter2020-08-14T11:05:27ZKenny Johnstonkencjohnston@gitlab.comAllow alerts to connect to other observability artifacts via a link formatter### Problem to solve
When passing relevant information from various observability sources (logs, metrics, errors, traces) in an alert, I have to go manually search for those other relevant sources in order to view and interact with them...### Problem to solve
When passing relevant information from various observability sources (logs, metrics, errors, traces) in an alert, I have to go manually search for those other relevant sources in order to view and interact with them.
### Intended users
* [Devon (DevOps Engineer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#devon-devops-engineer)
Personas are described at https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/ -->
### Further details
<!-- Include use cases, benefits, and/or goals (contributes to our vision?) -->
### Proposal
Include ability to easily interpret common IDs from other observability tools when passed in alerts and generate links directly to their appropriate systems when creating incidents.
### Permissions and Security
<!-- What permissions are required to perform the described actions? Are they consistent with the existing permissions as documented for users, groups, and projects as appropriate? Is the proposed behavior consistent between the UI, API, and other access methods (e.g. email replies)?-->
### Documentation
<!-- See the Feature Change Documentation Workflow https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html
Add all known Documentation Requirements here, per https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html#documentation-requirements
If this feature requires changing permissions, this document https://docs.gitlab.com/ee/user/permissions.html must be updated accordingly. -->
### Testing
<!-- What risks does this change pose? How might it affect the quality of the product? What additional test coverage or changes to tests will be needed? Will it require cross-browser testing? See the test engineering process for further help: https://about.gitlab.com/handbook/engineering/quality/test-engineering/ -->
### What does success look like, and how can we measure that?
<!-- Define both the success metrics and acceptance criteria. Note that success metrics indicate the desired business outcomes, while acceptance criteria indicate when the solution is working correctly. If there is no way to measure success, link to an issue that will implement a way to measure this. -->
### What is the type of buyer?
<!-- Which leads to: in which enterprise tier should this feature go? See https://about.gitlab.com/handbook/product/pricing/#four-tiers -->
### Links / referencesBackloghttps://gitlab.com/gitlab-org/gitlab/-/issues/30768Embed specific metrics chart in issue - follow-up2019-11-18T17:20:48ZSarah YasonikEmbed specific metrics chart in issue - follow-up### Summary
https://gitlab.com/gitlab-org/gitlab-ce/issues/62971 adds support for embedding a particular metric panel in any markdown field. There are a few follow-up items that came out of that.
- [x] refactor `CustomMetricEmbedServi...### Summary
https://gitlab.com/gitlab-org/gitlab-ce/issues/62971 adds support for embedding a particular metric panel in any markdown field. There are a few follow-up items that came out of that.
- [x] refactor `CustomMetricEmbedService` to use a new `PrometheusMetricFinder` class12.5Sarah YasonikSarah Yasonikhttps://gitlab.com/gitlab-org/gitlab/-/issues/12086Too many SQL queries were executed in Admin::ApplicationSettingsController#us...2019-08-12T14:59:04ZPeter Leitzenpleitzen@gitlab.comToo many SQL queries were executed in Admin::ApplicationSettingsController#usage_dataTriggered by `rspec ./spec/features/admin/admin_settings_spec.rb:324`
Example https://gitlab.com/gitlab-org/gitlab-ee/-/jobs/229121980.
```
Failures:
1) Admin updates settings Metrics and profiling page loads usage ping payload on c...Triggered by `rspec ./spec/features/admin/admin_settings_spec.rb:324`
Example https://gitlab.com/gitlab-org/gitlab-ee/-/jobs/229121980.
```
Failures:
1) Admin updates settings Metrics and profiling page loads usage ping payload on click
Got 0 failures and 2 other errors:
1.1) Failure/Error: raise(error) if raise_error?
Gitlab::QueryLimiting::Transaction::ThresholdExceededError:
Too many SQL queries were executed in Admin::ApplicationSettingsController#usage_data: a maximum of 100 is allowed but 102 SQL queries were executed
# ./lib/gitlab/query_limiting/transaction.rb:56:in `act_upon_results'
# ./lib/gitlab/query_limiting/middleware.rb:21:in `call'
# ./ee/lib/gitlab/jira/middleware.rb:17:in `call'
# ./lib/gitlab/middleware/go.rb:20:in `call'
# ./lib/gitlab/etag_caching/middleware.rb:13:in `call'
# ./lib/gitlab/middleware/correlation_id.rb:16:in `block in call'
# ./vendor/ruby/2.6.0/gems/gitlab-labkit-0.2.0/lib/labkit/correlation/correlation_id.rb:18:in `use_id'
# ./lib/gitlab/middleware/correlation_id.rb:15:in `call'
# ./vendor/ruby/2.6.0/gems/batch-loader-1.4.0/lib/batch_loader/middleware.rb:11:in `call'
# ./vendor/ruby/2.6.0/gems/apollo_upload_server-2.0.0.beta.3/lib/apollo_upload_server/middleware.rb:20:in `call'
# ./vendor/ruby/2.6.0/gems/rack-attack-4.4.1/lib/rack/attack.rb:107:in `call'
# ./vendor/ruby/2.6.0/gems/warden-1.2.7/lib/warden/manager.rb:36:in `block in call'
# ./vendor/ruby/2.6.0/gems/warden-1.2.7/lib/warden/manager.rb:35:in `catch'
# ./vendor/ruby/2.6.0/gems/warden-1.2.7/lib/warden/manager.rb:35:in `call'
# ./vendor/ruby/2.6.0/gems/rack-cors-1.0.2/lib/rack/cors.rb:97:in `call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/etag.rb:25:in `call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/conditional_get.rb:25:in `call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/head.rb:12:in `call'
# ./lib/gitlab/middleware/read_only/controller.rb:42:in `call'
# ./lib/gitlab/middleware/read_only.rb:18:in `call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:232:in `context'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:226:in `call'
# ./vendor/ruby/2.6.0/gems/actionpack-5.1.7/lib/action_dispatch/middleware/cookies.rb:613:in `call'
# ./vendor/ruby/2.6.0/gems/actionpack-5.1.7/lib/action_dispatch/middleware/callbacks.rb:26:in `block in call'
# ./vendor/ruby/2.6.0/gems/activesupport-5.1.7/lib/active_support/callbacks.rb:97:in `run_callbacks'
# ./vendor/ruby/2.6.0/gems/actionpack-5.1.7/lib/action_dispatch/middleware/callbacks.rb:24:in `call'
# ./vendor/ruby/2.6.0/gems/actionpack-5.1.7/lib/action_dispatch/middleware/debug_exceptions.rb:59:in `call'
# ./vendor/ruby/2.6.0/gems/actionpack-5.1.7/lib/action_dispatch/middleware/show_exceptions.rb:31:in `call'
# ./lib/gitlab/middleware/basic_health_check.rb:25:in `call'
# ./vendor/ruby/2.6.0/gems/railties-5.1.7/lib/rails/rack/logger.rb:36:in `call_app'
# ./vendor/ruby/2.6.0/gems/railties-5.1.7/lib/rails/rack/logger.rb:24:in `block in call'
# ./vendor/ruby/2.6.0/gems/activesupport-5.1.7/lib/active_support/tagged_logging.rb:69:in `block in tagged'
# ./vendor/ruby/2.6.0/gems/activesupport-5.1.7/lib/active_support/tagged_logging.rb:26:in `tagged'
# ./vendor/ruby/2.6.0/gems/activesupport-5.1.7/lib/active_support/tagged_logging.rb:69:in `tagged'
# ./vendor/ruby/2.6.0/gems/railties-5.1.7/lib/rails/rack/logger.rb:24:in `call'
# ./vendor/ruby/2.6.0/gems/actionpack-5.1.7/lib/action_dispatch/middleware/remote_ip.rb:79:in `call'
# ./lib/gitlab/request_context.rb:26:in `call'
# ./vendor/ruby/2.6.0/gems/request_store-1.3.1/lib/request_store/middleware.rb:9:in `call'
# ./vendor/ruby/2.6.0/gems/actionpack-5.1.7/lib/action_dispatch/middleware/request_id.rb:25:in `call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/method_override.rb:22:in `call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/runtime.rb:22:in `call'
# ./config/initializers/fix_local_cache_middleware.rb:9:in `call'
# ./vendor/ruby/2.6.0/gems/actionpack-5.1.7/lib/action_dispatch/middleware/executor.rb:12:in `call'
# ./vendor/ruby/2.6.0/gems/actionpack-5.1.7/lib/action_dispatch/middleware/static.rb:125:in `call'
# ./lib/gitlab/middleware/static.rb:11:in `call'
# ./lib/gitlab/testing/request_inspector_middleware.rb:33:in `call'
# ./lib/gitlab/testing/request_blocker_middleware.rb:47:in `call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/sendfile.rb:111:in `call'
# ./lib/gitlab/metrics/requests_rack_middleware.rb:29:in `call'
# ./vendor/ruby/2.6.0/gems/sentry-raven-2.9.0/lib/raven/integrations/rack.rb:51:in `call'
# ./vendor/ruby/2.6.0/gems/railties-5.1.7/lib/rails/engine.rb:522:in `call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/urlmap.rb:68:in `block in call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/urlmap.rb:53:in `each'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/urlmap.rb:53:in `call'
# ./vendor/ruby/2.6.0/gems/capybara-2.18.0/lib/capybara/server.rb:44:in `call'
# ./vendor/ruby/2.6.0/gems/rack-2.0.7/lib/rack/handler/webrick.rb:86:in `service'
# /usr/local/lib/ruby/2.6.0/webrick/httpserver.rb:140:in `service'
# /usr/local/lib/ruby/2.6.0/webrick/httpserver.rb:96:in `run'
# /usr/local/lib/ruby/2.6.0/webrick/server.rb:307:in `block in start_thread'
# ------------------
# --- Caused by: ---
# Capybara::ExpectationNotMet:
# expected to find visible css ".js-usage-ping-payload" but there were no matches. Also found "", which matched the selector but not all filters.
# ./vendor/ruby/2.6.0/gems/capybara-2.18.0/lib/capybara/node/matchers.rb:95:in `block in assert_selector'
1.2) Failure/Error: raise JSConsoleError, message
JSConsoleError:
Unexpected browser console output:
http://127.0.0.1:38425/admin/application_settings/usage_data.html - Failed to load resource: the server responded with a status of 500 (Internal Server Error)
# ./spec/support/capybara.rb:100:in `block (2 levels) in <top (required)>'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:447:in `instance_exec'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:447:in `instance_exec'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:357:in `run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:509:in `block in run_owned_hooks_for'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:508:in `each'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:508:in `run_owned_hooks_for'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:595:in `block in run_example_hooks_for'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:594:in `each'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:594:in `run_example_hooks_for'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:465:in `run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:507:in `run_after_example'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:273:in `block in run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:500:in `block in with_around_and_singleton_context_hooks'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:457:in `block in with_around_example_hooks'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:466:in `block in run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:606:in `block in run_around_example_hooks_for'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:342:in `call'
# ./vendor/ruby/2.6.0/gems/rspec-rails-3.7.2/lib/rspec/rails/adapters.rb:127:in `block (2 levels) in <module:MinitestLifecycleAdapter>'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:447:in `instance_exec'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:447:in `instance_exec'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:375:in `execute_with'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:608:in `block (2 levels) in run_around_example_hooks_for'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:342:in `call'
# ./vendor/ruby/2.6.0/gems/rspec-retry-0.6.1/lib/rspec/retry.rb:123:in `block in run'
# ./vendor/ruby/2.6.0/gems/rspec-retry-0.6.1/lib/rspec/retry.rb:110:in `loop'
# ./vendor/ruby/2.6.0/gems/rspec-retry-0.6.1/lib/rspec/retry.rb:110:in `run'
# ./vendor/ruby/2.6.0/gems/rspec-retry-0.6.1/lib/rspec_ext/rspec_ext.rb:12:in `run_with_retry'
# ./vendor/ruby/2.6.0/gems/rspec-retry-0.6.1/lib/rspec/retry.rb:37:in `block (2 levels) in setup'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:447:in `instance_exec'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:447:in `instance_exec'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:375:in `execute_with'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:608:in `block (2 levels) in run_around_example_hooks_for'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:342:in `call'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:609:in `run_around_example_hooks_for'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/hooks.rb:466:in `run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:457:in `with_around_example_hooks'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:500:in `with_around_and_singleton_context_hooks'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example.rb:251:in `run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example_group.rb:628:in `block in run_examples'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example_group.rb:624:in `map'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example_group.rb:624:in `run_examples'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example_group.rb:590:in `run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example_group.rb:591:in `block in run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example_group.rb:591:in `map'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/example_group.rb:591:in `run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/runner.rb:118:in `block (3 levels) in run_specs'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/runner.rb:118:in `map'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/runner.rb:118:in `block (2 levels) in run_specs'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/configuration.rb:1926:in `with_suite_hooks'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/runner.rb:113:in `block in run_specs'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/reporter.rb:79:in `report'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/runner.rb:112:in `run_specs'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/runner.rb:87:in `run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/runner.rb:71:in `run'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/lib/rspec/core/runner.rb:45:in `invoke'
# ./vendor/ruby/2.6.0/gems/rspec-core-3.7.1/exe/rspec:4:in `<top (required)>'
# ./vendor/ruby/2.6.0/bin/rspec:23:in `load'
# ./vendor/ruby/2.6.0/bin/rspec:23:in `<top (required)>'
# /usr/local/lib/ruby/2.6.0/bundler/cli/exec.rb:74:in `load'
# /usr/local/lib/ruby/2.6.0/bundler/cli/exec.rb:74:in `kernel_load'
# /usr/local/lib/ruby/2.6.0/bundler/cli/exec.rb:28:in `run'
# /usr/local/lib/ruby/2.6.0/bundler/cli.rb:463:in `exec'
# /usr/local/lib/ruby/2.6.0/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
# /usr/local/lib/ruby/2.6.0/bundler/vendor/thor/lib/thor/invocation.rb:126:in `invoke_command'
# /usr/local/lib/ruby/2.6.0/bundler/vendor/thor/lib/thor.rb:387:in `dispatch'
# /usr/local/lib/ruby/2.6.0/bundler/cli.rb:27:in `dispatch'
# /usr/local/lib/ruby/2.6.0/bundler/vendor/thor/lib/thor/base.rb:466:in `start'
# /usr/local/lib/ruby/2.6.0/bundler/cli.rb:18:in `start'
# /usr/local/lib/ruby/gems/2.6.0/gems/bundler-1.17.2/exe/bundle:30:in `block in <top (required)>'
# /usr/local/lib/ruby/2.6.0/bundler/friendly_errors.rb:124:in `with_friendly_errors'
# /usr/local/lib/ruby/gems/2.6.0/gems/bundler-1.17.2/exe/bundle:22:in `<top (required)>'
# /usr/local/bin/bundle:23:in `load'
# /usr/local/bin/bundle:23:in `<main>'
Finished in 8 minutes 55 seconds (files took 16.33 seconds to load)
158 examples, 1 failure, 3 pending
Failed examples:
rspec ./spec/features/admin/admin_settings_spec.rb:324 # Admin updates settings Metrics and profiling page loads usage ping payload on click
```12.1https://gitlab.com/gitlab-org/gitlab/-/issues/11753Cluster health memory usage graph goes from horizontal to vertical line2023-12-18T05:34:23ZClement HoCluster health memory usage graph goes from horizontal to vertical line![2019-05-24_13.37.40](/uploads/379615fc4799a4078ee8dca29a38ece5/2019-05-24_13.37.40.gif)![2019-05-24_13.37.40](/uploads/379615fc4799a4078ee8dca29a38ece5/2019-05-24_13.37.40.gif)Next 4-6 releaseshttps://gitlab.com/gitlab-org/gitlab/-/issues/10891Usage ping for Incident Management2019-08-12T15:03:02ZJoshua LambertUsage ping for Incident Management### Problem to solve
We should add telemetry for incident management, so we can know how many users are using it.
### Intended users
<!-- Who will use this feature? If known, include any of the following: types of users (e.g. Develop...### Problem to solve
We should add telemetry for incident management, so we can know how many users are using it.
### Intended users
<!-- Who will use this feature? If known, include any of the following: types of users (e.g. Developer), personas, or specific company roles (e.g. Release Manager). It's okay to write "Unknown" and fill this field in later.
Personas can be found at https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/ -->
### Further details
<!-- Include use cases, benefits, and/or goals (contributes to our vision?) -->
### Proposal
We should track usage. We can do this by tracking how many issues are created from alerts, or potentially another solution.
### Permissions and Security
<!-- What permissions are required to perform the described actions? Are they consistent with the existing permissions as documented for users, groups, and projects as appropriate? Is the proposed behavior consistent between the UI, API, and other access methods (e.g. email replies)? -->
### Documentation
<!-- See the Feature Change Documentation Workflow https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html
Add all known Documentation Requirements here, per https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html#documentation-requirements -->
### What does success look like, and how can we measure that?
<!-- Define both the success metrics and acceptance criteria. Note that success metrics indicate the desired business outcomes, while acceptance criteria indicate when the solution is working correctly. If there is no way to measure success, link to an issue that will implement a way to measure this. -->
### What is the type of buyer?
<!-- Which leads to: in which enterprise tier should this feature go? See https://about.gitlab.com/handbook/product/pricing/#four-tiers -->
### Links / references12.0Peter Leitzenpleitzen@gitlab.comPeter Leitzenpleitzen@gitlab.comhttps://gitlab.com/gitlab-org/gitlab/-/issues/9764Operations icon is shown twice on small viewport2019-08-12T15:08:44ZGhost UserOperations icon is shown twice on small viewport### Summary
The operations icon is shon twice on a mobile viewport. It´s shown in the menu just like on bigger viewports but also in the "More" menu.
### Steps to reproduce
Open GitLab.com on your mobile device and have a look at the ...### Summary
The operations icon is shon twice on a mobile viewport. It´s shown in the menu just like on bigger viewports but also in the "More" menu.
### Steps to reproduce
Open GitLab.com on your mobile device and have a look at the "More" menu.
### Relevant logs and/or screenshots
![Screenshot_20190123-095502](/uploads/5a58b2e315a2dff1ecdeca770cc777b4/Screenshot_20190123-095502.jpg)12.0Tristan ReadDhiraj BodicherlaTristan Readhttps://gitlab.com/gitlab-org/gitlab/-/issues/7821Design track post-mortem issues from Incidents2019-08-16T23:05:08ZJoshua LambertDesign track post-mortem issues from IncidentsWith the introduction of Incident Management, we can now create and link issues directly to an incident within GitLab.
We plan to include a post-mortem section of the Incident, which offers a location to create and link issues that need...With the introduction of Incident Management, we can now create and link issues directly to an incident within GitLab.
We plan to include a post-mortem section of the Incident, which offers a location to create and link issues that need follow up. Typically these issues would be focused on ensuring that whatever caused the incident, does not happen again. Because of this, it is important that these issues are actually scheduled and delivered in a timely fashion.
We should display the status of the linked issues in the post-mortem itself, but we should also display summary level reporting in the incidents overview section as well. This would serve to show whether the backlog of post-mortem issues is growing, or staying at maintainable levels.
In some cases, making time to address these can be challenging given the desire to ship roadmap features. By providing this reporting, we can make it easier to determine the appropriate amount of attention.12.0Amelia BauerlySVAmelia Bauerlyhttps://gitlab.com/gitlab-org/gitlab/-/issues/4925Open issues based on Prometheus alerts2020-05-06T00:01:47ZJoshua LambertOpen issues based on Prometheus alertsMany organizations use Issues for tracking incidents, like GitLab. Currently users have to manually create an issue, label it in some way as an incident, mention the proper folks, and then communicate that issue out to other channels.
T...Many organizations use Issues for tracking incidents, like GitLab. Currently users have to manually create an issue, label it in some way as an incident, mention the proper folks, and then communicate that issue out to other channels.
This is very inconvenient, and we should automate this process.
### Proposal
We can automate these tasks relatively simply, by automatically opening an issue when an alert is received from Prometheus.
The configuration can start off being very simple, with simply a checkbox to enable this functionality, which should default to on.
* New settings section `Operations -> Incidents`, this should be the top section in Operations
* Short text blurb describing what this does
* Checkbox to enable automatic issue creation
* Checkbox to enable email messages sent to developers (current default)
The contents of the issue can then be:
For GitLab alerts: `labels/gitlab_alert_id`
* `title` - Metric title
* `metric_query` - Configured metric query
* `environment_name` - associated deployment environment
For external alerts:
* title - `annotations/title` or `annotations/summary`
* If it does not exist, build title out of `<receiver>: <startsAt>` just to have something that works out of the box
* description - `annotations/descriptions`
* Also include any other annotations included as well, for example `severity`, `runbook`, etc. We can do special things with these later
### Design
![Operations_Settings_Collapsed](/uploads/c2ab5d04ae05fd3333b58d90c79728c0/Operations_Settings_Collapsed.png)
![Operations_Settings](/uploads/66eaa899f3e651da04666243db7989a5/Operations_Settings.png)
#### WIP screenshot
![Screen_Shot_2019-02-22_at_9.59.28_am](/uploads/6b8d30c923c76aa4ef74928b5c15906a/Screen_Shot_2019-02-22_at_9.59.28_am.png)
### Documentation
There are a few places we should add documentation:
* We should create a new section in Settings -> Operations, which includes instructions for how to create issues from alerts
* We should include instructions for how to set this up for managed Prometheus instances, which is essentially no further action
* We should also document how to set this up for external Prometheus instances, as well as example configuration
* We should crosslink here from the Prometheus integration section11.10Peter Leitzenpleitzen@gitlab.comSimon KnoxMatej LatinSVPeter Leitzenpleitzen@gitlab.com