Summary of issues not in Epics (autogenerated)

Summary of issues that are not in an Epic

Total Issues: 103

Team Tasks

Topic	Board	Workflow Status
Keeping documentation up to date #4016		workflow-infraProposal
Document functionality to create GitLab issues from prometheus alerts #4013	boardplanning	workflow-infraTriage
Dreaming of 2025: Observability Wish List! #4012		workflow-infraStalled
Metrics data for consumption and analysis #3967	boardplanning	workflow-infraTriage
Add script to update feature categories to stage-groups-index #3962		workflow-infraTriage
Remove promtool from the runbooks image #3960		workflow-infraTriage
Route saturation alerts to service owners #3941		workflow-infraTriage
Metric / o11y on inactive sidekiq threads #3808		workflow-infraStalled
jsonnet-tool should pass along JSONNET_PATH #3678		workflow-infraTriage
Link the unwinded source metrics of an alert in an alert message #3662		workflow-infraTriage
Commit and push feature categories should alert on failure #3616		workflow-infraTriage
Tamland documentation: a day-in-the-life of Capacity planning #3578	boardbuild	workflow-infraReady
Automate creation of MR to update feature categories #3400		workflow-infraStalled
Summary of issues not in Epics (autogenerated) #538

Service::AIGateway

Issues: 1 ServiceAIGateway

Topic	Team	Service::AIGateway	Board	Workflow Status
Analysis of frequency and duration of specific AI Gateway errors with code 429 #4210		ServiceAIGateway		workflow-infraTriage

Service::AlertManager

Issues: 2 ServiceAlertManager

Topic	Team	Service::AlertManager	Board	Workflow Status
Corrective action: Workhorse and Load Balancer SLI interdependency for alerts #2955		ServiceAlertManager		workflow-infraTriage
Traffic absent alerts causing pager noise #3276		ServiceAlertManager		workflow-infraProposal

Service::ClickHouseCloud

Issues: 1 ServiceClickHouseCloud

Topic	Team	Service::ClickHouseCloud	Board	Workflow Status
Help setup Clickhouse Rails logs #2982		ServiceClickHouseCloud		workflow-infraStalled

Service::Container Registry

Issues: 1 ServiceContainer Registry

Topic	Team	Service::Container Registry	Board	Workflow Status
Include a link to a specific kibana error log search in the alert definition for the garbage collection component of the container registry service #3293		ServiceContainer Registry		workflow-infraTriage

Service::Database

Issues: 1 ServiceDatabase

Topic	Team	Service::Database	Board	Workflow Status
Stage group index is broken again. #4043		ServiceDatabase		workflow-infraStalled

Service::Elasticsearch

Issues: 2 ServiceElasticsearch

Topic	Team	Service::Elasticsearch	Board	Workflow Status
Create Elastic Cloud Serverless Project with Elasticsearch project `gitlab-docs-website` for TW team #4370		ServiceElasticsearch
Create test deployment for gitlab-docs-website for Localization team #4141		ServiceElasticsearch		workflow-infraTriage

Service::GCP

Issues: 2 ServiceGCP

Topic	Team	Service::GCP	Board	Workflow Status
Review new Google Cloud Logging regional ingestion quotas #4056		ServiceGCP		workflow-infraTriage
Implement GCP scheduled snapshots health check #3257		ServiceGCP		workflow-infraTriage

Service::GitLab Rails

Issues: 1 ServiceGitLab Rails

Topic	Team	Service::GitLab Rails	Board	Workflow Status
Review Request: maven virtual registry, multiple usptreams support #4096	teamScalability	ServiceGitLab Rails	boardbuild	workflow-infraReady

Service::Gitaly

Issues: 1 ServiceGitaly

Topic	Team	Service::Gitaly	Board	Workflow Status
Corrective action: alert on GitLab pipeline failures due to load. #3245		ServiceGitaly		workflow-infraTriage

Service::Grafana

Issues: 11 ServiceGrafana

Topic	Service::Grafana	Workflow Status
Adding a grafana datasource to through configuration fails on secrets #4214	ServiceGrafana	workflow-infraTriage
SLI detail panels should apply the same selectors as the SLI itself does #4189	ServiceGrafana	workflow-infraTriage
Replace redis-sidekiq shard template to use the generic shard template #4171	ServiceGrafana	workflow-infraTriage
Make the service overview show the SLI for each shard in a different colour #4169	ServiceGrafana	workflow-infraTriage
Confidential Issue https://gitlab.com/gitlab-com/gl-infra/observability/team/-/issues/4079	ServiceGrafana	workflow-infraBacklog
Confidential Issue https://gitlab.com/gitlab-com/gl-infra/observability/team/-/issues/4044	ServiceGrafana	workflow-infraTriage
Migrate Grafana to Okta #4003	ServiceGrafana	workflow-infraProposal
Webservice dashboard link to kibana slow rails requests broken #3855	ServiceGrafana	workflow-infraTriage
Escaping of promql queries in alertmanager Slack alerts broken #3854	ServiceGrafana	workflow-infraTriage
Streamline latency attribution via service dashboards #3849	ServiceGrafana	workflow-infraTriage
Review grafana monitoring and alerting rules. #2971	ServiceGrafana	workflow-infraTriage

Service::Kube

Issues: 2 ServiceKube

Topic	Team	Service::Kube	Board	Workflow Status
Create process to periodically review nodepool instance families in kubernetes #4173		ServiceKube		workflow-infraProposal
Monitor kubernetes node CPU wait / noisy neighbour #4172		ServiceKube		workflow-infraProposal

Service::Logging

Issues: 10 ServiceLogging

Topic	Service::Logging	Workflow Status
Update runbooks and docs-hub logging documentation #4217	ServiceLogging	workflow-infraProposal
Audit GCP Cloud Logs usage #4118	ServiceLogging	workflow-infraTriage
Decommission Loki #4116	ServiceLogging	workflow-infraBlocked
Ingest sampled logs for some percentage of gitlab rails sql queries #4107	ServiceLogging	workflow-infraTriage
Differentiate disk space in Elastic by data tier #4066	ServiceLogging	workflow-infraTriage
How should we handle large json.message fields in Elastic. #4052	ServiceLogging	workflow-infraTriage
Confidential Issue https://gitlab.com/gitlab-com/gl-infra/observability/team/-/issues/3880	ServiceLogging	workflow-infraTriage
Improve the pubsubbeat deployment #3255	ServiceLogging	workflow-infraTriage
Push Elasticsearch ILM policies and index templates on a schedule #3292	ServiceLogging	workflow-infraTriage
Add Kibana fields for Postgres autovacuum auto-analyze log messages #3232	ServiceLogging	workflow-infraTriage

Service::Mimir

Issues: 9 ServiceMimir

Topic	Service::Mimir	Board	Workflow Status
Implement aggregation for metrics with endpoint_id #4139	ServiceMimir		workflow-infraProposal
Increase in Mimir store getRange latencies since upgrade to 2.15.0 #4124	ServiceMimir		workflow-infraStalled
Validating alert recording rules on live metrics #3853	ServiceMimir		workflow-infraTriage
Create a testing framework for recording- and alerting rules #3851	ServiceMimir		workflow-infraTriage
Add Metric Management information to Monitoring section of handbook #3704	ServiceMimir		workflow-infraTriage
Request to update prometheus blackbox config for handbook website #3675	ServiceMimir		workflow-infraTriage
Rename GCP bucket thanos-periodic-queries to periodic-queries #3519	ServiceMimir		workflow-infraTriage
Move periodic queries execution from ops to GitLab.com #3512	ServiceMimir		workflow-infraReady
Combine enqueued_jobs and sidekiq_queueing SLI in Sidekiq #3488	ServiceMimir	boardplanning	workflow-infraTriage

Service::Monitoring-Other

Issues: 2 ServiceMonitoring-Other

Topic	Team	Service::Monitoring-Other	Board	Workflow Status
Provide capability to backtest new alert definitions #4143		ServiceMonitoring-Other		workflow-infraTriage
Reduce GitLab's histograms to 3-5 buckets for most histograms #476		ServiceMonitoring-Other		workflow-infraTriage

Service::Patroni

Issues: 1 ServicePatroni

Topic	Team	Service::Patroni	Board	Workflow Status
Monitor Postgres TOAST oid exhaustion #3180		ServicePatroni	boardplanning	workflow-infraNeeds More Info

Service::Postgres

Issues: 1 ServicePostgres

Topic	Team	Service::Postgres	Board	Workflow Status
Patroni main, data growth drill-down #3784		ServicePostgres	boardplanning	workflow-infraTriage

Service::Prometheus

Issues: 4 ServicePrometheus

Topic	Service::Prometheus	Workflow Status
Update to Prometheus 3.0 #4000	ServicePrometheus	workflow-infraTriage
Create a more general purpose stackdriver-exporter for teams #2997	ServicePrometheus	workflow-infraReady
Deploy Prometheus Rules and Alertmanager from gitlab-helmfiles instead of runbooks #3267	ServicePrometheus	workflow-infraTriage
Add monitoring for OAuth2 login endpoints #3228	ServicePrometheus	workflow-infraTriage

Service::Redis

Issues: 2 ServiceRedis

Topic	Team	Service::Redis	Board	Workflow Status
Mirror process-exporter image to be resilient to docker registry failure #1709		ServiceRedis		workflow-infraTriage
Evaluate porting scheduled CPU profiles for redis observability on Kubernetes #1633		ServiceRedis		workflow-infraTriage

Service::Runbooks

Issues: 2 ServiceRunbooks

Topic	Team	Service::Runbooks	Board	Workflow Status
Report availability per service and overall GitLab availability. #4082		ServiceRunbooks		workflow-infraTriage
Fix update feature categories script on runbooks #3961		ServiceRunbooks		workflow-infraTriage

Service::Sentry

Issues: 2 ServiceSentry

Topic	Team	Service::Sentry	Board	Workflow Status
Fix gap in sentry monitoring #4224		ServiceSentry		workflow-infraTriage
Corrective Action: relieve memory pressure issues with Sentry's kafka #4026		ServiceSentry		workflow-infraTriage

Service::Sidekiq

Issues: 2 ServiceSidekiq

Topic	Team	Service::Sidekiq	Board	Workflow Status
Continuous profiling for Ruby Projects #3827		ServiceSidekiq		workflow-infraTriage
Discuss removal of histogram metrics on Sidekiq for self-managed #2474		ServiceSidekiq		workflow-infraProposal

Service::Thanos

Issues: 1 ServiceThanos

Topic	Team	Service::Thanos	Board	Workflow Status
Remove remaining Thanos components #4008		ServiceThanos		workflow-infraProposal

Service::Unknown

Issues: 4 ServiceUnknown

Topic	Service::Unknown	Workflow Status
Make the pager stop melting if the world is on fire. #4222	ServiceUnknown	workflow-infraTriage
tenant-observability-stack: Add support for node selector in tenant-observability-config-manager job #4158	ServiceUnknown	workflow-infraTriage
tenant-observability-stack: Make images configurable for ARM support #4157	ServiceUnknown	workflow-infraTriage
Creating test dashboard do not properly work #4144	ServiceUnknown	workflow-infraTriage

Service::Web

Issues: 2 ServiceWeb

Topic	Team	Service::Web	Board	Workflow Status
Web pods are being throttled #4205		ServiceWeb		workflow-infraTriage
Add job to update feature categories to the rails app #3963		ServiceWeb		workflow-infraTriage

Other

Topic	Team	Board	Workflow Status
Can we make the triage dashboards useful? #4221
Introduce floor threshold into our Capacity Planning process to improve financial efficiency #4108
Confidential Issue https://gitlab.com/gitlab-com/gl-infra/observability/team/-/issues/4104
Implement a feature registry #4099
Dimension lookup during reporting and issue management #4033
Discussion: Observability Service topology for metrics in cells #4029
patroni.disk_sustained_write_iops and patroni.disk_sustained_read_iops missing graphs #4032
Error Budgets should be based on full calendar month #3979
Stewardship for common-ci-tasks and related projects #3948
Confidential Issue https://gitlab.com/gitlab-com/gl-infra/observability/team/-/issues/3840
Confidential Issue https://gitlab.com/gitlab-com/gl-infra/observability/team/-/issues/3835
Confidential Issue https://gitlab.com/gitlab-com/gl-infra/observability/team/-/issues/3729
Synthetic Monitoring / Testing #3637
Move product error budget dashboards out of the current folder #3618
Confidential Issue https://gitlab.com/gitlab-com/gl-infra/observability/team/-/issues/3441			boardplanning
Use expanded labels recording rule for alerting dashboards #3426			boardplanning
Observability Feedback from Engineering Productivity Pulse Survey - FY25Q1 #2953
Turn the get-hybrid monitoring config into a monitoring mixin #2832			boardplanning
Labkit as the in-application platform toolkit #2793
Introduce open_fds saturation point for process_exporter #2778
Rename SLOs we use in saturation points #2168
Remove custom feature category recordings for the puma component #1481

Edited Dec 12, 2025 by service-epic-status-automation