Group Security Dashboard `history` endpoint timing out
What is the problem
We are seeing regular timeouts on the vulnerabilities/history
endpoint for the gitlab-org
group security dashboard: 17 occurrences within the last 2 days https://gitlab.com/groups/gitlab-org/-/security/dashboard/
We are also seeing timouts on the vulnerabilities/index
and vulnerabilities/summary
endpoints, but not nearly as often.
Over the past 4 months, we've seen the following timeouts (as of 6/13/2019):
vulnerabilities/history - 244 occurrences
vulnerabilities/index - 32 occurrences
vulnerabilities/summary - 20 occurrences
Details
The attached files include the sql statements for the corresponding endpoints as well as the output of EXPLAIN ANALYZE
ran against production using chatops.
Example of a representative issue from Sentry:
https://sentry.gitlab.net/gitlab/gitlabcom/issues/756092/
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/peek.rb:18:in `async_exec_params'
super(*args)
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/peek.rb:18:in `exec_params'
super(*args)
active_record/connection_adapters/postgresql_adapter.rb:624:in `block (2 levels) in exec_no_cache'
@connection.exec_params(sql, type_casted_binds)
active_support/dependencies/interlock.rb:46:in `block in permit_concurrent_loads'
yield
active_support/concurrency/share_lock.rb:185:in `yield_shares'
yield
...
(160 additional frame(s) were not displayed)
ActiveRecord::StatementInvalid: PG::QueryCanceled: ERROR: canceling statement due to statement timeout
: SELECT CAST(vulnerability_occurrence_pipelines.created_at AS DATE) AS day, "vulnerability_occurrences"."severity", COUNT(distinct vulnerability_occurrences.id) as count FROM "vulnerability_occurrences" INNER JOIN "vulnerability_occurrence_pipelines" ON "vulnerability_occurrence_pipelines"."occurrence_id" = "vulnerability_occurrences"."id" WHERE "vulnerability_occurrence_pipelines"."pipeline_id" IN (SELECT "ci_pipelines"."id" FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project' WHERE (rs.path LIKE 'gitlab-org/%')) AND (EXISTS (SELECT 1 FROM "vulnerability_occurrence_pipelines" WHERE (ci_pipelines.id=vulnerability_occurrence_pipelines.pipeline_id))) AND "ci_pipelines"."status" = 'success') AND (vulnerability_occurrence_pipelines.created_at >= '2019-02-21 00:00:00') GROUP BY "day", "vulnerability_occurrences"."severity" ORDER BY day
ActiveRecord::StatementInvalid: PG::QueryCanceled: ERROR: canceling statement due to statement timeout
: SELECT CAST(vulnerability_occurrence_pipelines.created_at AS DATE) AS day, "vulnerability_occurrences"."severity", COUNT(distinct vulnerability_occurrences.id) as count FROM "vulnerability_occurrences" INNER JOIN "vulnerability_occurrence_pipelines" ON "vulnerability_occurrence_pipelines"."occurrence_id" = "vulnerability_occurrences"."id" WHERE "vulnerability_occurrence_pipelines"."pipeline_id" IN (SELECT "ci_pipelines"."id" FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project' WHERE (rs.path LIKE 'gitlab-org/%')) AND (EXISTS (SELECT 1 FROM "vulnerability_occurrence_pipelines" WHERE (ci_pipelines.id=vulnerability_occurrence_pipelines.pipeline_id))) AND "ci_pipelines"."status" = 'success') AND (vulnerability_occurrence_pipelines.created_at >= '2019-02-21 00:00:00') GROUP BY "day", "vulnerability_occurrences"."severity" ORDER BY day