Group Security Dashboard `history` endpoint timing out

What is the problem

We are seeing regular timeouts on the vulnerabilities/history endpoint for the gitlab-org group security dashboard: 17 occurrences within the last 2 days https://gitlab.com/groups/gitlab-org/-/security/dashboard/

We are also seeing timouts on the vulnerabilities/index and vulnerabilities/summary endpoints, but not nearly as often.

Over the past 4 months, we've seen the following timeouts (as of 6/13/2019):

vulnerabilities/history - 244 occurrences

vulnerabilities/index - 32 occurrences

vulnerabilities/summary - 20 occurrences

Details

history_analyze.sql

index_analyze.sql

summary_analyze.sql

The attached files include the sql statements for the corresponding endpoints as well as the output of EXPLAIN ANALYZE ran against production using chatops.

Example of a representative issue from Sentry:

https://sentry.gitlab.net/gitlab/gitlabcom/issues/756092/

PG::QueryCanceled: ERROR:  canceling statement due to statement timeout

  /opt/gitlab/embedded/service/gitlab-rails/config/initializers/peek.rb:18:in `async_exec_params'
    super(*args)
  /opt/gitlab/embedded/service/gitlab-rails/config/initializers/peek.rb:18:in `exec_params'
    super(*args)
  active_record/connection_adapters/postgresql_adapter.rb:624:in `block (2 levels) in exec_no_cache'
    @connection.exec_params(sql, type_casted_binds)
  active_support/dependencies/interlock.rb:46:in `block in permit_concurrent_loads'
    yield
  active_support/concurrency/share_lock.rb:185:in `yield_shares'
    yield
...
(160 additional frame(s) were not displayed)

ActiveRecord::StatementInvalid: PG::QueryCanceled: ERROR:  canceling statement due to statement timeout
: SELECT CAST(vulnerability_occurrence_pipelines.created_at AS DATE) AS day, "vulnerability_occurrences"."severity", COUNT(distinct vulnerability_occurrences.id) as count FROM "vulnerability_occurrences" INNER JOIN "vulnerability_occurrence_pipelines" ON "vulnerability_occurrence_pipelines"."occurrence_id" = "vulnerability_occurrences"."id" WHERE "vulnerability_occurrence_pipelines"."pipeline_id" IN (SELECT "ci_pipelines"."id" FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project' WHERE (rs.path LIKE 'gitlab-org/%')) AND (EXISTS (SELECT 1 FROM "vulnerability_occurrence_pipelines" WHERE (ci_pipelines.id=vulnerability_occurrence_pipelines.pipeline_id))) AND "ci_pipelines"."status" = 'success') AND (vulnerability_occurrence_pipelines.created_at >= '2019-02-21 00:00:00') GROUP BY "day", "vulnerability_occurrences"."severity" ORDER BY day

ActiveRecord::StatementInvalid: PG::QueryCanceled: ERROR:  canceling statement due to statement timeout
: SELECT CAST(vulnerability_occurrence_pipelines.created_at AS DATE) AS day, "vulnerability_occurrences"."severity", COUNT(distinct vulnerability_occurrences.id) as count FROM "vulnerability_occurrences" INNER JOIN "vulnerability_occurrence_pipelines" ON "vulnerability_occurrence_pipelines"."occurrence_id" = "vulnerability_occurrences"."id" WHERE "vulnerability_occurrence_pipelines"."pipeline_id" IN (SELECT "ci_pipelines"."id" FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" IN (SELECT "projects"."id" FROM "projects" INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project' WHERE (rs.path LIKE 'gitlab-org/%')) AND (EXISTS (SELECT 1 FROM "vulnerability_occurrence_pipelines" WHERE (ci_pipelines.id=vulnerability_occurrence_pipelines.pipeline_id))) AND "ci_pipelines"."status" = 'success') AND (vulnerability_occurrence_pipelines.created_at >= '2019-02-21 00:00:00') GROUP BY "day", "vulnerability_occurrences"."severity" ORDER BY day
Edited by rossfuhrman