Improve security dashboards' performance

The Problem

The endpoints that fetch vulnerability findings data on the security dashboards are slow, and on larger group and instance dashboards they often time out.

Technical Details

A large chunk of the request time is spent querying the database, and it is the database timeout that results in 500s from these endpoints. Specifically, the part of the query that fetches the appropriate pipelines can take a long time when the DB does not already have the pipelines cached.

Example

The findings endpoints for the gitlab-org group security dashboard have begun timing out frequently.

Here is the query for the findings list for that group: https://gitlab.com/snippets/1930742
And here is an EXPLAIN plan for that query: https://explain.depesz.com/s/Vyp2

Note: the reasonable time reported by the plan is the result of caching. Normally the query would take too long for us to create a plan using chatops.

A solution

In a recent instance dashboard MR, it was suggested that we separate the query for pipelines from the query for vulnerabilities.

Example

This example uses the gitlab-org group dashboard findings query:

Query for pipelines: https://gitlab.com/snippets/1936492
Plan: https://explain.depesz.com/s/0aB9

Query for vulnerabilities using pipeline IDs: https://gitlab.com/snippets/1936496
Plan: https://explain.depesz.com/s/6Baq

Pros

  1. Each query will take less time, so we should avoid the 15s from DB timeouts.
  2. We only need to fetch the pipeline data once, and then we can reuse it in each of the findings queries (index, summary, and history).

Cons

  1. Initial page load may be longer as we fetch the pipeline IDs
Edited by Avielle Wolfe