Improve security dashboards' performance
The Problem
The endpoints that fetch vulnerability findings data on the security dashboards are slow, and on larger group and instance dashboards they often time out.
Technical Details
A large chunk of the request time is spent querying the database, and it is the database timeout that results in 500s from these endpoints. Specifically, the part of the query that fetches the appropriate pipelines can take a long time when the DB does not already have the pipelines cached.
Example
The findings endpoints for the gitlab-org
group security dashboard have begun timing out frequently.
Here is the query for the findings list for that group: https://gitlab.com/snippets/1930742
And here is an EXPLAIN
plan for that query: https://explain.depesz.com/s/Vyp2
Note: the reasonable time reported by the plan is the result of caching. Normally the query would take too long for us to create a plan using chatops.
A solution
In a recent instance dashboard MR, it was suggested that we separate the query for pipelines from the query for vulnerabilities.
Example
This example uses the gitlab-org
group dashboard findings query:
Query for pipelines: https://gitlab.com/snippets/1936492
Plan: https://explain.depesz.com/s/0aB9
Query for vulnerabilities using pipeline IDs: https://gitlab.com/snippets/1936496
Plan: https://explain.depesz.com/s/6Baq
Pros
- Each query will take less time, so we should avoid the 15s from DB timeouts.
- We only need to fetch the pipeline data once, and then we can reuse it in each of the findings queries (
index
,summary
, andhistory
).
Cons
- Initial page load may be longer as we fetch the pipeline IDs