Group Security Dashboard shows incorrect vulnerability counts
We are evaluating vulnerabilities in our analyzer projects and totals on the dashboard don't line up with the project counts (19 vs. ~100 critical shown below).
Steps to reproduce
- Navigate to a project level security dashboard (example)
- Notice that the count # in the
Vulnerabilities over timeis much lower than the counts per project in
Project security status(expand the
What is the current bug behavior?
Timeline counts are much lower than the actual project counts.
What is the expected correct behavior?
Timeline counts should match project counts.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)
(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)
(we will only investigate if the tests are passing)
In https://gitlab.slack.com/archives/CV09DAXEW/p1598556709070100 it was determined that this aggregate method may be the cause https://gitlab.com/gitlab-org/gitlab/-/blob/8d0dff57d77883aad64ec4e3a3814f602ec6d89b/ee/app/models/vulnerabilities/historical_statistic.rb#L28-34
The above logic assumes that we have historical statistic entries in the table for all the projects for each day which might not correct based on the implementation. If there is no record for a date then the previous date entry should be used in the sum.
The root cause of this issue might be the query timeout exceptions we get in
When we get an exception in this worker, the historical vulnerability statistics will not be generated for that day which causes wrong results to be calculated while aggregating the historical statistics for a group as there are gaps in the time series.
Find a better performance query to prevent having timeout issues or introduce
has_vulnerabilitiesflag for projects and get the list of projects with the following query;
SELECT * FROM projects WHERE has_vulnerabilities IS TRUE
- Find a better performance query to prevent having timeout issues or introduce
- Write a migration to populate missing historical vulnerability statistics
Since the daily adjustment worker can fail for various reasons, it's better to have a more robust way of aggregating the values so maybe we can change the query with something like proposed in the #244380 (comment 406700787)