Group Security Dashboard shows incorrect vulnerability counts

Summary

We are evaluating vulnerabilities in our analyzer projects and totals on the dashboard don't line up with the project counts (19 vs. ~100 critical shown below).

Steps to reproduce

  1. Navigate to a project level security dashboard (example)
  2. Notice that the count # in the Vulnerabilities over time is much lower than the counts per project in Project security status (expand the F grade)

Example Project

https://gitlab.com/groups/gitlab-org/security-products/analyzers/-/security/dashboard

What is the current bug behavior?

Timeline counts are much lower than the actual project counts.

What is the expected correct behavior?

Timeline counts should match project counts.

Relevant logs and/or screenshots

image

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

In https://gitlab.slack.com/archives/CV09DAXEW/p1598556709070100 it was determined that this aggregate method may be the cause https://gitlab.com/gitlab-org/gitlab/-/blob/8d0dff57d77883aad64ec4e3a3814f602ec6d89b/ee/app/models/vulnerabilities/historical_statistic.rb#L28-34

Per @minac:

The above logic assumes that we have historical statistic entries in the table for all the projects for each day which might not correct based on the implementation. If there is no record for a date then the previous date entry should be used in the sum.

Implementation plan

The root cause of this issue might be the query timeout exceptions we get in Vulnerabilities::Statistics::ScheduleWorker#L19.

When we get an exception in this worker, the historical vulnerability statistics will not be generated for that day which causes wrong results to be calculated while aggregating the historical statistics for a group as there are gaps in the time series.

    • Find a better performance query to prevent having timeout issues or introduce has_vulnerabilities flag for projects and get the list of projects with the following query;
      SELECT * FROM projects WHERE has_vulnerabilities IS TRUE
    • Write a migration to populate missing historical vulnerability statistics
    • [-] Since the daily adjustment worker can fail for various reasons, it's better to have a more robust way of aggregating the values so maybe we can change the query with something like proposed in the #244380 (comment 406700787)
Edited by Mehmet Emin INAC