Use approximate counter for CI/CD related big tables

Problem

GitLab instances send usage pings from on-premises/gitlab.com servers to https://version.gitlab.com/usage_data. The tracked data is listed in https://gitlab.com/gitlab-org/gitlab-ce/blob/master/lib/gitlab/usage_data.rb.

In this routinue, GitLab counts a number of records for each tables/conditions, however, CI/CD related tables are too big (e.g. 170M~) to count. We should use approximate counter (Gitlab::Database::Count.approximate_counts) for those tables.

At the glance, those relations should be counted via approximate_counts.

  • Ci::Build
  • Ci::Pipeline.internal
  • Ci::Pipeline.external
  • Ci::Pipeline.auto_devops_source
  • Ci::Pipeline.repository_source

We also want to add more queires which related to ci_builds, such as

  • SELECT COUNT(DISTINCT "ci_builds"."project_id") FROM "ci_builds" WHERE "ci_builds"."name" = 'pages:deploy'

We tried to implement the query in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25014, but concluded that it doesn't work on gitlab.com scale.