Refactor Complex Queries for project_imports(from)
project_imports metrics are identified as having complex queries and discrepancies with Automated Service Ping
Implementation https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb#L891
Metrics to fix
usage_activity_by_stage_monthly => manage: => {:gitlab_project=>0, :gitlab=>0, :github=>0, :bitbucket=>0, :bitbucket_server=>0, :gitea=>0, :git=>0, :manifest=>0, :gitlab_migration=>0, :total=>0}
- usage_activity_by_stage { manage { project_imports { total } } source code
"total": "SELECT (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'gitlab_project' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'gitlab' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'github' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'bitbucket' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'bitbucket_server' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'gitea' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'git' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'manifest' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"bulk_import_entities\".\"id\") FROM \"bulk_import_entities\" WHERE \"bulk_import_entities\".\"source_type\" = 1)"
- usage_activity_by_stage_monthly { manage { project_imports { "total" } } source code
"total": "SELECT (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'gitlab_project' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'gitlab' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'github' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'bitbucket' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'bitbucket_server' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'gitea' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'git' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"projects\".\"id\") FROM \"projects\" WHERE \"projects\".\"import_type\" = 'manifest' AND \"projects\".\"import_type\" IS NOT NULL) + (SELECT COUNT(\"bulk_import_entities\".\"id\") FROM \"bulk_import_entities\" WHERE \"bulk_import_entities\".\"created_at\" BETWEEN '2021-12-27 06:42:59.923192' AND '2022-01-24 06:42:59.923266' AND \"bulk_import_entities\".\"source_type\" = 1)"
Acceptance criteria
For each identified complex database metric, we want to look into:
- Refactor to Make the logic simple, to only use
count
ordistinct_count
- Overwrite
to_sql
if required, with the correct query for DWH - The logic for the query generated for DWH should be the same, we should be able to get in the end the same values
Related to #349858 (closed)
Edited by Luis Mejia