Skip to content

Change Manage Import metrics to total of imports instead of unique users

Kassio Borges requested to merge kassio/manage-import-metrics into master

What does this MR do?

Change Manage Import metrics to total of imports instead of unique users

There were some confusion about the manage metrics when they were originally created. This data have the intent of provide the "North Star" metrics information, which is the total of imported resources.

The monthly unique user usage (GMAU) will be added separately in another MR in the unique_users_all_imports key

Related to:

  • #283175 (closed) (discussion regarding the current confusion between the metric names and values returned)
  • #297432 (closed) (request to change how to calculate these metrics)

Screenshots (strongly suggested)

Original Queries

bulk_imports.gitlab

Query:

SELECT
    COUNT(DISTINCT "bulk_imports"."user_id")
FROM
    "bulk_imports"
WHERE
    "bulk_imports"."created_at" BETWEEN '2020-12-21 22:34:42.084581'
    AND '2021-01-18 22:34:42.084663'
    AND "bulk_imports"."user_id" >= 0
    AND "bulk_imports"."user_id" < 1

Plan:

 Aggregate  (cost=3.17..3.18 rows=1 width=8) (actual time=0.026..0.026 rows=1 loops=1)
   Buffers: shared hit=1
   ->  Index Scan using index_bulk_imports_on_user_id on public.bulk_imports  (cost=0.14..3.17 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=1)
         Index Cond: ((bulk_imports.user_id >= 0) AND (bulk_imports.user_id < 1))
         Filter: ((bulk_imports.created_at >= '2020-12-21 22:34:42.084581+00'::timestamp with time zone) AND (bulk_imports.created_at <= '2021-01-18 22:34:42.084663+00'::timestamp with time zone))
         Rows Removed by Filter: 0
         Buffers: shared hit=1

Recommendations: Looks good

Statistics:

Time: 0.172 ms
  - planning: 0.112 ms
  - execution: 0.060 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 1 (~8.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
projects_imported.total

Query:

SELECT
    COUNT(DISTINCT "projects"."creator_id")
FROM
    "projects"
WHERE
    "projects"."created_at" BETWEEN '2020-12-21 22:34:42.084581'
    AND '2021-01-18 22:34:42.084663'
    AND "projects"."import_type" IS NOT NULL
    AND "projects"."creator_id" >= 0
    AND "projects"."creator_id" < 1

Plan:

 Aggregate  (cost=3.46..3.47 rows=1 width=8) (actual time=0.032..0.032 rows=1 loops=1)
   Buffers: shared hit=3
   ->  Index Only Scan using index_projects_on_creator_id_import_type_and_created_at_partial on public.projects  (cost=0.43..3.45 rows=1 width=4) (actual time=0.021..0.022 rows=0 loops=1)
         Index Cond: ((projects.creator_id >= 0) AND (projects.creator_id < 1) AND (projects.created_at >= '2020-12-21 22:34:42.084581+00'::timestamp with time zone) AND (projects.created_at <= '2021-01-18 22:34:42.084663+00'::timestamp with time zone))
         Heap Fetches: 0
         Buffers: shared hit=3

Recommendations: Looks good

Statistics:

Time: 0.680 ms
  - planning: 0.613 ms
  - execution: 0.067 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 3 (~24.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
projects_imported.(gitlab_project,gitlab,github,bitbucket,bitbucket_server,gitea,git,manifest) and issues_imported.(fogbugz,phabricator)

Query:

SELECT
    COUNT(DISTINCT "projects"."creator_id")
FROM
    "projects"
WHERE
    "projects"."import_type" = 'gitlab_project'
    AND "projects"."created_at" BETWEEN '2020-12-21 22:34:42.084581'
    AND '2021-01-18 22:34:42.084663'
    AND "projects"."import_type" IS NOT NULL
    AND "projects"."creator_id" >= 0
    AND "projects"."creator_id" < 1

Plan:

 Aggregate  (cost=3.46..3.47 rows=1 width=8) (actual time=0.028..0.028 rows=1 loops=1)
   Buffers: shared hit=3
   ->  Index Only Scan using index_projects_on_creator_id_import_type_and_created_at_partial on public.projects  (cost=0.43..3.46 rows=1 width=4) (actual time=0.017..0.018 rows=0 loops=1)
         Index Cond: ((projects.creator_id >= 0) AND (projects.creator_id < 1) AND (projects.import_type = 'gitlab_project'::text) AND (projects.created_at >= '2020-12-21 22:34:42.084581+00'::timestamp with time zone) AND (projects.created_at <= '2021-01-18 22:34:42.084663+00'::timestamp with time zone))
         Heap Fetches: 0
         Buffers: shared hit=3

Recommendations: Looks good

Statistics:

Time: 0.681 ms
  - planning: 0.618 ms
  - execution: 0.063 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 3 (~24.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
groups_imported

Query:

SELECT
    COUNT(DISTINCT "group_import_states"."user_id")
FROM
    "group_import_states"
WHERE
    "group_import_states"."created_at" BETWEEN '2020-12-21 22:34:42.084581'
    AND '2021-01-18 22:34:42.084663'
    AND "group_import_states"."user_id" >= 0
    AND "group_import_states"."user_id" < 1

Plan:

 Aggregate  (cost=3.30..3.31 rows=1 width=8) (actual time=1.009..1.010 rows=1 loops=1)
   Buffers: shared hit=12 read=2
   I/O Timings: read=0.807
   ->  Index Scan using index_group_import_states_on_user_id on public.group_import_states  (cost=0.28..3.30 rows=1 width=8) (actual time=0.927..0.927 rows=0 loops=1)
         Index Cond: ((group_import_states.user_id >= 0) AND (group_import_states.user_id < 1))
         Filter: ((group_import_states.created_at >= '2020-12-21 22:34:42.084581+00'::timestamp with time zone) AND (group_import_states.created_at <= '2021-01-18 22:34:42.084663+00'::timestamp with time zone))
         Rows Removed by Filter: 0
         Buffers: shared hit=3 read=2
         I/O Timings: read=0.807

Recommendations: Looks good

Statistics:

Time: 1.210 ms
  - planning: 0.168 ms
  - execution: 1.042 ms
    - I/O read: 0.807 ms
    - I/O write: N/A

Shared buffers:
  - hits: 12 (~96.00 KiB) from the buffer pool
  - reads: 2 (~16.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

New Queries

bulk_imports.gitlab

Query:

SELECT
    COUNT("bulk_imports"."id")
FROM
    "bulk_imports"
WHERE
    "bulk_imports"."created_at" BETWEEN '2020-12-21 22:34:42.084581'
    AND '2021-01-18 22:34:42.084663'
    AND "bulk_imports"."id" >= 0
    AND "bulk_imports"."id" < 1

Plan:

 Aggregate  (cost=3.17..3.18 rows=1 width=8) (actual time=0.017..0.018 rows=1 loops=1)
   Buffers: shared hit=1
   ->  Index Scan using bulk_imports_pkey on public.bulk_imports  (cost=0.14..3.17 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1)
         Index Cond: ((bulk_imports.id >= 0) AND (bulk_imports.id < 1))
         Filter: ((bulk_imports.created_at >= '2020-12-21 22:34:42.084581+00'::timestamp with time zone) AND (bulk_imports.created_at <= '2021-01-18 22:34:42.084663+00'::timestamp with time zone))
         Rows Removed by Filter: 0
         Buffers: shared hit=1

Recommendations: Looks good

Statistics:

Time: 0.173 ms
  - planning: 0.108 ms
  - execution: 0.065 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 1 (~8.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
project_imports.(gitlab_project,gitlab,github,bitbucket,bitbucket_server,gitea,git,manifest)

Query:

SELECT
    COUNT(DISTINCT "projects"."creator_id")
FROM
    "projects"
WHERE
    "projects"."import_type" = 'gitlab_project'
    AND "projects"."created_at" BETWEEN '2020-12-21 22:34:42.084581'
    AND '2021-01-18 22:34:42.084663'
    AND "projects"."import_type" IS NOT NULL
    AND "projects"."creator_id" >= 0
    AND "projects"."creator_id" < 1

Plan:

 Aggregate  (cost=3.46..3.47 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)
   Buffers: shared hit=3
   ->  Index Only Scan using index_projects_on_creator_id_import_type_and_created_at_partial on public.projects  (cost=0.43..3.46 rows=1 width=4) (actual time=0.019..0.019 rows=0 loops=1)
         Index Cond: ((projects.creator_id >= 0) AND (projects.creator_id < 1) AND (projects.import_type = 'gitlab_project'::text) AND (projects.created_at >= '2020-12-21 22:34:42.084581+00'::timestamp with time zone) AND (projects.created_at <= '2021-01-18 22:34:42.084663+00'::timestamp with time zone))
         Heap Fetches: 0
         Buffers: shared hit=3

Recommendations: Looks good

Statistics:

Time: 0.749 ms
  - planning: 0.683 ms
  - execution: 0.066 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 3 (~24.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
project_imports.gitlab_migration

Query:

SELECT
    COUNT("bulk_import_entities"."id")
FROM
    "bulk_import_entities"
WHERE
    "bulk_import_entities"."created_at" BETWEEN '2020-12-21 22:34:42.084581'
    AND '2021-01-18 22:34:42.084663'
    AND "bulk_import_entities"."source_type" = 1
    AND "bulk_import_entities"."id" >= 0
    AND "bulk_import_entities"."id" < 1

Plan:

 Aggregate  (cost=3.17..3.18 rows=1 width=8) (actual time=0.025..0.025 rows=1 loops=1)
   Buffers: shared hit=1
   ->  Index Scan using bulk_import_entities_pkey on public.bulk_import_entities  (cost=0.14..3.17 rows=1 width=8) (actual time=0.022..0.022 rows=0 loops=1)
         Index Cond: ((bulk_import_entities.id >= 0) AND (bulk_import_entities.id < 1))
         Filter: ((bulk_import_entities.created_at >= '2020-12-21 22:34:42.084581+00'::timestamp with time zone) AND (bulk_import_entities.created_at <= '2021-01-18 22:34:42.084663+00'::timestamp with time zone) AND (bulk_import_entities.source_type = 1))
         Rows Removed by Filter: 0
         Buffers: shared hit=1

Recommendations: Looks good

Statistics:

Time: 0.254 ms
  - planning: 0.173 ms
  - execution: 0.081 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 1 (~8.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
group_imports.group_import

Query:

SELECT
    COUNT("group_import_states"."group_id")
FROM
    "group_import_states"
WHERE
    "group_import_states"."created_at" BETWEEN '2020-12-21 22:34:42.084581'
    AND '2021-01-18 22:34:42.084663'
    AND "group_import_states"."group_id" >= 0
    AND "group_import_states"."group_id" < 1

Plan:

 Aggregate  (cost=3.30..3.31 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)
   Buffers: shared hit=2
   ->  Index Scan using index_group_import_states_on_group_id on public.group_import_states  (cost=0.28..3.30 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1)
         Index Cond: ((group_import_states.group_id >= 0) AND (group_import_states.group_id < 1))
         Filter: ((group_import_states.created_at >= '2020-12-21 22:34:42.084581+00'::timestamp with time zone) AND (group_import_states.created_at <= '2021-01-18 22:34:42.084663+00'::timestamp with time zone))
         Rows Removed by Filter: 0
         Buffers: shared hit=2

Recommendations: Looks good

Statistics:

Time: 0.235 ms
  - planning: 0.196 ms
  - execution: 0.039 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 2 (~16.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
group_imports.gitlab_migration

Query:

SELECT
    COUNT("bulk_import_entities"."id")
FROM
    "bulk_import_entities"
WHERE
    "bulk_import_entities"."created_at" BETWEEN '2020-12-21 22:34:42.084581'
    AND '2021-01-18 22:34:42.084663'
    AND "bulk_import_entities"."source_type" = 0
    AND "bulk_import_entities"."id" >= 0
    AND "bulk_import_entities"."id" < 1

Plan:

 Aggregate  (cost=3.17..3.18 rows=1 width=8) (actual time=0.030..0.031 rows=1 loops=1)
   Buffers: shared hit=1
   ->  Index Scan using bulk_import_entities_pkey on public.bulk_import_entities  (cost=0.14..3.17 rows=1 width=8) (actual time=0.026..0.026 rows=0 loops=1)
         Index Cond: ((bulk_import_entities.id >= 0) AND (bulk_import_entities.id < 1))
         Filter: ((bulk_import_entities.created_at >= '2020-12-21 22:34:42.084581+00'::timestamp with time zone) AND (bulk_import_entities.created_at <= '2021-01-18 22:34:42.084663+00'::timestamp with time zone) AND (bulk_import_entities.source_type = 0))
         Rows Removed by Filter: 0
         Buffers: shared hit=1

Recommendations: Looks good

Statistics:

Time: 0.284 ms
  - planning: 0.196 ms
  - execution: 0.088 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 1 (~8.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Mayra Cabrera

Merge request reports