Skip to content

WIP: Start to quantify data that violates pods

What does this MR do and why?

I started playing with the data in the issue_links table slowly growing a list of namespaces that we expect to be part of the GitLab Organization when we migrate to organizations. I got to 13 groups before I was finding lots of personal namespaces and groups that aren't owned by GitLab and then I decided to try and count how many links there would be outside of our organization into our organization to determine the scale of data that will need to be resolved in some way.

The query I was playing with to grow the organization was:

with gitlab_organization_namespace_ids as (
  select unnest(ARRAY[9970,
    6543,
    6737739,
    2065085,
    1400979,
    9361910,
    650153,
    253555,
    4347861,
    2008720,
    5085244,
    2299361,
    5723700]::bigint[])
)
select
source_id,
target_id,
source_root_namespaces.path as source_root_path,
target_root_namespaces.path as target_root_path,
source_root_namespaces.id as source_root_namspace_id,
target_root_namespaces.id as target_root_namspace_id
from issue_links
inner join issues as source_issues on source_issues.id = source_id
inner join issues as target_issues on target_issues.id = target_id
inner join namespaces as source_namespaces on source_namespaces.id = source_issues.namespace_id
inner join namespaces as target_namespaces on target_namespaces.id = target_issues.namespace_id
inner join namespaces as source_root_namespaces on source_root_namespaces.id = source_namespaces.traversal_ids[1]
inner join namespaces as target_root_namespaces on target_root_namespaces.id = target_namespaces.traversal_ids[1]
where
(source_root_namespaces.id not in (select * from gitlab_organization_namespace_ids) and target_root_namespaces.id in (select * from gitlab_organization_namespace_ids))
or
(target_root_namespaces.id not in (select * from gitlab_organization_namespace_ids) and source_root_namespaces.id in (select * from gitlab_organization_namespace_ids))
limit 20;

Then to count the violations I had:

with gitlab_organization_namespace_ids as (
  select unnest(ARRAY[9970,
    6543,
    6737739,
    2065085,
    1400979,
    9361910,
    650153,
    253555,
    4347861,
    2008720,
    5085244,
    2299361,
    5723700]::bigint[])
)
select
count(*)
from issue_links
inner join issues as source_issues on source_issues.id = source_id
inner join issues as target_issues on target_issues.id = target_id
inner join namespaces as source_namespaces on source_namespaces.id = source_issues.namespace_id
inner join namespaces as target_namespaces on target_namespaces.id = target_issues.namespace_id
inner join namespaces as source_root_namespaces on source_root_namespaces.id = source_namespaces.traversal_ids[1]
inner join namespaces as target_root_namespaces on target_root_namespaces.id = target_namespaces.traversal_ids[1]
where
(source_root_namespaces.id not in (select * from gitlab_organization_namespace_ids) and target_root_namespaces.id in (select * from gitlab_organization_namespace_ids))
or
(target_root_namespaces.id not in (select * from gitlab_organization_namespace_ids) and source_root_namespaces.id in (select * from gitlab_organization_namespace_ids));

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Merge request reports