[SPIKE] Investigate dormant member query
Background
In [Gitlab.com] Automatic removal of dormant members (&7533) we intend to add a feature to identify and remove dormant users from Namespaces for GitLab.com
This is similar to the existing SM feature ( &5519), but rather than deactivating users, we will remove members.
To achieve this, we will be adding a way to track member’s last activity timestamp within a namespace, so that we can identify dormant members and remove them if they've been dormant for greater than a predefined time period (e.g. 90 days).
Purpose
The purpose of this issue is to:
Investigate the best or most performant approach to querying for dormant members
Considerations
We will need to:
- query daily (limited capacity worker?)
- for every top level group that has enabled the setting
- find all members in the group's hierarchy that have not been active > N (defined in group setting) days
In previous discussions (internal slack), we've considered:
- One query that finds all members in the hierarchy, e.g. using the
Member#in_hierarchy
scope. On a warm cache with the GL group, this has been very performant in the past, but we might want to test against larger groups/cold caches (example) - An iterative loop over groups in the hierarchy, which fires off a query for each one to find the dormant members
The question between the two is which is better, many fast queries, or one slower query
Outcome
The desired outcome of this issue is to preferably have decided on the approach that will satisfy GitLab performance guidelines.
Outcome
We'll use batches to loop over matching groups and find / remove dormant members within those batches.
example code and query performances added in !153100 (diffs)