Simplify Namespace relationships
A Group can exist in a parent-child hierarchy, or as one group shared in to another group. Our architecture treats these as two different types of relationships, when in fact they are the same. They are different expressions of the same underlying "directed acyclic graph" (DAG). This divide in the architecture has led to engineering complexity (e.g. #227793, #217937 (closed), #33534 (closed)) and limitations on product design (e.g. https://about.gitlab.com/company/team/structure/working-groups/simplify-groups-and-projects/, &2885 (closed)).
My proposal is to obsolete the parent-child hierarchy and unify within the GroupGroupLink join model. This will greatly reduce complexity for engineering, and the product design will be free to link and traverse across the hierarchy as easily as it can up and down the hierarchy.
The Problem
Our parent-child group hierarchy is currently viewed as:
graph TD;
grandparent---parentA;
grandparent---parentB;
parentA---childA;
parentA---childB;
parentB---childC;
parentB---childD;
Members of childA are the union of the members from childA U parentA U grandparent
The equivalent in "group sharing" terms for childA membership becomes ...
graph LR;
grandparent-- shared into -->parentA;
parentA -- shared into --> childA;
This demonstrates that the parent-child structure can be described from a "group sharing" perspective.
The "parent-child hierarchy" is thought of as a one to many structure, and "group sharing" as many to many. When we apply a many to many requirement on the parent-child hierarchy we get:
graph TD;
grandparent---parentA;
grandparent---parentB;
parentA---childA;
parentA---childB;
parentB---childC;
parentB---childD;
parentB---childA;
Where childA now has the two parents of parentA and parentB.
From a "group sharing" perspective we get:
graph LR;
grandparent-- shared into -->parentA;
parentA -- shared into --> childA;
grandparent-- shared into -->parentB;
parentB -- shared into --> childA;
childA membership is now the union of (childA U parentA U grandparent) U (childA U parentB U grandparent) which is just childA U parentA U parentB U grandparent
We can not represent this multiple parent structure using the parent_id attribute of the Namespace model. In this case, we have to turn to "group sharing" with its corresponding GroupGroupLink join model. The problem is that once we do this, we have split the group architecture across two different implementations. One being the parent_id attribute, and the other being the GroupGroupLink join model.
The Solution
The solution is to unify everything under the GroupGroupLink because it is able to represent both the group sharing and parent-child structure. Note that there is no need to demarcate between the parent-child hierarchy and the "group sharing" links as they are the same thing.
We can search the GroupGroupLink table using the same methods we use today. Either with a recursive CTE or linearly with traversal_ids.
The part of the above example where childA has two parents of parentA and parentB would be described in the GroupGroupLink table like so:
| shared_group | shared_with_group | traversal_ids |
|---|---|---|
| grandparent | parentA | grandparent / parentA |
| parentA | childA | grandparent / parentA / childA |
| grandparent | parentB | grandparent / parentB |
| parentB | childA | grandparent / parentB / childA |
Members of childA are all the members of childA and its ancestors, which would be traversal_ids matching * / childA.
Search with recursive CTE would match our use today, but there is a difference with how we use traversal_ids. In the parent-child structure there is only one path from the root to each node. But, in the shared group structure there can be multiple paths. For example, going from grandparent to childA we can walk either path of:
grandparent / parentA / childAgrandparent / parentB / childA
This means we need to store all paths of traversal. Say childA has a subchildA such that:
graph LR;
grandparent-- shared into -->parentA;
parentA -- shared into --> childA;
grandparent-- shared into -->parentB;
parentB -- shared into --> childA;
childA -- shared into --> subchildA;
We would have to store subchildA like so:
| shared_group | shared_with_group | traversal_ids |
|---|---|---|
| childA | subchildA |
grandparent / parentA / childA / subchildAgrandparent / parentB / childA / subchildA
|
Implications
We are liberated away from the rigid hierarchical parent-child structure to a much more free flowing and flexible interconnected structure. Because the graph still runs top-down we can maintain the valuable "roll up" concept, while at the same time avoid having to impose a rigid structure on organizations.
By unifying the group hierarchy and sharing structure we can reduce engineering complexity. This also enables the product team to rethink how we can more appropriately structure teams of people and groups of resources to better align and adapt to organizational requirements.
Acyclic Caveat
There is one important caveat with a DAG which is that we can not allow cycles to occur. A cycle is something like:
graph TD;
grandparent---parentA;
grandparent---parentB;
parentA---childA;
parentA---childB;
parentB---childC;
parentB---childD;
childA---grandparent;
The cycle being more clearly illustrated from a sharing perspective:
graph LR;
grandparent-- shared into -->parentA;
parentA -- shared into --> childA;
childA -- shared into --> grandparent;
The net result is that grandparent, parentA, and childA all contain the same members. This caveat should be fine in practical terms as it is an invalid representation of an organizational structure. From a technical perspective we can detect these fine and ensure integrity through model validations. They are already detected through the recursive CTE methods as a by product of querying.