Design: Approach for iterative tables classification between clusterwide and pod local databases
Discussion
Kamil: Everything will move to gitlab_main_pod
or gitlab_main_cluster
. This is the difference to what is being described. The difference is that gitlab_main
won't exist in the end and we want to start converting tables one at a time to explicitly _pod
and anything without _pod
or _cluster
is still "undefined".
If a query uses gitlab_main
and gitlab_main_cluster
then this query is not undefined yet. If your query uses gitlab_main_cluster
and gitlab_main_pod
then this is clearly not allowed. You can assign the users
table to gitlab_main_cluster
and you can assign the namespaces
table gitlab_main_pod
so it should it should allow either of them to be joined to gitlab_main
or their same schema but never allow gitlab_main_cluster
and gitlab_main_pod
. You can target a single table at a time.
If we start with users
and namespaces
this should give a smaller list of violations to fix straight away. For example we don't need to tackle interaction between users
and personal_access_tokens
. If we record all tables being touched in the workflow of group creation then we could focus only on these tables and this should shrink the problem space. This should make it easier but we'll need to try it out to be sure.
SELECT * from namespaces inner join projects ....
=> gitlab_main_pod,gitlab_main
=> only relevant information is that this is gitlab_main_pod
as this is the most specific context you are working in. tables_to_schemas(['namespaces','projects']) => ['gitlab_main_pod']
. This works most places except cross-modifications builds up array of schemas over time and this needs to evict gitlab_main
when it sees a more specific gitlab_main_...
before detecting violations. That still needs to be added to !108462 (0d566e7e) .
Conclusion
Some conclusions from this discussion: (from @DylanGriffith)
- The choice of new schema
gitlab_main_cluster
orgitlab_cluster
is mostly just down to naming considering how we're implementing - This approach does require adapting our schema violation tooling but likely we'd need updates no matter what we choose. It seems like the approach described as
tables_to_schemas(['namespaces','projects']) => ['gitlab_main_pod']
will simplify the implementation and Kamil will continue this and getting it working in !108462 (closed) - Likely we won't have much to fix in
application_settings
decomposition and it's mostly done in this MR anyway. Things will only start to get more complicated when we markusers: gitlab_main_cluster, namespaces:gitlab_main_pod
as this will reveal a bunch of violations. We plan to only cascade this further to tables that are written during the creation of a new namespace and along with the trick in (2) above this will hopefully narrow our focus to a smaller set of tables for now - Already
application_settings
has raised a few important issues to be resolved by other teams and we'd like to start engaging these teams early to brainstorm ideas on how to proceed - Development on the "user creates namespace" workflow should be concurrent with working with other teams to evaluate the user-facing impacts of the problems we discover along the way as this helps derisk some major unknowns
- We don't want to fan out tasks to too many teams too widely initially until we have some clear documented patterns for how specific problems can be solved (a template of sorts)
Recording
From this recorded discussion https://www.youtube.com/watch?v=NLZvSZpb3fQ