If you are unsure about the correct group, please do not leave the issue without a group label, and refer to
GitLab's shared responsibility functionality guidelines
for more information on how to triage this kind of issue.
Do we plan to use organization_id as a sharding key for this table? It doesn't seem like the best fit, as we have a mix of rows connected to either group or project, and even global. In general we should avoid using organization_id as a sharding key, unless there is no other way - https://docs.gitlab.com/ee/development/cells/#using-organization_id-as-sharding-key. Ideally we should split the table per each type of push rule, though I do not currently see a column to distinguish between the different types.
Global push rule (only one per instance, has organization_id=1)
Is this global for the instance (i.e. clusteriwide data), or global for the organization? Either way we can't rely on organization_id=1, as the default organization is something temporary and will be removed at some point.
/cc @tkuah@alexpooley as they have some knowledge about organization isolation.
Do we plan to use organization_id as a sharding key for this table?
That was the plan. Well, then I should reconsider that.
push_rules table is quite a mess that includes 3 different types of records:
For project push rules, we can use project_id as a sharding key.
For group push rules is trickier, because there is no group_id field on the table. Instead it's a reverse relation from namespaces.push_rule_id that points to push_rules.
Global push rule is instance wide. It was connected to the default organization. It doesn't have a connection to group or projects.
I really appreciate any suggestions and ideas. Currently, I don't see a viable option apart from redesigning the whole table (it will take a while) or using organization_id as a sharding key.
What is a global push rule ? Does it really have to be instance wide, or can it be made per-organization ?
I think it can be moved to organization level from the instance level. Do you know if we have an example of a similar migration of instance level setting into the organization level with a backward compatibility?
I think it will work, but I'm concerned about the scope of this change.
It will be a large migration that requires many steps and milestones to be implemented. I'd say it will be similar in amount of work with an attempt to split the push_rules table into project_push_rules, group_push_rules and organization_push_rules.
I wonder if we have an option to introduce a Cells support with the current table layout or with less work required? I believe there is an expectation to have it done by November.
I wonder if we have an option to introduce a Cells support with the current table layout or with less work required? I believe there is an expectation to have it done by November.
The November deadline is only for sharding keys on organization_id. For the rest (project_id/namespace_id) we have more time.
In this case, what do you think about extracting group_push_rules and organization_push_rules from push_rules idea? If we have more time, then I see a reason to provide a clear separation between different push rules types rather than try to store them in the same table as we do now.
Proposal
Extract group push rules
Create table group_push_rules with a similar structure as push_rules but with a group_id instead of project_id
Update Rails code to save new group push rules in both push_rules and group_push_rules tables.
Migrate old group push rules from push_rules into group_push_rules
Verify that both tables are in sync and contain same records
Start reading from group_push_rules to fetch push rules data
Delete group rules from push_rules
Extract organizations push rules
Create table organization_push_rules with a similar structure as push_rules but with a organization_id instead of project_id
Update Rails code to save modification of organization push rules in both push_rules and organization_push_rules tables.
Copy a single global push rule record into organization_push_rules
Delete global push rule from push_rules
Clean-up
Remove organization_id from push_rules
Enforce project_id presence in push_rules
Result
We will have three tables:
push_rules - for project push rules (sharding_key: :project_id)
group_push_rules - for group push rules (sharding_key: :group_id)
organization_push_rules - for organization push rules (sharding_key: :organization_id)
@vyaklushin Splitting tables that hold data with mixed scope (e.g. instance/group/projects) is good idea in general. As long we're going to have this on time (which Cells phase 7, not sure for the exact date), we can do this. @tkuah can you provide some more details on the timeframe?
Update Rails code to save new group push rules in both push_rules and group_push_rules tables.
This may also be done with database triggers.
Start reading from group_push_rules to fetch push rules data
What about writes? At the time of switch we'll have some nodes using the push_rules table and some the new tables? Ideally we should keep it reversible.
It will be a large migration that requires many steps and milestones to be implemented. I'd say it will be similar in amount of work with an attempt to split the push_rules table into project_push_rules, group_push_rules and organization_push_rules.
What about writes? At the time of switch we'll have some nodes using the push_rules table and some the new tables? Ideally we should keep it reversible.
Writes should be already in sync after step 2 (Update Rails code to save new group push rules in both 'push_rules' and 'group_push_rules' tables.). Basically, for each update of group push rules we will update both push_rules and group_push_rules in the same transaction.
After it's done, we switch reading operations from push_rules to group_push_rules and treat group_push_rules as a SSOT for group push rules. Rails can stop updating group push rules in push_rules and we can remove group related records from push_rules.
Probably yes, but let's consider our end goal. I think our preference is to have separate tables for each type of push rules, right?
So, rather than migrating twice (from current state to #476212 (comment 2128054383) and eventually to #476212 (comment 2135610514)), we can move directly to the final state skipping intermediate stops. It might take more time but it still will be faster than two migrations combined.
@vyaklushin OK. BTW for uploads I am planning to split the table at the database level, but keep the app unchanged, while also leave the door open to start using each new table directly.
@vyaklushin Let's either keep it open until all requirements are met, or update the list with an issue that will achieve this. Will make tracking this effort easier.