Simplify and make selective sync queries faster
Problem to solve
With Geo and selective sync, we are making expensive joins in every place where we need to list/count/display specific projects that are under the constraints. We are also not removing related project_registry
entries when t a project is moved to a different namespace/shard.
Further details
If we restrict to the only the cursor, the expensive queries, and we re-use existing database as a final/unique source of truth, we can skip the joins in every other place and just rely on project_registry
.
Proposal
Have a flag in project_registry
to mark a repository as disabled
, so we know we need to remove from disk, we can restrict counts and the displaying of data by filtering by that flag while delaying removal for an after step.
So the idea is whenever there is a change in selective sync rules, existing code generates an event (RepositoriesChangedEvent) to re-evaluate existing projects.
That event will early trigger a mass UPDATE
in project_registry defining disabled=true
to any project not matching the new constraints. Doing that early we allow almost 'instant' feedback on our management data (API, Admin UI etc). Then event continue by doing what it currently does (cleanup repositories) plus removing the project_registry
record after.
What does success look like, and how can we measure that?
We would be able to simplify queries everywhere that deals with selective sync, by not making the same joins, just relying on existing project_registry.
Links / references
cc @dbalexandre