Skip to content

Simplify and make selective sync queries faster

Problem to solve

With Geo and selective sync, we are making expensive joins in every place where we need to list/count/display specific projects that are under the constraints. We are also not removing related project_registry entries when t a project is moved to a different namespace/shard.

Further details

If we restrict to the only the cursor, the expensive queries, and we re-use existing database as a final/unique source of truth, we can skip the joins in every other place and just rely on project_registry.

Proposal

Have a flag in project_registry to mark a repository as disabled, so we know we need to remove from disk, we can restrict counts and the displaying of data by filtering by that flag while delaying removal for an after step.

So the idea is whenever there is a change in selective sync rules, existing code generates an event (RepositoriesChangedEvent) to re-evaluate existing projects.

That event will early trigger a mass UPDATE in project_registry defining disabled=true to any project not matching the new constraints. Doing that early we allow almost 'instant' feedback on our management data (API, Admin UI etc). Then event continue by doing what it currently does (cleanup repositories) plus removing the project_registry record after.

What does success look like, and how can we measure that?

We would be able to simplify queries everywhere that deals with selective sync, by not making the same joins, just relying on existing project_registry.

Links / references

cc @dbalexandre

Edited by Valery Sizov