Skip to content

Research possible race condition when toggling FF - geo_project_repository_replication_v2

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Researcg possible race conditions based on the conversation below:

The following discussion from !209787 (merged) should be addressed:

Aakriti Gupta @aakriti.gupta 2 days ago Author Maintainer

The replicator stays the same for both versions of replication whether based on projects or ProjectRepositories.

But, we want to process a project or project_repository update event only when the corresponding replication version is turned on.

If a model_record of ProjectRepository type is updated, we should process it only when v2 is on. Same for Project updates.

Without this logic, 2 events will be created for the same update.

Michael Kozono @mkozono 1 day ago Maintainer

Ok makes sense not to create 2 events for the same update 👍

Related to this, is there a small race condition? 🤔 When enabling the feature flag (or when upgrading to a version where the FF is defaulted on or removed), a ProjectRepository event may be created, and the secondary site processes it while the FF is still off due to caching. One solution might be to make the secondary site able to consume and act on either kind of event (Project or ProjectRepository) regardless of the FF state.

Aakriti Gupta @aakriti.gupta 22 hours ago Author Maintainer

Not sure I completely understand - how do you imagine both events being consumed? 🤔

Btw, there will not be a race condition between the FF being enabled and the ProjectRepositoryReplicator choosing ProjectRepository as the model_class, because the model_class method will read the value of the FF as well.

Michael Kozono @mkozono 11 hours ago Maintainer

Sorry, I’m not 100% certain there is one. But I wouldn’t be surprised if we found one. For example, if we automatically created a project repo every second and then flipped the flag on and off. There is an issue where the secondary site sometimes takes over a minute to propagate a feature flag change. We do have a special Geo event for invalidating the cache when a feature flag has changed, but since all events are processed by Sidekiq jobs, there’s an opportunity for a race there too.

I’m thinking that on the consume side you can get a project from a project repository ID and vice versa, so v1 and v2 can call or enqueue the other, if the FF is the opposite of what they expect. WDYT? Or maybe just open an issue to check on this before releasing?

Sorry, I’m not 100% certain there is one. But I wouldn’t be surprised if we found one. For example, if we automatically created a project repo every second and then flipped the flag on and off. There is an issue where the secondary site sometimes takes over a minute to propagate a feature flag change. We do have a special Geo event for invalidating the cache when a feature flag has changed, but since all events are processed by Sidekiq jobs, there’s an opportunity for a race there too.

I’m thinking that on the consume side you can get a project from a project repository ID and vice versa, so v1 and v2 can call or enqueue the other, if the FF is the opposite of what they expect. WDYT? Or maybe just open an issue to check on this before releasing?

Edited by 🤖 GitLab Bot 🤖