Atomic CUD operations with SideKiq

Description

Today, Sidekiq handles a lot of asynchronous Create/Update/Delete operations.

- pipeline_creation:create_pipeline             # Create a pipeline
- pipeline_creation:run_pipeline_schedule       # Create a manual pipeline from a pipeline schedule
- pipeline_default:build_coverage               # Update coverage of ci_builds
- pipeline_default:build_trace_sections         # Create trace sections associated with ci_builds
- pipeline_default:create_trace_artifact        # Create a trace artifact
- cronjob:expire_build_artifacts                # Delete expired artifacts
- object_storage:object_storage_background_move # Migrate a trace artifact from FileStoarge to ObjectStorage
- pipeline_default:pipeline_metrics             # Update metrics
- pipeline_default:update_head_pipeline_for_merge_request # Update head pipeline id for MR
- pipeline_processing:build_success             # Create deployments
- pipeline_processing:pipeline_process          # Proceed pipeline
- pipeline_processing:pipeline_update           # Update pipeline final status 
- pipeline_processing:stage_update              # Update pipelien stage status
- cronjob:pipeline_schedule                     # Create a pipeline from pipeline schedule periodically
- cronjob:stuck_ci_jobs                         # Drop jobs which took a long time
- object_storage:archive_legacy_trace           # Archive live traces

The problem is that those workers are not atomic per subject (e.g. job, job artifact, pipeline, environment).

For example, ObjectStorage::BackgroundMoveWorker and ExpireBuildArtifactsWorker can run concurrently to the same job artifact. What happens if an artifact is moving to another storage and concurrently if it's erased? In such case, we have to check the implementation and sort out possible race conditions. This is a very time consuming task.

Also, we had a bug that migrate! method incurs data loss when it's accessed concurrently. We fixed this bug by wrapping the method in ExclusiveLease, but this doesn't mean this subject is fully atomic from any other CUD operations.

Proposal

I wonder if we can enqueue sidekiq queues per subject. For exmaple,

  1. SidekiqWorker-A is working for job_id: 123
  2. SidekiqWorker-B is enqueued for job_id: 123, this will not be executed immediately because there are some queues for the subject.
  3. SidekiqWorker-B is enqueued for job_id: 76, this will be executed because there are no queues for the subject.

This is something similar concept of "Channel" from message broker (e.g. rabbit-mq).

Concerns

  • This doesn't take into account of native invocation(e.g. obj.migrate! in Unicron thread), so maybe we should have a lock mechanism beside application logic, but there is a downside that it just can skip the process instead of waiting due to avoiding to hold Unicorn process. So still using Sidekiq has a lot advantages.

Links / references

/cc @ayufan @grzesiek

Assignee Loading
Time tracking Loading