Add meta.root_caller_id for Sidekiq jobs
What does this MR do and why?
The idea is that we want to distinguish which jobs ultimately comes from
a Cronjob
- Why? Because we want to do a controlled shutdown, we want to quickly any non-Cronjob initiated jobs before we decide to shutdown Sidekiq pods.
- Why ? Because we want to minimise any non-Cronjob initiated jobs after we bring up the side
- Why ? Because we want to validate with QA first. If QA fails then we have to roll back. We want to minimise any "real" data loss if we have to roll back
Related to gitlab-com/gl-infra/db-migration!181 (merged), gitlab-com/gl-infra/production#7064 (closed)
Note: We don't fill in meta.root_caller_id
for web requests, as it's not needed currently
Screenshots or screen recordings
How to set up and validate locally
There are a few things to observe.
Cronjobs
- Simply observe
log/sidekiq.log
. - You will eventually notice some Cronjobs run which in turn trigger other jobs. You can then see
meta.root_caller_id
: - Grep for the correlation_id. All sidekiq jobs with that correlation_id should have the same
meta.root_caller_id
{"severity":"INFO","time":"2022-05-19T00:00:31.883Z","retry":3,"queue":"default","backtrace":true,"version":0,"queue_namespace":"pipeline_default","args":["1050"],"class":"Ci::MergeRequests::AddTodoWhenBuildFailsWorker","jid":"05c48439e7f815627473d231","created_at":"2022-05-19T00:00:31.882Z","meta.caller_id":"BuildFinishedWorker","correlation_id":"928841d7708c8f5e19f7cb480c6afe93","meta.feature_category":"continuous_integration","meta.project":"gitlab-org/gitlab-shell","meta.root_namespace":"gitlab-org","meta.client_id":"ip/","meta.root_caller_id":"Cronjob","worker_data_consistency":"always","idempotency_key":"resque:gitlab:duplicate:default:de082e6dfc9235edd9f5fe6811df04c3f9745151c058b4cb06c23f5d4d4829fc","size_limiter":"validated","enqueued_at":"2022-05-19T00:00:31.883Z","job_size_bytes":6,"pid":86190,"message":"Ci::MergeRequests::AddTodoWhenBuildFailsWorker JID-05c48439e7f815627473d231: start","job_status":"start","scheduling_latency_s":0.000434}
Normal jobs
- The easiest way is to run a pipeline. It will trigger some Sidekiq jobs.
- In
log/sidekiq.log
, you can then see seemeta.root_caller_id
: - Grep for the correlation_id. All sidekiq jobs with that correlation_id should have the same
meta.root_caller_id
{"severity":"INFO","time":"2022-05-19T00:09:34.192Z","retry":3,"queue":"default","backtrace":true,"version":0,"queue_namespace":"pipeline_hooks","args":["1068"],"class":"BuildHooksWorker","jid":"152038142968abd249786d1a","created_at":"2022-05-19T00:09:32.225Z","correlation_id":"01G3CTBRV1QGF67GQHY85GZDJG","meta.user":"root","meta.project":"gitlab-org/gitlab-shell","meta.root_namespace":"gitlab-org","meta.client_id":"user/1","meta.caller_id":"Ci::InitialPipelineProcessWorker","meta.remote_ip":"127.0.0.1","meta.feature_category":"continuous_integration","meta.subscription_plan":"default","meta.root_caller_id":"Ci::InitialPipelineProcessWorker","worker_data_consistency":"delayed","wal_locations":{},"idempotency_key":"resque:gitlab:duplicate:default:8a9227dc7d726bc008e308898b4445ae2bcd24d2d2f70f86c429cb9657bab5e8","size_limiter":"validated","enqueued_at":"2022-05-19T00:09:32.225Z","job_size_bytes":6,"pid":86190,"message":"BuildHooksWorker JID-152038142968abd249786d1a: done: 1.965483 sec","job_status":"done","scheduling_latency_s":0.001313,"rugged_calls":1,"rugged_duration_s":0.001777,"redis_calls":1,"redis_duration_s":0.000363,"redis_read_bytes":10,"redis_write_bytes":312,"redis_queues_calls":1,"redis_queues_duration_s":0.000363,"redis_queues_read_bytes":10,"redis_queues_write_bytes":312,"db_count":45,"db_write_count":0,"db_cached_count":6,"db_replica_count":0,"db_primary_count":45,"db_main_count":39,"db_main_replica_count":0,"db_ci_count":6,"db_ci_replica_count":0,"db_replica_cached_count":0,"db_primary_cached_count":6,"db_main_cached_count":6,"db_main_replica_cached_count":0,"db_ci_cached_count":0,"db_ci_replica_cached_count":0,"db_replica_wal_count":0,"db_primary_wal_count":0,"db_main_wal_count":0,"db_main_replica_wal_count":0,"db_ci_wal_count":0,"db_ci_replica_wal_count":0,"db_replica_wal_cached_count":0,"db_primary_wal_cached_count":0,"db_main_wal_cached_count":0,"db_main_replica_wal_cached_count":0,"db_ci_wal_cached_count":0,"db_ci_replica_wal_cached_count":0,"db_replica_duration_s":0.0,"db_primary_duration_s":0.042,"db_main_duration_s":0.038,"db_main_replica_duration_s":0.0,"db_ci_duration_s":0.005,"db_ci_replica_duration_s":0.0,"cpu_s":0.029943,"rate_limiting_gates":[],"duration_s":1.965483,"completed_at":"2022-05-19T00:09:34.192Z","load_balancing_strategy":"primary_no_wal","db_duration_s":0.006113}
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Thong Kuah