Avoid conflicts between ArchiveTracesCronWorker and ArchiveTraceWorker
What does this MR do?
ArchiveTraceWorker
runs after pipeline job finished as a generic lifecycle. It archives a live trace and stores the trace data in a permanent storage. This process might fail when external services are not operational, for example, S3 has an incident, then the live trace stays intact and archive process via ArchiveTraceWorker
will not happen again.
ArchiveTracesCronWorker
runs periodically as a Cron Worker for archiving live traces. This worker's main purpose is to rescue the stale live traces which could not have been archived by ArchiveTraceWorker
. It runs once per hour not to leave unarchived behind.
Now the problem is that ArchiveTraceWorker
and ArchiveTracesCronWorker
could run simultaneously and cause a race condition. We suspect that this race condition cause a potential production data loss, we avoid the case by explicitly targeting stale live traces in ArchiveTracesCronWorker
.
Ci::Build.with_stale_live_trace.find_each(batch_size: 100)
Query Plan for Before
Limit (cost=6718.91..6719.16 rows=100 width=1388) (actual time=10.097..10.123 rows=100 loops=1)
Buffers: shared hit=3671
-> Sort (cost=6718.91..6720.77 rows=742 width=1388) (actual time=10.096..10.113 rows=100 loops=1)
Sort Key: ci_builds.id
Sort Method: top-N heapsort Memory: 172kB
Buffers: shared hit=3671
-> Nested Loop (cost=2243.47..6690.56 rows=742 width=1388) (actual time=3.255..9.283 rows=615 loops=1)
Buffers: shared hit=3668
-> HashAggregate (cost=2242.90..2252.54 rows=964 width=4) (actual time=3.229..3.327 rows=622 loops=1)
Group Key: ci_build_trace_chunks.build_id
Buffers: shared hit=555
-> Seq Scan on public.ci_build_trace_chunks (cost=0.00..2238.32 rows=1832 width=4) (actual time=0.007..2.786 rows=1179 loops=1)
Buffers: shared hit=555
-> Index Scan using ci_builds_pkey on public.ci_builds (cost=0.57..4.59 rows=1 width=1388) (actual time=0.009..0.009 rows=1 loops=622)
Index Cond: (ci_builds.id = ci_build_trace_chunks.build_id)
Filter: (((ci_builds.type)::text = 'Ci::Build'::text) AND ((ci_builds.status)::text = ANY ('{success,failed,canceled}'::text[])))
Rows Removed by Filter: 0
Buffers: shared hit=3113
Planning time: 7.027 ms
Execution time: 10.188 ms
Total Cost: 6720.77
Buffers Hit: 3671
Buffers Written: 0
Buffers Read: 0
After
Limit (cost=6717.63..6717.88 rows=100 width=1388) (actual time=9.521..9.552 rows=100 loops=1)
Buffers: shared hit=3671
-> Sort (cost=6717.63..6719.48 rows=740 width=1388) (actual time=9.519..9.536 rows=100 loops=1)
Sort Key: ci_builds.id
Sort Method: top-N heapsort Memory: 172kB
Buffers: shared hit=3671
-> Nested Loop (cost=2243.47..6689.35 rows=740 width=1388) (actual time=3.050..8.745 rows=615 loops=1)
Buffers: shared hit=3668
-> HashAggregate (cost=2242.90..2252.54 rows=964 width=4) (actual time=3.020..3.141 rows=622 loops=1)
Group Key: ci_build_trace_chunks.build_id
Buffers: shared hit=555
-> Seq Scan on public.ci_build_trace_chunks (cost=0.00..2238.32 rows=1832 width=4) (actual time=0.008..2.595 rows=1179 loops=1)
Buffers: shared hit=555
-> Index Scan using ci_builds_pkey on public.ci_builds (cost=0.57..4.59 rows=1 width=1388) (actual time=0.008..0.009 rows=1 loops=622)
Index Cond: (ci_builds.id = ci_build_trace_chunks.build_id)
Filter: ((ci_builds.finished_at < '2019-08-01 15:46:08.23754'::timestamp without time zone) AND ((ci_builds.type)::text = 'Ci::Build'::text))
Rows Removed by Filter: 0
Buffers: shared hit=3113
Planning time: 7.061 ms
Execution time: 9.619 ms
Total Cost: 6719.48
Buffers Hit: 3671
Buffers Written: 0
Buffers Read: 0
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry for user-facing changes, or community contribution. Check the link for other scenarios. - [-] Documentation created/updated or follow-up review issue created
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Performance and testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team