Rollout - Cloud Native Build Logs - Gitlab.com
Description
This issue describes an incremental rollout strategy we are going to use to enable Cloud Native Build Logs for everyone on gitlab.com
Checklist
Item | State | DRI |
---|---|---|
|
@grzesiek | |
|
@grzesiek | |
–› Fix build trace rate metric |
|
@grzesiek |
|
@grzesiek | |
|
@grzesiek | |
–› Parse CRC32 checksum provided in hexadecimal |
|
@grzesiek |
|
@grzesiek | |
–› Ensure that runner exponential backoff is an integer |
|
@grzesiek |
|
@grzesiek | |
–› gitlab-org/gitlab rollout production change
|
|
@grzesiek, @ahmadsherif |
–› Extend exception about chunk data not fulfilled in a bucket |
|
@grzesiek |
–› Make build trace correctness validation sticky |
|
@grzesiek |
–› Retry a build trace chunk migration in case of an exception |
|
@grzesiek |
–› Log detected invalid build trace chunks |
|
@grzesiek |
–› Use optimistic locking to safely migrate a build trace chunk |
|
@grzesiek |
|
@grzesiek | |
–› gitlab-com/www-gitlab-com rollout production change
|
|
@grzesiek, @igorwwwwwwwwwwwwwwwwwwww |
|
@grzesiek |
–› |
|
@grzesiek, @igorwwwwwwwwwwwwwwwwwwww |
|
@grzesiek | |
–› |
|
@grzesiek, @igorwwwwwwwwwwwwwwwwwwww |
–› Resolve live trace read race condition using a retry |
|
@grzesiek |
–› Reduce the noise generated by locked chunk migration |
|
@grzesiek |
–› Delay archive trace operation to fix race condition |
|
@grzesiek |
–› Add build trace chunks migration duration histogram metric |
|
@grzesiek |
|
@grzesiek | |
–› |
|
@grzesiek, @hphilipps |
–› Improve trace finalize histogram buckets |
|
@grzesiek |
–› Fix NoMethodError when chunks are being removed |
|
@grzesiek |
|
@grzesiek | |
–› |
|
@grzesiek, @hphilipps |
–› Add Grape content logger to log content length and range |
|
@grzesiek |
|
@grzesiek | |
–› |
|
@grzesiek, @hphilipps |
–› Deduplicate build trace chunks flush worker |
|
@nmilojevic1 |
|
@grzesiek | |
–› |
|
@grzesiek |
Metrics
- Sentry errors containing "trace" keyword -> https://sentry.gitlab.net/gitlab/gitlabcom/?query=trace
- API dashboard for build status / trace operations - PUT /api/jobs/:id / PATCH /api/jobs/:id/trace
- Build details page -> GET trace.json / GET raw
- Redis memory -> Redis Overview Dashboard
New metrics exposed in Prometheus:
gitlab_ci_trace_operations_total
gitlab_ci_trace_rate_bytes
Logs
KQLs:
# PATCH trace
json.meta.project : "grzesiek/live-traces-sandbox" and json.method: "PATCH" and json.route: "/api/:version/jobs/:id/trace"
# PUT job
json.meta.project : "grzesiek/live-traces-sandbox" and json.method: "PUT" and json.route: "/api/:version/jobs/:id"
Feature Flags
-
ci_enable_live_trace
- main feature flag to enable / disable cloud native build logs -
ci_accept_trace
- feature flag for the new mechanism responsible for validating traces
Edited by Grzegorz Bizon