Cloud Native Build Logs on gitlab.com
## Description
This epic tracks work that needs to be done to enable Cloud Native Build Logs (Live Traces) on gitlab.com to remove one of NFS dependencies and unblock Kubernetes migration.
For more information about what Cloud Native Build Logs are and how the new architecture looks like, please see architecture blueprint (draft merge request) - https://gitlab.com/gitlab-com/www-gitlab-com/-/merge_requests/59964
Epic about making Cloud Native Build Logs generally available, that tracks an extension of this work, is available here :arrow_right: https://gitlab.com/groups/gitlab-org/-/epics/3791
## Roadmap
<table>
<thead>
<tr>
<th>Work item</th>
<th>Issue</th>
<th>Status</th>
<th>DRI</th>
<th>ETA</th>
</tr>
</thead>
<tbody>
<tr>
<td>Improve performance related to Redis usage</td>
<td>https://gitlab.com/gitlab-org/gitlab/-/issues/34781</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/37075) in %"13.3"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td>Validate Redis performance improvements</td>
<td>https://gitlab.com/gitlab-org/gitlab/-/issues/217988#rollout-plan</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/issues/217988#note_395531515) in %"13.3"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td>Calculate CRC32 checksums of trace chunks</td>
<td>https://gitlab.com/gitlab-org/gitlab/-/issues/241490</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/40506) in %"13.4"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td>Send CRC32 checksum of a build trace from a runner</td>
<td>https://gitlab.com/gitlab-org/gitlab-runner/-/issues/26545</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2375) in %"13.4"</td>
<td>@ajwalker, @ayufan</td>
<td>In production</td>
</tr>
<tr>
<td>Make cloud native build logs resilient</td>
<td>https://gitlab.com/gitlab-org/gitlab/-/issues/232533</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/41304) in %"13.4"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Refine underlying data models in Runner used for trace / build updates
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2388) in %"13.4"</td>
<td>@ayufan</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Add support for a variable build / trace update interval
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2389) in %"13.4"</td>
<td>@ayufan</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Make it possible to rewind a build trace in case of validation errors
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2390) in %"13.4"</td>
<td>@ayufan</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Make build trace flush worker idempotent
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/41579) in %"13.4"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Introduce build pending state backend model
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/41585) in %"13.4"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td>Avoid mutating build logs using legacy secrets masking</td>
<td>https://gitlab.com/gitlab-org/gitlab/-/issues/241189</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/40408) in %"13.4"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td>Improve observability</td>
<td>https://gitlab.com/gitlab-org/gitlab/-/issues/227182</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/42359) in %"13.4"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td>Validate correctness of traces and log invalid ones</td>
<td>https://gitlab.com/gitlab-org/gitlab/-/issues/228877</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/42829) in %"13.5"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Add build status update exponential backoff mechanism
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2389) in %"13.5"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td>Incremental rollout on gitlab.com</td>
<td>https://gitlab.com/gitlab-org/gitlab/-/issues/241471</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/issues/241471#note_436999367) in %"13.6"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Fix trace rate metric
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/43587) in %"13.5"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Parse CRC32 checksum provided in hexadecimal
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/43718/) in %"13.5"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Ensure that runner exponential backoff is an integer
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/43849) in %"13.5"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Extend exception about chunk data not fulfilled in a bucket
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/44007) in %"13.5"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Make build trace correctness validation sticky
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/44019) in %"13.5" </td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Retry a build trace chunk migration in case of an exception
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/44299) in %"13.5" </td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Log detected invalid build trace chunks
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/44409) in %"13.5" </td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Use optimistic locking to safely migrate a build trace chunk
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/44588) in %"13.5" </td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Resolve live trace read race condition using a retry
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/44988) in %"13.5" </td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Reduce the noise generated by locked chunk migration
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/45128) in %"13.5" </td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Delay archive trace operation to fix race condition
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/45043) in %"13.5"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Add build trace chunks migration duration histogram metric
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/45516) in %"13.6"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Improve trace finalize histogram buckets
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/45648) in %"13.6"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Fix NoMethodError when chunks are being removed
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/45657) in %"13.6"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Deduplicate build trace chunks flush worker
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/42223) in %"13.6"</td>
<td>@nmilojevic1</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Add Grape content logger to log content length and range
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46128) in %"13.6"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=5><b>Cloud Native Build Logs enabled on Gitlab.com in 100% of projects</b></td>
</tr>
<tr>
<td colspan=2>Fix Azure cloud storage adapter</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46209) in %"13.6"</td>
<td>@stanhu</td>
<td>In production</td>
</tr>
<tr>
<td>Observability improvements post-rollout</td>
<td>https://gitlab.com/gitlab-org/gitlab/-/issues/273756</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46877#note_442223676) in %"13.6"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Log trace range violation errors with additional metadata
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46256) in %"13.6"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
<tr>
<td colspan=2>
–› Reduce the noise created by pending state constraints violation
</td>
<td>:heavy_check_mark: [done](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46877) in %"13.6"</td>
<td>@grzesiek</td>
<td>In production</td>
</tr>
</tbody>
</table>
epic