Discuss: Tracing and Pipelines integration for CI/CD debugging/optimization use cases
Summary
- Internal teams use Tracing to visualize pipeline job execution for optimizing Gitlab rails pipelines using a custom built tool.
- This solve several identified problems with the current pipelines details page when debugging/optimizing pipeline execution.
- Igor suggested as a first step towards product integration to link a pipeline's trace when available from the Pipeline details page.
Open Questions:
- Should we support this use case and provide a way to link the features when possible?
- If so, what UX and additional features could we provide in the future?
- How can we organize the work, as it requires cross-team collaboration?
- Else, if we do not want to support this use case this way, what is the alternative that we should communicate to customers?
Context
GitLab's platform engineering teams (including Engineering Productivity, Delivery, and Scalability) are using a custom-built tool that converts GitLab build pipelines into traces.
This enables them to use the trace timeline view to visualize the detailed execution timeline of GitLab rails monolith pipelines - which can take hours to execute and impact the productivity of all GitLab engineering teams.
With this view, they can troubleshoot pipeline performance issues by identifying which jobs are part of the critical path and slowing down the entire process. By identifying these bottlenecks, they can optimize job's execution —for example, making the job fail faster, or running more jobs in parallel—to improve overall pipeline efficiency.
- @igorwwwwwwwwwwwwwwwwwwww demonstrated the feature in this video (notes).
- Example traces are visible here.
- Example trace recording:
Distributed Tracing is a feature that is part of our upcoming Observability offering, initially built for monitoring application performance use cases.
While the current tracing UI is not initially optimized for monitoring pipelines, it effectively provides a boring solution to some problems that our internal teams, as well as customers, have with GitLab pipelines visualization, which have already been identified in the past and similar solutions have been explored (from a quick search in existing gitlab issues, feel free to edit):
- visualize the complete pipeline execution, including downstream pipeline
- Visualize all dependencies, implicit or explicit: #336564
- visualize pipeline jobs duration and progress: #330078, #330078, #323167, #2666, #460155
Suggested MVC solution
A first step suggested by @igorwwwwwwwwwwwwwwwwwwww at the 24 min mark of the video recording:
- from the pipeline details view, link to the corresponding trace when available
- as a tab, or link, from a generic annotation (annotate the pipeline with additional urls at creation)
- with some webhook-based auto import, or on-demand, to avoid having to use the command line.
- This way, users will be able to know when a pipeline is "trace-enabled" and may open the timeline view when needed to dig into detailed execution for troubleshooting.
Potential next iterations
If we decide to support this use case, we could further improve the UX and add other key functionalities to it - some ideas:
- Instead of requiring to configure a command line tool, the collection of pipeline data into traces could be something done in the back-end and user could enable the feature via a project-level setting.
- The trace list and timeline view UI could vary a bit from the default UI to include specific capabilities related to pipelines instead of application monitoring.
- Traces UI could be progressively integrated into the pipeline list and detail view page (as a tab? as an API that is called by the pipeline view page?)
- An additional feature could be linking individual jobs to their runner and related infrastructure metrics and logs, so user can identify and fix issues coming from this part of the stack (related customer insight: https://gitlab.com/gitlab-com/user-interviews/-/issues/29#note_1569976852. Of course, this would require additional work on the back-end in addition to the front-end UI to correlate these data.
- etc. - feel free to suggest other ideas