Monitoring Integration of pipelines with Datadog
Why are we doing this work
We want a seamless integration for users of Gitlab CI with Datadog. At Datadog we have an internal v1 working with the standard webhooks but we would like to provide a native Gitlab integration for:
- Self-discovery by users.
- Making configuration easier.
- Prepare the way to supporting additional features that are not available with the current webhooks.
The first version of this integration would only have in scope the automatic set-up of webhooks with our current implementation. The events that we need are Pipeline Hook
, Job Hook
.
What does our current implementation with webhooks do?
- Seeing Gitlab pipelines as Datadog traces.
- Metrics and dashboards for pipelines over time: avg pipeline durations, stage/job time breakdowns (WIP)
Things we want to add/improve (ordered by priority)
- Exclude individual job retries from calculated metrics. Urgent since this blocks calculating accurate metrics over time.
- Link these individual retries as subtraces of the initial pipeline.
- Also link related traces for full pipeline retries, downstream/upstream pipelines.
- Provide full content of job log artifacts to integrate with Datadog logs
Relevant links
- Kickoff meeting docs
- Similar implementation of Jenkins integration Provided by @Andysoiron in meeting
- Current Datadog integration. This one's for metrics and stats. We are working on something more developer centric.
Implementation plan
v1
-
Add project service integration with form to configure webhooks to Datadog. Hidden by feature flag until GA on both sides !46564 (merged) -
Add hostname of the gitlab-runner to the hooks(delayed, it could be unblocked by #20688) -
Remove feature flag and make the integration public #284088 (closed)
v1.1 (for discussion)
-
Extend webhooks with information to differentiate retried jobs. This might be possible through pipeline.builds.latest
v1.2 (also for discussion)
-
Extend webhooks with information on related pipelines: child downstream pipeline (or retry) of pipeline-XX -
Provide full content of job log artifacts to integrate with Datadog logs
Customer Questions
This issue is for features and technical discussion. The availability is still private alpha. For customer questions regarding the alpha os testing this integration please reach out to Bryan