Disable service finish instrumentation temporarily
What does this merge request do and why?
Partial revert of !5151 (merged).
As found in https://gitlab.com/gitlab-org/gitlab-development-kit/-/issues/2961#note_2726973576 crash-looping services cause a significant spike in GDK telemetry events without real value:
Proposal:
- Partially revert Instrument service finish (!5151 - merged) for now to reduce the amount of events again
- Re-implement Instrument service finish (!5151 - merged) with crash-loop detection
See
Why partial?
Removing finish.erb does not remove generated finish files in sv/*/finish.
Instead of adding a migration to remove these files we are replacing the content of the template as no-op.
How to set up and validate locally
gdk stopgit checkout pl-disable-service-telemetrygdk startgdk stop- Check if for event names
Custom service_finishin Clickhouse:
select derived_tstamp, user_id, custom_event_name, custom_event_props,
visitParamExtractString(custom_event_props, 'value') as service,
JSONExtractString(custom_event_props, 'extras', 'exit_code') as exit_code
from snowplow_events
where custom_event_name = 'Custom service_finish'
and user_id = '<YOUR USERID>'
order by derived_tstamp DESC
limit 100
Impacted categories
The following categories relate to this merge request:
-
gdk-reliability - e.g. When a GDK action fails to complete. -
gdk-usability - e.g. Improvements or suggestions around how the GDK functions. -
gdk-performance - e.g. When a GDK action is slow or times out.
Merge request checklist
-
This MR references an issue describing the change. -
This change is backward compatible. If not, please include steps to communicate to our users. -
Tests added for new functionality. If not, please raise an issue to follow-up. -
Documentation added/updated, if needed. -
Announcement added, if change is notable. -
gdk doctortest added, if needed.
Edited by Manuel Schönlaub