Switch to async Batched Telemetry for faster GDK commands
Overview
As of today, we flush telemetry synchronously (batched) in a exit hook. This slows down GDK commands if they have to flush many events.
For example, gdk reconfigure is ~25% (or 1.3s) faster when telemetry is disabled (GDK_TELEMETRY=0):
$ hyperfine -w 1 'CI=1 gdk reconfigure' 'GDK_TELEMETRY=0 gdk reconfigure'
Benchmark 1: CI=1 gdk reconfigure
Time (mean ± σ): 9.158 s ± 0.176 s [User: 11.047 s, System: 3.120 s]
Range (min … max): 8.871 s … 9.467 s 10 runs
Benchmark 2: GDK_TELEMETRY=0 gdk reconfigure
Time (mean ± σ): 6.751 s ± 0.081 s [User: 9.716 s, System: 2.862 s]
Range (min … max): 6.669 s … 6.929 s 10 runs
Summary
GDK_TELEMETRY=0 gdk reconfigure ran
1.36 ± 0.03 times faster than CI=1 gdk reconfigure
Proposed solution
A few options:
- Run GDK telemetry in a forked background process (detach from its parent) to not block the main process
- Add a new GDK service which collects telemetry from the main GDK process and syncs it later to the collector endpoint
Impacted categories
The following categories relate to this issue:
-
gdk-reliability - e.g. When a GDK action fails to complete. -
gdk-usability - e.g. Improvements or suggestions around how the GDK functions. -
gdk-performance - e.g. When a GDK action is slow or times out.
Edited by Mohga Gamea