Switch to async Batched Telemetry for faster GDK commands

Overview

As of today, we flush telemetry synchronously (batched) in a exit hook. This slows down GDK commands if they have to flush many events.

For example, gdk reconfigure is ~25% (or 1.3s) faster when telemetry is disabled (GDK_TELEMETRY=0):

$ hyperfine -w 1 'CI=1 gdk reconfigure' 'GDK_TELEMETRY=0 gdk reconfigure'
Benchmark 1: CI=1 gdk reconfigure
  Time (mean ± σ):      9.158 s ±  0.176 s    [User: 11.047 s, System: 3.120 s]
  Range (min … max):    8.871 s …  9.467 s    10 runs

Benchmark 2: GDK_TELEMETRY=0 gdk reconfigure
  Time (mean ± σ):      6.751 s ±  0.081 s    [User: 9.716 s, System: 2.862 s]
  Range (min … max):    6.669 s …  6.929 s    10 runs

Summary
  GDK_TELEMETRY=0 gdk reconfigure ran
    1.36 ± 0.03 times faster than CI=1 gdk reconfigure

Proposed solution

A few options:

Run GDK telemetry in a forked background process (detach from its parent) to not block the main process
Add a new GDK service which collects telemetry from the main GDK process and syncs it later to the collector endpoint

Impacted categories

The following categories relate to this issue:

gdk-reliability - e.g. When a GDK action fails to complete.
gdk-usability - e.g. Improvements or suggestions around how the GDK functions.
gdk-performance - e.g. When a GDK action is slow or times out.

Edited Apr 03, 2025 by Mohga Gamea