Skip to content

Add Snowplow event tracker and client wrapper

Tan Le requested to merge add-snowplow-tracker into main

What does this MR do and why?

This adds the ability to track code suggestions events using Python Snowplow tracker.

  • Introduce Snowplow standard synchronous tracker and emitter
  • Introduce some environment variables to configure Snowplow tracking
    • SNOWPLOW_ENABLED - default to false, only enable in prod
    • SNOWPLOW_ENDPOINT - default to none, only set in prod
  • Add specs and ensure the dependency is stubbed
  • Use AsyncEmitter instead of synchronous emitter

Related to #192 (closed)

How to validate and test locally

We need to setup Snowplow Micro locally to instropect the event emitted from Model Gateway.

Untitled_scene-3

  1. Clone Snowplow Micro repo
  2. Run bash ./snowplow-micro.sh
  3. In another terminal session, build and run a Model Gateway container
    $ docker buildx build --platform linux/amd64 -t code-suggestions-api:dev .
    $ docker run -it --platform linux/amd64 --rm -v $PWD:/app -it code-suggestions-api:dev bash
    $ poetry run python
  4. Run the follow script in the Poetry shell session
    >>> from codesuggestions.tracking import *
    None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
    >>> client = SnowplowClient(configuration=SnowplowClientConfiguration(endpoint="http://host.docker.internal:9090"))
    INFO:snowplow_tracker.emitters:Emitter initialized with endpoint http://host.docker.internal:9090/com.snowplowanalytics.snowplow/tp2
    >>> client.track(SnowplowEvent(context=SnowplowEventContext(request_counts=[RequestCount(requests=1, errors=0, accepts=1, lang="python", model_engine="vertex-ai", model_name="code-gecko")],prefix_length=2048, suffix_length=1024, language="python", user_agent="vs-code-gitlab-workflow", gitlab_realm="saas")))
    INFO:snowplow_tracker.emitters:Attempting to send 1 events
    INFO:snowplow_tracker.emitters:Sending POST request to http://host.docker.internal:9090/com.snowplowanalytics.snowplow/tp2...
  5. Verify the events from Snowplow micro via http://localhost:9090/micro/good
    $ curl -s http://localhost:9090/micro/good | jq .[0].rawEvent
    {
      "api": {
        "vendor": "com.snowplowanalytics.snowplow",
        "version": "tp2"
      },
      "parameters": {
        "e": "se",
        "eid": "e86c72b2-8272-476c-98b2-0a9a4b575e07",
        "aid": "gitlab_ai_gateway",
        "cx": "eyJzY2hlbWEiOiAiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3Muc25vd3Bsb3cvY29udGV4dHMvanNvbnNjaGVtYS8xLTAtMSIsICJkYXRhIjogW3sic2NoZW1hIjogImlnbHU6Y29tLmdpdGxhYi9jb2RlX3N1Z2dlc3Rpb25zX2NvbnRleHQvanNvbnNjaGVtYS8xLTAtMCIsICJkYXRhIjogeyJyZXF1ZXN0X2NvdW50cyI6IFt7InJlcXVlc3RzIjogMSwgImVycm9ycyI6IDAsICJhY2NlcHRzIjogMSwgImxhbmciOiAicHl0aG9uIiwgIm1vZGVsX2VuZ2luZSI6ICJ2ZXJ0ZXgtYWkiLCAibW9kZWxfbmFtZSI6ICJjb2RlLWdlY2tvIn1dLCAicHJlZml4X2xlbmd0aCI6IDIwNDgsICJzdWZmaXhfbGVuZ3RoIjogMTAyNCwgImxhbmd1YWdlIjogInB5dGhvbiIsICJ1c2VyX2FnZW50IjogInZzLWNvZGUtZ2l0bGFiLXdvcmtmbG93IiwgImdpdGxhYl9yZWFsbSI6ICJzYWFzIn19XX0=",
        "tna": "gl",
        "stm": "1691159810000",
        "tv": "py-1.0.1",
        "se_ac": "suggestions_requested",
        "se_ca": "code_suggestions",
        "p": "pc",
        "dtm": "1691159810428"
      },
      "contentType": "application/json",
      "source": {
        "name": "snowplow-micro-1.7.2-stdout$",
        "encoding": "UTF-8",
        "hostname": "host.docker.internal"
      },
      "context": {
        "timestamp": "2023-08-04T14:36:50.597Z",
        "ipAddress": "172.17.0.1",
        "useragent": "python-requests/2.31.0",
        "refererUri": null,
        "headers": [
          "Timeout-Access: <function1>",
          "Host: host.docker.internal:9090",
          "User-Agent: python-requests/2.31.0",
          "Accept-Encoding: gzip, deflate",
          "Accept: */*",
          "Connection: keep-alive",
          "application/json"
        ],
        "userId": "e4d1fce8-2737-4e5d-af85-878fda9a3267"
      }
    }
  6. The context payload cx can be Base64-decoded
    echo -n "eyJzY2hlbWEiOiAiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3Muc25vd3Bsb3cvY29udGV4dHMvanNvbnNjaGVtYS8xLTAtMSIsICJkYXRhIjogW3sic2NoZW1hIjogImlnbHU6Y29tLmdpdGxhYi9jb2RlX3N1Z2dlc3Rpb25zX2NvbnRleHQvanNvbnNjaGVtYS8xLTAtMCIsICJkYXRhIjogeyJyZXF1ZXN0X2NvdW50cyI6IFt7InJlcXVlc3RzIjogMSwgImVycm9ycyI6IDAsICJhY2NlcHRzIjogMSwgImxhbmciOiAicHl0aG9uIiwgIm1vZGVsX2VuZ2luZSI6ICJ2ZXJ0ZXgtYWkiLCAibW9kZWxfbmFtZSI6ICJjb2RlLWdlY2tvIn1dLCAicHJlZml4X2xlbmd0aCI6IDIwNDgsICJzdWZmaXhfbGVuZ3RoIjogMTAyNCwgImxhbmd1YWdlIjogInB5dGhvbiIsICJ1c2VyX2FnZW50IjogInZzLWNvZGUtZ2l0bGFiLXdvcmtmbG93IiwgImdpdGxhYl9yZWFsbSI6ICJzYWFzIn19XX0=" | base64 --decode
    {"schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1", "data": [{"schema": "iglu:com.gitlab/code_suggestions_context/jsonschema/1-0-0", "data": {"request_counts": [{"requests": 1, "errors": 0, "accepts": 1, "lang": "python", "model_engine": "vertex-ai", "model_name": "code-gecko"}], "prefix_length": 2048, "suffix_length": 1024, "language": "python", "user_agent": "vs-code-gitlab-workflow", "gitlab_realm": "saas"}}]}
Edited by Tan Le

Merge request reports