Skip to content

Track Historical Information for the gitlab_subscriptions table

Why is this Important?

Currently our GitLab.com gitlab_subscriptions table only captures the current state of our data. This is blocking us from doing any meaningful analysis on our subscriptions because we can only see who's subscribed right now. We have no data on who was subscribed two weeks ago or even two hours ago, all we know about is the state in this current moment.

The reasoning is because of the backend design of our application. In our database all records are overwritten to reflect the current state, which is a good design from an application point-of-view, but makes analysis of our data impossible. This "throwing out" of data is causing GitLab Inc. to miss out on capturing tons of valuable information.

For example: if a customers starts a trial on gitlab.com, we'll create a gitlab_subscriptions record with trial=true. If they later upgrade to a Gold plan, the existing record gets overwritten with a new record that has trial=false. By throwing out all information about the trial ever existing, we'll never be able to measure the value or success of trials at GitLab. The exact same thing can be said about free plans, we currently have no way to see any of those conversions.

By preserving historical information, we will be able to calculate the value of trials, free plans, and the LTV of all visitors to GitLab.com. Paul Machle (CFO) has asked us multiple times for an LTV analysis on free users to GitLab.com and this issue is the single thing blocking us from doing that.

Proposal

We plan to track changes to the gitlab_subscriptions using Snowplow events.

The gitlab_subscripition records the current state of our gitlab_subscriptions. The records are slowly mutable. On the contrary, Snowplow will receive and store sequences of immutable events. Upon any creation, update or deletion event (does that happen ?) in our database a new row will be added with all metadata attached to the specific object. This table will allow us to keep track of all changes happening in the database. For example status changes of a specific subscription (from active to expire, or from trial started to active subscriptions...)

Requirement

Ideally, this tacking event will have all columns that are in gitlab_subscriptions` + extra metadata:

  • change_type: creation, update, deletion
    Preserve Historical Information for the gitlab_subscriptions Table

Design

Resorted back to a database based solution for the gitlab_subscriptions table as it will be quicker to unblock blocked data issues without telemetry concerns.

Performance

  • We must make sure backend tracking events are not lost, and have a certain degree of reliability.
  • Only Gitlab.com
Edited by Alper Akgun