Preserve Historical Information for the `gitlab_subscriptions` Table
Why is this Important?
Currently our GitLab.com
gitlab_subscriptions table only captures the current state of our data. This is blocking us from doing any meaningful analysis on our subscriptions because we can only see who's subscribed right now. We have no data on who was subscribed two weeks ago or even two hours ago, all we know about is the state in this current moment.
The reasoning is because of the backend design of our application. In our database all records are overwritten to reflect the current state, which is a good design from an application point-of-view, but makes analysis of our data impossible. This "throwing out" of data is causing GitLab Inc. to miss out on capturing tons of valuable information.
For example: if a customers starts a trial on gitlab.com, we'll create a
gitlab_subscriptions record with
is_trial=True. If they later upgrade to a Gold plan, the existing record gets overwritten with a new record that has
is_trial=False. By throwing out all information about the trial ever existing, we'll never be able to measure the value or success of trials at GitLab. The exact same thing can be said about free plans, we currently have no way to see any of those conversions.
By preserving historical information, we will be able to calculate the value of trials, free plans, and the LTV of all visitors to GitLab.com. Paul Machle (CFO) has asked us multiple times for an LTV analysis on free users to GitLab.com and this issue is the single thing blocking us from doing that.
We would like to see a
gitlab_subscriptions_logs table added that keeps a record of all changes to
gitlab_subscriptions. Having a separate table would allow us to leave the original
gitlab_subscriptions alone and not having to risk breaking anything or slowing anything down.
The gitlab_subscripition records the current state of our gitlab_subscriptions. The records are slowly mutable. On the contrary, the gitlab_subscription_logs will store sequences of immutable events. Upon any creation, update or deletion event (does that happen ?) in our database a new row will be added with all metadata attached to the specific object. This table will allow us to keep track of all changes happening in the database. For example status changes of a specific subscription (from active to expire, or from trial started to active subscriptions...)
gitlab_subscription_logs table will have all columns that are in gitlab_subscriptions` + extra metadata:
- version: which version of the gitlab_subscription are we seeing. that will be a counter starting at 1
- action_type: creation, update, deletion
So I'll have a single
history_logs which could be used in the future for every historical log needs. And it will contain all previous historical values, and except for the current value for any record in logged table.
id, created_at, about_type # "GitlabSubscription" or "GitlabNamespace".. about_id, # gitlab_subscriptions.id or.. polymorphic id of the related table data, jsonb # binary if allowed) to keep the whole of a subscription version: # which version of the gitlab_subscription are we seeing. that will be a counter starting at action_type: # creation, update, deletion
- We must check the size and the monthly growth of the log
- Check with the database team