Improve data model logic and retention on namespace subscriptions
Problem to solve
During analysis of GitLab.com customers this month we were trying to understand how many free users converted to paid users. It was identified that we are overwriting the subscription information every time it changes, which leaves us with no historical record of when a free user might have become a paying user and when they change the namespaces for which payments are collected. In order to be able to quantify improvements to retention and conversion, we need to have a log / track subscription information changes. Single Client View has been a problem for banking for many years and is currently costing banks a lot of money. We should ensure that we address this problem early on.
Currently, we store plan information on namespaces in the follow places and data quality seems particularly bad (a couple of examples can be found in https://gitlab.com/gitlab-com/Product/issues/251#note_179112461, https://gitlab.com/gitlab-com/Product/issues/251#note_179113734):
- namespaces (gitlab.com) - under plan_id (the current plan under which the namespace is)
- gitlab_subscriptions, where we see when the subscription was created, updated and the plan
- orders, where we are supposed to see each transaction.
Any PM/ Data analyst looking at the data
@tipyn The more I look at the data, the more I am convinced that we need to have free and paid plans in 1 table with a transaction log for each change. Otherwise, there are a lot of issues with connecting the data. We should just have the user with a plan registered as free or core. We should probably have all of these have the one account number (A-S00015031 I assume has 1:1 mapping with customer_id in orders table) as we see them in the customers's portal but keep all history of changes.
I think we can leave
namespacesto contain only the most recent information, but we should add
product_rate_plan_idfrom the orders table in customers.gitlab.com in order to know which namespaces are actually free as per (#11991). If we can get which one is free gold vs paid would be ideal, but not a requirement as we can get a proxy from Zuora. I basically excluded :
Gold Plan - 1 Year (EDU or OSS)&
Ultimate - 1 Year (EDU or OSS), but I do not get all the right info this was as
Iowa State University of Science and Technologynamespace is still part of the dataset as it has just
Gold Planfor it.
I would imagine that
gitlab_subscriptionscomes partly from
orderstable in customers.gitlab.com. As per Eli's comments here we can add a row with every new subscription on the namespaces and have a column that indicates which one is active: gitlab-com/gl-infra/infrastructure#6763 (comment 178291594). We should also have the correct start and end date when a subscription is changed from 1 namespace to another so that it's easier to combine the information under 1 parent account and not have puzzling situations as in this example where the plan starts before the namespace is created (https://gitlab.com/gitlab-com/Product/issues/251#note_173884161).
I think we should ensure that every change is registered as a transaction so that when a user changes the namespace a subscription relates to we can see that a subscription on the previous namespace has ended and a new subscription on another namespace has started. It seems like some orders are cancelled, which we can see in Zuora but cannot see in orders. We should have a way to reflect this potentially in a separate column?
We seem to have situations where a salesperson enters an order and doesn't associate a namespace, so we don't know what namespace the subscription relates to. We should potentially enforce a namespace requirement so that sales cannot enter orders without the association? It seems when a namespace is a user one and not a group, we don't register the name, so cannot easily connect to any of its data before it became paid.
We seem to miss the plan id information for certain subscriptions so we cannot quantify them as self-managed vs hosted - ex: (https://gitlab.com/gitlab-com/Product/issues/251#note_179138487 might be an example of this)
We seem to miss orders for customers where we know there is a paid namespace - ex:
Yet to identify the solution for : https://gitlab.com/gitlab-com/Product/issues/251#note_179112461 & https://gitlab.com/gitlab-com/Product/issues/251#note_179113734 (bots, which we must have the info that sth was cancelled in some way?)
@amandarueda: am I missing anything here from the edge cases we discussed? Thank you!
@tipyn - please note that you should probably collaborate on this issue with @eli_kastelein more closely so that all edge cases are cleared out and the free to paid analysis can be done on a recurrent basis.