Updated the description to better descrbe it. It's what you guessed at. I'd also be curious about number of projects with reports, that way we don't get skewed numbers based on heavy usage by a single project, but if you don't think we need it I'm happy to leave it out!
@alexives I totally understand your point about the number of projects, and it's a valuable metric.
Actually, I'd even iterate once on it. In every usage scenario the total usage equals users * personal usage. Counting the projects is a proxy for the number of users, and it's clearly a valuable metric. Another variant on that same metric would be the number of namespaces that have projects with terraform states. This might be useful from a business point of view as at GitLab.com every subscription is about a namespace, and we have reports per namespace and license type usage (you might like this :)).
What do you think:
shall we add the per-namespace report as stretch goal 2?
@nagyv-gitlab We can easily count artifacts for given reports until they expire, perhaps this is a solution that would allow us to implement the usage ping. I lack some understanding about how we send usage data to external tools, this is something I still need to check out.
@nagyv-gitlab I think tthat @jeromezng can help you understand how the usage pings are currently calculated.
Regarding the flow (from usage ping to available data in Periscope), we have the following steps:
on a weekly basis, any instance that has usage pings turned ON, create a usage ping json, defined mostly in this file. Most of the counters are count of objects/user_id on backend tables. For example Number of issues open as defined here
This json is then sent to version app that validates it and stores it in its postgres database. Documentation here (you see btw in the table that dast_jobs are usage pings)
Then the data is sent to our data warehouse (snowflake), transformed and made available in Periscope right away. Normally the data is sent from Version to our data warehouse automatically, on a daily basis. Currently, this process is broken and replaced by a manual process that used to be every 2 weeks, might be once a week...
Data Team is not the DRI for that whole flow (apart from making it available in Periscope which is totally automated...). @jeromezng (EM Telemetry) and @sid_reddy (PM telemetry) are better touchpoints regarding any question about the process explained above!
Though, @nagyv-gitlab super happy to help with any data related questions.
Last link regarding the difference between snowplow and usage pings, here