Standard usage logging

As part of the Cloud Spend working group, we are trying to understand the usage patterns of users on GitLab.com so that we can optimise our cloud costs.

Something that has become apparent while doing this analysis is that it's very difficult to attribute resource usage on GitLab.com to different types of users.

I would like to start thinking about how we could address this with the smallest viable change that would start providing us with more insight into usage patterns.

Note that any effort in this area would also be very useful for the abuse team cc @wvandenberg @rostrander @jurbanc

I'm open to any ideas that others have, and have one proposal myself 👇

Standardised Usage Structured Logs

This approach is very similar to what @stanhu did with audit logging, except for usage: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/22471

For events related to usage we emit usage logs in a structured (NDJSON / newline delimited JSON) format
The audit logs contain the following common details (when appropriate)
- timestamp
- usage_type the type of usage
- user_id + username
- project_id + full_path
- plan_id: free tier, gold etc
- correlation_id: to correlate usage with other log data, to allow deeper analysis in future.
- amount: the quantity of usage
- unit: the unit of usage
Some usage_type events would have additional details specific to their domain. For example, for runner minutes, it's important to know which runner-manager the jobs were executed on, since some runners are owned by
These events can be ingested into the logging system for later analysis. Currently, we use Elastic for this purpose, but in future this could even be done with Splunk or Hadoop jobs.
While probably not helpful for small installations, this data may be useful for cross-departmental billing in larger self-managed instances too

Anything that has a significant cost associated with it could be logged with this approach - for example CI runner minutes, artifact storage, registry images, LFS objects etc.

Here are some examples:

CI Runner Usage

{
  usage_type: "ci_runner",
  user_id: 5,
  username: "andrewn",        
  project_id: 123,
  full_path: "andrewn/pirate",    
  plan_id: 4,                     // Gold tier customer
  correlation_id: "oiquoei123as", // For further investigation
  runner_manager: "gitlab",       // Using the GitLab runners
  amount: 26,
  units: "minute"                // The unit for measuring runner usage is minutes
}

LFS storage

{
  usage_type: "lfs_storage",
  user_id: 5,
  username: "andrewn",
  project_id: 123,
  full_path: "andrewn/pirate",            
  plan_id: 4,                     // Gold tier customer
  correlation_id: "oiquoei123as", // For further investigation
  amount: 221,
  units: "mebibyte"              // The unit for measuring LFS object storage is mebibytes
}

git storage

Git storage usage logs could be written following GC/Housekeeping operations

{
  usage_type: "git_storage",
  user_id: 5,
  username: "andrewn",            
  project_id: 123,
  full_path: "andrewn/pirate",
  plan_id: 4,                     // Gold tier customer
  correlation_id: "2312asdas31dasd", // For further investigation
  amount: 1503,
  units: "mebibyte"              // The unit for measuring git storage is mebibytes
}

Reporting

Once we're recording usage and being able to attribute usage to users/plans we can run queries such as:

How many runner minutes is a particular user using?
How many runner minutes are being used for each tier of our usage on GitLab.com?
Which users are storing the most LFS object

We will be able to keep track of these and feed them into our KPIs.

cc @jeremy @broyer1 @wwright @jarv @glopezfernandez

Edited Apr 18, 2019 by Andrew Newdigate