Runner Fleet Dashboard - Admin View: Runner Compute Costs
Release notes
{placeholder for release notes}
Problem(s) to solve
Users who use cloud platforms (AWS, GCP, Azure, and more) to host their Fleet of runners do not have an easy way of knowing how much their runners are costing them. The current flow they go through is finding the total compute cost in their cloud platform, and then trying to manually attribute that to jobs run in GitLab. They have no easy way to know who their top users are of their Instance runners (the ones any project can use) so they can “charge” those groups or projects accordingly or even update their Fleet to optimize pipeline performance for certain groups and projects.
Out of scope
Finally, users don't have a way to optimize their costs based on previous data. For example, if they are spending $1K in 1 week, how do they lower costs while still maintaining pipeline performance? Another example is if one project attributes to the majority of the runner costs, how can the user optimize their Fleet (or even company organization) to account for the higher usage for that project?
Intended users
Self-managed platform engineers who are using cloud platforms to create their Fleet of autoscaling runners (this is 80% of our users who bring their own runners).
Related research
I need to understand which projects are the biggest users of the shared runner pools that my team maintains?
For cost-distribution, the systems team wants to see who is using what type of runner and for how long.
As a GitLab admin, in order to track costs for shared runner minutes (instance) incurred by applications, I need a report of shared runner minutes usage per asset id.
Requirements
- Only support the following Cloud providers for this feature:
- GCP, AWS, Azure
- Only provide cost visibility runners that are:
- Instance type
- Docker Autoscaler or Instance Autoscaler
- Users must be able to download the cost report as a .csv file from within the UI.
User experience goal
The user must be able to at a glance understand the total compute costs for their Runner Fleet.
Proposal (TBD)
Prior art from PM
Metrics date range selection options:
The default time range for the view is the current month.
Relative dates
- Last 7 days
- Last 30 days
Absolute dates
- Last month
- This month
Filter results options
The default view will display the total runner fleet cloud costs for all projects organized by group name
- Group
- Project
Cost dashboard - Panel 1 - total costs and trends
- Total cloud costs = sum of cloud costs for all projects.
- Cost trends = (current_period_costs) - (previous_period_cost) for all projects
example query and output
SELECT
round(sum(ci_job_compute_cost)) AS cloud_costs,
bar(cloud_costs, 0, 100, 80)
FROM ci_finished_builds
WHERE (created_at >= toDateTime('2023-07-01 00:00:00')) AND (created_at <= NOW())
Cost dashboard Panel 2 - cost chart
- Displays the total cost per day for all projects.
- Chart filters:
- Daily - the default chart time period is daily.
- Monthly - switches the chart time period to monthly.
- Bar chart - the default chart type.
- Line chart - changes the chart type to a line chart.
Configuration options for runner worker compute costs
Option 1 - user enters required cost attributes to runner details.
The required cost attributes are:
- Runner Worker Machine Type -
runner_worker_compute_type
- Runner Worker Compute Cost Per Hour - -
runner_worker_compute_cost_hr
Option 2 - automatically retrieve the required cost attributes from the runner host
- Theoretically we can implement a solution that automatically grabs the instance type information from the instance. For example, you can use the aws cli and the command `aws ec2 describe-instances – instance-ids.
Example command with output:
aws ec2 describe-instances \
--query "Reservations[*].Instances[*].{PublicIP:PublicIpAddress,Type:InstanceType,Name:Tags[?Key=='Name']|[0].Value,Status:State.Name}" \
--filters "Name=instance-state-name,Values=running" "Name=instance-type,Values='t2.medium','t2.micro'" \
--output table
Note - OpenCost.io " is a vendor-neutral open source project for measuring and allocating infrastructure and container costs. However, as of 2023-07-26, the solution is "built for Kubernetes", so we have to explore other options for inputting the public cloud compute cost data into GitLab.
Open technical questions
-
For the MVC should we default to USD as the currency designation? -
For the MVC do we attempt to use an automated solution to retrieve the compute host specs and vendor list price per hour for compute, or have the user manually enter that data. -
How to track changes in runner_worker_compute_cost?
Disclaimer
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.