Plan to improve bottom line of .com
GitLab.com costs a lot and the costs are growing quickly. We need a comprehensive plan to reduce the costs, and increase revenue.
Since a markdown table is hard to keep evergreen, we're now tracking issues related to GitLab.com profitability in two places:
- A
gitlab-com
board, owned by @andrewn: https://gitlab.com/groups/gitlab-com/-/boards/993893?label_name[]=Cloud%20Spend - A
gitlab-org
board, owned by @jeremy: https://gitlab.com/groups/gitlab-org/-/boards/1005109
Please also see the GitLab.com profitability agenda. Feel free to join and participate in the working group.
Deprecated content as of March 2019
What's Next and Why
The highest priority is committed use discounting for CI runners, owned by @dawsmith and currently in-progress as of 2019-03-05.
Reducing costs
Issue | Owner | Comments | Anticipated impact |
---|---|---|---|
Committed use discounting for CI runners | Infrastructure | No blockers. We should forecast our vCPU usage for our CI runner fleet and commit usage to achieve some immediate relief. | Very High |
Move gitlab-production from SUDs to CUDs | Infrastructure | No blockers. | High |
10GB limit should include all storage types | Manage | Development needed. For us to implement this limit, we need to give users the ability to understand how they are using their allocated 10GB at the project level (at minimum) and the group level. | High |
Size down underutilized instances | Infrastructure | No blockers. Google's billing panel is recommending that we size down some of our instances, which we should investigate doing. GCP's billing panel estimates possible savings of $17K/month. | Medium/High |
Handle worst registry offenders | Jeremy | No blockers. We should handle users who are posting images to the container registry at a high pace, either individually or by setting a ceiling on frequency. | Medium/High |
Prune registry | Package | Development needed. Waiting on both adding cleanup and GCS support so we can run it on GitLab.com. | Medium/High |
Fix expired artifact worker and manually remove expired build artifacts if needed | Release | No blockers. Fixed in 11.8. Enables us to expire build artifacts by setting artifacts: expire_in . Artifacts are currently kept forever on GitLab.com, so we need to decide and execute on a limit. We planned on manually expiring artifacts first, but the worker may be able to pick this up and this may not be needed. |
Medium |
Enforce limits and expire build artifacts | Release/Verify | Discussing in https://gitlab.com/gitlab-org/gitlab-ce/issues/41057. | Medium |
Reduce sidekiq node provisioning | Infrastructure | Needs investigation. Figure out new instance sizes we're comfortable with, experiment, roll out the new sizing everywhere. | Medium |
Move some object storage to cheaper storage | Infrastructure | Needs investigation. We considered this previously and decided the cost savings wasn't worth pursuing. This may have changed. | Medium |
Require a credit card before allowing free runner access | Growth | For discussion, development needed | ? |
Build artifact restrictions | Verify | For discussion, development needed | ? |
Discuss Secure feature cost optimizations | Secure | For discussion, development needed. Not scheduled, approach TBD. | ? |
Enforce limits and expire container registry images | Package | Asked again in https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/6137#note_137235147, previous feedback is that implementing a universal 10GB limit is the best solution. | ? |
Move old block storage to object storage | Gitaly / Create | For discussion. No proposal, not scheduled. | ? |
Reduce transfer costs by moving all storage to GCP | Infrastructure | Investigating, see https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/4684#note_137274878. | ? |
Change CI minute offering (e.g. restrict free minutes, require a credit card on file) | Fulfillment/Verify | For discussion, development needed. Not scheduled, approach TBD. | ? |
Migrate artifact store off S3 | Infrastructure | Needs development. In Backlog. | ? |
Deduplicate fork storage | Gitaly | Development needed. | Low |
Clean old registry buckets | Infrastructure | No blockers. We should be able to clean out this bucket. 42TB should be ~$800/month. | Low |
Remove unused Gitter volumes | Infrastructure | No blockers. Looks like ~500GB, so no huge cost savings here. | Low |
Infrastructure | Complete. After a consultation with Rackspace, we learned that most cloud optimization tools do not support GCP well and GCP's billing suite already provides cost savings recommendations that are superior to tools like Cloud Health. | N/A | |
Infrastructure | Complete. Non-CI compute is benefitting from a sustained use discount, CI runners are not discounted and could benefit from significant savings from committed use discounting. | N/A | |
Jeremy | Complete. This appears to be validated; we shut down most Azure resources. The remaining ~$12K/month extends to ancillary services and is still needed. These aren't redundant services and are needed somewhere (if not in Azure, then on GCP). | N/A |
Increasing revenue
Issue | Owner | Comments | Anticipated impact |
---|---|---|---|
Manual true-up cycle for GitLab.com groups | Jeremy | No blockers. We don't bill for new group members, so we've built up a backlog of groups on GitLab.com that have more members than they've paid for. Billing for new group members on add (see issue in row immediately below this one) requires building new functionality, so we should do a manual sweep first. | High |
Bill for new group members on add | Fulfillment | Development needed. Scheduled for 11.10. To solve for groups adding more members after subscription, we should charge for new members. We'll do a manual true-up to decouple the unclaimed revenue from building this feature. | High |
Begin selling add on CI runner minutes for GitLab.com | Fulfillment | Development needed. Scheduled for 11.9. We should sell add-on CI minutes for shared runners on GitLab.com. Can we extend this to self-hosted? | High |
Begin selling add on storage | Fulfillment | Development needed. Scheduled for Q2. Dependent on 10GB storage limit counting for all storage types above. | Medium |
Free users must pay for CI consumption | Growth, Fulfillment | Development needed, discussion needed. See gitlab-org&835 for shared runner access for self-managed. | ? |
Private runners | Verify, Fulfillment | Development, discussion, and scoping needed. | ? |
Analysis
Analysis | Owner | Comments |
---|---|---|
Analyze shared runner use on GitLab.com | Jeremy | Before we consider more substantive changes to how we offer shared runner minutes on GitLab.com, we should first understand current usage and our costs. |
Segment GitLab.com costs by user type | Growth | Dependencies on other issues, specifically setting up Snowplow. We should understand, track, and allocate GitLab's resource consumption separately, which is much higher than the average group. |
Explore how registry images are used | Growth | |
CI minute use | Growth |
Links/resources
Dashboards
Documents
- GitLab.com meeting for December 2018
- Master Google doc for September 2018 GCP planning
- GitLab.com Discount Analysis summary used during GCP capacity planning