Actionable analytics MVC
Problem to solve
Bringing CI/CD metrics to the forefront of pipeline management will enable us to help bring teams to the next level of maturity for their build pipeline, realizing more and more benefit from their DevOps transformation.
Create a dashboard which is aimed at managers, potentially which is suitable for use on large monitors/television screens and contains all information on 1-page. Allow users to select which projects/groups they would like to see (rather than only using existing groups). Either take over existing "Charts" page (and rename it) or introduce a new page for reports and analytics.
Report on some of the following statistics within the dashboard (for example):
- Which pipelines are green or red right now?
- Are we releasing more or less frequently over time? How many releases do we average per week/day?
- What is the average time from first commit to in production?
- What is the average job time? What are our longest-running jobs on average or individually? Same questions for pipelines?
- What % of time do pipelines stay broken?
- What is the status of our environments? What's deployed where?
- What is our pipeline success rate?
- How many concurrent branches/pipelines do we have?
- What % of our tasks are manual and how is that changing over time?
- How long do tests take to run on average? Is it getting slower or faster?
- Which tests take the most time?
- Which tests are flaky (fail often unrelated to code changes)?
- Who are the most active contributors?
- How has the pipeline duration changed over time? Are we getting better or worse? When actively trying to improve pipeline duration, can we track progress?
- Would splitting our pipeline into more parallel jobs be worth it?
- What is the right number of parallel jobs to optimize wall-clock time while paying attention to cost?
- How much does our CI/CD pipeline cost us?
- How much time is spent queueing (waiting for runners to be available)?
- How is our pipeline health?
- How often does
masterbreak? Who/what most often causes it?
We can also look at common usage on Jenkins for target customers to see what they want most, understand what they need most.
Meltano Use Case
This would be very useful for us in Meltano and Analytics.
We've just updated our schedules to run a bunch of separate pipelines so we can have better control about when data sources are extracted. For example, Zuora runs every 2 hours, but SFDC is set to run every 30 minutes.
From here I'd like to be able to see a few things. I'd like to be able to click in on a specific schedule and see how that pipeline has been running over the past X amount of time. If it's run 12 times in the last 24 hours, how many failed/succeeded?
I'd also like to know when a job was last run (not just when the next run is). I want to be able to look at that page and understand the state of my runs so I can so "ok, SFDC data is 23 minutes old and will be updated in 7 minutes" or something like that.
Some of this may be able to be achieved with filters on the jobs / pipelines pages, but better analytics and a holistic understanding of CI jobs would be great!
What does success look like, and how can we measure that?
(If no way to measure success, link to an issue that will implement a way to measure this)