AI Impact Analytics Value Map | Exploration On Quantifying ROI & Productivity
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem to solve
Need to credibly prove a positive impact of AI and ROI to their customers before the hefty price of the addon could be justified.
needs ability to track metrics to justify use of AI.
needs the ability to track metrics to report on ROI and justify investment into Code Suggestions.
main goal with investing in AI is to accelerate developer Productivity so they need metrics to be able to prove the investment's ROI
starting a trial with code suggestions and have specific success criteria to track to prove out the success it has against other AI tools as they evaluate.
evaluating GitLab Duo to improve developer productivity. They would like to be able to measure the actual increase of developer productivity.
What is Value?
- A monetary number
- A derivative currency and/or asset
Generic guidelines for estimating value
How value is estimated is incredibly variable and depends on specific business models and products, but there are a few general good practices to follow:
- Estimate the beneficial effects of the change
- Make the value equal to the cost of alternatives
- Make assumptions visible
- The goal is accuracy, not precision.
- The Lean “5-Whys” technique can help identify a feature's value. Keep asking “why” until you can identify one or more benefit types.
Value Profiles
Leveraging AI features within GitLab will naturally fall into one or more of the following universal value profiles:
- Increase Revenue: Increase sales to new or existing customers. Delight or disrupt to increase market share and size.
- Protect Revenue: Improvements and incremental innovation to sustain current market share and revenue figures.
- Reduce Costs: Cost that we are currently incurring that can be reduced. More efficient, improved margin or contribution.
- Avoid Costs: Improvements to sustain the current cost base. We are not currently incurring costs, but we may in the future.
Hypothesis: How does AI impact value realized?
- Using AI throughout the SDLC directly correlates to an acceleration in realizing one or more of the four value profiles.
- For organizations that rely on software as a critical component in their value streams and who want to accelerate the realized rate of value captured, cycle time (as measured by when code writing starts to when a change is deployed) is the crucial bottleneck in all cases as all changes that contribute to one of the four universal value profiles requires writing code, merging code into a branch, and deploying the code to an environment where it can benefit the target consumers of the software.
- AI features that yield time saved outside of the "code lifecycle," such as summarizing comments, generating work item descriptions, and anything we aim to build in the future, directly correlate to decreasing overall lead time (as defined by when a problem or opportunity is first identified [work item created] to when the solution is deployed to an environment where it can benefit the software's target consumers).
- In all cases, AI features that improve quality and security (fewer bugs, technical debt, incidents, vulnerabilities) correlate to organizations spending less time (money) resolving quality and security problems and more time (money) on adding net new value or improving existing value, which directly correlates to an acceleration in top-line revenue (faster acquisition/expansion) and protecting revenue (decreasing churn/increasing retention).
AI Features Value Map
Mermaid diagram inputs
flowchart TB
subgraph Income Statement
Revenue
Min["-"]
CostRev["Cost of Revenue"]
Min1["-"]
OpEx["Operating Expenses"]
Eq["="]
Profit
end
IRev["Goal: Increase Revenue"] --> Revenue
PRev["Goal: Protect Revenue"] --> Revenue
RCost["Goal: Reduce Costs"] --> OpEx
ACost["Goal: Avoid Costs"] --> OpEx
RCost --> CostRev
ACost --> CostRev
Cycle["Tactic: Reduce Cycle Time"] --> New["Strategy: Add New Value"]
Cycle --> Improve["Strategy: Improve Existing Value"]
Cycle --> Quality["Strategy: Improve Quality"]
Cycle --> Security["Strategy: Improve Security"]
New --> IRev
New --> PRev
Improve --> IRev
Improve --> PRev
Quality --> RCost
Quality --> ACost
Quality --> IRev
Quality --> PRev
Security --> ACost
Security --> PRev
subgraph Plan["Plan Lifecycle"]
direction TB
IdentP["Task: Identify Problem or Opportunity"] --> Prioritize["Task: Prioritize"]
Prioritize --> Understand["Task: Understand"]
Understand --> IdentS["Task: Identify Solution"]
end
subgraph Code["Code Lifecycle"]
direction TB
Write["Task: Write Code"] --> Review["Task: Review Code"]
Review -->Apply["Task: Apply Changes"]
Apply --> Approve["Task: Approve MR"]
Approve -->Merge["Task: Merge"]
Merge -->Build["Task: Build"]
Build --> Deploy["Task: Deploy"]
end
Plan --->Code
Code --> Cycle
CodeSug["AI: Code Suggestions"] -->|Metrics: <br> Decrease Time to MR and Review <br> Acceptance Rate|Write
CodeExp["AI: Code Explanation"]-->|Metrics: <br> Decrease Time to MR and Review|Write
CodeExp -->Review
TestGen["AI: Test Generation"]-->|Metrics: <br> Decrease Time to MR and Review |Write
TestGen -->|Metrics: <br> Decrease in defects/incidents|Quality
DuoCli["AI: Duo for CLI"] -->|Metrics: <br> Decrease Time to MR and Review <br> Acceptance Rate|Write
AutoSquash["AI: Automated Merge/Squash Commits"] -->|Metrics:<br>Decrease time to merge|Merge
VulnRes["AI: Vulnerability Resolution"]-->|Metrics: <br> Decrease time to identify solution|IdentS
VulnRes-->|Metrics: <br> Time to resolve vulnerabilities|Write
VulnRes-->|Metrics: <br> Decrease in vulnerabilities|Security
RCA["AI: Root Cause Analysis"]-->|Decreased time to passing pipeline|Build
VSF["Value Stream Forecasting"] -->|Metrics: <br> Predict future throughput|Plan
VSF -->|Metrics: <br> Predict future throughput|Code
Summary["Discussion Summary"]-->|Metrics: <br> Time to understand|Understand
Description["Issue Description Generation"]-->|Metrics: <br> Decreased time to capture next steps|IdentP
AIReview["Duo Code Review"] -->|Metrics: <br> Decrease time in review|Review
AI Impact Metrics
Code suggestions
-
Time saved: This is the primary value proposition for code suggestions. However, it isn't easy to measure due to several factors:
- Establishing a baseline on the time it would take a single engineer to open an IDE, write some code, and then commit the code without code suggestions vs. with code suggestions.
- Two commits rarely have the same composition, which leads to comparing apples to oranges.
- It isn't easy to measure consistently and accurately.
- Acceptance Rate: This metric is leveraged as a proxy for time saved and inferred as a quality metric for the effectiveness of the model(s). In theory, the higher the acceptance rate, the more time an engineer saves.
-
Cycle time: Time from the first commit to MR being merged (or commit/MR mentioning issue is closed). This is a lagging indicator, whereas quantifying time savings during the "code writing" step can be inferred as a leading indicator to improving cycle time. When we start measuring cycle time (after the first commit), the time savings gained by using code suggestions are moot because it happens before the commit. Using code suggestions frequently, and even an abnormally high acceptance rate overall, would have little to no role in the cycle time for an MR. Additionally, many factors could negatively impact an MR's cycle time, such as:
- How large is the change (LoC, files changed/added, scope of the MR, etc.)?
- How many reviewers need to review the MR?
- How long does an MR wait for reviews to be started and completed?
- How many change requests/suggestions are created based on the reviews? How long does it take from the completion of reviews until the original author picks the MR back up to implement the requested changes?
- How many approvals are required?
- How many merge conflicts need to be resolved before merging?
- The health of the pipelines and how many times a broken pipeline has to be fixed before merging.
The bottom line
- Understanding and quantifying time savings during the "code writing" task is paramount to quantifying ROI for an AI feature contributing to an engineer potentially saving time.
- The ROI for saving time is not reduced costs but rather a decrease in overall lead time, which is a positive feedback loop for all of the core strategies and goals organizations are targeting (e.g., spending less time writing code while achieving the same outcome accelerates adding new value, improving existing value, improving quality, and improving security).
- If we want to correlate the adoption of AI features with decreasing lead and cycle time, we need to include all of the efficiency features integrated with the different "code lifecycle" tasks, such as code explanation, test generation, root cause analytics, code review, Duo for CLI, and code suggestions. Ideally, we would be able to quantify the time saved with the adoption of each additional AI feature beyond code suggestions for a single "code lifecycle" (e.g., an engineer opens the IDE, writes code, commits code, reviews the code, applies changes, approves the MR, local pipelines pass, MR is merged, and default branch pipelines pass).
- Later, we can extend the scope to include time savings during the "plan lifecycle" and everything that happens from the default branch pipelines passing to deploy (and eventually monitor), but there is high variability in the different deployment strategies organizations employ.
Things worth exploring further
- Can we measure time spent in the following workflow using code suggestions and those that are not:
- Before - Not Using Code Suggestions
Single User > Open IDE>Write Code>Commit Code - After - Using Code Suggestions:
Single User > Open IDE>Write Code>Accept Suggestions>Commit Code - Time spent for single users before/after aggregated to an account level, then aggregated across all accounts.
- Can we group these into logical cohorts such as batch size, language, number of code suggestions accepted, ...?
- Before - Not Using Code Suggestions
- Is there any correlation between commit frequency and code suggestions usage for an individual user?
- What are the downstream behaviors of code written with suggestions vs. code written without:
- Are there fewer changes requested by reviewers for code written with suggestions?
- Does using code suggestions have any impact on MR review time?
- Do the number of code suggestions applied in a single "code writing" session have any correlations with downstream behaviors on an MR?
- Does the scope of the MR (lines/files changed) correlate with cycle time?
- Does the number of reviewers on an MR correlate with cycle time?
- Does the number of approvals on an MR correlate with cycle time?
- Does the number of requested changes have any correlation with cycle time?
- Does using multiple AI features (e.g. Code Explanation, Duo For CLI, Test Generation, during the "code writing" lead to increased time savings compared to using or not using code suggestions?)
- What is the impact vulnerability resolution has on decreasing overall vulnerabilities and the cycle time for resolving a single vulnerability for customers using this Duo Enterprise feature vs. those customers that do not?
Ideas
-
AVG(LOC Accepted / Total Commit LOC) = xx%increase in developer productivity LOC Accepted / LOC Suggested = LOC Acceptance Rate(daily_commit_count_with_acceptance_per_engineer - daily_commit_count_without_acceptance_per_engineer) / 100
Root Cause Analysis
The primary value driver here is time savings.
Hypothesis: When pipelines are broken, end-user value (in the form of unshipped code) is blocked from moving into a production environment. The time it takes to troubleshoot and resolve broken pipelines directly correlates to Mean Time To Merge (MTTM) and Cycle Time (issue first mentioned in commit to issue being closed). At a more granular level, we can derive time savings by measuring the average time to resolve failed pipelines not using root cause analytics compared to the average time to resolved failed pipelines with root cause analysis. We can further measure adoption rates by counting unique users using RCA and RCA utilization rates on a macro level.
Important risk to be aware of: If we expose these metrics to customers and there is no change in pipeline resolution time when using RCA, it will potentially negatively impact sales and adoption of GitLab Duo Enterprise.
We can measure the impact of Root Cause Analysis with the following metrics:
-
Total pipelines: Count of pipelines run over
[insert period] -
Failed pipelines: Count of pipelines that failed over
[insert period]- Measured by pipeline starts and pipeline fails before successfully completing.
- Omit manually stopped pipelines and cancelled pipelines
-
Pipeline resolution time: The average time it takes to resolve a pipeline after failure over
[insert period]- Start event: A pipeline fails
- End event: The same pipeline passes with no failures
-
Root cause analysis pipeline resolution time: The average time it takes to resolve a pipeline after failure when Root Cause Analysis was used over
[insert period].- Start event: A pipeline fails
- Middle event: Root Cause Analysis button clicked
- End event: The same pipeline passes with no failures
- Display average time and % diff from pipeline resolution time without RCA
- Improved Pipeline Success Ratio (Measured through https://gitlab.com/gitlab-org/gitlab/-/pipelines/charts)
-
Root Cause Analysis utilization: The percentage of failed pipelines where RCA was used compared to overall failed pipelines over
[insert period]Failed pipelines using RCA / Total failed pipelines
-
Root Cause Analysis Time Saved: Approximate hours/days saved by using root cause analysis
(Pipeline resolution time (avg) - root cause analysis pipeline resolution time (avg)) * count of failed pipelines where root cause analysis was used to resolve the pipeline
-
Root cause analysis of unique users over
[insert period]- Count of returning users (unique user has used root cause analysis previously)
- Count of new users (this is the first time the user has used RCA in
[insert period])
-
Correlated metrics:
- Mean Time To Merge: Average time from when an MR is opened to when an MR is merged.
- Cycle Time: Average time from when a commit (or MR) first mentions an issue to when that issue is closed.
Vulnerability Resolution
Hypothesis: Using GitLab Duo's Vulnerability Resolution will decrease the amount of time to resolve a found vulnerability, leading to an increased rate of vulnerabilities resolved and a decrease in total open vulnerabilities over time. Additionally, we can quantify the time saved using vulnerability resolution vs. not using it.
Important risk to be aware of: If we expose these metrics to customers and there is no change in vulnerability resolution time when using vulnerability resolution, it will potentially negatively impact sales and adoption of GitLab Duo Enterprise.
We can measure the impact of Vulnerability Resolution with the following metrics:
-
Resolved Vulnerabilities: Count of resolved vulnerabilities over
[insert period] -
Mean Time To Resolve (MTTR): The average time to resolve a vulnerability over
[insert period]- Start event: Vulnerability is first recorded
- End event: Vulnerability status = resolved
-
Vulnerability Resolution MTTR (VRMTTR): The average time to resolve a vulnerability with vulnerability resolution over
[insert period]- Start event: Vulnerability is first recorded
- Middle event: Explain with AI or resolve with AI is selected
- End event: Vulnerability status = resolved
- Display in time and % change from vulnerability lead time
-
Vulnerabilities resolution utilization: Percentage (%) of vulnerabilities resolved with the Vulnerability resolution AI feature over
[insert period]count vulnerabilities resolved with VR / count of total vulnerabilities resolved
-
Vulnerability Resolution Time Saved: Approximate hours/days saved by using vulnerability resolution
(vulnerability lead time (avg) - vulnerability resolution lead time (avg)) * count of vulnerabilities resolved with vulnerability resolution
-
Vulnerability resolution unique users over
[insert period]- Count of returning users (unique user has used vulnerability resolution previously)
- Count of new users (this is the first time the user has used vulnerability resolution in
[insert period])
