Code Analytics - Files committed together
Problem to solve
As a follow up to https://gitlab.com/gitlab-org/gitlab-ee/issues/12104, when we look at our code hotspots, it's useful to know how many files are dependent on them in order to understand the importance of the piece of code with respect to the codebase. We would also like to get dependencies that might be not explicitly referred to. Temporal coupling is correlated to defects and can highlight bugs that might come from seemingly unrelated places when one piece of code is referenced in unrelated parts of the codebase. By looking at the state of temporal coupling over time with our filtering, we can spot the evolution of architectural technical debt. For microservices architectures - we wouldn't expect a commit to refer to more than 1 service.
Intended users
EMs
Further details
Proposal
This chart will be positioned after the code hotspots graph on https://gitlab.com/gitlab-org/gitlab-ee/issues/12683 and the same filters should apply.
- Let's show a graph that shows which files change together during commits during a specific timeframe: the stronger the relationship (% of shared commits), the more red the color of the relationship: https://observablehq.com/@d3/hierarchical-edge-bundling, https://echarts.apache.org/examples/en/editor.html?c=graph-circular-layout&theme=light on a gradient of green to red or the thicker the lines, depending on what we choose in order to represent the coupling (https://echarts.apache.org/examples/en/editor.html?c=graph is another alternative, https://sail.cs.queensu.ca/Downloads/WCRE2006_AnimatedVisualizationOfSoftwareHistoryUsingEvolutionStoryboards.pdf)
What we expect to see is code dependencies, relations between code and tests, copy and paste between files, etc.
- On the code hotspots graph - let's show the number of times the file has been been updated together with another one during a commit, i.e. if during the period:
-
in the first commit, I was updated with 3 files-> 3
-
the second commit, I was updated with 1 file, which is in addition to the 3 above, then the total number to show in the treemap is 4. The more often a file changes with others, the more likely it is important architecturally.
-
I propose to have a filter by group, project, subgroup, so that we don't load all the information at once, i.e. a user needs to select a/multiple groups in order to see the graph.
- Link to churn?
Permissions and Security
Tiers: Only clients/ namespaces that are premium and above can see the information. Roles: Future state for roles will be reporters and above for the projects/groups they have permission for and that are premium and above. For now: Admins
Documentation
Testing
What does success look like, and how can we measure that?
What is the type of buyer?
Solution
Let's add a circular chart with two main components:
- Circle: file name (the bigger the circle is the more times that file was changed)
- Line: 2 files were committed together (the thicker the line is the more time these two files were committed together)
Colours:
- Circle:
- Default:
#79AEE4
- Hover:
#1F78D1
($blue-500
) - All others when one is hovered:
#BCD7F2
- Connected to hovered:
#4C93DB
- Default:
- Line:
- Default:
$blue-500
at 20% opacity - Connected to hovered:
$blue-500
at 60% opacity - All others when one is hovered:
$blue-500
at 10% opacity
- Default:
Default | Hover |
---|---|