[BE] Productivity Analytics - Type of Work (pre-defined) - data harvesting
Problem to solve
This particular issue is about gathering Type of Work data only. Check related issues for other parts.
This issue adds a new chart to the Productivity Analytics page as described in https://gitlab.com/gitlab-org/gitlab-ee/issues/12246
BE Requirements
We should classify MR code as
-
New(Number of LOC additions), -
Churn(Number of LOCs changed/deleted that have been previously modified in less than 1 month), -
Refactoring(Number of LOCs changed/deleted that have been previously modified in more than 1 month).
We can treat MR diffs by blocks and consider code as New if entire block is only added lines and Modified if block contains at least 1 deleted line. Then Modified part should be split into Refactoring or Churn.
Every MR against the repo's default branch should have 3 new metrics according to lines it adds\modifies\removes:
- new_loc
- churned_loc
- refactored_loc
Gitaly enhancements
- Need to modify Gitaly to support
git diff --color-movedin a way that the ANSI color encoding is parsed. Purpose: Detect cut-and-paste line movements. - Need to modify Gitaly to support
git diff --word-diff(we can modify/subclassGitlab::Diff::Parserto handle parsing it). Purpose: Detect line modifications and whitespace changes. - Want to modify Gitaly to handle
git blame -Lwith multiple-Loptions. Purpose: Improve performance; it's about 3x faster than ordinarygit blame.
Note: Shelling out to git diff is currently possible, but not viable due to security, availability and versioning concerns.
Technical notes
- For details on determining the "default branch", see
def default_branch?in app/services/git/branch_push_service.rb.
Edited by Dan Jensen
