AI translations overwrite human translations during MR merge due to Git change detection behavior

Context

Extracting this reasoning from @jcole-ext from this MR comment.

When reviewing this MR that was pushing translated files to the main project of charts @marcel.amirault identified some unexpected revert in translations. The human reviewed translations in Phrase were looking correct but the translations in the MR were incorrect. Please see MR comment thread for more details and context.

Problem

Copied over from @jcole-ext 's comment thread:

Why GITTECHA-39 Human Translation MR has AI Translations

Originally, the Human translations were created in Phase in GITTECHA-25. But then GITTECHA-39 had AI translations overwrite some the Human translations in Phrase. Translators then reverted the translations back to the original Human translations.

In globals.md, you can find segment 504 by searching “Settings to configure the” in the source. In the Phrase Project for GITTECHA-25, this segment was reviewed and finalized by humans months ago.

image.png

Then recently in the Phrase Project for GITTECHA-39, AI overwrote these Human translations even though the source content had not been updated. Human reviewers reverted this change to exactly what it was before.

image.png

GITTECHA-39 then created two Translation MRs: One for the AI Translations, and the other for Human Translations. The problem is, the AI Translation MR has updated segments with 100% matches. Thus, the MR will contain changes to these segments. The Human Translation MR will contain the original Human translations, since they were reverted in Phrase. Therefore, there will be no updates to those segments.

image.png

Git Blame of charts/globals.md for GITTECHA-39 Human Translations

For these specific segments (where AI overwrote and then they were reverted), when you combine the two Translation MRs, git will find that since the Human Translation MR had no changes, and the AI Translation MR had changes, then the AI Translation MR changes are what are used. This would have been the final outcome once both MRs were merged, no matter the method used merged them (rebase/merge from/merge to).

image.png

Git Blame of charts/globals.md for GITTECHA-39 after Rasam merged the AI Translations

Rasam merged the branches, and therefore the segments, from what was already in the main-translation branch (the AI translations) with the Human Translation MR, thus bringing the AI changes into the MR. Had he rebased instead, git would have produced the same result. There is nothing reasonable Rasam could have done to prevent the Human translations from being overwritten with AI translations.

This is a fundamental aspect of git, the foundation of GitLab. Just as Contentful doesn’t store content as files, but instead fields, git stores content as changes. Even though Argo hands GitLab a file, GitLab is not actually putting the literal file on git, but instead creating a list of changes. Anything that is not changed is thrown out.

Solution going forward

This problem is something we actually predicted months ago, but we only expected merge conflicts. In this unique case, it created no merge conflicts and thus Rasam couldn’t have been able to replace all the AI translations.

image.png

Our solution is to create an automated process which shifts the Human Translation MR’s base to no longer be before the AI merge, but after. This means all the AI translations will be overwritten every time by human translations, instead of merged together unpredictably.

image.png

Tasks

Edited by Jack Cole