Deleted, moved, and redirected source language files

Background

The Technical Writing team moves and deletes source language files. This creates an issue because the source files do not have their corresponding target files deleted or moved automatically by the Argo GitLab Integration. This can result in the accumulation of stale target files.

Along with this, the redirects file is directly tied to these deletions. Currently, it is not setup for multilingual support, so the hundreds of redirects currently in it will not work on the Japanese website.

Redirect Markdown files

When GitLab localization team decides to move either the whole file itself, or the contents of it to an existing file, they will perform the move and also replace the existing file with a redirect Markdown file. This file is a stub that tells where to redirect to, and a date where when passed, it should be deleted. The deletion date is three months after the file is created. Every month, the TW team will go through the files and delete any expired redirects.

See doc/raketasks/spdx.md for an example

Stale files and Argo GitLab Integration

Argo GitLab Integration creates and modifies target files only when they are created and updated. Upon deleting a source file, Argo will stop tracking it in the Asset Dashboard and also no longer send it in for translation. This means it doesn't perform any changes on the existing target file, such as deleting it to match the source.

This includes moving a file, which Argo treats as if the file were deleted, and then a new one created in another location.

Tasks

The following tasks must be completed for this issue to be resolved:

  • Automating cleanup of target files
    • Solution: Implement checks on forks that let us know what files we can delete
  • Updating redirects to work for all languages
  • Missing redirect targets
    • Solution: None decided yet

Automating cleanup of target files

The current process in which files are deleted occurs when redirecting Markdown files of the Technical Writing Documentation, located in /doc or /docs, are past their remove_date. Every month, the TW team checks these files with a script, deletes them from the docs, and adds their entry to the GitLab Docs redirects.yaml.

Removing target files is not part of this process. The TW team has expressed that since the target content is part of the localization team's control, that they should be responsible for the cleanup.

The consequences of not handling these stale files are quite minor:

  • End users will not notice a difference in behavior of the website. There could be a very slight delay to loading a redirect page.
  • The target language directory will, over time, accumulate stale files which will be visible in the repo

Solution

To resolve this, the localization team shall implement tests in the pipeline. If target files exists in the repository, for which there is no source file equivalent, then the pipeline shall result in a failure and list all the target files that do not have a corresponding source file. This will trigger when there are modifications in the target files directory in one of our localization forks, so as not to block changes from occurring upstream.

Updating redirects to work for all languages

The redirects.yaml file has hundreds of redirects, which is leveraged by a script to create redirects for GitLab Pages. The created redirects only work for English, and not for other locales. There is a limit of 1000 total redirects, and over 300 are used, so care must be taken not to multiply the number of used redirects.

If this script is not updated, then the following behavior will occur:

  • Users who try visiting those redirects on the Japanese site will be led to a 404 page
  • If English fallback is implemented, then they will be directed to the English URL which will show them an English page, even though a translated Japanese version likely exists

Solution

Update the script to make redirects work for all languages. This can be accomplished without creating additional redirects by using splats.

MR that is implementing this change: gitlab-org/technical-writing/docs-gitlab-com!594 (merged)

Missing redirect targets

When a file is moved, the whole file has to be re-translated. This can take some time, depending on the size of the file. Whereas, a redirect file only has a single string, so it can be translated extremely quickly. If a target redirect is missing, then the whole site will fail to build.

Problems this currently has:

  • When file is moved, Hugo Docs fails to build until translation is done

Solution

Adding a placeholder file would fix the issue of the site building, but that would have to be manually done when the failure occurs. A more robust solution is required.

Relavant links

Edited by Lauren Barker