Improve website translation rules based on Phrase parser findings

This MR was created with GitLab Duo. @opysaryuk prompted Duo Chat to review localization-team#674 (Configure Phrase for parsing marketing website yml content) and identify how the translation rules in specifications/core/website-translation-rules.md should be improved based on the edge cases and guidelines surfaced during the Phrase parser configuration work.

Why these changes

During the setup of the Phrase parser for marketing website YAML files, several gaps were identified between the existing translation rules and the actual requirements confirmed by the team. This MR addresses each gap:

1. Explicitly list id as non-translatable

@laurenbarker confirmed that id values are anchor link identifiers and must stay in English. The previous rules did not mention id at all.

2. Expand dataGA coverage to all variants

The previous rules only listed dataGaName:. @laurenbarker clarified that all dataGAxxx attributes are analytics attributes and must stay in English so analytics data remains in one language.

  • Relevant thread: "dataGAxxx = analytics - stays in english so all our data is in one language"

3. Add config key block protection

@laurenbarker identified that class names, icons, images, and anything under the config key should not be translated. This structural rule was entirely missing.

  • Relevant thread: "TBD = There are others. Class names, icons, images, ect, anything under the config key."

4. Clarify href handling and post-processing script

The previous rules mentioned href: as a configuration key but didn't explain the workflow. @opysaryuk and @mjsibanez clarified that a link localization script (scripts/fix-links.mjs) handles locale-prefixing of internal URLs as post-processing, so the agent/translator must not modify href values at all.

5. Add heuristic for dynamically added non-translatable keys

@laurenbarker flagged that new non-translatable keys are added to the YAML files regularly by developers. A static list alone is insufficient, so this MR adds a general heuristic for identifying technical/config keys.

  • Relevant thread: "Additional non-translatable keys are added to the yml all the time too and will need a process to communicate these back to Phrase."

6. Add guidance for embedded Markdown and HTML in YAML

@wojciech.froelich identified that YAML content can contain both Markdown and HTML as embedded content, and @laurenbarker confirmed to prepare for both.

7. Add Schema.org structured data exception for config blocks

Added by GitLab Duo based on updated feedback. @hsmith-watson confirmed that Schema.org structured data fields (offers.name and offers.description) inside config blocks should be translated to match the page language for localized SEO. Google's structured data guidelines recommend schema content matches the language of the page. @laurenbarker noted that these fields were previously left in English (e.g., in existing French translations), and having them translated through the workflow will be an improvement.

  • Relevant thread: "Translating these is the correct move. Google's structured data guidelines recommend schema content matches the language of the page."
  • Relevant thread: @laurenbarker confirming offers.name and offers.description should be translated for localized SEO
Edited by Oleksandr Pysaryuk

Merge request reports

Loading