Improve website translation rules based on Phrase parser findings
This MR was created with GitLab Duo. @opysaryuk prompted Duo Chat to review localization-team#674 (Configure Phrase for parsing marketing website yml content) and identify how the translation rules in
specifications/core/website-translation-rules.mdshould be improved based on the edge cases and guidelines surfaced during the Phrase parser configuration work.
Why these changes
During the setup of the Phrase parser for marketing website YAML files, several gaps were identified between the existing translation rules and the actual requirements confirmed by the team. This MR addresses each gap:
1. Explicitly list id as non-translatable
@laurenbarker confirmed that id values are anchor link identifiers and must stay in English. The previous rules did not mention id at all.
- Relevant thread: "id = anchor links, need to stay in english"
2. Expand dataGA coverage to all variants
The previous rules only listed dataGaName:. @laurenbarker clarified that all dataGAxxx attributes are analytics attributes and must stay in English so analytics data remains in one language.
- Relevant thread: "dataGAxxx = analytics - stays in english so all our data is in one language"
3. Add config key block protection
@laurenbarker identified that class names, icons, images, and anything under the config key should not be translated. This structural rule was entirely missing.
- Relevant thread: "TBD = There are others. Class names, icons, images, ect, anything under the config key."
4. Clarify href handling and post-processing script
The previous rules mentioned href: as a configuration key but didn't explain the workflow. @opysaryuk and @mjsibanez clarified that a link localization script (scripts/fix-links.mjs) handles locale-prefixing of internal URLs as post-processing, so the agent/translator must not modify href values at all.
-
Relevant thread: reference to the
fix-links.mjsscript - Relevant thread: "href = urls, localized with post processing script"
5. Add heuristic for dynamically added non-translatable keys
@laurenbarker flagged that new non-translatable keys are added to the YAML files regularly by developers. A static list alone is insufficient, so this MR adds a general heuristic for identifying technical/config keys.
- Relevant thread: "Additional non-translatable keys are added to the yml all the time too and will need a process to communicate these back to Phrase."
6. Add guidance for embedded Markdown and HTML in YAML
@wojciech.froelich identified that YAML content can contain both Markdown and HTML as embedded content, and @laurenbarker confirmed to prepare for both.
- Relevant thread: "What can be embedded in YAML content? I see Markdown... but I also see HTML"
- Relevant thread: "Yup, prepare for both"
7. Add Schema.org structured data exception for config blocks
Added by GitLab Duo based on updated feedback. @hsmith-watson confirmed that Schema.org structured data fields (offers.name and offers.description) inside config blocks should be translated to match the page language for localized SEO. Google's structured data guidelines recommend schema content matches the language of the page. @laurenbarker noted that these fields were previously left in English (e.g., in existing French translations), and having them translated through the workflow will be an improvement.
- Relevant thread: "Translating these is the correct move. Google's structured data guidelines recommend schema content matches the language of the page."
-
Relevant thread: @laurenbarker confirming
offers.nameandoffers.descriptionshould be translated for localized SEO