Draft: fr-FR style guide with a 3-rule example spec
Related issue: Create and refine French (FR) style guide for G... (docs-site-localization#892)
This is a draft and a proposal, not a prescription. The intent is to invite feedback from Argos Multilingual language specialists.
What this is
A minimal, testable style guide for the FR (fr-FR) tech docs AI translation pipeline. Three rules only.
The word spec is intentional. We use it in the localization vision, and it appears in the GDATP folder structure and throughout the GDATP epic, because a spec is testable. A rule that cannot be verified (by regex, by heuristic, or by asking the model to judge) does not belong here.
This is the opposite of a textbook: no currency formatting, no tone-of-voice principles, no quality validation checklists. Three assertions, each with a before/after example, each applicable at segment level. The model translates segment by segment, not file by file, so every rule must work on a single segment in isolation.
Why three example rules
The JA style guide has been incrementally improved over 14 months through production experience. Each update was a direct response to a real failure observed in MR review:
- !30 (merged): Colon handling added after the model consistently produced EN-style colons in JA output
- !43 (merged): Quotation mark rules refined after the wrong quotation character set appeared in production translations
- !51 (merged): Katakana section updated and an automated post-processing fix added simultaneously, after specific Katakana rendering failures were identified in MR review
- !83 (merged): Colon rules refined again with additional specificity after further production failures
- !86 (merged): A single long vowel mark rule added for one specific term (アクティビティ) after it was flagged by a human reviewer
- etc.
These are JA-specific corrections for JA-specific failure modes. I realize FR is a different language, there is of course no expectation that FR failures will mirror JA failures. The point is the pattern: start with the minimum set of rules you are confident the model needs. Add rules only when production output proves they are missing.
Why start with 30 rules if ~27 are going to be arbitrary or assumptions. Starting with 3 rules means every rule has been reasoned from evidence: real GitLab docs markdown files read and analyzed, real content patterns confirmed, and, for Rule 2 (colon spacing) specifically, direct historical precedent from JA production failures, since both stem from the same root cause: a model trained predominantly on English text reproducing English punctuation conventions in the target language.