AI Quick translator glossary feature
Purpose of the Glossary Learning Mechanism
The initial reasoning behind this feature was to capture reviewer preferences to teach an AI (Claude) to quickly build a glossary (used loosely here, as not all changes are terms, but also phrases) of differences between the Tech Docs and marketing content, and to identify if terms were edited differently depending on context (text type, target audience etc.). The AI uses the rules through a RAG implementation (Claude project knowledge).
How GitLab AI Quick Translator Detects New Glossary Entries
The tool identifies new terminology through differential analysis rather than traditional extraction methods:
Detection Process
1. Learning Mode Analysis (Primary Method) When given file pairs (source + translated & reviewed file), the AI:
- Compares English terms against existing translated terms
- Identifies consistent translation patterns that differ from current glossary
- Flags terms appearing 3+ times with the same translation
- Generates confidence scores based on pattern frequency
2. Translation Delta Detection During update translations (Translated vs Reviewed files):
- Extracts terms that were consistently changed by human reviewers
- Terms modified across multiple segments indicate glossary gaps
- High-confidence corrections (>0.80) become candidate entries
3. Failed Lookup Tracking The system notifies when:
- Terms have no matches in any glossary source
- AI must create translations without terminology guidance
- Same term appears in multiple documents without glossary entry
Key Difference from Traditional Tools
Traditional CAT tools use linguistic extractors (POS tagging, frequency analysis, n-gram detection) to identify term candidates from source text.
This tool instead uses translation evidence - it only identifies terminology gaps when it sees:
- Human corrections to AI translations
- Consistent patterns across multiple translations
- Reviewer-validated translation pairs
Result: Fewer false positives but requires review work to identify gaps. New terms are discovered through usage, not prediction.
Examples of Japanese Terms Added Through Updates
-
agentic AI → 自律型AI
- AI that acts autonomously and makes decisions
- Added for modern AI terminology in marketing/product docs
-
SSH key → SSHキー
- Corrected from incorrect "SSH鍵" based on SA validation
- Establishes pattern: technical "key" terms use キー not 鍵
-
C-Suite survey → 企業経営調査
- Executive-level business surveys
- Specific to GitLab's C-Suite research materials
-
feature planning → 機能設計
- Planning and designing software features
- More professional than literal translations
-
findings → 発見 (security) / 検出結果 (test results)
- Context-sensitive: different translations for security vs testing
- Technical documentation specific
-
major version → メジャーバージョン
- Replaces outdated "主要バージョン"
- Standardized technical terminology
-
pipeline passes → パイプラインが成功する
- More natural than "パイプラインがパスする"
- Improved verb form for technical docs
-
roll out → ロールアウトする
- Replaces generic "展開する"
- Maintains technical specificity
-
Known Exploited Vulnerabilities (KEV) → 既知の悪用された脆弱性(KEV)
- Specific security terminology
- Corrected from mistranslation
-
from idea to production → アイデア出しからリリースまで
- Complete development lifecycle phrase
- Natural Japanese flow for marketing content
These terms were identified through various learning modes including human reviewer feedback, SA validation, file pair analysis, and technical documentation reviews. They represent terminology gaps that weren't covered in the original glossaries but were discovered through actual translation work and quality improvements.