AI Quick translator glossary feature

Purpose of the Glossary Learning Mechanism

The initial reasoning behind this feature was to capture reviewer preferences to teach an AI (Claude) to quickly build a glossary (used loosely here, as not all changes are terms, but also phrases) of differences between the Tech Docs and marketing content, and to identify if terms were edited differently depending on context (text type, target audience etc.). The AI uses the rules through a RAG implementation (Claude project knowledge).

How GitLab AI Quick Translator Detects New Glossary Entries

The tool identifies new terminology through differential analysis rather than traditional extraction methods:

Detection Process

1. Learning Mode Analysis (Primary Method) When given file pairs (source + translated & reviewed file), the AI:

Compares English terms against existing translated terms
Identifies consistent translation patterns that differ from current glossary
Flags terms appearing 3+ times with the same translation
Generates confidence scores based on pattern frequency

2. Translation Delta Detection During update translations (Translated vs Reviewed files):

Extracts terms that were consistently changed by human reviewers
Terms modified across multiple segments indicate glossary gaps
High-confidence corrections (>0.80) become candidate entries

3. Failed Lookup Tracking The system notifies when:

Terms have no matches in any glossary source
AI must create translations without terminology guidance
Same term appears in multiple documents without glossary entry

Key Difference from Traditional Tools

Traditional CAT tools use linguistic extractors (POS tagging, frequency analysis, n-gram detection) to identify term candidates from source text.

This tool instead uses translation evidence - it only identifies terminology gaps when it sees:

Human corrections to AI translations
Consistent patterns across multiple translations
Reviewer-validated translation pairs

Result: Fewer false positives but requires review work to identify gaps. New terms are discovered through usage, not prediction.

Examples of Japanese Terms Added Through Updates

agentic AI → 自律型AI
- AI that acts autonomously and makes decisions
- Added for modern AI terminology in marketing/product docs
SSH key → SSHキー
- Corrected from incorrect "SSH鍵" based on SA validation
- Establishes pattern: technical "key" terms use キー not 鍵
C-Suite survey → 企業経営調査
- Executive-level business surveys
- Specific to GitLab's C-Suite research materials
feature planning → 機能設計
- Planning and designing software features
- More professional than literal translations
findings → 発見 (security) / 検出結果 (test results)
- Context-sensitive: different translations for security vs testing
- Technical documentation specific
major version → メジャーバージョン
- Replaces outdated "主要バージョン"
- Standardized technical terminology
pipeline passes → パイプラインが成功する
- More natural than "パイプラインがパスする"
- Improved verb form for technical docs
roll out → ロールアウトする
- Replaces generic "展開する"
- Maintains technical specificity
Known Exploited Vulnerabilities (KEV) → 既知の悪用された脆弱性（KEV）
- Specific security terminology
- Corrected from mistranslation
from idea to production → アイデア出しからリリースまで
- Complete development lifecycle phrase
- Natural Japanese flow for marketing content

These terms were identified through various learning modes including human reviewer feedback, SA validation, file pair analysis, and technical documentation reviews. They represent terminology gaps that weren't covered in the original glossaries but were discovered through actual translation work and quality improvements.