Edge cases with Emphasis in Markdown
Issue
- Related to https://gitlab.com/gitlab-com/localization/tech-docs-forked-projects/prod/gitlab/-/merge_requests/247#note_2747812770
- Related to https://gitlab.com/gitlab-com/localization/tech-docs-forked-projects/prod/gitlab/-/merge_requests/236#note_2751696166
I reviewed the markdown spec for emphasis, and it's a bit of a mess: https://spec.commonmark.org/0.31.2/#emphasis-and-strong-emphasis
It explains (with crosslinks I didn't copy over):
A delimiter run is either a sequence of one or more * characters that is not preceded or followed by a non-backslash-escaped * character, or a sequence of one or more _ characters that is not preceded or followed by a non-backslash-escaped _ character.
A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a Unicode punctuation character, or (2b) preceded by a Unicode punctuation character and followed by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
Then following this, there are 131 separate examples showing the huge variety of edge cases and confusing issues with emphasis in Markdown.
I think the most common issue we're running into is related to (2b). With Japanese, we sometimes have a right-flanking delimiter run preceded by a punctuation character and NOT followed by whitespace or punctuation.
For example:
- Preceded by punctuation, not followed by whitespace or punctuation:
**完了しました)**は- **完了しました)**は (broken)
- Preceded by punctuation, followed by whitespace:
**完了しました)** は- 完了しました) は
- Preceded by punctuation, followed punctuation:
**完了しました)**,は- 完了しました),は
Note that this is also the case with English punctuation and characters:
- Preceded by punctuation, not followed by whitespace or punctuation:
**done)**is- **done)**is (broken)
- Preceded by punctuation, followed by whitespace:
**done)** is- done) is
- Preceded by punctuation, followed punctuation:
**done)**,is- done),is
In English we also have either a space or punctuation after every word, but that's not the case in Japanese, which is why we're running into this (and running into it with ? in that other issue linked above, and with : previously).
I found the perfect example of this issue in one of the earlier translated files: doc-locale/ja-jp/install/aws/_index.md:
-
各パブリックサブネットを順番に選択し、**Action(アクション)**、**Edit subet setting(サブネット設定の編集)** の順に選択します。**Enable auto-assign public IPv4 address(パブリックIPv4アドレスの自動割り当てを有効にする)**オプションをオンにして、保存します。-
**Action(アクション)**、has the right delimiter preceded by punctuation ()**), but it IS followed by punctuation (、), so it satisfies (2b) rule, and should be emphasized. -
**Edit subet setting(サブネット設定の編集)** のhas the right delimiter preceded by punctuation ()**), but it IS followed by a space, so it satisfies (2b) rule, and should be emphasized. -
**Enable auto-assign public IPv4 address(パブリックIPv4アドレスの自動割り当てを有効にする)**オプションhas the right delimiter preceded by punctuation ()), but it is NOT followed by punctuation OR a space. Thus it does not satisfy the (2b) rule, and should not be emphasized.
-
- Then, we can verify that indeed, the first two are emphasized, and the last is not:
- 各パブリックサブネットを順番に選択し、Action(アクション)、Edit subet setting(サブネット設定の編集) の順に選択します。**Enable auto-assign public IPv4 address(パブリックIPv4アドレスの自動割り当てを有効にする)**オプションをオンにして、保存します。
So this would suggest we need to add a space after ** any time there's punctuation as the last character in the emphasis but there's no space or punctuation after it. BUT...
Continuing the research, I checked for cases where the left delimiter doesn't follow the rules, and indeed we can find some perfect examples of the issue. This time, in doc-locale/ja-jp/editor_extensions/visual_studio_code/_index.md
Here, we have **(複数のプロジェクト)** formatted identically in two places, but the surrounding characters are different:
このような場合、拡張機能は**(複数のプロジェクト)**ラベルを追加して、アカウントを選択する必要があることを示します。1. **(複数のプロジェクト)**を含む行を選択して、アカウントのリストを展開します。
Now, if we add a space to fix it, the LEFT delimiter is broken in the first line, because it is not preceded by a space or punctuation (inverse of right delimiter):
- Try to fix by adding space after right delimiter:
このような場合、拡張機能は**(複数のプロジェクト)** ラベルを追加して、アカウントを選択する必要があることを示します。- Still broken because of left delimiter issue: このような場合、拡張機能は**(複数のプロジェクト)** ラベルを追加して、アカウントを選択する必要があることを示します。
- Try to fix by adding space after right delimiter:
1. **(複数のプロジェクト)** を含む行を選択して、アカウントのリストを展開します。- Fixed, because left delimiter is already preceded by a space: 1. (複数のプロジェクト) を含む行を選択して、アカウントのリストを展開します。