Skip to content

Improve LatinTerms test

Amy Qualls requested to merge aqualls-improve-latin-test into master

What does this MR do?

In this Slack thread, @eread noted a false positive for the word viable:

viable

For example, doc/development/contributing/merge_request_workflow.md contains edge cases that would've been flagged before: viable and trivial.

First iteration

I looked at LatinTerms.yml and can see why it's happening. Periods and spaces are word boundaries, so we created a non-word rule to try to catch variants of e.g. and i.e.. I did some digging and found another approach (https://github.com/errata-ai/Google/blob/master/Google/Latin.yml) but I can't figure out how to add in the version of each abbreviation with spaces between the period and the second letter:

  # Won't catch 'e. g.' with a space in between
  '\b(?:eg|e\.g\.)(?=[\s,;])': for example

  # Won't catch 'i. e.' with a space in between
  '\b(?:ie|i\.e\.)(?=[\s,;])': that is

However, @cynthia noted the fix for via is simpler. We don't want it when it's part of a longer word, so test for word boundaries, like this: '\bvia\b'

Second iteration

@marcel.amirault found ways to improve the regex further, and capture more variations on the phrases.

Before and after

vale --no-wrap --filter='.Name=="gitlab.LatinTerms"' doc/**/*.md

Small variances for e.g. and i.e., but (as expected) a larger drop in findings for via:

type of finding before round 1 round 2
e.g. variants 133 128 155
i.e. variants 27 25 37
via. 791 705 704

Related issues

Author's checklist

If you are a GitLab team member and only adding documentation, do not add any of the following labels:

  • ~"frontend"
  • ~"backend"
  • ~"type::bug"
  • ~"database"

These labels cause the MR to be added to code verification QA issues.

Reviewer's checklist

Documentation-related MRs should be reviewed by a Technical Writer for a non-blocking review, based on Documentation Guidelines and the Style Guide.

If you aren't sure which tech writer to ask, use roulette or ask in the #docs Slack channel.

  • If the content requires it, ensure the information is reviewed by a subject matter expert.
  • Technical writer review items:
    • Ensure docs metadata is present and up-to-date.
    • Ensure the appropriate labels are added to this MR.
    • Ensure a release milestone is set.
    • If relevant to this MR, ensure content topic type principles are in use, including:
      • The headings should be something you'd do a Google search for. Instead of Default behavior, say something like Default behavior when you close an issue.
      • The headings (other than the page title) should be active. Instead of Configuring GDK, say something like Configure GDK.
      • Any task steps should be written as a numbered list.
      • If the content still needs to be edited for topic types, you can create a follow-up issue with the docs-technical-debt label.
  • Review by assigned maintainer, who can always request/require the reviews above. Maintainer's review can occur before or after a technical writer review.
Edited by Amy Qualls

Merge request reports