Skip to content

Draft: Define regexes for script

Félix Veillette-Potvin requested to merge fvpotvin-modify-regex-tester into main

Rewriting most of this script !5 (closed) to try to satisfy this issue: #8 (closed)

@dappelt Questions:

By reading that issue description, I get the feeling that the output of this effort needed is only the Ruby regexes right? I also think there is no need to include things that tokinator already covers, as this has been done, right?

My understanding is that we are looking at including the regexes that would capture the emails and phone numbers in the data set. By using this script, I have found that the following are covering our data set

      'phone number' => /\+\d{1,4}(?:[ -]\(?\d{2,}\)?)(?:[ -]\(?\d{2,}\)?)+/,
      'email' => /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i,

However, that script is really not pretty and would need some more refactor, but I'm wondering if you actually need that anywhere, and if so what are the requirements for it? Right now, it processes archive files passed as arguments, but I do not think that the rest of your scanner works that way, right?

Merge request reports