Skip to content

Add RFC3986 expressions

Jamie A. Jennings requested to merge Veratil:rfc3986 into master

Created by: Veratil

This is the expressions for URI matching based on RFC3986. A possible fix for #17 (closed) as well as enhancing IP address matching.

Examples of main matching:

Rosie> .match URI "ftp://bob:dole@www.google.com:80/a/b/c/page.cgi?query=something#offset"
{"URI": 
   {"subs": 
      [{"scheme": 
          {"text": "ftp", 
           "pos": 1.0}}, 
       {"hier_part": 
          {"subs": 
             [{"authority": 
                 {"subs": 
                    [{"userinfo": 
                        {"text": "bob:dole", 
                         "pos": 7.0}}, 
                     {"host": 
                        {"text": "www.google.com", 
                         "pos": 16.0}}, 
                     {"port": 
                        {"text": "80", 
                         "pos": 31.0}}], 
                  "pos": 7.0, 
                  "text": "bob:dole@www.google.com:80"}}, 
              {"path_abempty": 
                 {"text": "\/a\/b\/c\/page.cgi", 
                  "pos": 33.0}}], 
           "pos": 5.0, 
           "text": "\/\/bob:dole@www.google.com:80\/a\/b\/c\/page...."}}, 
       {"query": 
          {"text": "query=something", 
           "pos": 49.0}}, 
       {"fragment": 
          {"text": "offset", 
           "pos": 65.0}}], 
    "pos": 1.0, 
    "text": "ftp:\/\/bob:dole@www.google.com:80\/a\/b\/c\/p..."}}

There is one downside to this expression that I haven't been able to determine a better way to handle: the path name has 5 different variations and it's uncertain which you will have returned.

I added an alternative POSIX regex (seen in Appendix B of the RFC) as well.

Rosie> .match uri.alt "ftp://bob:dole@www.google.com:80/a/b/c/page.cgi?query=something#offset"
{"uri.alt": 
   {"subs": 
      [{"uri.alt.scheme": 
          {"text": "ftp", 
           "pos": 1.0}}, 
       {"uri.alt.authority": 
          {"text": "bob:dole@www.google.com:80", 
           "pos": 7.0}}, 
       {"uri.alt.path": 
          {"text": "\/a\/b\/c\/page.cgi", 
           "pos": 33.0}}, 
       {"uri.alt.query": 
          {"text": "query=something", 
           "pos": 49.0}}, 
       {"uri.alt.fragment": 
          {"text": "offset", 
           "pos": 65.0}}], 
    "pos": 1.0, 
    "text": "ftp:\/\/bob:dole@www.google.com:80\/a\/b\/c\/p..."}}

For breaking down the authority section, I added a second alternative:

Rosie> .match uri.alt2 "ftp://bob:dole@www.google.com:80/a/b/c/page.cgi?query=something#offset"
{"uri.alt2": 
   {"subs": 
      [{"uri.alt.scheme": 
          {"text": "ftp", 
           "pos": 1.0}}, 
       {"authority": 
          {"subs": 
             [{"userinfo": 
                 {"text": "bob:dole", 
                  "pos": 7.0}}, 
              {"host": 
                 {"text": "www.google.com", 
                  "pos": 16.0}}, 
              {"port": 
                 {"text": "80", 
                  "pos": 31.0}}], 
           "pos": 7.0, 
           "text": "bob:dole@www.google.com:80"}}, 
       {"uri.alt.path": 
          {"text": "\/a\/b\/c\/page.cgi", 
           "pos": 33.0}}, 
       {"uri.alt.query": 
          {"text": "query=something", 
           "pos": 49.0}}, 
       {"uri.alt.fragment": 
          {"text": "offset", 
           "pos": 65.0}}], 
    "pos": 1.0, 
    "text": "ftp:\/\/bob:dole@www.google.com:80\/a\/b\/c\/p..."}}

Matching valid IP addresses are possible (which is an improvement over the current ip_address expression):

Rosie> .match IPv4address "127.0.0.1"
{"*": 
   {"text": "127.0.0.1", 
    "pos": 1.0}}

Errors when given an invalid IP:

Rosie> .match IPv4address "123.456.789.0"
     SEQUENCE: (dec_octet ~ "." ~ dec_octet ~ "." ~ dec_octet ~ "." ~ dec_octet)
     FAILED to match against input "123.456.789.0"
-----snip-----
 11...................................NAMED CHARSET: [:digit:]
                                      Matched "5" (against input "56.789.0")
                 REFERENCE: ~
                 FAILED to match against input "6.789.0"
                 This identifier is a built-in RPL pattern

Repl: No match  (turn debug off to hide the match evaluation trace)

Current ip_address expression:

Rosie> .match ip_address "123.456.789.0"
{"*": 
   {"text": "123.456.789.0", 
    "pos": 1.0}}

IPv6 addresses are also matchable:

Rosie> .match IPv6address "2001:db8::ff00:32:8439"
{"*": 
   {"text": "2001:db8::ff00:32:8439", 
    "pos": 1.0}}

It will still match this (with warning) unless adding a terminating match:

Rosie> .match IPv6address "::8:7:6:5:4:3:2:1"
{"*": 
   {"text": "::8:7:6:5:4:3:2", 
    "pos": 1.0}}
Warning: 2 unmatched characters at end of input

Terminating match:

Rosie> .match IPv6address !. "::8:7:6:5:4:3:2:1"
     SEQUENCE: (IPv6address ~ !.)
     FAILED to match against input "::8:7:6:5:4:3:2:1"
----snip-----
  7.....PREDICATE: !.
        FAILED to match against input ":1"
        Explanation (EXPRESSION): !.
           REFERENCE: .
           Matched ":" (against input ":1")
           This identifier is a built-in RPL pattern

Repl: No match  (turn debug off to hide the match evaluation trace)

Merge request reports