Add RFC3986 expressions
Created by: Veratil
This is the expressions for URI matching based on RFC3986. A possible fix for #17 (closed) as well as enhancing IP address matching.
Examples of main matching:
Rosie> .match URI "ftp://bob:dole@www.google.com:80/a/b/c/page.cgi?query=something#offset"
{"URI":
{"subs":
[{"scheme":
{"text": "ftp",
"pos": 1.0}},
{"hier_part":
{"subs":
[{"authority":
{"subs":
[{"userinfo":
{"text": "bob:dole",
"pos": 7.0}},
{"host":
{"text": "www.google.com",
"pos": 16.0}},
{"port":
{"text": "80",
"pos": 31.0}}],
"pos": 7.0,
"text": "bob:dole@www.google.com:80"}},
{"path_abempty":
{"text": "\/a\/b\/c\/page.cgi",
"pos": 33.0}}],
"pos": 5.0,
"text": "\/\/bob:dole@www.google.com:80\/a\/b\/c\/page...."}},
{"query":
{"text": "query=something",
"pos": 49.0}},
{"fragment":
{"text": "offset",
"pos": 65.0}}],
"pos": 1.0,
"text": "ftp:\/\/bob:dole@www.google.com:80\/a\/b\/c\/p..."}}
There is one downside to this expression that I haven't been able to determine a better way to handle: the path name has 5 different variations and it's uncertain which you will have returned.
I added an alternative POSIX regex (seen in Appendix B of the RFC) as well.
Rosie> .match uri.alt "ftp://bob:dole@www.google.com:80/a/b/c/page.cgi?query=something#offset"
{"uri.alt":
{"subs":
[{"uri.alt.scheme":
{"text": "ftp",
"pos": 1.0}},
{"uri.alt.authority":
{"text": "bob:dole@www.google.com:80",
"pos": 7.0}},
{"uri.alt.path":
{"text": "\/a\/b\/c\/page.cgi",
"pos": 33.0}},
{"uri.alt.query":
{"text": "query=something",
"pos": 49.0}},
{"uri.alt.fragment":
{"text": "offset",
"pos": 65.0}}],
"pos": 1.0,
"text": "ftp:\/\/bob:dole@www.google.com:80\/a\/b\/c\/p..."}}
For breaking down the authority section, I added a second alternative:
Rosie> .match uri.alt2 "ftp://bob:dole@www.google.com:80/a/b/c/page.cgi?query=something#offset"
{"uri.alt2":
{"subs":
[{"uri.alt.scheme":
{"text": "ftp",
"pos": 1.0}},
{"authority":
{"subs":
[{"userinfo":
{"text": "bob:dole",
"pos": 7.0}},
{"host":
{"text": "www.google.com",
"pos": 16.0}},
{"port":
{"text": "80",
"pos": 31.0}}],
"pos": 7.0,
"text": "bob:dole@www.google.com:80"}},
{"uri.alt.path":
{"text": "\/a\/b\/c\/page.cgi",
"pos": 33.0}},
{"uri.alt.query":
{"text": "query=something",
"pos": 49.0}},
{"uri.alt.fragment":
{"text": "offset",
"pos": 65.0}}],
"pos": 1.0,
"text": "ftp:\/\/bob:dole@www.google.com:80\/a\/b\/c\/p..."}}
Matching valid IP addresses are possible (which is an improvement over the current ip_address expression):
Rosie> .match IPv4address "127.0.0.1"
{"*":
{"text": "127.0.0.1",
"pos": 1.0}}
Errors when given an invalid IP:
Rosie> .match IPv4address "123.456.789.0"
SEQUENCE: (dec_octet ~ "." ~ dec_octet ~ "." ~ dec_octet ~ "." ~ dec_octet)
FAILED to match against input "123.456.789.0"
-----snip-----
11...................................NAMED CHARSET: [:digit:]
Matched "5" (against input "56.789.0")
REFERENCE: ~
FAILED to match against input "6.789.0"
This identifier is a built-in RPL pattern
Repl: No match (turn debug off to hide the match evaluation trace)
Current ip_address expression:
Rosie> .match ip_address "123.456.789.0"
{"*":
{"text": "123.456.789.0",
"pos": 1.0}}
IPv6 addresses are also matchable:
Rosie> .match IPv6address "2001:db8::ff00:32:8439"
{"*":
{"text": "2001:db8::ff00:32:8439",
"pos": 1.0}}
It will still match this (with warning) unless adding a terminating match:
Rosie> .match IPv6address "::8:7:6:5:4:3:2:1"
{"*":
{"text": "::8:7:6:5:4:3:2",
"pos": 1.0}}
Warning: 2 unmatched characters at end of input
Terminating match:
Rosie> .match IPv6address !. "::8:7:6:5:4:3:2:1"
SEQUENCE: (IPv6address ~ !.)
FAILED to match against input "::8:7:6:5:4:3:2:1"
----snip-----
7.....PREDICATE: !.
FAILED to match against input ":1"
Explanation (EXPRESSION): !.
REFERENCE: .
Matched ":" (against input ":1")
This identifier is a built-in RPL pattern
Repl: No match (turn debug off to hide the match evaluation trace)