Question on (a)+
I've tested the ("a")+
pattern on the command line to figure out what exactly it captures. Let's start with:
echo "a a " | rosie match -o jsonpp '("a")*'
{"type": "*",
"e": 4,
"s": 1,
"data": "a a"}
I'll will shorten this output in the examples below to:
1. echo "" | rosie match -o jsonpp '("a")*' => "data": ""
2. echo " " | rosie match -o jsonpp '("a")*' => "data": ""
3. echo " a" | rosie match -o jsonpp '("a")*' => "data": ""
4. echo "a" | rosie match -o jsonpp '("a")*' => "data": "a"
5. echo "a " | rosie match -o jsonpp '("a")*' => "data": "a"
6. echo "a a" | rosie match -o jsonpp '("a")*' => "data": "a a"
7. echo "a a " | rosie match -o jsonpp '("a")*' => "data": "a a"
Please take a closer look at 5 and 7. The capture does not include the trailing word boundary.
Which means that (a)+
is not equivalent to {a ~}+
but rather {a {~ a}*}
and (a)*
translates to {a {~ a}*} / ""
.
Some more examples
8. echo "b" | rosie match -o jsonpp '("a")* "b"' => "data": "b"
9. echo " b" | rosie match -o jsonpp '("a")* "b"' => "data": " b"
10. echo "a b" | rosie match -o jsonpp '("a")* "b"' => "data": "a b"
11. echo "ab" | rosie match -o jsonpp '("a")* "b"' => no match
12. echo "a ab" | rosie match -o jsonpp '("a")* "b"' => no match
13. echo "a a b" | rosie match -o jsonpp '("a")* "b"' => "data": "a a b"
First I was confused by 9, but the leading space is captured by the word boundary in between ("a")* "b"
== `{("a")* ~ "b"}. In summary they are explainable, and intuitive.
All these examples work fine if I translate (a)+
into {a ~}+
, except for the trailing space not captured in your implementation. My question is whether this is by purpose? Whether it has a reason, which I don't yet understand.
If it's not by purpose, may be it is something that can be clarified in rpl 2.0. Clarifed by documenting the expected transformation happening behind the scene. And if it is not a straight forward transformation, the reason for deliberately choosing something more complicated.
I hope that makes sense.