README.md 13.3 KB
Newer Older
1
[![beta](https://img.shields.io/badge/version-1.1.0-093073.svg)](./CHANGELOG)
Jamie A. Jennings's avatar
Jamie A. Jennings committed
2
[![pipeline status](https://gitlab.com/rosie-pattern-language/rosie/badges/master/pipeline.svg)](https://gitlab.com/rosie-pattern-language/rosie/commits/master)
3

4
# Rosie Pattern Language (RPL)
5

6 7 8 9 10 11 12 13 14 15
## News
* v1.1.0 released (March 2019)
* Rosie has moved from
  [GitHub](https://github.com/jamiejennings/rosie-pattern-language) to
  [GitLab](https://gitlab.com/rosie-pattern-language/rosie) 
* [Brew installer for Mac OS X](https://gitlab.com/rosie-community/packages/homebrew-rosie)
* Pip installer for Python interface to librosie -- `pip install rosie`

## What is Rosie/RPL?

16 17 18
RPL is a variant of modern Regular Expressions (regex) that is used, for
example, to validate input and to extract key data from unstructured (and
semi-structured) text.  Rosie is an implementation of RPL that is designed
19
to scale to big data, many developers, and large collections of patterns.
20

21 22
The Rosie project provides a CLI (like Unix `grep`) and a library, `librosie`,
that can be used within programs written in Python, Go, Haskell, C, and other
23
languages.  
24

25 26
In the screen capture below, the `net.url` pattern (from the `net` library) is
used to extract all of the URLs mentioned on the google home page.
27

28
![](extra/examples/images/readme-fig3.png)
29

30
Suppose we wanted to see all the sub-domains of *google.com* that are
31 32 33 34
referenced on google's home page.  The `net.fqdn` pattern matches domain names,
and the _look behind_ operator `<` can be used to look backwards to see if the
matched domain ended in `google.com`.  (In RPL, to match a literal string, you
place it in double quotes, like string literals in other programs.)
35

36
![](extra/examples/images/readme-fig4.png)
37

38
## See more [examples](extra/examples/README.md)
39

40 41
## Read the [documentation](doc/README.md)

42 43
## Why RPL?

Jamie A. Jennings's avatar
Jamie A. Jennings committed
44 45
* More **readable** and **maintainable** than regex
  * RPL [looks like a programming language](rpl/num.rpl), with whitespace, comments, identifiers
Jamie A. Jennings's avatar
Jamie A. Jennings committed
46
  * Built-in unit test framework with `rosie test` (useful as regression tests when modifying patterns)
47
  * Plays well with development tools (like `diff`, ci tools)
Jamie A. Jennings's avatar
Jamie A. Jennings committed
48 49
  * Patterns can be (optionally) put into namespaces for easy sharing, e.g. `net` for network patterns
* Better **development experience** than regex
50 51 52 53 54
  * Rosie ships with a library of dozens of useful patterns (timestamps, network addresses, and more)
  * Want to see how a match succeeds or fails? Use the `rosie trace` command.
  * Pattern development tools: tracing, REPL, color-coded match output
  * The `rosie test` command compiles patterns and runs their unit tests.  Use
    this command during your project's build to avoid run-time errors.
Jamie A. Jennings's avatar
Jamie A. Jennings committed
55
* Rosie produces **match output in multiple formats** including:
56 57
  * Color, for human-readable use at the command line
  * Plain text (full text `-o data` or list of sub-matches `-o subs`) for scripting
Jamie A. Jennings's avatar
Jamie A. Jennings committed
58
  * JSON to use as input to other programs
59
  * Native data structures in Python, Haskell, Go, etc.
Jamie A. Jennings's avatar
Jamie A. Jennings committed
60
* RPL is a **Parsing Expression Grammar** (PEG) language
61
  * A superset of regular expressions, i.e. more powerful
Jamie A. Jennings's avatar
Jamie A. Jennings committed
62 63
  * Allows recursive grammars, so RPL can recognize recursively defined
    structures like [JSON](rpl/json.rpl)
Jamie A. Jennings's avatar
Jamie A. Jennings committed
64
  * The PEG formalism is an elegant alternative to the ad hoc
Jamie A. Jennings's avatar
Jamie A. Jennings committed
65
    extensions to regular expressions found in most "regex" libraries
66 67 68
  * Supports linear run-time (in the input size) for common use cases


69 70
## Features

71
- Reasonably small:
Jamie A. Jennings's avatar
Jamie A. Jennings committed
72 73 74
  * The Rosie compiler/runtime/libraries take up less than 2MB on disk. <!-- 
  du -ch /usr/local/lib/librosie.a /usr/local/bin/rosie /usr/local/lib/rosie/lib /usr/local/lib/rosie/rpl 
  -->
75
  * See a [discussion of Rosie performance](doc/performance.md).
Jamie A. Jennings's avatar
Jamie A. Jennings committed
76
- Reasonably good performance: 
77 78 79 80 81
  * Faster than [Grok](https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html)
  (by a factor of 4 or more).
  * Slower than Unix [grep](https://en.wikipedia.org/wiki/Grep).
  * But Rosie does more than both of them -- and the CLI is still single-threaded.
  * See a [discussion of Rosie performance](doc/performance.md).
82
- Extensible pattern library
83
- Pattern development tools: CLI, REPL, unit tests, trace facility
84
- Rosie is fluent in Unicode (UTF-8), ASCII, and the
85
  [binary language of moisture vaporators](http://www.starwars.com/databank/moisture-vaporator)
86
  (arbitrary byte-encoded data).
87

88

89
## Installing
90

91
- [x] Build from source (see [local installation](#local-installation) below)
92
- [X] [Brew installer for Mac OS X](https://gitlab.com/rosie-community/packages/homebrew-rosie)
93
- [ ] RPM and debian packages
94

95 96 97 98 99 100 101 102 103 104
### Repository organization

Releases are tagged, e.g. `v1.1.0`.  The head of the master branch is always the
latest release plus any post-release documentation updates (which will be folded
into the next release).

The dev branch should be a stable working development build, possibly with some
quirks and likely with out-of-date documentation.  If you are a contributor, you
probably want to be running the latest dev branch.

105
### Build dependencies
106

107
Tools: git, make, gcc/cc <br>
Jamie A. Jennings's avatar
Jamie A. Jennings committed
108 109
Libraries: readline, readline-devel <br>
Additional libraries for Linux: libbsd, libbsd-devel
Jamie Jennings's avatar
Jamie Jennings committed
110

111 112 113 114 115 116 117 118 119 120
### Supported platforms

Platforms: (most of these were tested with docker)

- [x] [Arch Linux](https://www.archlinux.org/) 
- [x] [CentOS Linux release 7.4.1708 (Core) and up](https://www.centos.org)
- [x] [Fedora release 25 (Twenty Five) and up](https://getfedora.org)
- [x] OS X (macOS 10.13 and up)
- [x] [Ubuntu 16.04.1 LTS (Xenial Xerus)](https://www.ubuntu.com/)
- [x] Windows Subsystem for Linux (see [example installation script](extra/WSL/rosie_install.sh))
Jamie A. Jennings's avatar
Jamie A. Jennings committed
121
- [ ] Windows (Help wanted!)
122

123
### Local installation
124

125
To install Rosie, clone this repository and `cd rosie-pattern-language` (the
Jamie A. Jennings's avatar
Jamie A. Jennings committed
126
_build directory_).  Then run `make`.
127

128
If the build succeeds, you will see a message like this:
129

130 131 132 133 134 135 136
```
Rosie Pattern Engine 1.1.0 built successfully!
    Use 'make install' to install into DESTDIR=/usr/local
    Use 'make uninstall' to uninstall from DESTDIR=/usr/local
    To run rosie from the build directory, use ./bin/rosie
    Try this example, and look for color text output: rosie match all.things test/resolv.conf
``` 
137

138 139
If the build fails, you probably need to install one of the dependencies listed
above, using a package management command like `apt-get`, `dnf install`, or
Jamie A. Jennings's avatar
Jamie A. Jennings committed
140
`brew install`.  More information on building Rosie are [here](doc/deployment.md).
141

142 143
After a successful build, you can run the Rosie CLI from the build directory
with `bin/rosie`.  Try the example suggested in the build message:
144

145 146 147
```
bin/rosie match all.things test/resolv.conf
```
148

149 150 151
The output is a color rendering of the file `test/resolv.conf` in which each
color indicates that a particular pattern matched.  The bold blue font, for
example, is used for strings that match the pattern `time.any`.
Jamie Jennings's avatar
Jamie Jennings committed
152

153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217
Try searching (using `rosie grep`) for lines containing this pattern:

```
bin/rosie grep -o color time.any test/resolv.conf 
```

The `-o color` option sets the output mode to color.  Otherwise, the `rosie
grep` command behaves like Unix `grep` and simply prints all matching lines.

Searching for literal strings requires double quotes. And since the shell will
strip off the double quotes, we have to put the entire RPL pattern in single quotes:

```
bin/rosie grep -o color '"domain"' test/resolv.conf 
```

Finally, let's search for lines that start with the word "domain" followed by a
network address.  We can use `rosie match` instead of `rosie grep` because we
want matching to start at the beginning of the line:

```
bin/rosie match '"domain" net.any' test/resolv.conf 
```

The default output format for `rosie match` is color.  To see JSON output, try
any of the commands above with `-o jsonpp` (for JSON pretty-printed) or `-o
json`. 


### System installation

Running `make install` creates an installation directory in `/usr/local` (by
default, or in DESTDIR otherwise).  The executable is `/usr/local/bin/rosie`.
Other files used by Rosie can be found in `/usr/local/lib/rosie/`.


## Other sources of information

Rosie announcements on Twitter:
* [@jamietheriveter](https://twitter.com/jamietheriveter)

Rosie's home page, with blog posts and other info:
* [Rosie home page](http:tiny.cc/rosie)

Rosie on IBM developerWorks Open: _(N.B. Examples may be out of date.)_
* [Rosie blogs and talks](https://developer.ibm.com/code/category/rosie-pattern-language/)
* Including:
    * [Project Overview](https://developer.ibm.com/code/rosie-pattern-language/)
    * [Introduction](https://developer.ibm.com/code/2016/02/20/world-data-science-needs-rosie-pattern-language/)
    * [Parsing Spark logs](https://developer.ibm.com/code/2016/04/26/develop-test-rosie-pattern-language-patterns-part-1-parsing-log-files/)
    * [Parsing CSV files](https://developer.ibm.com/code/2016/10/14/develop-test-rosie-pattern-language-patterns-part-2-csv-data/)

For an introduction to Rosie and explanations of the key concepts, see
[Rosie's _raison d'etre_](doc/raisondetre.md).

I wrote some [notes](doc/geek.md) on Rosie's design and theoretical foundation
for my fellow PL and CS enthusiasts.

## Project roadmap

### Release History
- [x] v1.1.0 (2019 March)
- [x] v1.0.0 (2018 June)
- [x] v1.0.0-beta (2018 February)
- [x] v1.0.0-alpha (2017 September)
Jamie Jennings's avatar
Jamie Jennings committed
218 219

### API and language support
220
- [x] C API (`librosie`)
Jamie A. Jennings's avatar
Jamie A. Jennings committed
221 222 223
- [x] Python module (`rosie.py`)
- [x] C client of `librosie.so` (dynamic library)
- [x] C client of `librosie.o` (static linking)
224
- [x] Go module
225
- [x] Haskell module
226
- [ ] Ruby, node.js modules
Jamie Jennings's avatar
Jamie Jennings committed
227

228
### Roadmap (future)
229
- [ ] Ahead-of-time compilation
230 231
- [ ] Support JSON output for trace, config, list, and other commands
- [ ] Generate patterns automatically from locale data
232
- [ ] Linter
233 234 235 236
- [ ] Toolkit for user-developed macros
- [ ] Toolkit for user-developed output encoders
- [ ] Compiler optimizations

Jamie Jennings's avatar
Jamie Jennings committed
237
<hr>
238

Jamie Jennings's avatar
Jamie Jennings committed
239
## Contributing
240

Jamie Jennings's avatar
Jamie Jennings committed
241 242
### Write new patterns!

243 244 245 246 247 248 249
We are interesting in adding more patterns to the "standard" library that ships
with Rosie in the [rpl directory](rpl).  And we have a group of Community
repositories for RPL patterns where you can be a maintainer of your own RPL
packages.  

Of course, you can also publish your own RPL patterns, hosted wherever you keep
other code online.  In that case, let us know about it and we'll link to it!
Jamie Jennings's avatar
Jamie Jennings committed
250

Jamie A. Jennings's avatar
Jamie A. Jennings committed
251
### Calling Rosie from Go, C, Python, Haskell, ...  <a name="api-help"></a> 
Jamie A. Jennings's avatar
Jamie A. Jennings committed
252

Jamie A. Jennings's avatar
Jamie A. Jennings committed
253 254 255 256 257 258
Rosie is available as a [C library](src/librosie/librosie.c) that is compiled
both as a static and dynamic (shared) library.  An assortment of interface
libraries (or "clients") are hosted in
[their own repositories](https://gitlab.com/rosie-community/clients/). We are
eager to get pull requests for improvements (some of the clients are first
drafts) and for interfaces to other languages.
Jamie A. Jennings's avatar
Jamie A. Jennings committed
259

Jamie A. Jennings's avatar
Jamie A. Jennings committed
260
Before Rosie v1.0, we had sample interfaces to node.js and Ruby as well.
Jamie A. Jennings's avatar
Jamie A. Jennings committed
261 262 263 264 265
These have not been updated, and there were many changes to `librosie` when v1.0
was released.  But we are confident that interfaces to those languages can be
created in a reasonably straightforward way.  Please open an issue to request a
particular language interface or to offer to work on one!

266
### Wanted: RPL tools
Jamie A. Jennings's avatar
Jamie A. Jennings committed
267

Jamie Jennings's avatar
Jamie Jennings committed
268
Because RPL is designed like a programming language (and it has an accessible
Jamie A. Jennings's avatar
Jamie A. Jennings committed
269
parser, [rpl_1_2.rpl](rpl/rosie/rpl_1_2.rpl), new tools are relatively easy to
Jamie Jennings's avatar
Jamie Jennings committed
270
write.  Here are some ideas:
Jamie A. Jennings's avatar
Jamie A. Jennings committed
271

272
- **Package doc:** Given a package name, display the exported pattern names
Jamie Jennings's avatar
Jamie Jennings committed
273
      and, for each, a summary of the strings accepted and rejected.
Jamie A. Jennings's avatar
Jamie A. Jennings committed
274

275
- **Improved trace:** The current trace output could be improved,
Jamie Jennings's avatar
Jamie Jennings committed
276 277 278 279
  particularly to make it more compact.  A trace is represented internally as a
  table which could easily be rendered as JSON.  And since this data structure
  represents a complete trace, it is the right input to a new algorithm that
  produces a compact summary.  Or an animated output.
Jamie A. Jennings's avatar
Jamie A. Jennings committed
280

281
- **Linter:** Users of most programming languages are aided by a linting
Jamie Jennings's avatar
Jamie Jennings committed
282 283 284 285
	tool, in part because of correct expressions that are not, in fact, what the
	programmer wanted.  For example, the character set `[_-.]` is a range in
	RPL, but it is an empty range.  Probably the author meant to write a set of
	3 characters, like `[._-]`.
Jamie A. Jennings's avatar
Jamie A. Jennings committed
286

287
- **Notebook:** A Rosie kernel for a notebook would be useful to many
Jamie Jennings's avatar
Jamie Jennings committed
288 289 290
  people.  So would adding Rosie capabilities to a general notebook environment
  (e.g. [Jupyter](http://jupyter.org)).
  
291
- **Pattern generators:** A number of techniques hold promise for
Jamie Jennings's avatar
Jamie Jennings committed
292 293 294 295 296 297 298
automatically generating RPL patterns, for example:
  * Convert a format string to pattern, e.g. a `printf` format string, or the
    posix locale structure's fields that specify how to format numbers,
    dates/times, and monetary amounts.
  * Infer the format of each field in a CSV (or JSON, HTML, XML) file using
  analytics techniques such as statistics and machine learning.
  * Convert a regular expression to an RPL pattern.
299

300

301

Jamie Jennings's avatar
Jamie Jennings committed
302
## Acknowledgements
303

Jamie Jennings's avatar
Jamie Jennings committed
304
In addition to the people listed in the CONTRIBUTORS file, we wish to thank:
305

306 307 308 309
- Roberto Ierusalimschy, Waldemar Celes, and Luiz Henrique de Figueiredo, the
  creators of [the Lua language](http://www.lua.org) (MIT License); and again
  Roberto, for his [lpeg library](http://www.inf.puc-rio.br/~roberto/lpeg) (MIT
  License), which has been critical to implementing Rosie.
310

311
-  The Lua community (at large);
Jamie Jennings's avatar
Jamie Jennings committed
312

Jamie Jennings's avatar
Jamie Jennings committed
313 314 315
-  Mark Pulford, the author of
   [lua-cjson](http://www.kyne.com.au/%7Emark/software/lua-cjson.php) (MIT
   License); 
Jamie Jennings's avatar
Jamie Jennings committed
316

Jamie Jennings's avatar
Jamie Jennings committed
317 318
-  Brian Nash, the author of
   [lua-readline](https://github.com/bcnjr5/lua-readline) (MIT License); 
319

Jamie Jennings's avatar
Jamie Jennings committed
320 321
-  Peter Melnichenko, the author of
   [argparse](https://github.com/mpeterv/argparse) (MIT License);
322