Commit 042e3d0f authored by Jamie A. Jennings's avatar Jamie A. Jennings

First draft of librosie doc

parent e2013782
Pipeline #51736021 passed with stage
in 1 minute and 14 seconds
# Documentation for librosie
## Overview
* pthreads
* convention about return value indicating an API error
* define engine
* input data is sequence of bytes
* output encoder defn, produces linearized output as a sequence of bytes (except bool)
## Guide to librosie.h
### Types
Librosie uses fixed-width integer types (e.g. `uint32_t`) in all key data
structures. This ensures that we can do the following in platform-independent
ways:
- publish accurate limits on things like input size and number of capture names
in a pattern
- read and write data to disk in a single format
An `Engine` struct holds a pointer to the (opaque) engine state, and a lock to
restrict access to one thread at a time. In a multi-threaded program, an Engine
should be created for each thread that will use librosie. (The state of an
Engine is reasonably small. Example programs in C and Go have spawned 1,000+
threads, each with their own Engine.)
typedef struct rosie_engine {
void *L;
pthread_mutex_t lock;
} Engine;
Rosie strings have a length and pointer to data. They are not null terminated,
and may contain nulls. Input data from the caller must be passed to librosie in
this form.
Librosie does not modify the input data, making it possible to pass to librosie
a "native" pointer to the data if the client language provides one. For
example, the Python `cffi` binding to [libffi](https://sourceware.org/libffi/)
lets the Rosie Python client pass a pointer to the input data, which is a Python
`bytes` object. This avoids copying of the input data, saving time and memory.
Of course, the input data must be a contiguous sequence of bytes.
typedef struct rosie_string {
uint32_t len;
byte_ptr ptr;
} rstr;
The `rosie_match` API returns a structure describing a match result. The fields
are:
- `data`, a string encoding of the results (**important:** see
[Interpreting match results](#interpreting-match-results))
- `leftover`, an integer number of bytes left unmatched (when the match
succeeded)
- `abend`, 0 when the match ended normally, 1 when it ended abnormally by encountering an
RPL `error` pattern
- `ttotal`, an integer number of microseconds spent in the call,
subject to the platform's clock resolution (see **clock()** in **time.h**)
- `tmatch`, an integer number of microseconds spent actually doing the matching
(whereas `ttotal` includes time spent encoding the results to produce `data`)
typedef struct rosie_matchresult {
str data;
int leftover;
int abend;
int ttotal;
int tmatch;
} match;
### Tuning parameters
#### INITIAL_RPLX_SLOTS 32
Compiled patterns are assigned a positive integer handle, which is returned to
the client. This number of slots are allocated when an engine is created. More
are allocated on demand.
#### MIN_ALLOC_LIMIT_MB 8192
See [**rosie_alloc_limit**](#rosie_alloc_limit). Do not lower this value.
Increasing it simply raises the minimum allocation limit that can be set through
**rosie_alloc_limit**.
#### MAX_ENCODER_NAME_LENGTH 64
Each of Rosie's output encoders has a name, e.g. `color`, `byte`. The encoders
implemented in Lua are declared in **init.lua**, and those implemented in C are
named in `encoder_table` in **common.lua**, which maps from names to the numbers
used by the C code.
The `MAX_ENCODER_NAME_LENGTH` must be at least 1 more than the length of the
longest of these encoder names. Do not change this value.
### Uniform status codes returned
Each librosie API returns a status code:
- `SUCCESS` will always be defined as 0
- `ERR_OUT_OF_MEMORY`, when an allocation request fails
- `ERR_SYSCALL_FAILED`, when a system call fails
- `ERR_ENGINE_CALL_FAILED`, when a Rosie API call fails
The status `ERR_ENGINE_CALL_FAILED` indicates a bug in librosie. That is, it is
safe to print a message suggesting that this be reported as an issue when
encountering this return value.
### Interpreting match results
In a `rosie_matchresult`, the `data` field is a `rosie_string` containing a
length and a pointer. When the pointer is non-null, it points to a string (byte
sequence) with the given length (**not** null terminated). This is the data
returned by the output encoder.
But when the `data` pointer is null, the `data` length field indicates the
actual result:
- `NO_MATCH` will always be defined as 0
- `MATCH_WITHOUT_DATA` will always be defined as 1, and is returned when the
output encoder produced no output data
- `ERR_NO_ENCODER`, when the output encoder or trace style passed to librosie is
invalid
- `ERR_NO_PATTERN`, when the pattern handle passed to librosie is invalid
- `ERR_NO_FILE`, when a filename passed to librosie cannot be found (`rosie_matchfile` only)
## API
### Engine management
**Engine *rosie_new(str \*messages)**
**void rosie_finalize(Engine \*e)**
**int rosie_libpath(Engine \*e, str \*newpath)**
**int rosie_alloc_limit(Engine \*e, int \*newlimit, int \*usage)**
The front-end of the RPL compiler, the CLI, and some of the output encoders
(such as `color` and `jsonpp`) are written in Lua, a language that has garbage
collection. The **rosie_alloc_limit** API allows the client program to set and
query a "soft limit" on the size of the Lua heap.
The functions **rosie_match**, **rosie_trace**, and **rosie_matchfile** check to
see if the Lua heap has grown beyond the current limit, and if so, invokes the
garbage collector.
When called with **newlimit** of 0, the limit is removed, and will default to
Lua's default garbage collection settings.
When called with **newlimit** of -1, the call is a query. On return,
**newlimit** will be set to the current limit, and **usage** to the current Lua
heap usage.
The units of **newlimit** and **usage** are Kb (1024 bytes).
### Loading RPL into an engine
Strings containing RPL blocks are processed by an engine using **rosie_load**.
A block may contain a single statement (e.g. `d=[:digit:]`) or many statements.
A block may also contain comments, an RPL language version declaration, a
package declaration, and import statements.
**int rosie_load(Engine \*e, int \*ok, str \*src, str \*pkgname, str \*messages)**
The string **src** is read, compiled, and the resulting bindings are stored in
the engine's environment. If **ok** is 0 on return, no errors occurred. There
may still be **messages** (e.g. warnings).
If **ok** is non-zero, an error occurred, and **messages** will contain a
JSON-encoded error structure.
_TODO: Document the JSON violation structure._
The client is responsible for freeing **messages** with **rosie_free_string_ptr**.
If **src** contained a package declaration, the package name will be returned in
**pkgname**.
The client is responsible for freeing **pkgname** with **rosie_free_string_ptr**.
**int rosie_loadfile(Engine \*e, int \*ok, str \*fn, str \*pkgname, str \*messages)**
Same functionality as **rosie_load**, except **fn** is a filename and librosie
reads and processes the contents of that file.
**int rosie_import(Engine \*e, int \*ok, str \*pkgname, str \*as, str \*actual_pkgname, str \*messages)**
Calling **rosie_import** with package <pkgname> causes the same actions as
calling **rosie_load** with the string `import <pkgname>`, with one exception:
**rosie_import** will always find and load the RPL package `<pkgname>` in the
filesystem. By contrast, when **rosie_load** encounters `import <pkgname>`, the
package may have already been loaded into the engine.
Including a (string) value for the **as** parameter behaves like `import
<pkgname> as <as>` with the same caveats.
### Compiling an RPL expression
An RPL expression must be compiled before it can be used to match (or trace)
with an input string.
**int rosie_compile(Engine \*e, str \*expression, int \*pat, str \*messages)**
The string **expression** is compiled into an _rplx_ object and an integer
handle to that object is returned. The object will be available until
explicitly freed, or until the engine **e** is freed with **rosie_finalize**.
If **pat** is non-zero upon return, it is the _rplx handle_, which behaves
somewhat like a Unix file descriptor in that (1) it remains valid until
explicitly freed (with **rosie_free_rplx**) and (2) the same integer value may
be reused by the engine afterwards.
Regardless of error status, **messages** may contain errors, warnings, or other
information.
The client is responsible for freeing **messages** with **rosie_free_string_ptr**.
**int rosie_free_rplx(Engine \*e, int pat)**
Call **rosie_free_rplx** to allow the engine to reclaim the compiled pattern **pat**.
### Matching and tracing
**int rosie_match(Engine \*e, int pat, int start, char \*encoder, str \*input, match \*match)**
Using engine **e** and its pattern **pat**, match the pattern against **input**
and produce match data (a string) using output encoder **encoder**. Note that
**encoder** is a null-terminated C-style string.
The **match** argument is a pointer to a **rosie_matchresult** structure that is
_allocated by the client program,_ into which the match results will be written.
A single struct may be used across repeated calls to **rosie_match**, and indeed
this is recommended.
As noted in the earlier section on [librosie types](#types), a
**rosie_matchresult** contains one dynamically allocated object, its **data**
field. The client program does not need to and _should not_ manage the storage
for **data** because librosie will automatically reuse it, making it larger as
needed (using **realloc**).
IMPORTANT: Because librosie reuses the match results **data** field (a string),
the client program must make a copy of that string, if necessary, before calling
**rosie_match** again.
**int rosie_matchfile(Engine \*e, int pat, char \*encoder, int wholefileflag,
char \*infilename, char \*outfilename, char \*errfilename,
int \*cin, int \*cout, int \*cerr,
str \*err)**
This is a convenience function, and useful if you are writing a new CLI. With
the same meanings of **e**, **pat**, and **encoder** as above, this function
reads **infilename** line by line, unless **wholefileflag** is non-zero, in
which case the entire file contents is read at once. Match output, produced by
**encoder** is written to **outfilename**, and input lines that did not match
are written to **errfilename**.
An empty string passed in for a filename argument defaults to the standard
input, output, and error channels, respectively. To ignore one of the outputs,
set its filename to "/dev/null" or the equivalent on your platform.
When the value returned in **cin** is 0 or more, **rosie_matchfile** executed
successfully.
And on a successful return, **cin**, **cout**, and **cerr** will contain the
number of lines read from the input and written to **outfilename** and
**errfilename**.
If the value returned in **cin** is -1, then **cout** will contain an error
code such as `ERR_NO_FILE` and **err** will hold a human-readable explanation.
The client is responsible for freeing **err** with **rosie_free_string_ptr**.
**int rosie_trace(Engine \*e, int pat, int start, char \*trace_style, str \*input, int \*matched, str \*trace)**
**str rosie_new_string(byte_ptr msg, size_t len)**
**str \*rosie_new_string_ptr(byte_ptr msg, size_t len)**
**str \*rosie_string_ptr_from(byte_ptr msg, size_t len)**
**str rosie_string_from(byte_ptr msg, size_t len)**
**void rosie_free_string(str s)**
**void rosie_free_string_ptr(str \*s)**
**int rosie_read_rcfile(Engine \*e, str \*filename, int \*file_exists, str \*options, str \*messages)**
**int rosie_execute_rcfile(Engine \*e, str \*filename, int \*file_exists, int \*no_errors, str \*messages)**
**int rosie_config(Engine \*e, str \*retvals)**
**int rosie_expression_refs(Engine \*e, str \*input, str \*refs, str \*messages)**
**int rosie_block_refs(Engine \*e, str \*input, str \*refs, str \*messages)**
**int rosie_expression_deps(Engine \*e, str \*input, str \*deps, str \*messages)**
**int rosie_block_deps(Engine \*e, str \*input, str \*deps, str \*messages)**
**int rosie_parse_expression(Engine \*e, str \*input, str \*parsetree, str \*messages)**
**int rosie_parse_block(Engine \*e, str \*input, str \*parsetree, str \*messages)**
/\*
Administrative:
+ status:int, engine:void\* = new(const char \*name)
+ status:int = finalize(void \*engine)
+ status:int, desc:string = config(void \*engine)
\* status:int = setlibpath(void \*engine, const char \*libpath)
+ set soft memory limit to m MB, with optional logging of when it is hit
logging level (to stderr)?
clone an engine? (to avoid setup cost; but cloned engine must be in new Lua state)
RPL:
+ status:int, pkgname:string, errors:strings = load(void \*engine, const char \*rpl)
+ status:int, pkgname:string, errors:strings = import(packageref, localname)
status:int = undefine(id)
test(rpl)?
testfile(filename)?
Match/trace:
+ status:int, pat:int, errors:strings = compile(void \*engine, const char \*expression)
+ status:int = free_rplx(void \*engine, int pat)
+ status:int = match(void \*engine, int pat, int start, str \*encoder,
str \*input, match \*match);
+ status:int, tracestring:\*buffer = trace(void \*engine, int pat, buffer \*input, int start, int encoder, int tracestyle)
status:int, cin:int, cout:int, cerr:int, errors:strings =
matchfile(void \*engine, int pat,
const char \*infilename, const char \*outfilename, const char \*errfilename,
int start, int encoder, int wholefile)
status:int, cin:int, cout:int, cerr:int, errors:strings =
tracefile(void \*engine, void pat,
const char \*infilename, const char \*outfilename, const char \*errfilename,
int start, int encoder, int readmethod, int tracestyle)
Debugging:
status:int, desc:string = lookup(void \*engine, const char \*id)
status:int, expr:string, errors:strings = expand(void \*engine, const char \*expr)
status:int, descs:strings = list(void \*engine, const char \*localnamefilter, const char \*packagenamefilter)
\*/
#endif
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment