Commit 31080ad3 authored by Matt's avatar Matt Committed by Dmitry Mozzherin
Browse files

Fix #35 add gRPC method preserving order in output

parent aba68299
......@@ -2,7 +2,9 @@
## Unreleased
## [v0.6.0]
- Add [#35]: gRPC method to preserve order in output according to input
- Add [#30]: write inline and README documentation.
- Add [#29]: docker and dockerhub support.
- Add [#26]: get all parser rules to CamelCase format.
......@@ -27,7 +29,9 @@
This document follows [changelog guidelines]
[v0.5.1]: https://gitlab.com/gogna/gnparser/tree/v0.5.0
[v0.6.0]: https://gitlab.com/gogna/gnparser/compare/v0.5.1...v0.6.0
[v0.5.1]: https://gitlab.com/gogna/gnparser/tree/v0.5.1
[#30]: https://gitlab.com/gogna/gnparser/issues/30
[#29]: https://gitlab.com/gogna/gnparser/issues/29
......
......@@ -12,10 +12,10 @@ parsed into human readable information as follows:
This parser, written in Go, is the 3rd iteration of the project. The first,
[biodiversity] had been written in Ruby, the second, [also
gnparser][gnparser-scala], had been written in Go. This project is learned from
previous ones, and, when it matures, is going to be the substitution of other
two, and will be the only one that is maintained further.
All three projects were developed as a part of [Global Names
gnparser][gnparser-scala], had been written in Go. This project is learned
from previous ones, and, when it matures, it is going to be the a
substitution of other two, and will be the only one that is maintained
further. All three projects were developed as a part of [Global Names
Architecture Project][gna].
Try as a command tool under Windows, Mac or Linux by downloading the [latest
......@@ -33,8 +33,8 @@ gnparser -h
## Introduction
Global Names Parser or ``gnparser`` is a program written in Go for breaking up
scientific names into their different elements. It is uses [peg] -- a Parsing
Expression Grammar (PEG) library.
scientific names into their different elements. It uses [peg] -- a Parsing
Expression Grammar (PEG) tool.
Many other parsing algorithms for scientific names use regular expressions.
This approach works well for extracting canonical forms in simple cases.
......@@ -69,18 +69,18 @@ more efficient JSON conversion.
## Features
- Fastest parser ever.
- Very easy to install, just placing executable somewhere in the PATH is
sufficient.
- Extracts all elements from a name, not only canonical forms.
- Works with very complex scientific names, including hybrids.
- Includes gRPC server that can be used as if a native method call from C++,
C#, Java, Python, Ruby, PHP, JavaScript, Objective C, Dart.
- Use as a native library from Go projects.
- Can run as a command line application.
- Can be scaled to many CPUs and computers (if 300 millions names an
- Fastest parser ever.
- Very easy to install, just placing executable somewhere in the PATH is
sufficient.
- Extracts all elements from a name, not only canonical forms.
- Works with very complex scientific names, including hybrids.
- Includes gRPC server that can be used as if a native method call from C++,
C#, Java, Python, Ruby, PHP, JavaScript, Objective C, Dart.
- Use as a native library from Go projects.
- Can run as a command line application.
- Can be scaled to many CPUs and computers (if 300 millions names an
hour is not enough).
- Calculates a stable UUID version 5 ID from the content of a string.
- Calculates a stable UUID version 5 ID from the content of a string.
## Use Cases
......@@ -115,13 +115,13 @@ If there are problems with parsing a name, parser generates ``qualityWarnings``
messages and lowers parsing ``quality`` of the name. Quality values mean the
following:
- ``"quality": 1`` - No problems were detected
- ``"quality": 2`` - There were small problems, normalized result
should still be good
- ``"quality": 3`` - There were serious problems with the name, and the
final result is rather doubtful
- ``"quality": 0`` - A string could not be recognized as a scientific
name and parsing fails
- ``"quality": 1`` - No problems were detected
- ``"quality": 2`` - There were small problems, normalized result
should still be good
- ``"quality": 3`` - There were serious problems with the name, and the
final result is rather doubtful
- ``"quality": 0`` - A string could not be recognized as a scientific
name and parsing fails
### Creating stable GUIDs for name-strings
......@@ -236,6 +236,13 @@ will be directed to STDERR.
```bash
gnparser -j 200 names.txt > names_parsed.txt
```
To parse a file returning results in the same order as they are given (slower):
```bash
gnparser -j 1 names.txt > names_parsed.txt
```
Potentially the input file might contain millions of names, therefore creating
one properly formatted JSON output might be prohibitively expensive. Therefore
the parser creates one JSON line per name (when ``compact`` format is used)
......@@ -293,8 +300,8 @@ func main() {
## Contributors
* [Dmitry Mozzherin]
* [Geoff Ower]
- [Dmitry Mozzherin]
- [Geoff Ower]
## License
......
......@@ -2,8 +2,8 @@ syntax = "proto3";
package grpc;
message Version {
string value = 1;
string build_time = 2;
string value = 1;
string build_time = 2;
}
message Void {}
......@@ -15,6 +15,11 @@ message Input {
}
}
message Output {
string value = 2;
string error = 3;
}
enum Format {
Compact = 0;
Pretty = 1;
......@@ -22,40 +27,8 @@ enum Format {
Debug = 3;
}
message Name {
bool parsed = 1;
int32 quality = 2;
string verbatim = 3;
string normalized = 4;
string canonical_simple = 5;
string canonical_full = 6;
Entry genus = 7;
Entry subgenus = 8;
Entry species = 9;
Entry subspicies = 10;
Entry variety = 11;
Entry form = 12;
}
message Entry {
string value = 1;
string norm_value = 2;
Pos position = 3;
string authors = 4;
string year = 5;
string score = 6;
}
message Pos {
int32 start = 1;
int32 end = 2;
}
message Output {
string value = 1;
string error = 2;
}
service GNparser {
rpc Ver(Void) returns(Version) {}
rpc Parse(stream Input) returns(stream Output) {}
rpc Ver(Void) returns(Version) {}
rpc Parse(stream Input) returns(stream Output) {}
rpc ParseInOrder(stream Input) returns(stream Output) {}
}
......@@ -65,6 +65,43 @@ func (gnps gnparserServer) Parse(stream GNparser_ParseServer) error {
}
}
func (gnps gnparserServer) ParseInOrder(stream GNparser_ParseInOrderServer) error {
gnp := gnparser.NewGNparser()
firstRecord := true
for {
in, err := stream.Recv()
if err == io.EOF {
return nil
}
if err != nil {
return err
}
switch c := in.Content.(type) {
case *Input_Name:
if firstRecord {
firstRecord = false
}
res, err := gnp.ParseAndFormat(c.Name)
strError := ""
if err != nil {
strError = err.Error()
}
out := &Output{Value: res, Error: strError}
err = stream.Send(out)
if err != nil {
return err
}
case *Input_Format:
if firstRecord {
firstRecord = false
f := c.Format
gnp = gnparser.NewGNparser(gnparser.Format(strFormat(f)))
}
}
}
}
func strFormat(f Format) string {
switch f {
case Format_Compact:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment