README.md 20.6 KB
Newer Older
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
1
# Global Names Parser: gnparser written in Go
2

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
3 4 5 6 7 8 9 10 11 12
## IMPORTANT

This repo is **archived** now, because `gnparser` development moved to 
[GitHub](https://github.com/gnames/gnparser) to keep it closer to
the rest of [Global Names Projects](https://github.com/gnames). 
Please get latest versions and leave issues at its new 
[home](https://github.com/gnames/gnparser)

## Summary

13
Try `gnparser` [online][parser-web].
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
14

15 16 17
``gnparser`` splits scientific names into their semantic elements with an
associated meta information. For example, ``"Homo sapiens Linnaeus"`` is
parsed into:
18

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
19 20 21 22 23
| Element  | Meaning         | Position |
| -------- | --------------- | -------- |
| Homo     | genus           | (0,4)    |
| sapiens  | specificEpithet | (5,12)   |
| Linnaeus | author          | (13,21)  |
24 25 26

This parser, written in Go, is the 3rd iteration of the project. The first,
[biodiversity] had been written in Ruby, the second, [also
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
27
gnparser][gnparser-scala], had been written in Scala. This project is now
28 29 30
a substitution for the other two. It will be the only one that is maintained
further. All three projects were developed as a part of
[Global Names Architecture Project][gna].
31

32 33 34
To use `gnparser` as a command line tool under Windows, Mac or Linux,
download the [latest release][releases], uncompress it, and copy `gnparser`
binary somewhere in your PATH.
35 36

```bash
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
37 38
wget https://gitlab.com/gogna/gnparser/uploads/55d247b8fbade60116c7e3b650dd978c/gnparser-v0.9.0-linux.tar.gz
tar xvf gnparser-v0.9.0-linux.tar.gz
39
sudo cp gnparser /usr/local/bin
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
40 41
# for CSV output
gnparser "Homo sapiens Linnaeus"
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
42
# for JSON output
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
43 44
gnparser -f compact "Homo sapiens Linnaeus"
# or
45 46 47 48
gnparser -f pretty "Homo sapiens Linnaeus"
gnparser -h
```

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
49 50 51 52 53 54 55
<!-- vim-markdown-toc GitLab -->

* [Introduction](#introduction)
* [Speed](#speed)
* [Features](#features)
* [Use Cases](#use-cases)
  * [Getting the simplest possible canonical form](#getting-the-simplest-possible-canonical-form)
56
  * [Quickly partition names by the type](#quickly-partition-names-by-the-type)
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
57
  * [Normalizing name-strings](#normalizing-name-strings)
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
58
  * [Removing authorships from the middle of the name](#removing-authorships-from-the-middle-of-the-name)
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
  * [Figuring out if names are well-formed](#figuring-out-if-names-are-well-formed)
  * [Creating stable GUIDs for name-strings](#creating-stable-guids-for-name-strings)
  * [Assembling canonical forms etc. from original spelling](#assembling-canonical-forms-etc-from-original-spelling)
* [Installation](#installation)
  * [Linux or OS X](#linux-or-os-x)
  * [Windows](#windows)
  * [Install with Go](#install-with-go)
* [Usage](#usage)
  * [Command Line](#command-line)
  * [Pipes](#pipes)
  * [gRPC server](#grpc-server)
  * [Usage as a REST API Interface](#usage-as-a-rest-api-interface)
  * [Use as a Docker image](#use-as-a-docker-image)
  * [Use as a library in Go](#use-as-a-library-in-go)
  * [Use as a shared C library](#use-as-a-shared-c-library)
* [Parsing ambiguities](#parsing-ambiguities)
  * [Names with `filius` (ICN code)](#names-with-filius-icn-code)
  * [Names with subgenus (ICZN code) and genus author (ICN code)](#names-with-subgenus-iczn-code-and-genus-author-icn-code)
* [Authors](#authors)
* [Contributors](#contributors)
* [References](#references)
* [License](#license)

<!-- vim-markdown-toc -->

84 85 86
## Introduction

Global Names Parser or ``gnparser`` is a program written in Go for breaking up
87
scientific names into their elements.  It uses [peg] -- a Parsing
88
Expression Grammar (PEG) tool.
89 90 91 92

Many other parsing algorithms for scientific names use regular expressions.
This approach works well for extracting canonical forms in simple cases.
However, for complex scientific names and to parse scientific names into
93
all semantic elements, regular expressions often fail, unable to overcome
94 95 96 97
the recursive nature of data embedded in names. By contrast, ``gnparser``
is able to deal with the most complex scientific name-strings.

``gnparser`` takes a name-string like ``Drosophila (Sophophora) melanogaster
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
98 99 100 101
Meigen, 1830`` and returns parsed components in `CSV` or `JSON` format. The
parsing of scientific names might become surprisingly complex and the
`gnparser's` [test file] is a good source of information about the parser's
capabilities, its input and output.
102 103 104

## Speed

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
105
Number of names parsed per hour on a i7-8750H CPU
106 107
(6 cores, 12 threads, at 2.20 GHz), parser v0.5.1

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
108 109 110 111 112 113 114 115
| Threads | names/hr    |
| ------- | ----------- |
| 1       | 48,000,000  |
| 2       | 63,000,000  |
| 4       | 128,000,000 |
| 8       | 202,000,000 |
| 16      | 248,000,000 |
| 100     | 293,000,000 |
116 117 118 119 120 121 122 123

For simplest output Go ``gnparser`` is roughly 2 times faster than Scala
``gnparser`` and about 100 times faster than Ruby ``biodiversity`` parser. For
JSON formats the parser is approximately 8 times faster than Scala one, due to
more efficient JSON conversion.

## Features

124 125
* Fastest parser ever.
* Very easy to install, just placing executable somewhere in the PATH is
126
  sufficient.
127 128 129 130 131 132 133 134 135 136
* Extracts all elements from a name, not only canonical forms.
* Works with very complex scientific names, including hybrid formulas.
* Includes gRPC server that can be used as if a native method call from C++,
* C#, Java, Python, Ruby, PHP, JavaScript, Objective C, Dart.
* Use as a native library from Go projects.
* Can run as a command line application.
* Can be scaled to many CPUs and computers (if 300 millions names an
  hour is not enough).
* Calculates a stable UUID version 5 ID from the content of a string.
* Provides C-binding to incorporate parser into other languages.
137 138 139 140 141 142

## Use Cases

### Getting the simplest possible canonical form

Canonical forms of a scientific name are the latinized components without
143 144
annotations, authors or dates. They are great for matching names that differ
in less stable parts. Use the ``canonicalName -> simple`` or ``canonicalName
145 146 147 148
-> full`` fields from parsing results for this use case. ``Full`` version of
canonical form includes infra-specific ranks and hybrid character for named
hybrids.

149 150 151
The ``canonicalName -> full`` is good for presentation, as it keeps more
details.

152 153 154 155
The ``canonicalName -> simple`` field is good for matching names from different
sources, because sometimes dataset curators omit hybrid sign in named hybrids,
or remove ranks for infraspecific epithets.

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
156
The ``canonicalName -> stem`` field normalizes `simple` canonical form even
157 158 159 160
further. The normalization is done according to stemming rules for Latin
language described in [Schinke R et al (1996)]. For example letters `j` are
converted to `i`, letters `v` are converted to `u`, and suffixes are removed
from the specific and infraspecific epithets.
161

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
162
If you only care about canonical form of a name you can use ``--format csv``
163
flag with command line tool.
164

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
165
CSV output has the following fields:
166

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
167 168 169 170 171 172 173 174 175 176 177
| Field             | Meaning                                         |
| ------------------| ----------------------------------------------- |
| Id                | UUID v5 generated out of Verbatim               |
| Verbatim          | Input name-string without any changes           |
| Cardinality       | 0 - N/A, 1 - Uninomial, 2 - Binomial etc.       |
| CanonicalFull     | Canonical form with hybrid sign and ranks       |
| CanonicalSimple   | Simplest canonical form                         |
| CanonicalStem     | Simplest canonical form with removed suffixes   |
| Authors           | Author string of a name                         |
| Year              | Year of the name (if given)                     |
| Quality           | Parsing quality                                 |
178

179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201
### Quickly partition names by the type

Usually scientific names can be broken into groups accoring by number of
elements:

* Uninomial
* Binomial
* Trinomial
* Quadrinomial

The output of `gnparser` contains a `Cardinality` field that tells, when
possible, how many elements are detected in the name.

| Cardinality  | Name Type    |
| ------------ | ------------ |
| 0            | Undetermined |
| 1            | Uninomial    |
| 2            | Binomial     |
| 3            | Trinomial    |
| 4            | Quadrinomial |

For hybrid formulas, "approximate" names (with "sp.", "spp." etc.), unparsed
names, as well as names from `BOLD` project cardinality is 0 (Undetermined)
202

203 204 205 206 207 208
### Normalizing name-strings

There are many inconsistencies in how scientific names may be written.
Use ``normalized`` field to bring them all to a common form (spelling, spacing,
ranks).

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
209
### Removing authorships from the middle of the name
210 211 212 213 214

Many data administrators store name-strings in two columns and split them into
"name part" and "authorship part". This practice misses some information when
dealing with names like "*Prosthechea cochleata* (L.) W.E.Higgins *var.
grandiflora* (Mutel) Christenson". However, if this is the use case, a
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
215
combination of ``canonicalName -> full`` with the authorship from the
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
216
lowest taxon will do the job. You can also use ``--format csv`` flag for
217 218 219 220 221 222 223 224
``gnparse`` command line tool.

### Figuring out if names are well-formed

If there are problems with parsing a name, parser generates ``qualityWarnings``
messages and lowers parsing ``quality`` of the name.  Quality values mean the
following:

225 226
* ``"quality": 1`` - No problems were detected
* ``"quality": 2`` - There were small problems, normalized result
227
  should still be good
228
* ``"quality": 3`` - There were serious problems with the name, and the
229
  final result is rather doubtful
230
* ``"quality": 0`` - A string could not be recognized as a scientific
231
  name and parsing fails
232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263

### Creating stable GUIDs for name-strings

``gnparser`` uses UUID version 5 to generate its ``id`` field.
There is algorithmic 1:1 relationship between the name-string and the UUID.
Moreover the same algorithm can be used in any popular language to
generate the same UUID. Such IDs can be used to globally connect information
about name-strings or information associated with name-strings.

More information about UUID version 5 can be found in the [Global Names
blog][uuid5]

### Assembling canonical forms etc. from original spelling

``gnparser`` tries to correct problems with spelling, but sometimes it is
important to keep original spelling of the canonical forms or authorships.
The ``positions`` field attaches semantic meaning to every word in the
original name-string and allows users to create canonical forms or other
combinations using the original verbatim spelling of the words. Each element
in ``positions`` contains 3 parts:

1. semantic meaning of a word
2. start position of the word
3. end position of the word

For example ``["specificEpithet", 6, 11]`` means that a specific epithet starts
at 6th character and ends *before* 11th character of the string.

## Installation

Compiled programs in Go are self-sufficient and small (``gnparser`` is only a
few megabytes). As a result the binary file of ``gnparser`` is all you need to
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
264 265
make it work. You can install it by downloading the [latest version of the
binary][releases] for your operating system, and placing it in your ``PATH``.
266 267 268

### Linux or OS X

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
269
Move ``gnparser`` executable somewhere in your PATH
270 271 272
(for example ``/usr/local/bin``)

```bash
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
273
sudo mv path_to/gnparser /usr/local/bin
274 275 276 277 278
```

### Windows

One possible way would be to create a default folder for executables and place
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
279
``gnparser`` there.
280 281 282 283 284 285 286 287 288 289 290

Use ``Windows+R`` keys
combination and type "``cmd``". In the appeared terminal window type:

```cmd
mkdir C:\bin
copy path_to\gnparser.exe C:\bin
```

[Add ``C:\bin`` directory to your ``PATH``][winpath] environment variable.

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
291
### Install with Go
292

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
293
If you have Go installed on your computer use
294 295 296

```bash
go get -u gitlab.com/gogna/gnparser
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
297 298
cd $GOPATH/srs/gitlab.com/gogna/gnparser
make install
299 300 301 302 303 304 305 306
```

You do need your ``PATH`` to include ``$HOME/go/bin``

## Usage

### Command Line

307 308 309 310
```bash
gnparser -f pretty "Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo"
```

311 312 313 314 315 316
Relevant flags:

``--help -h``
: help information about flags

``--format -f``
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
317
: output format. Can be ``compact``, ``pretty``, ``csv``, or ``debug``.
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
318
Default is ``csv``.
319

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
320
CSV format returns a header row and the CSV-compatible parsed result.
321

322 323 324
``--jobs -j``
: number of jobs running concurrently.

325 326 327 328
``--nocleanup -n``
: keeps HTML entities and tags if they are present in a name-string. If your
data is clean from HTML tags or entities, you can use this flag to increase
performance.
329

330 331 332
To parse one name:

```bash
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
333
# CSV ouput (default)
334
gnparser "Parus major Linnaeus, 1788"
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
335 336 337 338 339
# or
gnparser -f csv "Parus major Linnaeus, 1788"

# JSON compact format
gnparser "Parus major Linnaeus, 1788" -f compact
340 341 342 343

# pretty format
gnparser -f pretty "Parus major Linnaeus, 1788"

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
344
# to parse a name from the standard input
345 346 347 348 349
echo "Parus major Linnaeus, 1788" | gnparser
```

To parse a file:

350
There is no flag for parsing a file. If parser finds the given file path on
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
351 352 353
your computer, it will parse the content of the file, assuming that every line
is a new scientific name. If the file path is not found, ``gnparser`` will try
to parse the "path" as a scientific name.
354 355 356 357 358 359

Parsed results will stream to STDOUT, while progress of the parsing
will be directed to STDERR.

```bash
gnparser -j 200 names.txt > names_parsed.txt
360 361

# to parse files using pipes
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
362
cat names.txt | gnparser -f csv -j 200 > names_parsed.txt
363

364 365 366 367 368
# to keep html tags and entities during parsing. You gain a bit of performance
# with this option if your data does not contain HTML tags or entities.
gnparser "<i>Pomatomus</i>&nbsp;<i>saltator</i>"
gnparser -n "<i>Pomatomus</i>&nbsp;<i>saltator</i>"
gnparser -n "Pomatomus saltator"
369
```
370 371 372 373 374 375 376

To parse a file returning results in the same order as they are given (slower):

```bash
gnparser -j 1 names.txt > names_parsed.txt
```

377 378 379 380 381 382 383
Potentially the input file might contain millions of names, therefore creating
one properly formatted JSON output might be prohibitively expensive. Therefore
the parser creates one JSON line per name (when ``compact`` format is used)

You can use up to 20 times more "threads" than the number of your CPU cores to
reach maximum speed of parsing (``--jobs 200`` flag). It is practical because
additional threads are very cheap in Go and they try to fill out every idle
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
384
gap in the CPU usage.
385

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
386 387 388 389 390 391 392 393 394 395 396
### Pipes

About any language has an ability to use pipes of the underlying operating
system. From the inside of your program you can make the CLI executable `gnparser`
to listen on a STDIN pipe and produce output into STDOUT pipe. Here is an
example in Ruby:

```ruby
def self.start_gnparser
  io = {}

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
397
  ['compact', 'csv'].each do |format|
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
398 399 400 401 402 403 404 405
    stdin, stdout, stderr = Open3.popen3("./gnparser -j 200 --format #{format}")
    io[format.to_sym] = { stdin: stdin, stdout: stdout, stderr: stderr }
  end
end
```

Such arrangement would give you a nearly native performance for large datasets.

406 407 408 409 410 411 412 413
### gRPC server

Relevant flags:

``--help -h``
: help information about flags

``--grpc -g``
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
414
: sets a port to run gRPC server, and starts gnparser in gRPC mode.
415 416 417

``--jobs -j``
: number or workers allocated per gRPC request. Default corresponds to the
418 419
  number of CPU threads. If you have a full control over gRPC server of
  `gnparser`, set this option to 100-300 jobs.
420 421

```bash
422
gnparser -g 8989 -j 200
423 424
```

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
425
For an example how to use gRPC server check ``gnparser`` [Ruby gem][gnparser
426 427
ruby] as well as [gRPC documentation].

428 429 430
It also helps to read [gnparser.proto] file to understand how to deal with
inputs and outputs of gRPC server.

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
431
### Usage as a REST API Interface
432

433
Use web-server REST API as a slower, but a more wide-spread alternative to
434 435 436 437
gRPC server. Web-based user interface and API are invoked by ``--web-port`` or
``-w`` flag. To start web server on ``http://0.0.0.0:9000``

```bash
438
gnparser -w 9000
439 440 441 442 443 444 445 446
```

Opening a browser with this address will now show an interactive interface
to parser. API calls would be accessibe on ``http://0.0.0.0:9000/api``.

Make sure to CGI-escape name-strings for GET requests. An '&' character
needs to be converted to '%26'

447 448
* ``GET /api?q=Aus+bus|Aus+bus+D.+%26+M.,+1870``
* ``POST /api`` with request body of JSON array of strings
449 450 451 452 453 454 455 456 457 458 459 460 461 462 463

```ruby
require 'json'
require 'net/http'

uri = URI('https://parser.globalnames.org/api')
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri, 'Content-Type' => 'application/json',
                                   'accept' => 'json')
request.body = ['Solanum mariae Särkinen & S.Knapp',
                'Ahmadiago Vánky 2004'].to_json
response = http.request(request)
```

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
464
### Use as a Docker image
465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483

You need to have [docker runtime installed](https://docs.docker.com/install/)
on your computer for these examples to work.

```bash
# run as a gRPC server on port 7777
docker run -p 0.0.0.0:7777:7777 gnames/gognparser -g 7777
# run grpc on 'default' 8778 port
docker run -p 0.0.0.0:8778:8778 gnames/gognparser
# to run as a daemon with 50 workers
docker run -d gnames/gognparser -g 7777 -j 50

# run as a website and a RESTful service
docker run -p 0.0.0.0:80:8080 gnames/gognparser -w 8080

# just parse something
docker run gnames/gognparser "Amaurorhinus bewichianus (Wollaston,1860) (s.str.)"
```

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
484
### Use as a library in Go
485 486 487 488 489

```go
package main

import (
490
  "fmt"
491

492
  "gitlab.com/gogna/gnparser"
493 494 495
)

func main() {
496 497 498 499 500 501 502 503 504 505
  opts := []gnparser.Option{
    gnparser.Format("csv"),
    gnparser.WorkersNum(100),
  }
  gnp := gnparser.NewGNparser(opts...)
  res, err := gnp.ParseAndFormat("Bubo bubo")
  if err != nil {
    fmt.Println(err)
  }
  fmt.Println(res)
506 507
}
```
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
508

509 510
To avoid JSON format we provide `gnp.ParseToObject` function.
Use [gnparser.proto] file as a reference of the available object fields.
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
511

512
```go
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
513 514
gnp := NewGNparser()
o := gnp.ParseToObject("Homo sapiens")
515 516 517 518

fmt.Println(o.Canonical.Simple)
switch d := o.Details.(type) {
case *pb.Parsed_Species:
519
  fmt.Println(d.Species.Genus)
520
case *pb.Parsed_Uninomial:
521
  fmt.Println(d.Uninomial.Value)
522
...
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
523 524 525
}
```

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
526 527 528 529 530 531 532 533 534 535 536 537 538 539 540
### Use as a shared C library

It is possible to bind `gnparser` functionality with languages that can use
C Application Binary Interface. For example such languages include
Python, Ruby, Rust, C, C++, Java (via JNI).

To compile `gnparser` shared library for your platform/operating system of
choice you need `GNU make` and `GNU gcc compiler` installed:

```bash
make clib
cd binding
cp libgnparser* /path/to/some/project
```

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
541 542 543
As an example how to use the shared library check this [StackOverflow
question][ruby_ffi_go_usage] and [biodiversity] Ruby gem. You can
find shared functions at their [export file].
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
544

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
545 546 547 548 549 550 551
## Parsing ambiguities

Some name-strings cannot be parsed unambiguously without some additional data.

### Names with `filius` (ICN code)

For names like `Aus bus Linn. f. cus` the `f.` is ambiguous. It might mean
552
that species were described by a son of (`filius`) Linn., or it might mean
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
553 554 555 556 557 558 559 560 561 562 563
that `cus` is `forma` of `bus`. We provide a warning
"Ambiguous f. (filius or forma)" for such cases.

### Names with subgenus (ICZN code) and genus author (ICN code)

For names like `Aus (Bus) L.` or `Aus (Bus) cus L.` the `(Bus)` token would
mean the name of subgenus for ICZN names, but for ICN names it would be an
author of genus `Aus`. We created a list of ICN generic authors using data from
[IRMNG] to distinguish such names from each other. For detected ICN names we
provide a warning "Possible ICN author instead of subgenus".

Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
564 565
## Authors

566
* [Dmitry Mozzherin]
567 568 569

## Contributors

570 571
* [Geoff Ower]
* [Hernan Lucas Pereira]
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
572

573
If you want to submit a bug or add a feature read
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
574
[CONTRIBUTING] file.
575

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
576 577 578
## References

Rees, T. (compiler) (2019). The Interim Register of Marine and Nonmarine
579
Genera. Available from `http://www.irmng.org` at VLIZ.
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
580
Accessed 2019-04-10
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
581

582
## License
583

584 585
Released under [MIT license]

Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
586
[releases]: https://gitlab.com/gogna/gnparser/-/releases
587 588 589 590
[biodiversity]: https://github.com/GlobalNamesArchitecture/biodiversity
[gnparser-scala]: https://github.com/GlobalNamesArchitecture/gnparser
[peg]: https://github.com/pointlander/peg
[gna]: http://globalnames.org
591
[test file]: https://gitlab.com/gogna/gnparser/raw/master/testdata/test_data.txt
592 593 594 595 596 597
[uuid5]: http://globalnames.org/news/2015/05/31/gn-uuid-0-5-0
[winpath]: https://www.computerhope.com/issues/ch000549.htm
[gnparser ruby]: https://gitlab.com/gnames/gnparser_rb
[gRPC documentation]: https://grpc.io/docs/quickstart
[Dmitry Mozzherin]: https://gitlab.com/dimus
[Geoff Ower]: https://gitlab.com/gdower
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
598
[Hernan Lucas Pereira]: https://gitlab.com/LocoDelAssembly
599
[MIT license]: https://gitlab.com/gogna/gnparser/raw/master/LICENSE
600
[parser-web]: https://parser.globalnames.org
Dmitry Mozzherin's avatar
readme  
Dmitry Mozzherin committed
601
[IRMNG]: http://www.irmng.org
602
[CONTRIBUTING]: https://gitlab.com/gogna/gnparser/blob/master/CONTRIBUTING.md
603
[gnparser.proto]: https://gitlab.com/gogna/gnparser/blob/master/pb/gnparser.proto
Dmitry Mozzherin's avatar
Dmitry Mozzherin committed
604 605
[Schinke R et al (1996)]: https://caio.ueberalles.net/a_stemming_algorithm_for_latin_text_databases-schinke_et_al.pdf
[ruby_ffi_go_usage]: https://stackoverflow.com/questions/58866962/how-to-pass-an-array-of-strings-and-get-an-array-of-strings-in-ruby-using-go-sha
606
[export file]: https://gitlab.com/gogna/gnparser/blob/master/binding/main.go