Commit eb4d9fa6 authored by Dmitry Mozzherin's avatar Dmitry Mozzherin
Browse files

Fix #66 remove HTML tags when parsing

parent 7213a91b
Pipeline #81385515 passed with stages
in 4 minutes and 8 seconds
......@@ -2,7 +2,10 @@
## Unreleased
- Add [#61]: handle authors that end with a word "bis"
## [v0.9.1]
- Add [#66]: parsing removes HTML tags as well.
- Add [#61]: handle authors that end with a word "bis".
- Add [#60]: handle correctly deprecated ranks with greek letters.
- Add [#62]: parser breaks on ``Drepanolejeunea (Spruce) (Steph.)``.
......@@ -92,7 +95,8 @@ array of names instead of a stream.
This document follows [changelog guidelines]
[v0.9.0]: https://gitlab.com/gogna/gnparser/compare/v0.0.8...v0.9.0
[v0.9.1]: https://gitlab.com/gogna/gnparser/compare/v0.9.0...v0.9.1
[v0.9.0]: https://gitlab.com/gogna/gnparser/compare/v0.8.0...v0.9.0
[v0.8.0]: https://gitlab.com/gogna/gnparser/compare/v0.7.5...v0.8.0
[v0.7.5]: https://gitlab.com/gogna/gnparser/compare/v0.7.4...v0.7.5
[v0.7.4]: https://gitlab.com/gogna/gnparser/compare/v0.7.3...v0.7.4
......@@ -103,6 +107,7 @@ This document follows [changelog guidelines]
[v0.6.0]: https://gitlab.com/gogna/gnparser/compare/v0.5.1...v0.6.0
[v0.5.1]: https://gitlab.com/gogna/gnparser/tree/v0.5.1
[#66]: https://gitlab.com/gogna/gnparser/issues/66
[#65]: https://gitlab.com/gogna/gnparser/issues/65
[#64]: https://gitlab.com/gogna/gnparser/issues/64
[#63]: https://gitlab.com/gogna/gnparser/issues/63
......
# Global Names Parser: gnparser written in Go
Try in [online][parser-web].
Try `gnparser` [online][parser-web].
``gnparser`` splits scientific names into their component elements with
associated meta information. For example, ``"Homo sapiens Linnaeus"`` is
parsed into human readable information as follows:
``gnparser`` splits scientific names into their semantic elements with an
associated meta information. For example, ``"Homo sapiens Linnaeus"`` is
parsed into:
| Element | Meaning | Position
| -------- | ---------------- | --------
......@@ -14,10 +14,10 @@ parsed into human readable information as follows:
This parser, written in Go, is the 3rd iteration of the project. The first,
[biodiversity] had been written in Ruby, the second, [also
gnparser][gnparser-scala], had been written in Scala. This project learned
from the previous ones, and is now a substitution for the other two. It will be
the only one that is maintained further. All three projects were developed as
a part of [Global Names Architecture Project][gna].
gnparser][gnparser-scala], had been written in Scala. This project and is now
a substitution for the other two. It will be the only one that is maintained
further. All three projects were developed as a part of
[Global Names Architecture Project][gna].
To use `gnparser` as a command line tool under Windows, Mac or Linux,
download the [latest release][releases], uncompress it, and copy `gnparser`
......@@ -72,14 +72,15 @@ Expression Grammar (PEG) tool.
Many other parsing algorithms for scientific names use regular expressions.
This approach works well for extracting canonical forms in simple cases.
However, for complex scientific names and to parse scientific names into
all semantic elements regular expressions often fail, unable to overcome
all semantic elements, regular expressions often fail, unable to overcome
the recursive nature of data embedded in names. By contrast, ``gnparser``
is able to deal with the most complex scientific name-strings.
``gnparser`` takes a name-string like ``Drosophila (Sophophora) melanogaster
Meigen, 1830`` and returns parsed components in `JSON` format. This behavior is
defined in its tests and the [test file] is a good source of information about
parser's capabilities, its input and output.
Meigen, 1830`` and returns parsed components in `JSON` format. The parsing of
scientific names might become surprisingly complex and the `gnparser's`
[test file] is a good source of information about the parser's capabilities,
its input and output.
## Speed
......@@ -106,7 +107,7 @@ more efficient JSON conversion.
- Very easy to install, just placing executable somewhere in the PATH is
sufficient.
- Extracts all elements from a name, not only canonical forms.
- Works with very complex scientific names, including hybrids.
- Works with very complex scientific names, including hybrid formulas.
- Includes gRPC server that can be used as if a native method call from C++,
C#, Java, Python, Ruby, PHP, JavaScript, Objective C, Dart.
- Use as a native library from Go projects.
......@@ -120,8 +121,8 @@ more efficient JSON conversion.
### Getting the simplest possible canonical form
Canonical forms of a scientific name are the latinized components without
annotations, authors or dates. They are great for matching names despite
alternative spellings. Use the ``canonicalName -> simple`` or ``canonicalName
annotations, authors or dates. They are great for matching names that differ
in less stable parts. Use the ``canonicalName -> simple`` or ``canonicalName
-> full`` fields from parsing results for this use case. ``Full`` version of
canonical form includes infra-specific ranks and hybrid character for named
hybrids.
......@@ -134,7 +135,7 @@ The ``canonicalName -> full`` is good for presentation, as it keeps more
details.
If you only care about canonical form of a name you can use ``--format simple``
flag with command line tool or gRPC service.
flag with command line tool.
### Normalizing name-strings
......@@ -240,6 +241,10 @@ You do need your ``PATH`` to include ``$HOME/go/bin``
### Command Line
```bash
gnparser -f pretty "Quadrella steyermarkii (Standl.) Iltis & Cornejo"
```
Relevant flags:
``--help -h``
......@@ -252,8 +257,10 @@ Default is ``compact``.
``--jobs -j``
: number of jobs running concurrently.
``--cleanup -c``
: cleans up input from HTML entities and tags instead of parsing
``--nocleanup -n``
: keeps HTML entities and tags if they are present in a name-string. If your
data is clean from HTML tags or entities, you can use this flag to increase
performance.
To parse one name:
......@@ -273,10 +280,10 @@ echo "Parus major Linnaeus, 1788" | gnparser
To parse a file:
There is no flag for parsing a file. If parser finds file path on your computer
it will parse the content of the file, assuming every line is a new scientific
name. If the file path is not found, ``gnparser`` will try to parse the "path"
as a scientific name.
There is no flag for parsing a file. If parser finds the given file path on
your computer, it will parse the content of the file, assuming every line is a
new scientific name. If the file path is not found, ``gnparser`` will try to
parse the "path" as a scientific name.
Parsed results will stream to STDOUT, while progress of the parsing
will be directed to STDERR.
......@@ -287,9 +294,11 @@ gnparser -j 200 names.txt > names_parsed.txt
# to parse files using pipes
cat names.txt | gnparser -f simple -j 200 > names_parsed.txt
# to clean names from html tags and entities first (no parsing
# or other changes), then parse
cat names.txt | gnparser -c | sed "s/.*|//" | gnparser > names_parsed.txt
# to keep html tags and entities during parsing. You gain a bit of performance
# with this option if your data does not contain HTML tags or entities.
gnparser "<i>Pomatomus</i>&nbsp;<i>saltator</i>"
gnparser -n "<i>Pomatomus</i>&nbsp;<i>saltator</i>"
gnparser -n "Pomatomus saltator"
```
To parse a file returning results in the same order as they are given (slower):
......@@ -307,28 +316,6 @@ reach maximum speed of parsing (``--jobs 200`` flag). It is practical because
additional threads are very cheap in Go and they try to fill out every idle
gap in the CPU usage.
To cleanup a name (no parsing here, it just removes HTML tags and entities,
and makes no other modifications):
The output contains the original name-string, and "HTML-normalized" one
separated by a pipe ("|") character.
```bash
gnparser -c "<i>Abacopteris glandulosa</i> (Bl.) F&eacute;e &amp; Chin"
```
To cleanup a file of names
```bash
gnparser -j 200 -c names.txt > no_html_names.txt
# using pipes
cat names.txt | gnparser -c -j 200 > no_html_names.txt
```
If you have data that has names with tags or HTML entities, the ``--cleanup
-c`` flag will help to normalize such names for parsing or other purposes.
### gRPC server
Relevant flags:
......@@ -341,15 +328,19 @@ Relevant flags:
``--jobs -j``
: number or workers allocated per gRPC request. Default corresponds to the
number of CPU threads.
number of CPU threads. If you have a full control over gRPC server of
`gnparser`, set this option to 100-300 jobs.
```bash
gnparser -g 8989 -j 20
gnparser -g 8989 -j 200
```
For an example how to use gRPC server check ``gnparser`` [Ruby gem][gnparser
ruby] as well as [gRPC documentation].
It also helps to read [gnparser.proto] file to understand how to deal with
inputs and outputs of gRPC server.
### Usage as a REST API Interface
Use web-server REST API as a slower, but a more wide-spread alternative to
......@@ -357,7 +348,7 @@ gRPC server. Web-based user interface and API are invoked by ``--web-port`` or
``-w`` flag. To start web server on ``http://0.0.0.0:9000``
```bash
gnparser -w 9000
gnparser -w 9000
```
Opening a browser with this address will now show an interactive interface
......@@ -452,7 +443,7 @@ Some name-strings cannot be parsed unambiguously without some additional data.
### Names with `filius` (ICN code)
For names like `Aus bus Linn. f. cus` the `f.` is ambiguous. It might mean
that species were described by son of (`filius`) of Linn., or it might mean
that species were described by a son of (`filius`) Linn., or it might mean
that `cus` is `forma` of `bus`. We provide a warning
"Ambiguous f. (filius or forma)" for such cases.
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -19,8 +19,12 @@ type GNparser struct {
workersNum int
// format defines the output format of the parser.
format
// removeHTML indicates that HTML tags have to be removed.
removeHTML bool
// nameString keeps parsed string
nameString string
// verbatim is originally entered name-string.
verbatim string
// isTest indicates that parsing is done for test purposes, so instead of
// real version of the paraser output will contain "test_version" phrase.
isTest bool
......@@ -54,10 +58,18 @@ func IsTest() Option {
}
}
// RemoveHTML Option is true of false. When true, the preprocess removes
// HTML tags from name-strings.
func RemoveHTML(r bool) Option {
return func(gnp *GNparser) {
gnp.removeHTML = r
}
}
// NewGNparser constructor function takes options and returns
// configured GNparser.
func NewGNparser(opts ...Option) GNparser {
gnp := GNparser{workersNum: runtime.NumCPU(), format: Compact}
gnp := GNparser{workersNum: runtime.NumCPU(), format: Compact, removeHTML: true}
for _, opt := range opts {
opt(&gnp)
}
......@@ -77,12 +89,23 @@ func (gnp *GNparser) WorkersNum() int {
// `gnp.parser.SN` field.
func (gnp *GNparser) Parse(s string) {
gnp.nameString = s
preproc := preprocess.Preprocess([]byte(s))
tagsOrEntities := false
if gnp.removeHTML {
orig := gnp.nameString
gnp.nameString = preprocess.StripTags(gnp.nameString)
if orig != gnp.nameString {
tagsOrEntities = true
}
}
preproc := preprocess.Preprocess([]byte(gnp.nameString))
if preproc.NoParse {
gnp.parser.NewNotParsedScientificNameNode(preproc)
}
gnp.parser.Buffer = string(preproc.Body)
gnp.parser.FullReset()
if tagsOrEntities {
gnp.parser.AddWarn(grammar.HTMLTagsEntitiesWarn)
}
if len(preproc.Tail) > 0 {
gnp.parser.AddWarn(grammar.TailWarn)
}
......
......@@ -31,7 +31,6 @@ import (
"github.com/spf13/cobra"
"gitlab.com/gogna/gnparser"
"gitlab.com/gogna/gnparser/preprocess"
"gitlab.com/gogna/gnparser/rpc"
"gitlab.com/gogna/gnparser/web"
)
......@@ -52,8 +51,8 @@ gnparser "Homo sapiens Linnaeus 1753" [flags]
To parse many names from a file (one name per line):
gnparser names.txt [flags] > parsed_names.txt
To clean names from html tags and entities
gnparser names.txt -c > cleanded_names.txt
To leave HTML tags and entities intact when parsing (faster)
gnparser names.txt -n > parsed_names.txt
To start gRPC parsing service on port 3355 with a limit
of 10 concurrent jobs per request:
......@@ -69,7 +68,7 @@ gnparser -j 5 -g 8080
versionFlag(cmd)
wn := workersNumFlag(cmd)
cleanup := cleanupFlag(cmd)
nocleanup := skipCleanupFlag(cmd)
grpcPort := grpcFlag(cmd)
if grpcPort != 0 {
......@@ -92,16 +91,13 @@ gnparser -j 5 -g 8080
opts := []gnparser.Option{
gnparser.WorkersNum(wn),
gnparser.Format(f),
gnparser.RemoveHTML(!nocleanup),
}
if len(args) == 0 {
processStdin(cmd, cleanup, wn, opts)
processStdin(cmd, opts)
os.Exit(0)
}
data := getInput(cmd, args)
if cleanup {
cleanupData(data, wn)
os.Exit(0)
}
parse(data, opts)
},
}
......@@ -129,7 +125,7 @@ func init() {
rootCmd.Flags().IntP("jobs", "j", dj,
"nubmer of threads to run. CPU's threads number is the default.")
rootCmd.Flags().BoolP("cleanup", "c", false, "removes HTML entities and tags instead of parsing.")
rootCmd.Flags().BoolP("nocleanup", "n", false, "keep HTML entities and tags when parsing.")
rootCmd.Flags().IntP("grpc_port", "g", 0, "starts gRPC server on the port.")
......@@ -151,13 +147,13 @@ func versionFlag(cmd *cobra.Command) {
}
}
func cleanupFlag(cmd *cobra.Command) bool {
cleanup, err := cmd.Flags().GetBool("cleanup")
func skipCleanupFlag(cmd *cobra.Command) bool {
nocleanup, err := cmd.Flags().GetBool("nocleanup")
if err != nil {
fmt.Println(err)
os.Exit(1)
}
return cleanup
return nocleanup
}
func grpcFlag(cmd *cobra.Command) int {
......@@ -197,16 +193,11 @@ func workersNumFlag(cmd *cobra.Command) int {
return i
}
func processStdin(cmd *cobra.Command, cleanup bool, wn int,
opts []gnparser.Option) {
func processStdin(cmd *cobra.Command, opts []gnparser.Option) {
if !checkStdin() {
cmd.Help()
return
}
if cleanup {
cleanupFile(os.Stdin, wn)
return
}
parseFile(os.Stdin, opts)
}
......@@ -298,48 +289,3 @@ func parseString(gnp gnparser.GNparser, data string) {
}
fmt.Println(res)
}
func cleanupData(data string, wc int) {
path := string(data)
if fileExists(path) {
f, err := os.OpenFile(path, os.O_RDONLY, os.ModePerm)
if err != nil {
log.Fatal(err)
os.Exit(1)
}
cleanupFile(f, wc)
f.Close()
} else {
res := preprocess.StripTags(data)
fmt.Println(data + "|" + res)
}
}
func cleanupFile(f io.Reader, wn int) {
in := make(chan string)
out := make(chan *preprocess.CleanupResult)
var wg sync.WaitGroup
wg.Add(1)
go preprocess.CleanupStream(in, out, wn)
go processCleanup(out, &wg)
sc := bufio.NewScanner(f)
count := 0
for sc.Scan() {
count++
if count%1000000 == 0 {
log.Printf("Cleaning %d-th line\n", count)
}
name := sc.Text()
in <- name
}
close(in)
wg.Wait()
}
func processCleanup(out <-chan *preprocess.CleanupResult, wg *sync.WaitGroup) {
defer wg.Done()
for r := range out {
fmt.Printf("%s|%s", r.Input, r.Output)
}
}
......@@ -64,9 +64,11 @@ require (
golang.org/x/oauth2 v0.0.0-20190130055435-99b60b757ec1 // indirect
golang.org/x/perf v0.0.0-20190124201629-844a5f5b46f4 // indirect
golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2
golang.org/x/tools v0.0.0-20190816200558-6889da9d5479 // indirect
golang.org/x/tools v0.0.0-20190909194007-75be6cdcda07 // indirect
google.golang.org/genproto v0.0.0-20190201180003-4b09977fb922 // indirect
google.golang.org/grpc v1.18.0
honnef.co/go/tools v0.0.0-20190128043916-71123fcbb8fe // indirect
sourcegraph.com/sqs/pbtypes v1.0.0 // indirect
)
go 1.13
......@@ -376,6 +376,8 @@ golang.org/x/tools v0.0.0-20190330180304-aef51cc3777c h1:hbqcUGBwEHdDbhy8EluQIkb
golang.org/x/tools v0.0.0-20190330180304-aef51cc3777c/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
golang.org/x/tools v0.0.0-20190816200558-6889da9d5479 h1:lfN2PY/jymfnxkNHlbBF5DwPsUvhqUnrdgfK01iH2s0=
golang.org/x/tools v0.0.0-20190816200558-6889da9d5479/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.0.0-20190909194007-75be6cdcda07 h1:lttDGkFxUqcdkT522GTSuVHkN+ZqZ16zIIJguFMBzuk=
golang.org/x/tools v0.0.0-20190909194007-75be6cdcda07/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
google.golang.org/api v0.0.0-20180910000450-7ca32eb868bf/go.mod h1:4mhQ8q/RsB7i+udVvVy5NUi08OU8ZlA0gRVgrF7VFY0=
google.golang.org/api v0.0.0-20181030000543-1d582fd0359e/go.mod h1:4mhQ8q/RsB7i+udVvVy5NUi08OU8ZlA0gRVgrF7VFY0=
......
......@@ -24,6 +24,7 @@ const (
GenusAbbrWarn
GenusUpperCharAfterDash
GreekLetterInRank
HTMLTagsEntitiesWarn
HybridCharNoSpaceWarn
HybridFormulaWarn
HybridFormulaIncompleteWarn
......
package output
const Version = "v0.9.0-6-ga17bfe4"
const Build = "2019-08-26_19:30:47UTC"
const Version = "v0.9.0-11-g9cf540b"
const Build = "2019-09-10_16:05:00UTC"
......@@ -97,6 +97,10 @@ var warningMap = map[grm.Warning]Warning{
Quality: 2,
Message: "Deprecated Greek letter enumeration in rank",
},
grm.HTMLTagsEntitiesWarn: Warning{
Quality: 3,
Message: "HTML tags or entities in the name",
},
grm.HybridCharNoSpaceWarn: Warning{
Quality: 3,
Message: "Hybrid char not separated by space",
......
......@@ -10,7 +10,6 @@ import (
"gitlab.com/gogna/gnparser/dict"
"gitlab.com/gogna/gnparser/output"
"gitlab.com/gogna/gnparser/pb"
"gitlab.com/gogna/gnparser/preprocess"
context "golang.org/x/net/context"
"google.golang.org/grpc"
)
......@@ -110,13 +109,10 @@ func (gnps gnparserServer) parseArray(ia *pb.InputArray) []*pb.Parsed {
func parseWorker(inCh <-chan string, outCh chan<- *parseArrayOutput,
skipClean bool, wg *sync.WaitGroup) {
defer wg.Done()
gnp := gnparser.NewGNparser()
opts := []gnparser.Option{gnparser.RemoveHTML(!skipClean)}
gnp := gnparser.NewGNparser(opts...)
for v := range inCh {
input := v
if !skipClean {
input = preprocess.StripTags(input)
}
res := gnp.ParseToObject(input)
res := gnp.ParseToObject(v)
outCh <- &parseArrayOutput{inputName: v, outputParsed: res}
}
}
......
......@@ -3129,35 +3129,31 @@ c4dd80b7-984b-51f8-a4ec-573b4b32358b|Naviculadicta witkowskii LB & Metzeltin nov
#>
# Section HTML tags and entities<
# Velutina haliotoides (Linnaeus, 1758) <i>sensu</i> Fabricius, 1780
# Velutina haliotoides (Linnaeus, 1758) <i>sensu</i> Fabricius, 1780
# {"quality":2,"parsed":true,"verbatim":"Velutina haliotoides (Linnaeus, 1758) <i>sensu</i> Fabricius, 1780","surrogate":false,"qualityWarnings":[[2,"Name had to be changed by preprocessing"]],"normalized":"Velutina haliotoides (Linnaeus 1758)","canonicalName":{"value":"Velutina haliotoides","valueRanked":"Velutina haliotoides"},"virus":false,"positions":[["genus",0,8],["specificEpithet",9,20],["authorWord",22,30],["year",32,36]],"nameStringId":"189c94f6-96aa-52bb-b019-103a2103ce21","parserVersion":"test_version","hybrid":false,"details":[{"genus":{"value":"Velutina"},"specificEpithet":{"value":"haliotoides","authorship":{"value":"(Linnaeus 1758)","basionymAuthorship":{"authors":["Linnaeus"],"years":[{"value":"1758"}]}}}}],"bacteria":false}
# 189c94f6-96aa-52bb-b019-103a2103ce21|Velutina haliotoides (Linnaeus, 1758) <i>sensu</i> Fabricius, 1780|Velutina haliotoides|Velutina haliotoides|(Linnaeus 1758)|1758|2
#
# Velutina haliotoides (Linnaeus, 1758), <i>sensu</i> Fabricius, 1780
# Velutina haliotoides (Linnaeus, 1758)
# {"quality":2,"parsed":true,"verbatim":"Velutina haliotoides (Linnaeus, 1758), <i>sensu</i> Fabricius, 1780","surrogate":false,"qualityWarnings":[[2,"Name had to be changed by preprocessing"]],"normalized":"Velutina haliotoides (Linnaeus 1758)","canonicalName":{"value":"Velutina haliotoides","valueRanked":"Velutina haliotoides"},"virus":false,"positions":[["genus",0,8],["specificEpithet",9,20],["authorWord",22,30],["year",32,36]],"nameStringId":"b8d77a78-2698-5050-9c7a-638f615bd357","parserVersion":"test_version","hybrid":false,"details":[{"genus":{"value":"Velutina"},"specificEpithet":{"value":"haliotoides","authorship":{"value":"(Linnaeus 1758)","basionymAuthorship":{"authors":["Linnaeus"],"years":[{"value":"1758"}]}}}}],"bacteria":false}
# b8d77a78-2698-5050-9c7a-638f615bd357|Velutina haliotoides (Linnaeus, 1758), <i>sensu</i> Fabricius, 1780|Velutina haliotoides|Velutina haliotoides|(Linnaeus 1758)|1758|2
#
# Fusinus clavilithoides Landau, Harzhauser, Büyükmeriç & Breitenberger, 20
# Fusinus clavilithoides Landau, Harzhauser, Büyükmeriç & Breitenberger
# {"quality":3,"parsed":true,"verbatim":"Fusinus clavilithoides Landau, Harzhauser, Büyükmeriç & Breitenberger, 20","surrogate":false,"qualityWarnings":[[3,"Unparsed tail"]],"normalized":"Fusinus clavilithoides Landau, Harzhauser, Büyükmeriç & Breitenberger","canonicalName":{"value":"Fusinus clavilithoides","valueRanked":"Fusinus clavilithoides"},"virus":false,"positions":[["genus",0,7],["specificEpithet",8,22],["authorWord",23,29],["authorWord",31,41],["authorWord",43,53],["authorWord",56,69]],"nameStringId":"76a62c41-361e-532c-852c-ab3dc7ab09cb","parserVersion":"test_version","hybrid":false,"details":[{"genus":{"value":"Fusinus"},"specificEpithet":{"value":"clavilithoides","authorship":{"value":"Landau, Harzhauser, Büyükmeriç & Breitenberger","basionymAuthorship":{"authors":["Landau","Harzhauser","Büyükmeriç","Breitenberger"]}}}}],"bacteria":false,"unparsedTail":" 20"}
# 76a62c41-361e-532c-852c-ab3dc7ab09cb|Fusinus clavilithoides Landau, Harzhauser, Büyükmeriç & Breitenberger, 20|Fusinus clavilithoides|Fusinus clavilithoides|Landau, Harzhauser, Büyükmeriç & Breitenberger||3
#
# <i>Velutina halioides</i> (Linnaeus, 1758)
# Velutina halioides (Linnaeus, 1758)
# {"quality":2,"parsed":true,"verbatim":"<i>Velutina halioides</i> (Linnaeus, 1758)","surrogate":false,"qualityWarnings":[[2,"Name had to be changed by preprocessing"]],"normalized":"Velutina halioides (Linnaeus 1758)","canonicalName":{"value":"Velutina halioides","valueRanked":"Velutina halioides"},"virus":false,"positions":[["genus",3,11],["specificEpithet",12,25],["authorWord",27,35],["year",37,41]],"nameStringId":"653bbe42-aef4-5847-add4-8c7f8a4d1f9b","parserVersion":"test_version","hybrid":false,"details":[{"genus":{"value":"Velutina"},"specificEpithet":{"value":"halioides","authorship":{"value":"(Linnaeus 1758)","basionymAuthorship":{"authors":["Linnaeus"],"years":[{"value":"1758"}]}}}}],"bacteria":false}
# 653bbe42-aef4-5847-add4-8c7f8a4d1f9b|<i>Velutina halioides</i> (Linnaeus, 1758)|Velutina halioides|Velutina halioides|(Linnaeus 1758)|1758|2
#
# Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo
# Quadrella steyermarkii (Standl.) Iltis & Cornejo
# {"quality":2,"parsed":true,"verbatim":"Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo","surrogate":false,"qualityWarnings":[[2,"Name had to be changed by preprocessing"]],"normalized":"Quadrella steyermarkii (Standl.) Iltis & Cornejo","canonicalName":{"value":"Quadrella steyermarkii","valueRanked":"Quadrella steyermarkii"},"virus":false,"positions":[["genus",0,9],["specificEpithet",10,22],["authorWord",24,31],["authorWord",33,38],["authorWord",45,52]],"nameStringId":"fbd1b4fe-f8ed-5390-9cb1-e0f798691b1e","parserVersion":"test_version","hybrid":false,"details":[{"genus":{"value":"Quadrella"},"specificEpithet":{"value":"steyermarkii","authorship":{"value":"(Standl.) Iltis & Cornejo","basionymAuthorship":{"authors":["Standl."]},"combinationAuthorship":{"authors":["Iltis","Cornejo"]}}}}],"bacteria":false}
# fbd1b4fe-f8ed-5390-9cb1-e0f798691b1e|Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo|Quadrella steyermarkii|Quadrella steyermarkii|(Standl.) Iltis & Cornejo||2
#
# Torymus bangalorensis (Mani &amp; Kurian, 1953)
# Torymus bangalorensis (Mani &amp; Kurian, 1953)
# {"quality":2,"parsed":true,"verbatim":"Torymus bangalorensis (Mani &amp; Kurian, 1953)","surrogate":false,"qualityWarnings":[[2,"Name had to be changed by preprocessing"]],"normalized":"Torymus bangalorensis (Mani & Kurian 1953)","canonicalName":{"value":"Torymus bangalorensis","valueRanked":"Torymus bangalorensis"},"virus":false,"positions":[["genus",0,7],["specificEpithet",8,21],["authorWord",23,27],["authorWord",34,40],["year",42,46]],"nameStringId":"8131ebda-dce6-5aaf-97ae-2370fe8e77d7","parserVersion":"test_version","hybrid":false,"details":[{"genus":{"value":"Torymus"},"specificEpithet":{"value":"bangalorensis","authorship":{"value":"(Mani & Kurian 1953)","basionymAuthorship":{"authors":["Mani","Kurian"],"years":[{"value":"1953"}]}}}}],"bacteria":false}
# 8131ebda-dce6-5aaf-97ae-2370fe8e77d7|Torymus bangalorensis (Mani &amp; Kurian, 1953)|Torymus bangalorensis|Torymus bangalorensis|(Mani & Kurian 1953)|1953|2
Velutina haliotoides (Linnaeus, 1758) <i>sensu</i> Fabricius, 1780
Velutina haliotoides (Linnaeus, 1758)
{"parsed":true,"quality":3,"qualityWarnings":[[3,"HTML tags or entities in the name"],[3,"Unparsed tail"]],"verbatim":"Velutina haliotoides (Linnaeus, 1758) \u003ci\u003esensu\u003c/i\u003e Fabricius, 1780","normalized":"Velutina haliotoides (Linnaeus 1758)","canonicalName":{"simple":"Velutina haliotoides","full":"Velutina haliotoides"},"details":[{"genus":{"value":"Velutina"},"specificEpithet":{"value":"haliotoides","authorship":{"value":"(Linnaeus 1758)","basionymAuthorship":{"authors":["Linnaeus"],"year":{"value":"1758"}}}}}],"positions":[["genus",0,8],["specificEpithet",9,20],["authorWord",22,30],["year",32,36]],"surrogate":false,"virus":false,"hybrid":false,"bacteria":false,"unparsedTail":" sensu Fabricius, 1780","nameStringId":"189c94f6-96aa-52bb-b019-103a2103ce21","parserVersion":"test_version"}
189c94f6-96aa-52bb-b019-103a2103ce21|Velutina haliotoides (Linnaeus, 1758) <i>sensu</i> Fabricius, 1780|Velutina haliotoides|Velutina haliotoides|(Linnaeus 1758)|1758|3
Velutina haliotoides (Linnaeus, 1758), <i>sensu</i> Fabricius, 1780
Velutina haliotoides (Linnaeus, 1758)
{"parsed":true,"quality":3,"qualityWarnings":[[3,"HTML tags or entities in the name"],[3,"Unparsed tail"]],"verbatim":"Velutina haliotoides (Linnaeus, 1758), \u003ci\u003esensu\u003c/i\u003e Fabricius, 1780","normalized":"Velutina haliotoides (Linnaeus 1758)","canonicalName":{"simple":"Velutina haliotoides","full":"Velutina haliotoides"},"details":[{"genus":{"value":"Velutina"},"specificEpithet":{"value":"haliotoides","authorship":{"value":"(Linnaeus 1758)","basionymAuthorship":{"authors":["Linnaeus"],"year":{"value":"1758"}}}}}],"positions":[["genus",0,8],["specificEpithet",9,20],["authorWord",22,30],["year",32,36]],"surrogate":false,"virus":false,"hybrid":false,"bacteria":false,"unparsedTail":", sensu Fabricius, 1780","nameStringId":"b8d77a78-2698-5050-9c7a-638f615bd357","parserVersion":"test_version"}
b8d77a78-2698-5050-9c7a-638f615bd357|Velutina haliotoides (Linnaeus, 1758), <i>sensu</i> Fabricius, 1780|Velutina haliotoides|Velutina haliotoides|(Linnaeus 1758)|1758|3
#AST is no parse because preprocessing removes tags
<i>Velutina halioides</i> (Linnaeus, 1758)
noparse
{"parsed":true,"quality":3,"qualityWarnings":[[3,"HTML tags or entities in the name"]],"verbatim":"\u003ci\u003eVelutina halioides\u003c/i\u003e (Linnaeus, 1758)","normalized":"Velutina halioides (Linnaeus 1758)","canonicalName":{"simple":"Velutina halioides","full":"Velutina halioides"},"details":[{"genus":{"value":"Velutina"},"specificEpithet":{"value":"halioides","authorship":{"value":"(Linnaeus 1758)","basionymAuthorship":{"authors":["Linnaeus"],"year":{"value":"1758"}}}}}],"positions":[["genus",0,8],["specificEpithet",9,18],["authorWord",20,28],["year",30,34]],"surrogate":false,"virus":false,"hybrid":false,"bacteria":false,"nameStringId":"653bbe42-aef4-5847-add4-8c7f8a4d1f9b","parserVersion":"test_version"}
653bbe42-aef4-5847-add4-8c7f8a4d1f9b|<i>Velutina halioides</i> (Linnaeus, 1758)|Velutina halioides|Velutina halioides|(Linnaeus 1758)|1758|3
Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo
Quadrella steyermarkii (Standl.) Iltis
{"parsed":true,"quality":3,"qualityWarnings":[[3,"HTML tags or entities in the name"]],"verbatim":"Quadrella steyermarkii (Standl.) Iltis \u0026amp; Cornejo","normalized":"Quadrella steyermarkii (Standl.) Iltis \u0026 Cornejo","canonicalName":{"simple":"Quadrella steyermarkii","full":"Quadrella steyermarkii"},"details":[{"genus":{"value":"Quadrella"},"specificEpithet":{"value":"steyermarkii","authorship":{"value":"(Standl.) Iltis \u0026 Cornejo","basionymAuthorship":{"authors":["Standl."]},"combinationAuthorship":{"authors":["Iltis","Cornejo"]}}}}],"positions":[["genus",0,9],["specificEpithet",10,22],["authorWord",24,31],["authorWord",33,38],["authorWord",41,48]],"surrogate":false,"virus":false,"hybrid":false,"bacteria":false,"nameStringId":"fbd1b4fe-f8ed-5390-9cb1-e0f798691b1e","parserVersion":"test_version"}
fbd1b4fe-f8ed-5390-9cb1-e0f798691b1e|Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo|Quadrella steyermarkii|Quadrella steyermarkii|(Standl.) Iltis & Cornejo||3
Torymus bangalorensis (Mani &amp; Kurian, 1953)
Torymus bangalorensis (Mani
{"parsed":true,"quality":3,"qualityWarnings":[[3,"HTML tags or entities in the name"]],"verbatim":"Torymus bangalorensis (Mani \u0026amp; Kurian, 1953)","normalized":"Torymus bangalorensis (Mani \u0026 Kurian 1953)","canonicalName":{"simple":"Torymus bangalorensis","full":"Torymus bangalorensis"},"details":[{"genus":{"value":"Torymus"},"specificEpithet":{"value":"bangalorensis","authorship":{"value":"(Mani \u0026 Kurian 1953)","basionymAuthorship":{"authors":["Mani","Kurian"],"year":{"value":"1953"}}}}}],"positions":[["genus",0,7],["specificEpithet",8,21],["authorWord",23,27],["authorWord",30,36],["year",38,42]],"surrogate":false,"virus":false,"hybrid":false,"bacteria":false,"nameStringId":"8131ebda-dce6-5aaf-97ae-2370fe8e77d7","parserVersion":"test_version"}
8131ebda-dce6-5aaf-97ae-2370fe8e77d7|Torymus bangalorensis (Mani &amp; Kurian, 1953)|Torymus bangalorensis|Torymus bangalorensis|(Mani & Kurian 1953)|1953|3
# #>
#SECTION: Underscores intead of spaces<
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment