Too long title causes exception when ingesting in new REST API
Background
DOIs 10.26782/jmcms.spl.10/2020.06.00048 and 10.11646/phytotaxa.387.2.4 have very long titles, longer than allowed in ES. In ES, title
field is of type keyword
, which has a limit of 32,766 bytes.
As a result, ingestion of those DOIs fails with:
Elasticsearch work index failed: ({:index {:_index "work", :_type "work", :_id "10.26782/jmcms.spl.10/2020.06.00048", :status 400, :error {:type "illegal_argument_exception", :reason "Document contains at least one immense term in field=\"title\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[83, 112, 101, 99, 105, 97, 108, 32, 73, 115, 115, 117, 101, 32, 78, 111, 46, 32, -30, -128, -109, 32, 49, 48, 44, 32, 74, 117, 110, 101]...', original message: bytes can be at most 32766 in length; got 50275", :caused_by {:type "max_bytes_length_exceeded_exception", :reason "bytes can be at most 32766 in length; got 50275"}}}})
The simplest solution could be not indexing title field of type "keyword", but only of type "text", just as abstract is indexed. Title is not used in filters or facets, and query.title uses another field title-text
, which is of type "text". So title with "keyword" type should not be needed.
Note that this will require changing mappings and reindexing in staging.
Definition of ready
-
Product owner: @ppolischuk1 -
Tech lead: @dtkaczyk -
Service:: label applied -
Definition of done updated -
Acceptance testing plan: staging -
Weight applied
Definition of done
-
Unit tests identified, implemented, and passing -
Code reviewed -
Available via a staging URL -
Knowledge base reviewed and updated -
Public documentation reviewed and updated -
Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed -
Acceptance criteria met -
titles are indexed as type text
-
too long titles do not cause exceptions when ingesting in the new REST API
-
Notes
Edited by Patrick Polischuk