Support Elastic Search's HTML strip char filter
https://gitlab.zendesk.com/agent/tickets/137301 (Internal)
As reported in gitlab#10693 (closed), certain code cases are not considered.
The terms searched in the ZD ticket live in a pom.xml, and according to @amulvany, the search input is being interpreted incorrectly:
artifactId-spring-boot-starter-parent/artifactId is being filtered as a path
artifactId-spring-boot-starter-parent as a single string which is common
spring-boot-starter-parent as a single string which is not so common.
The suggested solution is to implement Html Strip Char Filter in order to return the decoded value.
Example:
POST _analyze
{
"tokenizer": "keyword",
"char_filter": [ "html_strip" ],
"text": "<p>I'm so <b>happy</b>!</p>"
}
Response:
[ \nI'm so happy!\n ] (instead of [ I'm, so, happy ]`