Skip to content

Sanitize string provided to to_tsvector

Michael Trainor requested to merge tmike-search-vector-sanitize into master

What does this MR do and why?

Closes #428428 (closed).

We sanitize full text queries for non-allowed characters, but we don't do this for generating the search vector itself.

When we supply non-allowed characters to to_tsvector, it can affect the resulting search vector.

Example: providing <gitlab> string to to_tsvector will result in an empty tsvector.

gitlabhq_production=# SELECT setweight(to_tsvector('english', '<gitlab>'), 'A');
 setweight
-----------

(1 row)

By sanitising the non-allowed characters out, it allows for words surrounded by these non-allowed characters to be included in the search vector

gitlabhq_development=# SELECT setweight(to_tsvector('english', ' gitlab '), 'A');
  setweight
-------------
 'gitlab':1A
(1 row)

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After
MR134715_before.png MR134715-after.png

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Pre-change:

  1. Create an issue with the title the <rain> is falling down
  2. Check the issue search data's search vector: Issue.where(title:'the <rain> is falling down').first.search_data.search_vector
    1. The search vector does not contain the word rain
  3. Search for the term rain using Basic Search in the project
    1. There are no results in Issues

Post-change:

Assumes the issue already exists with the title as per pre-change steps.

  1. Update the issues's search data: Issue.where(title:'the <rain> is falling down').first.update_search_data!
  2. Check the issue search data's search vector: Issue.where(title:'the <rain> is falling down').first.search_data.search_vector
    1. The search vector includes the word rain
  3. Search for the term rain using Basic Search in the project
    1. There is a result matching the issue

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michael Trainor

Merge request reports