Elasticsearch v6 - migrate from parent/child relationships to joins
What does this MR do?
Since ES6 only allows one mapping type per index and we need to have documents in a single index in order to use join datatypes, I had to make quite a few changes to how we deal with elasticsearch in the codebase:
- I changed all our document types to be
doc
- In order to still be able to query per type, I implemented a custom type field in our elasticsearch documents, called simply
type
- The join type was created granularly, that is to say, it has a mapping per type: project->issue, project->blob, etc.
- We now use an
es_id
for elasticsearch's_id
field. This is comprised of the model class name and the ID in the database - Because of the
es_id
change a lot of changes were required to theelasticsearch-model
gem. These changes have been encapsulated in theee/lib/gem_extensions
folder and are injected inside the elasticsearch client initializer.
Breaking changes
We can no longer support Elasticsearch versions under 5.6
. This is because the join
field is only supported from versions 5.6 and upwards.
Wiki searches now must use a new type: wiki_blob
. I've edited existing usages in the codebase but this is good to keep in mind
What are the relevant issue numbers?
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
Tests added for this feature/bug -
Conforms to the code review guidelines -
Conforms to the merge request performance guidelines -
Conforms to the style guides -
Conforms to the database guides -
EE specific content should be in the top level /ee
folder -
For a paid feature, have we considered GitLab.com plans, how it works for groups, and is there a design for promoting it to users who aren't on the correct plan?
Closes #4218 (closed)
Merge request reports
Activity
1 Error b2303efe: The commit subject must contain at least three words 1 Warning This merge request is quite big (more than 1019 lines changed), please consider splitting it into multiple merge requests. 1 Message This merge request adds or changes files that require a review from the docs team. Docs Review
The following files require a review from the Documentation team:
doc/development/elasticsearch.md
doc/integration/elasticsearch.md
To make sure these changes are reviewed, mention
@gl-docsteam
in a separate comment, and explain what needs to be reviewed by the team. Please don't mention the team until your changes are ready for review.Commit message standards
One or more commit messages do not meet our Git commit message standards. For more information on how to write a good commit message, take a look at How to Write a Git Commit Message.
Here is an example of a good commit message:
Reject ruby interpolation in externalized strings When using ruby interpolation in externalized strings, they can't be detected. Which means they will never be presented to be translated. To mix variables into translations we need to use `sprintf` instead. Instead of: _("Hello #{subject}") Use: _("Hello %{subject}") % { subject: 'world' }
This is an example of a bad commit message:
updated README.md
This commit message is bad because although it tells us that README.md is updated, it doesn't tell us why or how it was updated.
Generated by
DangerEdited by 🤖 GitLab Bot 🤖added 1 commit
- a84681ec - Use elasticsearch 5.6 in CI tests. We cannot support ES5.5 and under with ES JOIN
added 1 commit
- 65e9fff0 - Elasticsearch - `es_type` in base class and fix namespace resolution issue
added 300 commits
-
9267cded...25c8d7ce - 298 commits from branch
master
- 34ddd589 - Elasticsearch v6 support
- 4bd2404a - Update elasticsearch docs to reflect v6 support
-
9267cded...25c8d7ce - 298 commits from branch
added 2 commits
- Resolved by Nick Thomas
- Resolved by Mario de la Ossa
- Resolved by Nick Thomas
- Resolved by Mario de la Ossa
- Resolved by Mario de la Ossa
- Resolved by Mario de la Ossa
- Resolved by Mario de la Ossa
- Resolved by Mario de la Ossa
added 2 commits
added 2 commits
marked the checklist item Changelog entry added, if necessary as completed
marked the checklist item Tests added for this feature/bug as completed
marked the checklist item Documentation created/updated as completed
added documentation label
added 2 commits
@nick.thomas could u bother you for a review here please? This was tested against ES5.6 and 6.4
Also ccing @smcgivern in case he wants to take an early look at this :)
FYI I still need to create a CE branch to backport some changesthere was only one change which I removed. no need for a backport anymoreEdited by Mario de la Ossaassigned to @nick.thomas
If @vsizov has time, he might want to review too.
Oh yes I'd love for @vsizov to give this a look as well if he has time! This is a big change, the more eyes the better
- Resolved by Mario de la Ossa
As a follow-up to this issue I propose we should comb through all the elasticsearch-related tests and add the
:elastic
tag to them so we can add some jobs to the pipeline to test against elasticsearch5.6.12
, since there ARE some differences between 5.6 and 6.4 and we wouldn't want to cause a regression
added 1 commit
- 412d7341 - Elasticsearch - tag specs that use it with :elastic
I did a quick look, I have no time this week for thorough review :(
- The changes look reasonable to me(in a context of iterative changes, I think, we don't have to use ES gems anymore as I already said. I believe you estimated the cost of the full removal of the
elasticsearch-rails
gem and you consider it to be more expensive, right? - I like that schema is in one place, it's way easier to maintain now.
- Don't forget about existing customers who have to reindex everything. It should be noted everywhere: update guides, blog post and so on.
- Why can't we still use
{entity}_id
notion?
Edited by Valery Sizov- The changes look reasonable to me(in a context of iterative changes, I think, we don't have to use ES gems anymore as I already said. I believe you estimated the cost of the full removal of the
Thanks a ton for looking @vsizov!
I believe you estimated the cost of the full removal of the
elasticsearch-rails
gem and you consider it to be more expensive, right?That's correct. In order to remove the gem we need to build our own modules that know how to retrieve ActiveRecord objects from the Elasticsearch results. That was the main reason why I went with monkey-patches instead of fully ripping it out
Why can't we still use
{entity}_id
notion?Sorry, I'm not sure what you mean by that? We switched to using
{entity}_id
for elasticsearch_id
sEdited by Mario de la Ossa- Resolved by Mario de la Ossa
- Resolved by Mario de la Ossa
- Resolved by Nick Thomas
- Resolved by Nick Thomas
- Resolved by Nick Thomas
- Resolved by Mario de la Ossa
- Resolved by Nick Thomas
- Resolved by Nick Thomas
- Resolved by Nick Thomas
- Resolved by Nick Thomas
- Resolved by Mario de la Ossa