Add temporary search endpoint to zoekt indexer servers to support federated search across multiple zoekt webservers
What does this MR do and why?
Note: this is a temporary solution. We ultimately want to introduce a gRPC endpoint on the webserver binary. Right now we use threads in rails which is a scalability constraint. We hope to use this until we have a proper gRPC search streaming endpoint.
The changes in this code introduce a new /search
endpoint that allows for distributed search across multiple Zoekt web servers. This enables users to perform faster and more comprehensive searches by utilizing the concurrency in go instead of the current approach in rails which relies on threads. Additionally, this search endpoint handles timeouts, partial or complete failures gracefully.
It is intended to be a near drop in replacement for a zoekt webserver url. The difference is a list of ForwardedTo
endpoints must be sent so the server knows which web servers to federate the search.
Architecture
source |
---|
References
- Current multi node search in rails using threads: https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/gitlab/search/zoekt/client.rb#L56
- Long term solution: gitlab#500087
- MR in rails that will use this proxy: gitlab!171867
How to set up and validate locally
Assuming steps for setting up zoekt are followed by gdk instructions: <>
- Ensure both
zoekt-webserver-development-1
andzoekt-webserver-development-2
are running withgdk status
- Stop the first development indexer server:
gdk stop gitlab-zoekt-indexer-development-1
- Start the indexer in gdk mode:
make gdk
- Examine the results from searching
zoekt-webserver-development-1
directly
curl --request POST \
--url http://localhost:6090/api/search \
--header 'Content-Type: application/json' \
--header 'User-Agent: insomnia/10.1.1' \
--data '{
"Q": "listen",
"Opts": {
"TotalMaxMatchCount": 10,
"NumContextLines": 3
},
"Timeout": 500
}'
- Examine the results from searching
zoekt-webserver-development-2
directly
curl --request POST \
--url http://localhost:6091/api/search \
--header 'Content-Type: application/json' \
--header 'User-Agent: insomnia/10.1.1' \
--data '{
"Q": "listen",
"Opts": {
"TotalMaxMatchCount": 10,
"NumContextLines": 3
},
"Timeout": 500
}'
- Examine the results when searching the indexer endpoint. There should be results from both webservers included and sorted. Notice that a partial failure includes the failing endpoint in the list of failures.
curl --request POST \
--url http://localhost:6080/indexer/search \
--header 'Content-Type: application/json' \
--header 'User-Agent: insomnia/10.1.1' \
--data '{
"Q": "listen",
"Opts": {
"TotalMaxMatchCount": 10,
"NumContextLines": 3
},
"ForwardTo": [
{ "Endpoint": "http://localhost:6090/api/search" },
{ "Endpoint": "http://localhost:6091/api/search" },
{ "Endpoint": "http://localhost:6099/noop" }
],
"Timeout": 500
}'
- Examine the response when there is a complete failure
curl --request POST \
--url http://localhost:6080/indexer/search \
--header 'Content-Type: application/json' \
--header 'User-Agent: insomnia/10.1.1' \
--data '{
"Q": "listen",
"Opts": {
"TotalMaxMatchCount": 10,
"NumContextLines": 3
},
"ForwardTo": [
{ "Endpoint": "http://localhost:6099/noop" }
],
"Timeout": 500
}'
``