Add temporary search endpoint to zoekt indexer servers to support federated search across multiple zoekt webservers (!310) · Merge requests · GitLab.org / Gitlab Zoekt Indexer

John Mason requested to merge jm-multi-node-search into main Nov 05, 2024

What does this MR do and why?

Note: this is a temporary solution. We ultimately want to introduce a gRPC endpoint on the webserver binary. Right now we use threads in rails which is a scalability constraint. We hope to use this until we have a proper gRPC search streaming endpoint.

The changes in this code introduce a new /search endpoint that allows for distributed search across multiple Zoekt web servers. This enables users to perform faster and more comprehensive searches by utilizing the concurrency in go instead of the current approach in rails which relies on threads. Additionally, this search endpoint handles timeouts, partial or complete failures gracefully.

It is intended to be a near drop in replacement for a zoekt webserver url. The difference is a list of ForwardedTo endpoints must be sent so the server knows which web servers to federate the search.

Architecture

source

References

Current multi node search in rails using threads: https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/gitlab/search/zoekt/client.rb#L56
Long term solution: gitlab#500087
MR in rails that will use this proxy: gitlab!171867 (merged)

How to set up and validate locally

Assuming steps for setting up zoekt are followed by gdk instructions: <>

Ensure both zoekt-webserver-development-1 and zoekt-webserver-development-2 are running with gdk status
Stop the first development indexer server: gdk stop gitlab-zoekt-indexer-development-1
Start the indexer in gdk mode: make gdk
Examine the results from searching zoekt-webserver-development-1 directly

curl --request POST \
  --url http://localhost:6090/api/search \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/10.1.1' \
  --data '{
	"Q": "listen",
	"Opts": {
		"TotalMaxMatchCount": 10,
		"NumContextLines": 3
	},
	"Timeout": 500
}'

Examine the results from searching zoekt-webserver-development-2 directly

curl --request POST \
  --url http://localhost:6091/api/search \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/10.1.1' \
  --data '{
	"Q": "listen",
	"Opts": {
		"TotalMaxMatchCount": 10,
		"NumContextLines": 3
	},
	"Timeout": 500
}'

Examine the results when searching the indexer endpoint. There should be results from both webservers included and sorted. Notice that a partial failure includes the failing endpoint in the list of failures.

curl --request POST \
  --url http://localhost:6080/indexer/search \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/10.1.1' \
  --data '{
	"Q": "listen",
	"Opts": {
		"TotalMaxMatchCount": 10,
		"NumContextLines": 3
	},
	"ForwardTo": [
		{ "Endpoint": "http://localhost:6090/api/search" },
		{ "Endpoint": "http://localhost:6091/api/search" },
		{ "Endpoint": "http://localhost:6099/noop" }
	],
	"Timeout": 500
}'

Examine the response when there is a complete failure

curl --request POST \
  --url http://localhost:6080/indexer/search \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/10.1.1' \
  --data '{
	"Q": "listen",
	"Opts": {
		"TotalMaxMatchCount": 10,
		"NumContextLines": 3
	},
	"ForwardTo": [
		{ "Endpoint": "http://localhost:6099/noop" }
	],
	"Timeout": 500
}'
``

Edited Nov 06, 2024 by John Mason

Add temporary search endpoint to zoekt indexer servers to support federated search across multiple zoekt webservers

What does this MR do and why?

Architecture

References

How to set up and validate locally

Merge request reports