Skip to content

Add temporary search endpoint to zoekt indexer servers to support federated search across multiple zoekt webservers

John Mason requested to merge jm-multi-node-search into main

What does this MR do and why?

Note: this is a temporary solution. We ultimately want to introduce a gRPC endpoint on the webserver binary. Right now we use threads in rails which is a scalability constraint. We hope to use this until we have a proper gRPC search streaming endpoint.

The changes in this code introduce a new /search endpoint that allows for distributed search across multiple Zoekt web servers. This enables users to perform faster and more comprehensive searches by utilizing the concurrency in go instead of the current approach in rails which relies on threads. Additionally, this search endpoint handles timeouts, partial or complete failures gracefully.

It is intended to be a near drop in replacement for a zoekt webserver url. The difference is a list of ForwardedTo endpoints must be sent so the server knows which web servers to federate the search.

Architecture

source
Code_Search_FigJam

References

How to set up and validate locally

Assuming steps for setting up zoekt are followed by gdk instructions: <>

  1. Ensure both zoekt-webserver-development-1 and zoekt-webserver-development-2 are running with gdk status
  2. Stop the first development indexer server: gdk stop gitlab-zoekt-indexer-development-1
  3. Start the indexer in gdk mode: make gdk
  4. Examine the results from searching zoekt-webserver-development-1 directly
curl --request POST \
  --url http://localhost:6090/api/search \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/10.1.1' \
  --data '{
	"Q": "listen",
	"Opts": {
		"TotalMaxMatchCount": 10,
		"NumContextLines": 3
	},
	"Timeout": 500
}'
  1. Examine the results from searching zoekt-webserver-development-2 directly
curl --request POST \
  --url http://localhost:6091/api/search \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/10.1.1' \
  --data '{
	"Q": "listen",
	"Opts": {
		"TotalMaxMatchCount": 10,
		"NumContextLines": 3
	},
	"Timeout": 500
}'
  1. Examine the results when searching the indexer endpoint. There should be results from both webservers included and sorted. Notice that a partial failure includes the failing endpoint in the list of failures.
curl --request POST \
  --url http://localhost:6080/indexer/search \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/10.1.1' \
  --data '{
	"Q": "listen",
	"Opts": {
		"TotalMaxMatchCount": 10,
		"NumContextLines": 3
	},
	"ForwardTo": [
		{ "Endpoint": "http://localhost:6090/api/search" },
		{ "Endpoint": "http://localhost:6091/api/search" },
		{ "Endpoint": "http://localhost:6099/noop" }
	],
	"Timeout": 500
}'
  1. Examine the response when there is a complete failure
curl --request POST \
  --url http://localhost:6080/indexer/search \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/10.1.1' \
  --data '{
	"Q": "listen",
	"Opts": {
		"TotalMaxMatchCount": 10,
		"NumContextLines": 3
	},
	"ForwardTo": [
		{ "Endpoint": "http://localhost:6099/noop" }
	],
	"Timeout": 500
}'
``
Edited by John Mason

Merge request reports

Loading