Manage and use our own Algolia search index
Currently, Algolia manages the search index for docs.antora.org. They run a hosted version of the docsearch scraper each night using this config. The index it populates ends up in an account owned by Algolia. The pages in the site then point to that index to get search results.
While it's convenient to not have to worry about managing the search index, it comes with several downsides:
- We have to submit pull requests to the docsearch-configs repositories to update the config (which may incur delay that is out of our control)
- We don't get any statistics, analytics, or other insights about search queries
- We can't control when the scraper runs (which is usually too often)
- We can't easily test config changes
- We can't see the index to learn how to tune results and optimize it
Fortunately, we can run the docsearch scraper ourselves and host the index in an Algolia account (aka Algolia application) that's maintained by this project (yeah self-determination). Algolia offers a free plan for open source projects, which we've already secured. What's left is to create an index, set up the scraper to run in CI, and update the CI for the site build to point the pages at our index.
For reference about how to run the docsearch scraper, see https://github.com/couchbase/docs-site/blob/master/docsearch/README.adoc.
This issue is complete once the search box on docs.antora.org is using the index from our dedicated Algolia account. As a follow-up, we can ask Algolia to stop scraping our site (by disabling our config in docsearch-configs). We'll also be in a position to start tuning our own config to get better search results.