During the summit Sid and I talked about this and he asked me to follow up with @aherrmann and ask him to try to implement it asap.
We implemented Algolia for the GitLab Docs and after a few iterations and adjustments on how to index the site and display the results, it's working pretty well; we can actually find pretty much anything there.
It's particularly important for the handbook and the blog. Beyond that, if we make a separate index for the blog, we can consider adding it to the docs search page (https://docs.gitlab.com/search/) as an option to include blog tutorials in the docs search results. By doing so, we can help our users to find the tutorials burried in the blog backlog and make them more useful/discoverable on the long term.
Ashton can you please look into it? Pls lemme know if you need help.
Your website must be a documentation website. We do not index blogs or commercial content.
The handbook might be, but not the whole site which covers the blog as well.
With that said, we need to get in contact with the Algolia folks and find out how much this is gonna cost us https://www.algolia.com/pricing. There's a pay as you go tier, but we need to calculate the records and indexing operations.
@marcia I'm pretty sure that someone from Algolia is gonna be at the write the docs conf, so we can approach them and find out more.
@axil thanks for the info; but from what we talked to Fred from Algolia, the paid tiers would only make sense if we needed support But let's try to find them in the conference, yes :)
@aherrmann I've received a follow-up email from Algolia asking how we're doing and seized the opportunity to kick off the conversation about this implementation.
He told me that there will be some costs attached, and asked me to schedule a meeting. I talked about this with the tech writing team, and we thought it would be better to introduce you to him so you can move forward with the conversation. We can help you with anything you need, but we'll leave it to you to lead the conversation.
I'll introduce you over email and cc/ the tech writing team as well as William so we're all aligned but please take point on the subject and keep us informed.
Please let me know if you have any questions, and we're happy to help with whatever you need :)
@marcia@aherrmann The Support team are interested in using Algolia search too. We're developing the Support portal at https://support.gitlab.com/hc/en-us and having the ability to use a customized Algolia search for docs and troubleshooting guides would be very helpful.
I'd be happy to be part of any discussions on how we can make that happen. If there are costs involved that could be shared with the Support team we would be happy to discuss.
@marcia@tatkins We have a call scheduled with Algolia for 13:00 PST, October 9th. I'm not entirely sure what to expect from this first call, but I'd be happy to have you there, @tatkins, so that we can ensure that we're asking questions based on our respective interests.
Once I have a calendar invite from them, I'll be sure to copy you on it.
@aherrmann I won't be available on Oct 9th, but please go ahead without me.
I think the first things to ask them and talk about in this call are:
How much will it cost (for the handbook only,for the entire about website, and for the entire domain *.gitlab.com)
Will we have any limitations (bandwidth, number of queries, etc)
What kind of support/guidance will we have to implement it
Where will we store the configuration file (for the docs site, it's in a json file on GH, which means that we can never change it by ourselves, we need to wait for them to merge - though they're very fast and we never had a problem with it, afaik)
What kind of implementation will we need to be able to fetch results from docs.gitlab.com on about.gitlab.com and vice-versa (in case we decide to try smt like that) - I think we'll need to config it in a way that supports searching multiple indexes
If you want to take a look at our current settings (for the docs site), please check:
docsearch.html, which globally configures the search in our docs site
@marcia, thank you so much for those questions and the links you've provided. I'll be sure to research how you're using Algolia in docs ahead of our call.
I'll also be sure to fill you in on the call after it's happened.
We met with Gabriel and Brian today, who is going to set us up with a trial account to use over the next three weeks--while Brian's away for his honeymoon--as a proof of concept. We have a follow-up scheduled for November 1st.
To answer some of your questions @marcia (we ran out of time before I could knock out the full list):
Pricing sounds like it can be structured a couple different ways; if we implement on a smaller scale than the full site, it's likely we'd pay based on usage, which leads into the second question
We didn't discuss bandwidth limits, but, we shouldn't run into issues with other limitations, outside of the fact that we'll be paying more the more we use it
We should be able to answer your other questions as we work with our trial account or once we meet for our follow-up call.
When you start the implementation, could you please ping me in the MRs? I want to keep an eye on them so that I can keep my mind open for possible future integrations with our implementation on the docs site :)
It might also be worth sharing updates in #algolia#grp_cross-site-search for greater visibility. Generally, I want to keep up with all things Algolia, and I know there are other potential areas where we may want to use it, like the forum.
Hey, @tatkins, I've cleared enough off my plate that I'm just ready to start testing out implementation and figured we should sync up. Have you had any success yet?
I've done a bit of research into this and will try to summarise here. (There is also #grp_cross_site_search in Slack to discuss this topic / ask questions.)
Algolia
As mentioned above we currently use their free Docsearch product for docs. This cannot be used for other parts of our content due to Terms of service.
Their paid plans are for 'indexed' based searching where you provide structured content (usually in the form of JSON files). To get what we need we'd be on the Business plan: https://www.algolia.com/pricing The cost of this may be prohibitive. There is also work required to generate structured files from our static site generators.
They do have a 'crawler' based solution (like Docsearch uses) but it requires a higher pricing tier: "Our web crawler is an add-on that is customised exclusively for customers on our Enterprise plan." See: https://discourse.algolia.com/t/algolia-web-crawler/862/5
Elasticsearch
We have in-house expertise with Elasticsearch which we use with our application logging. This is quite different from 'site search' type functionality needed for what we are talking about here.
Elastic is open source so we could host and manage it all ourselves. It would require developer time to create and maintain a full search solution using this approach.
Swiftype
Please see how they return results from all their content across the forum, docs and blog etc: https://swiftype.com/#stq=api&stp=1 < This kind of search result would be great for us.
We currently use Swiftype site search service for about.gitlab.com and the handbook.
Our implementation is an out-of-the box solution so the results and functionality are limited. It can do a lot more (competitive with Algolia features) for a better price:
If we configure the search engine and optimise our content I believe we can get excellent search results across our various properties with Swiftype.
Swiftype uses ElasticSearch on the backend and was recently acquired by Elastic Co. They are developing and expanding the service. Our existing relationship with Elastic means they are keen to help us get the best out of their service.
Since the acquisition by Elastic there has been considerable development of Swiftype - some examples:
We currently have a legacy plan with Swiftype that covers our search for the handbook and about.gitlab.com. I've requested a quote to upgrade the plan to a new 'Pro' plan as described here: https://swiftype.com/site-search/pricing based on possibly indexing he following:
One big improvement of the new plan is that it indexes content every 12 hours. The legacy plan we're currently on only indexes every 7 days which may account for some of the impression regarding poor results.
I'll continue to investigate this next week and point the crawler at our docs so we can do a comparison of results to see if might meet our needs.
Once again - do take a look at the results on https://swiftype.com/#stq=api&stp=1 to see how we could present results across our various information sources (bearing in mind that we can customize / limite results in various places as needed.)
@cteskey I was collaborating on this with the docs team and others prior to moving out of my handbook role. If and when we hire a new handbook content manager, we may want to direct this to them