Sitemap doesn't include index.html files
In https://gitlab.com/gitlab-com/gitlab-docs/blob/master/content/sitemap.xml.erb we have:
<%= xml_sitemap(
items: items.select { |item| !item[:is_hidden] && item.path.end_with?('.html') }
) %>
which skips files having is_hidden
in their frontmatter and only parses files ending in .html
. It might have something to do with the route defined in https://gitlab.com/gitlab-com/gitlab-docs/blob/ce58118e812e50796bc25ff80d453d57d066830d/Rules#L121-127.
route '/**/*.{html,md}' do
if item.identifier =~ '/index.*'
'/index.html'
else
item.identifier.without_ext + '.html'
end
end
The bottom line is that the sitemap doesn't include those pages, and thus are not indexed by Algolia if they are not interlinked anywhere.
For example, https://docs.gitlab.com/ee/user/group/saml_sso/ is not in sitemap or linked in any other page, so it's not found when searched. On the other hand, https://docs.gitlab.com/ee/topics/autodevops/ is linked in another page, but it's not in sitemap. Since one of two is true, it's indexed by Algolia.