Skip to content

Elastic upgrade: Implement Go Indexer option "full"

Goal: Implement Go indexer option index_type: "full"

Suggested steps:

  • Use go-elasticsearch to implement indexer logic: https://github.com/elastic/go-elasticsearch/tree/main

  • Create a Go script that will take in a parameter called index_type. This parameter will control what type of indexing we will do. This issue will handle full type

  • Full indexing will,

    • Go through every project's .md doc files (doc/docs/doc-locale directories) - parallel processing
    • Break down each content page by header sections. Each header section need to be processed to provide values for the index mentioned below.
  • Set explicit index mapping. We need to ensure the right data type is set for each field.

  • When updating index use the following format,

    {
      "id": "docs/user/project/settings#access-tokens",
      "title": "Access Tokens",
      "page_title": "Project Settings",
      "anchor": "#access-tokens",
      "url_path": "/docs/user/project/settings/#access-tokens",
      "content": "You can create access tokens to authenticate with GitLab APIs. Personal access tokens are scoped to a user account, while project access tokens are scoped to a specific project.",
      "heading_hierarchy": [
        {
          "level": 1,
          "text": "Project Settings",
          "anchor": null
        },
        {
          "level": 2,
          "text": "Security",
          "anchor": "#security"
        },
        {
          "level": 3,
          "text": "Access Tokens",
          "anchor": "#access-tokens"
        }
      ],
      "gitlab_docs_breadcrumbs": "User > Project > Settings > Security > Access Tokens",
      "gitlab_docs_section": "user",
      "language": "en",
      "product": "gitlab",
      "version": "16.5",  
      "last_updated": "2024-01-15T10:30:00Z",
      "last_indexed": "2024-01-15T14:45:00Z"
    }
  • Add serverless project connection, including CI/CD variables.

  • Ensure indexer respects “noindex” meta tags by not indexing pages that have this.

Edited by Hiru Fernando