Skip to content

GitLab Next

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
GitLab
GitLab
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 37,041
    • Issues 37,041
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 1,426
    • Merge requests 1,426
  • Requirements
    • Requirements
    • List
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Operations
    • Operations
    • Metrics
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.org
  • GitLabGitLab
  • Merge requests
  • !24298

Merged
Created Feb 03, 2020 by Nick Thomas@nick.thomas🌇Maintainer3 of 13 tasks completed3/13 tasks

Add a bulk processor for elasticsearch incremental updates

  • Overview 138
  • Commits 1
  • Pipelines 37
  • Changes 21

What does this MR do?

Currently, we store bookkeeping information for the elasticsearch index in sidekiq jobs. There are four types of information:

  • Backfill indexing for repositories
  • Backfill indexing for database records
  • Incremental indexing for repositories
  • Incremental indexing for database records

The first three use elasticsearch bulk requests when indexing. The last does not.

This MR introduces a system that uses bulk requests when indexing incremental changes to database records. This is done by adding the bookkeeping information to a Redis ZSET, rather than enqueuing sidekiq jobs for each change. A Sidekiq cron worker takes batches from the ZSET and submits them to elasticsearch via the bulk API.

This reduces the responsiveness of indexing slightly, but also reduces the cost of indexing, both in terms of the load on Elasticsearch, and the size of the bookkeeping information.

Since we're using a ZSET, we also get deduplication of work for free.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

  • Changelog entry
  • Documentation (if required)
  • Code review guidelines
  • Merge request performance guidelines
  • Style guides
  • Database guides
  • Separation of EE specific content

Availability and Testing

  • Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
  • Tested in all supported browsers
  • Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #34086 (closed)

Edited Feb 21, 2020 by Nick Thomas
Assignee
Assign to
Reviewer
Request review from
12.9
Milestone
12.9 (Past due)
Assign milestone
Time tracking
Source branch: 34086-es-bulk-incremental-index-updates