[Discovery] Use Elasticsearch for some Analytics features?

Problem to solve

For data-intensive Analytics features like Productivity Analytics and Code Hotspots, Postgres is proving to be a limitation both for reading and writing. Elasticsearch has been proposed as an alternative data store. Does Elasticsearch suit our needs for these features?

Related issues

  • Discovery: Use InfluxDB for some Analytics features?

Further details

  • Elasticsearch is in an early-to-mid phase of implementation for search functionality in GitLab. While fairly new infrastructure, it is expected to become long-standing.
  • @DylanGriffith started this conversation by asking "if we'd considered using Elasticsearch as the database for this kind of analytics data". The Analytics team has been actively searching for alternatives to Postgres.
  • @DylanGriffith noted "the search team is already going to need to work on reliably keeping ES in sync". Presumably the Analytics team could support that effort.
  • @ahegyi created a proof of concept and ran it on the gitlab repo. His results are promising, with approximately a 10 second runtime to parse and store data for the entire repo, and with a resulting size of approximately 15 MB. That's manageable at scale.

Proposal

Elasticsearch satisfies our needs for Analytics if it accommodates:

  • Extremely fast create (for backfilling data on the scale of gitlab.com)
  • Extremely fast update (for adding attributes and/or changing values in existing records)
  • Extremely fast querying (for quick page load times)
  • A long lifetime in GitLab (not a technology that will be soon replaced)
  • A manageable increase in complexity

Drawbacks

  • Not all self-managed instances will have Elasticsearch, so would not have any Analytics features that rely on it.

What does success look like, and how can we measure that?

We want specific criteria for deciding "if and when" to use Elasticsearch over Postgres for Analytics features.

Edited Oct 30, 2019 by Dan Jensen
Assignee Loading
Time tracking Loading