Skip to content
Snippets Groups Projects
Verified Commit 860507ee authored by Steve Xuereb's avatar Steve Xuereb
Browse files

Document Google Spanner for Global Service

What
---
Add a new section in the Global Service blueprint about Google Spanner

Why
---
We only looked at Spanner briefly, so we needed to do more research on
the platform.

Reference: #454056


Signed-off-by: Steve Xuereb's avatarSteve Xuereb <sxuereb@gitlab.com>
parent 3d7374fe
No related branches found
No related tags found
1 merge request!148308Document Google Spanner for Global Service
......@@ -5,6 +5,8 @@ description: 'Cells: Global Service'
status: accepted
---
<!-- vale gitlab.FutureTense = NO -->
# Cells: Global Service
This document describes design goals and architecture of Global Service
......@@ -19,7 +21,7 @@ Global Service, that can be deployed in many regions.
1. **Technology.**
The Global Service will be written in [Golang](https://go.dev/)
The Global Service will be written in [Go](https://go.dev/)
and expose API over [gRPC](https://grpc.io/).
1. **Cells aware.**
......@@ -173,12 +175,80 @@ The original [Cells 1.0](iterations/cells-1.0.md) described [Primary Cell API](i
by various services (HTTP Routing Service, SSH Routing Service, each Cell).
1. As part of Cells 1.0 PoC we discovered that we need to provide robust classification API
to support more workflows than anticipated. We need to classify various resources
(username for login, projects for ssh routing, etc.) to route to correct Cell.
(username for login, projects for SSH routing, etc.) to route to correct Cell.
This would put a lot of dependency on resilience of the First Cell.
1. It is our desire long-term to have Global Service for passing information across Cells.
This does a first step towards long-term direction, allowing us to much easier perform
additional functions.
## Spanner
[Spanner](https://cloud.google.com/spanner) will be a new data store introduced into the GitLab Stack, the reasons we are going with Spanner are:
1. It supports Multi-Regional read-write access with a lot less operations when compared to PostgreSQL helping with out [regional DR](../disaster_recovery/index.md)
1. The data is read heavy not write heavy.
1. Spanner provides [99.999%](https://cloud.google.com/spanner/sla) SLA when using Multi-Regional deployments.
1. Provides consistency whilst still being globally distributed.
1. Shards/[Splits](https://cloud.google.com/spanner/docs/schema-and-data-model#database-splits) are handled for us.
The cons of using Spanners are:
1. Vendor lock-in, our data will be hosted in a proprietary data.
- How to prevent this: Global Service will use generic SQL.
1. Not self-managed friendly, when we want to have Global Service available for self-managed customers.
- How to prevent this: Spanner supports PostgreSQL dialect.
1. Brand new data store we need to learn to operate/develop with.
### GoogleSQL vs PostgreSQL dialects
Spanner supports two dialects one called [GoogleSQL](https://cloud.google.com/spanner/docs/reference/standard-sql/overview) and [PostgreSQL](https://cloud.google.com/spanner/docs/reference/postgresql/overview).
The dialect [doesn't change the performance characteristics of Spanner](https://cloud.google.com/spanner/docs/postgresql-interface#choose), it's mostly how the Database schemas and queries are written.
Choosing a dialect is a one-way door decision, to change the dialect we'll have to go through a data migration process.
We will use the `GoogleSQL` dialect for the Global Service, and [go-sql-spanner](https://github.com/googleapis/go-sql-spanner) to connect to it, because:
1. Using Go's standard library `database/sql` will allow us to swap implementations which is needed to support self-managed.
1. GoogleSQL [data types](https://cloud.google.com/spanner/docs/reference/standard-sql/data-types) are narrower and don't allow to make mistakes for example choosing int32 because it only supports int64.
1. New features seem to be released on GoogleSQL first, for example, <https://cloud.google.com/spanner/docs/ml>. We don't need this feature specifically, but it shows that new features support GoogleSQL first.
1. A more clear split in the code when we are using Google Spanner or native PostgreSQL, and won't hit edge cases.
Citations:
1. Google (n.d.). _PostgreSQL interface for Spanner._ Google Cloud. Retrieved April 1, 2024, from <https://cloud.google.com/spanner/docs/postgresql-interface>
1. Google (n.d.). _Dialect parity between GoogleSQL and PostgreSQL._ Google Cloud. Retrieved April 1, 2024, from <https://cloud.google.com/spanner/docs/reference/dialect-differences>
### Multi-Regional
Running Multi-Regional read-write is one of the biggest selling points of Spanner.
When provisioning an instance you can choose single Region or Multi-region.
After provisioning you can [move an instance](https://cloud.google.com/spanner/docs/move-instance) whilst is running but this is a manual process that requires assistance from GCP.
We will provision a Multi-Regional Cloud Spanner instance because:
1. Won't require migration to Multi-Regional in the future.
1. Have Multi Regional on day 0 which cuts the scope of multi region deployments at GitLab.
This will however increase the cost considerably, using public facing numbers from GCP:
1. [Regional](https://cloud.google.com/products/calculator?hl=en&dl=CiRlMjU0ZDQyMy05MmE5LTRhNjktYjUzYi1hZWE2MjQ4N2JkNDcQIhokOTlGQUM4RjUtNjdBRi00QTY1LTk5NDctNThCODRGM0ZFMERC): $1,716
1. [Multi Regional](https://cloud.google.com/products/calculator?hl=en&dl=CiQzNjc2ODc5My05Y2JjLTQ4NDQtYjRhNi1iYzIzODMxYjRkYzYQIhokOTlGQUM4RjUtNjdBRi00QTY1LTk5NDctNThCODRGM0ZFMERC): $9,085
Citations:
1. Google (n.d.). _Regional and multi-region configurations._ Google Cloud. Retrieved April 1, 2024, from <https://cloud.google.com/spanner/docs/instance-configurations>
1. Google (n.d.). FeedbackReplication. Google Cloud. Retrieved April 1, 2024, from <https://cloud.google.com/spanner/docs/replication>
### Performance
We haven't run any benchmarks ourselves because we don't have a full schema designed.
However looking at the [performance documentation](https://cloud.google.com/spanner/docs/performance), both the read and write throughputs of a Spanner instance scale linearly as you add more compute capacity.
### Alternatives
1. PostgreSQL: Having a multi-regional deployment requires a lot of operations.
1. ClickHouse: It's an `OLAP` database not an `OLTP`.
1. Elasticsearch: Search and analytics document store.
## FAQ
1. Does Global Service implement all services for Cells 1.0?
......@@ -192,7 +262,7 @@ The original [Cells 1.0](iterations/cells-1.0.md) described [Primary Cell API](i
1. How we will push all existing claims from "First Cell" into Global Service?
We would add `rake gitlab:cells:claims:create` task. Then we would configure First Cell
to use Global Service, and execute the rake task. That way First Cell would claim all new
to use Global Service, and execute the Rake task. That way First Cell would claim all new
records via Global Service, and concurrently we would copy data over.
1. How and where the Global Service will be deployed?
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment