[Geo] Logical replication case

Yes, currently, using logical replication from a hot standby is not possible. Work in this direction wasn't finished https://commitfest.postgresql.org/15/788/, and unfortunately this will not appear in the upcoming Postgres 11. Chances to have it in Postgres 12 are unclear now.

I totally agree that in this use case using logical replication looks reasonable in general, as well as thoughts that dealing with two types of replication leads to high level of complexity and risks of technical issues and bugs. It's worth to consider using only logical replication for everything.

How much work will it be, what types of issues we might encounter with – these questions are worth investigating. I know that OnGres has solid experience with logical replication so it's definitely worth to discuss it with them (/cc @ahachete @teoincontatto @emanuel_ongres @Finotto – I'm putting this to our next call's agenda)

To give a more complete opinion here, I'd like to have more internal insights of the GEO replication process and to understand more the requirements. Despite what I know about the product and this ticket, I don't feel I get the full picture. Maybe external systems could be used, similar to the approach described in the famous https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/ (this is just hand waving as an example, not even a proposal).

But Re: logical replication as the only means for replication, the answer is unfortunately: no. Logical replication, despite all the love that I have for it, is not ready (by far) to be used as a data replication mechanism to be used as the basis of a high availability mechanism. It is probably not even designed for this. Has more lag, more resource usage, imposes several limitations on the DDL, does not replicate DDL, sequences or large objects and is definitely not proven as the solution for whole data replication.

I'd say to setup an specific call for this and discuss further and understand the whole scenario.

I would like to understand better the requirements here to see all our possible solutions. and what could be the most feasible solution considering the product and implementation efforts.

@ahachete @Finotto It's not entirely focused on the database part, but this may help you get a good understanding on Geo and the architectural decisions we made so far: gitlab-com/www-gitlab-com!13968 (diffs) (expand the diff to read)

changed the description

Synchronization. If two replications have different latencies we can have errors because of not consisting data

This is expected on Logical Replication as there is no snapshot isolation at global level. Even if no latencies are observed, [both|N] nodes will act as standalone servers, having their own timelines and snapshot. Addressing this is quite complex and there are solutions

Logical Replication does not prompt all the errors of the incoming events unless they act over constraints (which are out of the replication component), this makes it not suitable for mirroring. This also means that HA should be applied to each "queue" or shard (meaning shard as the entry point of a queue).

Dumping here what it have been discussed on this Slack thread.

Each Geo node is an standalone database by itself, so an HA on each point should be setup. (Backups and consistent failover). Unless its state is ephemeral or can be recovered systematically, each endpoint should have failover and recovery capabilities.

The queue will be downstream'd from an origin to all nodes, which will record it state so the view of the queue is consistent in the consumers (from the Logical Replication level we do not consider the consumers as replicas).

The centralized queue can be considered as the catalog of a sharding cluster. That consistency can be reached through logical synchronous replication (26.2.8. Synchronous Replication mentions through synchronous_standby_names although this variable is server wide, although the subscription can be set with synchronous_commit at creation), but if the expected stream is wide large it may slow things down and I can't warrantee how far has this been tested. Just to clarify, this synchronicity happens at transaction level, meaning that it will only affect records on the pub/sub flow.

We also need to consider the workaround on possible data mismatch, that is via recreating the subscription or either fix in place approach.

changed the description

Based on our discussion in https://docs.google.com/document/d/1II7LwedfqDVdc5-nKz6TzETFNFRQKBOJZlWJ60c6N14/edit#:

At the moment, logical replication is not meant to replicate the whole database.
If logical replication breaks for some reason, some repair has to be done. This seems like a can of worms we don't want to open.

Ongres suggested we consider PostgreSQL's support for Change Data Capture: https://medium.com/@ramesh.esl/change-data-capture-cdc-in-postgresql-7dee2d467d1b t

Actually, I suppose CDC is another form of logical replication?

Exporting a full stream of changes from a database based on its commit log is known as change data capture or CDC. Since version 9.4, PostgreSQL exposes a CDC interface called logical decoding. Users define an output format by writing a logical decoding output plugin in C that gets access to the internal PostgreSQL structures representing a data change. Once a logical decoding plugin is in place, clients can connect to a replication slot and receive messages in their specified format.

https://www.simple.com/engineering/a-change-data-capture-pipeline-from-postgresql-to-kafka

@emanuel_ongres Do you know if using this logical decoding interface will break from one PostgreSQL version to another?

@stanhu Logical Replication (LR) is indeed based on Logical Decoding (LD). LD is kind of a building block to build CDC-based solutions, out of which LR is one of them.

Since full database replication via LR is not recommended, we need to still rely on Streaming Replication (SR) for HA purposes. But LD can be used in parallel for GEO.

The LD interface is stable, and won't change in future versions (it is tied to the protocol level, and that's very hard to change and never in an incompatible way). What can change is the output of the logical decoding plugin, but that's in full control of the downstream consumer and ops, who can control what plugin to use (the downstream consumer) and what plugins are available (ops).

marked this issue as related to #6913 (closed)

marked this issue as related to #4516 (closed)

mentioned in issue #6913 (closed)

As I understand we can use CDC instead of LR to get a couple of benefits:

We can filter events on a publisher side (only Plogical extension is capable to do that, PG 10 is not)
We don't depend on PG version (9.5+ is totally fine) or any extension

But we also add one problem, we need to create a daemon that will be responsible for events capturing.

We also still have the same problems that LR has:

We would have to have two replications working in parallel which adds overall complexity. It’s harder to set up and maintain
~~This makes upgrades more complex. Not only because of more complex setup but also, because logical replication does not work well when the schema is changed on one of the sides~~ (CDC can solve this actually)
Synchronization. If two replications have different latencies we can have errors because of not consistent data as events have references to records from the main database (UPDATE: As CDC or LR are more slow, it may be not a problem but still, there is no guarantee, right?)

I already asked this question in Slack but I will duplicate here:

The logical replication or logical decoding require creating a logical slot on a primary and this creates lots of problems itself. Is there a way of capturing events from the replica that is already streamed to the secondary? I mean, theoretically, we could logically decode the raw WAL we 're getting on the secondary so why would we bother the primary?! The question: Do we have some means for that?

Hi @vsizov Replying to some of your comments:

We can filter events on a publisher side (only Plogical extension is capable to do that, PG 10 is not)

CDC is implemented via the "Logical Decoding" mechanism (LD). LD allows you to plug-in the code that will format (and filter, if you want) what is sent downstream. This as you say, pglogical allows you to filter, and certainly PG 10 not, but you could as well develop a plugin that does exactly whatever you want. Not necessarily advocating for this, but writing a LD plugin is not a big undertaking at all (you can write a basic one in a day).

But we also add one problem, we need to create a daemon that will be responsible for events capturing.

Certainly. But this is less of a problem than the current mechanism, which is something that periodically? is querying the database with costly queries to infer changes and act upon then. So not only I believe you need this "daemon" in both cases, but with LD it gets significantly simplified. So I see this as an advantage much more than a problem.

We would have to have two replications working in parallel which adds overall complexity. It’s harder to set up and maintain

The "two replications" are the Streaming Replication (SR) and the LD. I wouldn't call LD as a replication. It's a separate thing.

So yes, you need two things: a mechanism for PG data replication and HA (SR), and one for event capturing. But I won't consider them similar stuff, they have different implementations and they serve different purposes.

Related to what I was saying before, I don't think this adds complexity, but diminishes it. The alternative is having one replication (SR) and current GEO with its costly queries. And this is, I believe, more complex than a LD-based daemon to react to events.

Synchronization. If two replications have different latencies we can have errors because of not consistent data as events have references to records from the main database (UPDATE: As CDC or LR are more slow, it may be not a problem but still, there is no guarantee, right?)

As I proposed at the end of the meeting, all the information that may be required to process the event downstream should be inserted in the database at the source, as part of the same transaction. This way, all the needed information will be present in the LD event. This greatly simplifies the architecture and makes the event stream processing almost stateless, which would be a great win (the only "state" might be control state about what events have been processed, but not event data).

In any case, any queue-like system (like actually) or event-based is by its very own nature, asynchronous. And there's no (easy) way of guaranteeing consistency in this case, be it with or without LD (as an example, current system is not). This is totally independent of the latencies and the relative positions of several replicas (which need not, at all, to be processing at the same time). In other words: I see consistency under a queue-like or event-processing as an almost impossible goal, and the real solution is to make the event data self-contained (and this should be more or less easy to implement).

The logical replication or logical decoding require creating a logical slot on a primary and this creates lots of problems itself

Not sure what "lots of problems" those are. Slots by themselves do not create a problem. Only if they are unused they would cause disk usage increase in the master (but no other performance degradation) and eventually if no action is taken PG in the master may crash if it ends up filling all the disk space. While this is a big problem on itself, it is the only problem I see slots may create, and it is easy to work around: monitor appropriately and kill (maybe automatically) unused slots if they remain unused for a long period of time. But otherwise, slots serve a great purpose, and are a great feature of PostgreSQL, both for LD and for SR itself (so if they are already used for SR, the same monitoring and killing should be already in place, so adding extra slots for LD adds no more problems).

By "lots of problems" I mean mostly shipping problems like:

Monitoring. Some slots can be out of consumers
Prepare instructions for GDK development kit
Built it in Omnibus package as well as monitoring for it
Prepare instructions for installation from source
Make sure that updates with a new system are smooth

Monitoring for sure, as I already mentioned. But this also applies to slots used for SR, so the monitoring infrastructure should be already in place.

About shipping, I don't fully understand. Slots need not to be "shipped", they are just created (up or downstream, that's a choice) and used. They don't require any further config file or source code. They are just a command sent over the replication (network) channel.

BTW: even though it's possible, it's worth noting here that a slot should only be used for a given customer at a time --unless for very particular cases where you want to parallelize consuming from a slot and you don't require previous state.

Slots need not to be "shipped", they are just created

GitLab Team works on GitLab.com as well as GitLab CE and EE packages. That means, that we need to make sure it works for us on gitlab.com as well as for our customers with on-prem installations. I just want to say that it's one more moving part in our huge stack. I don't say it's a stopper for us, I just say we should consider all the implications.

Sure, I understand. But a slot creation is like a SQL command. It works if it is on a supported version (since 9.4).

mentioned in issue #7567 (closed)

I think @vsizov is saying is that configuring and setting up for logical decoding for customers is a significant amount of work because:

It's yet another thing that has to be configured/automated in Omnibus for customers. Another words, it's another thing that can go wrong.
As we've seen with FDW, getting it right can be tricky. Customers will run into issues pretty quickly (e.g. max replication slots not high enough, upgrading from previous versions, etc.).
It doubles the amount of space on the primary for the WAL if the Geo secondary goes offline.

That being said, the alternative of putting in an event bus (https://gitlab.com/gitlab-org/gitlab-ce/issues/44578) is probably an even bigger change and adds even more complexity in the stack.

Sure. I agree with 1 and 2. However, I believe GEO requires a solution/alternative in any way, as the current solution -from what I understood- is a) no terribly performing and b) may run into consistency issues. Any alternative may create its own problems, so alternatives may need to be compared among each others, not with the current status.

However, 3 is not necessarily precise. An unused slot will grow the space requirements on the master, not limited (at all) to double the space. It could be less, or more. So what actually is required here is to monitor unused slots and kill them if they are unused.

So my question would be: what is going to create more problems and/or more tractable ones, an LD-based solution or an alternative solution which fixes current GEO problems by doing X and Y?

1 and 2 will be true for any solution. LD-based solution minimizes amount of possible issues, because (a) it's being developed and used by Postgres community members for several years already (btw, @ahachete have you used it somewhere? can you describe some cases?), (b) it's native, embedded to Postgres, it's in its core (see https://gitlab.com/postgres/postgres/blob/master/src/backend/replication/logical/logical.c) – it's more "native" than even postgres_fdw, which is a contrib module (well, shipped in any standard Postgres package, but still contrib module).

I think the biggest thing against CDC right now is that, to get it at usable state for us, we will need to develop something entirely new (which may or may not have ruby support right now) just to read a stream of changes from the database and then trigger events based on that, just to have a rudimentary PubSub solution.

I see the appeal for using CDC when you don't have access to do any modification to the "Publisher" (like when you are just an integrator for a proprietary, closed-source software).

The CDC is basically a "man in the middle" with the database, but as we do have access to the "Publisher", and the architecture allows to easily add something to publish events to a custom PubSub solution, then the cost of having the CDC is not justifiable.

Also, people using managed solutions like AWS RDS may or may not be able to generate or access the required replication slot needed, and we would like to support that as well.

I know we are trying to avoid having to use a dedicated solution here, but for the sake of giving an example on using PubSub and how we can iterate and introduce it to the current architecture, here is a simple proposal:

Let's say we decide to use something like NATS Streaming, which is Go based (so no heavy Java/Erlang VM to support) and has both Go and Ruby client and has the primitives we need for reliability: persistence, Durable subscriptions (survive restarts, require ACK, etc), at-least-once delivery.

To "fix" the problem we have with the gaps today would require us to simply publish the IDs for the events we are generating and reading then on the other side.

With that information we know which events to expect and we don't need to care about the gaps anymore, we just SELECT geo_event_log WHERE ID IN(...).

This is the simples change we can do that will fix and will not put a lot of load on current machines or require a big re-architecture on how we create and consume events. That allows us also to get comfortable with the new PubSub.

In a next iteration, we can remove the geo_event_log completely and just use the PubSub to publish all events.

To use the PubSub with more than one node, we need to use the Queue Groups concept so each "node" has its own "mailbox"

@brodock At that point, I think implementing a streaming with Queue Groups, will be re-implementing what Kafka does or brings into its feature set. If we are trying to avoid custom tools, I think so we should reconsider implement brokers.

@emanuel_ongres I'm not following the implementation part, Queue Groups is already "implemented", it's a feature from NATS Streaming. The point here is that CDC will require coding time, we are not even sure if there are working bindings for Ruby already, and there are possible issues integrating it with cloud solutions.

Even if we overcome that, it still subpar alternative, as it costs us a new replication slot (will put extra presure on database servers, disk etc), and we still need to process every database change to actually use only a small percentage of it.

We also have internal knowledge of another solution NSQ also Go based (very similar to NATS Streamming), from the Gemnasium people (Security Product).

Both NATS Streamming and NSQ are supposedly easier to integrate than Kafka for our omnibus package (Kafka requires JVM and Zookeeper just as starting point) that makes sense for companies already using both, not our case, we are using Consul instead of Zookeeper and we don't use anything that depends on JVM, so that means to integrate 1 daemon we need to carry 2 additional software dependencies at least.

To "fix" the problem we have with the gaps today would require us to simply publish the IDs for the events we are generating and reading then on the other side.

With that information we know which events to expect, and we don't need to care about the gaps anymore, we just SELECT geo_event_log WHERE ID IN(...).

I think that we still have a problem with replication lag, e.g., we read the event with ID 3 from the stream, but it's not visible in the secondary database yet.

That's true, we can handle it by dispatching one job per ID and have it retrying until it can find the ID in the database, or by not ACK the queue untill we can find it (so we don't need to bookkeep on the secondary. (We will probably need to investigate which alternative is better, which one provides more insight on the queue size, replication state etc).

added 1 deleted label

Hi @brodock wanted to provide a few clarifications and further opinion :)

It is definitely true that a CDC solution requires coding. However: a) I believe the coding effort here may be substantially smaller than the coding effort on fixing the shortcomings of a home-grown, pub-sub based solution (more on this later); and b) I believe Ruby drivers are ready for the task, see https://github.com/MasahikoSawada/fluent-plugin-pg-logical as an example.

It's understandable to see CDC as a middle-man, but I rather see it the opposite way: it's a pub-sub a middleman. CDC is part of database infrastructure: it is provided by PostgreSQL, and you need to do nothing to produce the changes (only to consume them). No extra writes on the database, no state management (of what is consumed and what not), no effort to allow subscribers, track them, or guarantee at-least-once delivery. On the other hand, a PubSub needs to provide all this (needs to write the events, track which subscriber consumed what -more writes- etc), as a layer on top of PostgreSQL, and hopefully provide same guarantees. And to compare fairly, at the same code quality, performance and robustness as an otherwise core PostgreSQL functionality that is already more than 4 years old.

On the other side, pubsub is, essentially, a queue. And queues in RDBMSs are arguably considered an anti-pattern. In my opinion they are just typically a bad pattern, but with significant downsides specifically on PostgreSQL, as PostgreSQL's MVCC does perform not very well on frequent UPDATEs/DELETEs, and that's exactly what a queue (or pubsub) does.

So if I'd compare the extra resources of CDC vs pubsub I'd argue the latter is significant overhead vs almost negligible of the former. Specifically, pubsub requires double (or more) writes: the original data to be stored in the database (from which the event information is derived) and then the event itself, plus control data * N, where N is the number of subscribers. With CDC only the original data is required, no event data to be written, and no subscriber control information is neither stored. If adding the vacuum pressure of a queue, resource consumption brings a significant advantage to a CDC-based solution.

Please note that a replication slot is something extremely cheap. It's is just a pointer to the WAL stream. To consume from the slot may bring some I/O, but since it is based on reading the WAL that has been already generated, is mostly always in RAM, unless secondary is heavily delayed. For clarification: CDC in PostgreSQL is not like CDC in MongoDB, for example, where change data is written internally to another collection (hence duplicating data). In PostgreSQL is based on decoding (hence the name logical decoding) the WAL that is already written for PostgreSQL durability and replication, and just parsing it to transform from the physical representation (arch, OS and PG version dependent) into a logical one (arch, OS and PG version independent).

More on the philosophical side, CDC already provides all that is required, while generating and replicating even data to consumers is like reinventing the wheel, (possibly) without the correctness guarantees that a core PostgreSQL functionality provides. I believe -and could be totally wrong here- that developing this based on a pubsub model will provide soon diminishing returns. At the beginning it sounds easy and doable. And works. But then adding new features or solving current problems starts costing more and more, until it becomes too complicated or even impossible. Specifically, providing consistency may be a difficult problem in pubsub, while it is provided by default via CDC.

So I'd say the decision may be a trade-off, where the CDC solution brings a cleaner architecture, better performance and less resource consumption, but understandably is newer to the teams and may end up involving a bigger effort/problem for GL, provided it works well with Ruby (which seems to, but needs to be verified).

Hope this helped anyway, feel free to come back if you need further help in the future once a decision has been taken! :)

In reply to @NikolayS We're actually working with a very large customer (I may not reproduce the customer name in public) with many critical production databases and requires a solution for a cross-cloud migration. They require a high-performance solution and we're building that based on LD. You also surely know the architecture of our colleagues at Zalando, where they consume the CDC to push the changes to Kafka and then many micro-services consume the changes from there (see https://jobs.zalando.com/tech/blog/data-integration-in-a-world-of-microservices).

P.S. Slightly off-topic, but the JVM can no longer considered "heavy". Since Java9 you can create a "custom" JVM with only the parts it needs to execute your program, and it's lightweight; and with newer projects like GraalVM, you can generate native Linux executables of a few MB that don't require a JVM at all to run. All self-contained.

Ah, forgot to mention LD is perfectly supported on RDS PostgreSQL (actually you can use either the test_decoding or the wal2json output plugins), but certainly is not on Aurora PostgreSQL (at least as of today). Last time I checked was not supported on PostgreSQL's Cloud SQL on GCP, but I also believe should be supported soon as it is considered a critical functionality by many.

added 1 deleted label

@ahachete thanks for the clarification. Here is my understanding from reading your proposal:

pg_recvlogical is the process that should run on the secondary node that will "read" from the extra replication slot and it will pipe to wal2json (that's what fluent is using).

Instead of using a native driver it is using a subshell and probably reading from the unix pipe in this part of the code: https://github.com/MasahikoSawada/fluent-plugin-pg-logical/blob/master/lib/fluent/plugin/in_pg_logical.rb#L97-L169

What we would need to do to ship this is:

bundle wal2json in omnibus (BSD-3)
enable the extra replication slot in omnibus
modify the geo-logcursor to do something like the fluent plugin (load pg_recvlogical piping to wal2json in a subshell and read from it)

Few concerns on my side: when we read from this subshell it will need a buffer, as we can't control the speed in which we receive data, as it is delegated to the subshell, so we need to be careful when implementing it, if we receive more data than we can process we may crash with lack of system memory and loose data... I think here is where there was the proposal to offload to Kafka during the call (I remember just parts of that conversation so please excuse me if I got it wrong).

On the trade-offs :

So if I'd compare the extra resources of CDC vs pubsub I'd argue the latter is significant overhead vs almost negligible of the former. Specifically, pubsub requires double (or more) writes: the original data to be stored in the database (from which the event information is derived) and then the event itself, plus control data * N, where N is the number of subscribers. With CDC only the original data is required, no event data to be written, and no subscriber control information is neither stored. If adding the vacuum pressure of a queue, resource consumption brings a significant advantage to a CDC-based solution.

This does not map 100% to our use case as not all operations are stored on the database in the first place and to be able to make a correct comparison we have to consider the data model / data structure on both cases and benchmark.

As an example, the equivalent of the "create repository" event which today is implemented by doing an insert to the geo_event_log and another one to the geo_repository_created_events would be implemented in a pubsub just by publishing the relevant information in the pubsub and not doing any operation on the database.

Also on the pubsub alternative the load doesn't have to be on the database machine, as it's a different daemon, it can be installed in a different machine, so it has zero impact on the database.

mentioned in issue #5876 (closed)

Customer ticket -> https://gitlab.zendesk.com/agent/tickets/106018 (internal use)

added customer label

I'm the customer for Aric's ticket referenced yesterday, and our use case is a bit different from than those mentioned so far. For us, our security architecture team wants to always separate databases onto separate subnet/VLAN from applications, and both of those separately from web presentation as well. So, that means several servers for everything, and then when you get into multiple instances for Geo, you're multiplying a whole bunch of things and that server count (with complexity and cost) goes up really fast. So, for some things it can make sense to combine within a tier to make up for what you separated across tiers. In our case we did that at the database level, having one set of database servers (one per datacenter location) that hosted databases for multiple applications, with GitLab being just one of those. (These were then external database servers, set up manually, not using GitLab Omnibus to manage them.) That approach makes streaming replication not a good option, because streaming replication requires replicating the entire database cluster, and would overwrite the databases for other applications outside of GitLab. So I figured I'd just use logical replication (the built-in in PG 10) for that, so they could have separation.

TL;DR: I'd be looking for a replication solution that could be set up on a per-database level for the use case of database servers hosting DBs for multiple independent applications.

@tonyyarusso We had previous discussions about using Logical Replication for Geo (I was one of the advocates for moving into that direction), but there are some major concerns for how logical replication works today.

The main one is that it's not "plug and play", there are things that it can't replicate automatically yet, in the same way as streaming replication does.

It's complicated to work around this specific limitation, with the way GitLab and its underline infrastructures work. That means most of the automation we built may not work as expected or may not work at all under that environment.

Another important concern is that there are performance limitations and it can't handle the same scalability scenarios that are possible with streaming replication (there are advantages as well, but the cons are bigger here).

Even with PostgreSQL 11.0 the downsides still exists, so I don't see it as a product fit in the foreseeing future.

What I can propose for your specific use-case is to run a dedicated database instance in the same machine as you run the "shared one" you just mention. All you need to do is run in a different port. It's possible to do that by provisioning a new instance yourself or running one from omnibus-gitlab (you can specify in the configuration file to just boot-up Postgres).

That will alleviate the problem of having to provision new machines, and reduce the burden when updating GitLab.

We can provide guidance on how to do that on the Zendesk ticket if you wish.

@brodock Yeah, the separate instance on the same machine approach is what I've been in the process of setting up today. I have Postgres up and running with that second process and got firewalls open to use it, but haven't gotten to recreating the gitlab user and migrating data to it yet. I imagine that should be okay - I just don't look forward to explaining it to my peers that have to help maintain it. (None of us are actually DBAs, but I'm playing one on TV...)

@tonyyarusso

Just for curiosity, have you considered using the managed instance from the omnibus package instead of provisioning one by hand? And if so why your decision end up on going the manual route? I'm asking that as feedback to understand what can be improved here on our side.

@brodock Because we have other applications using PostgreSQL besides just GitLab. It's bad enough that we have to have two clusters of it, but I really don't think I can expect my colleagues to deal with the two clusters being managed and updated through completely different processes. The Omnibus package is great if you're either doing everything all on one machine OR if GitLab is all your organization does, but in an enterprise environment where you have to split things up AND have other applications in the mix, that approach falls apart because the administrative overhead of having an extra layer of "remember that thing1 is different from thing2" is taxing on brain space.

@tonyyarusso thanks for sharing your perspective, I can see now why you want to have it done in a standardized way.

mentioned in issue #8353 (closed)

added Geo Next Gen label

changed the description

mentioned in issue #9900

added devopssystems groupgeo + 1 deleted label

added workflowplanning breakdown label

removed workflowplanning breakdown label

added [deprecated] Accepting merge requests label

removed [deprecated] Accepting merge requests label

added Enterprise Edition label

marked this issue as related to #13720

removed 1 deleted label

added bugperformance label and removed 1 deleted label

added Category:Geo Replication label and removed 1 deleted label

mentioned in epic &2376 (closed)

mentioned in epic &2184 (closed)

mentioned in issue omnibus-gitlab#5346 (closed)

added sectioncore platform label

mentioned in issue gitlab-org/quality/reference-architectures#79

mentioned in issue #437281

added workflowvalidation backlog label

marked this issue as related to gitlab-org#13721 (closed)

marked this issue as related to #473894

marked this issue as related to #502540

marked this issue as related to #502546

As a part of our refinement process, we are doing a clean up where we are closing issues older than 5 years. Please reopen this issue if you are still seeing this issue or would like further action on this issue.

closed

Note: Org Mover is looking to reuse Geo code, and it may use logical replication for PG data in the future. So this may become more feasible.

https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/cells/migration/

Hmm it's interesting how the API would have let me do this one, I specifically mention skipping issues with a lot of comments. This issue should have gotten excluded. I'll reopen this one

@sranasinghe should we elevate this to an epic level since we already seem to have a pretty good write up here? Is there more you would like to add?

reopened

[Geo] Logical replication case

Current solution

The problem

Applying Logical Replication to solve our problem

Using a single logical replication

Designs

Child items ...

Activity

[Geo] Logical replication case

Current solution

The problem

Applying Logical Replication to solve our problem

Using a single logical replication

Relates to

Activity