Skip to content

WIP: Allow system database schema to be overridden when PostgreSQL is used

Merge Request Checklist

  • Link to an issue
  • Include the proposed fix or feature
  • Include and update tests for the modified code
  • Include a documentation change
  • Add a CHANGELOG.md entry in the Unreleased section

Once your merge request is complete, please assign it a Meltano maintainer for review cycle. Once the review cycle finished, the reviewer shall approve the merge request so it can be merged by any Meltano maintainer.

Closes #2082 (closed)

I thought I'd take a stab at this one thinking it would be easy, but there's a few gotchas here. What I've committed so far is a conversation starter, it doesn't work yet.

Some problems with this solution:

  • The env var is hard coded and spread around a few places. Ideally this setting would be configured like everything else, but there's kind of a circular dependency between the config code and the database
  • core/db.py supports a database engine per project, but I'm not really sure under what situations there would be more than one project. SQLAlchemy seems to set the schema either on the the tables themselves or via MetaData, in either case these would be shared across multiple projects. I'm not sure if that's a concern or not
  • Getting the migrations to find out about the configuration option has been a lot harder than I expected. It doesn't seem to recognize the option from MetaData. Some sources online suggested using:
    def upgrade(self, silent=False):
        conn = self.engine.connect().execution_options(
            schema_translate_map={
                None: os.getenv('MELTANO_DATABASE_SCHEMA')
            }
        )

This seems to change the schema for Alembic, but it causes a database error when the migrations are run:

[2020-07-27 18:42:59,321] [21398|MainThread|root] [ERROR] (psycopg2.errors.UndefinedObject) type "job_state" does not exist
LINE 5:  state job_state, 
               ^
 [SQL: '\nCREATE TABLE meltano.job (\n\tid SERIAL NOT NULL, \n\tjob_id VARCHAR, \n\tstate job_state, \n\tstarted_at TIMESTAMP WITHOUT TIME ZONE, \n\tended_at TIMESTAMP WITHOUT TIME ZONE, \n\tpayload VARCHAR, \n\tpayload_flags INTEGER, \n\tPRIMARY KEY (id)\n)\n\n'] (Background on this error at: http://sqlalche.me/e/f405)

With a little direction I'd still be willing to contribute. Otherwise I can leave this as context for a future implementer.

Edited by Charles Julian Knight

Merge request reports