Not possible to define designs in m5o files without joins
What is the current bug behavior?
I am reporting this as a bug, but it could also be a feature proposal depending on the understanding we have of what should be supported in .m5o
files.
When creating a .topic.m5o
file, I want to be able to only check a single table.
Let's say for example that in tap-gitlab
's transforms, we generate a stats table gitlab_stats_per_user
with various stats grouped together.
This table already has everything I need to generate my reports, both the dimensions (attributes to group by like user name
, project
or milestone
) and the values to be aggregated (e.g. issues authored or assigned, mrs, etc)
I want to be able to generate a design using only that table:
{
version = 1
name = gitlab
connection = postgres_db
label = Gitlab
designs {
gitlab_stats_per_user {
label = Gitlab Stats
from = gitlab_stats_per_user
description = "Gitlab Stats per User, Project and Milestone"
}
}
}
Unfortunately, the design above fails to compile as we try to access the joins
attribute of the definition:
File "/home/iroussos/work/meltano/src/meltano/core/m5o/m5o_file_parser.py", line 148, in graph_design
joins = deepcopy(design["joins"])
KeyError: 'joins'
For a more complete log, check the section bellow.
My understanding is that this type of design should be supported out of the box per the definition of Models. This is also supported by the fact that the .topic.m5o
file above passes our syntactical analysis.
What is the expected correct behavior?
Users should be able to create .topic.m5o
files with designs that have a single table without joins.
Steps to reproduce
You can use the topic and models defined in the model-gitlab project to test and reproduce this issue. Checkout the branch I work on model-gitlab!1 (merged) as this depends on fixing this issue.
If you want to check it with real data, then create a new project and add the gitlab token in our Vault to extract data from the Gitlab API and check the transformed results in the analytics
schema:
meltano init tap-gitlab-project --no_usage_stats
cd tap-gitlab-project
# update .env
source .env
meltano elt tap-gitlab target-postgres --transform run
# Copy the models from the model-gitlab project to the /model/ directory of your project
meltano ui
.env required:
export FLASK_ENV=development
export SQLITE_DATABASE=meltano
export PG_DATABASE=warehouse
export PG_PASSWORD=
export PG_USERNAME=
export PG_ADDRESS=localhost
export PG_PORT=5432
export PG_SCHEMA='tap_gitlab'
export GITLAB_API_TOKEN='OUR GITLAB TOKEN'
export GITLAB_API_GROUPS='meltano'
export GITLAB_API_PROJECTS=''
export GITLAB_API_START_DATE='2018-01-01T00:00:00Z'
Relevant logs and/or screenshots
Traceback (most recent call last):
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/app.py", line 2309, in __call__
return self.wsgi_app(environ, start_response)
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/app.py", line 2295, in wsgi_app
response = self.handle_exception(e)
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask_restful/__init__.py", line 269, in error_router
return original_handler(e)
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/app.py", line 1741, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask_restful/__init__.py", line 269, in error_router
return original_handler(e)
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/home/iroussos/work/meltano/venv/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/iroussos/work/meltano/src/meltano/api/controllers/repos.py", line 143, in sync
return lint_all(True)
File "/home/iroussos/work/meltano/src/meltano/api/controllers/repos.py", line 120, in lint_all
compiler.compile()
File "/home/iroussos/work/meltano/src/meltano/core/compiler/project_compiler.py", line 42, in compile
self.m5o_parse.compile(self.topics)
File "/home/iroussos/work/meltano/src/meltano/core/m5o/m5o_file_parser.py", line 165, in compile
topic = self.graph_topic(topic)
File "/home/iroussos/work/meltano/src/meltano/core/m5o/m5o_file_parser.py", line 155, in graph_topic
design_graph = self.graph_design(design)
File "/home/iroussos/work/meltano/src/meltano/core/m5o/m5o_file_parser.py", line 148, in graph_design
joins = deepcopy(design["joins"])
KeyError: 'joins'
Possible fixes
File "/home/iroussos/work/meltano/src/meltano/core/m5o/m5o_file_parser.py", line 148, in graph_design
joins = deepcopy(design["joins"])
Fixing the KeyError above is pretty simple and could be as easy as writing something like design.get("joins", None)
instead of design["joins"]
(or checking the result and skipping the deepcopy operation).
What I am not sure about is whether there are other places in our m5o parser and the UI where we implicitly assume that a design always has a joins
key as part of its structure, so we should care this one with care and check our implementation end to end.
Further regression test
We should add at least one test with a .topic.m5o
file without joins
-
Write additional adequate test cases and submit test results -
Test results should be reviewed by a person from the team