Skip to content

GitLab Next

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
Meltano
Meltano
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 503
    • Issues 503
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 17
    • Merge Requests 17
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • meltano
  • MeltanoMeltano
  • Issues
  • #80

Closed
Open
Opened Aug 28, 2018 by Yannis Roussos@iroussos🔴Contributor

Add support for Data Manifest files in the new Meltano Architecture

The details/specification of the schema to be created for each entity should be provided to the Loader through a Loader specific Data Manifest.

  • This manifest may be different depending on the type of the Data Store (e.g. a Relational Database vs a Document Store) and should be provided at run time by the user of the framework. i.e. the Data Manifest should be a run time parameter, not a hard-coded schema definition in the code of the Extractors.

  • The manifest format provided in the remotes/origin/original_integrated_monorepo branch is the format we have agreed in past sync meetings (meltano/analytics#276, meltano/analytics#260), so we should built on top of it. @mbergeron has already provided a working implementation for reading/writing such manifests, so that code should be the starting point. (I just think that we should move away from the Entities approach and just use sqlalchemy as in the current implementation).

  • Creating the schema and loading the extracted data to the Data Store should not be a concern of the Extractor (extractors should be agnostic to the nature of the target data store). The extractor just sends the data for each entity (e.g. as Pandas Data Frames at the current iteration) and should not worry about what happens in the next step. That supports providing the Data Manifest directly to the Loader, with each user/execution possibly choosing a different subset of the extracted entities and attributes to be stored on the target Data Store.

  • We could also support a fall back (defaulting) to just creating a schema based on the Data Frame provided to the Loader.

This is a feature required in order to move forward with working on #72 (closed) and then supporting other similar extractors:

  • The existing SFDC Extractor is a great example of how those manifest files are built and used, so this feature is both required and will be driven by the example manifest file in the SFDC ectractor.

  • First step will be to update this manifest to the manifest format that has been agreed (meltano/analytics#276, meltano/analytics#260)

  • What is needed on the Meltano framework is adding a manifest loader that will read this (updated) yaml file and convert it to the sqlalchemy table definitions that are used by the new implementation. The manifest reader/writers built by Micael is a great start to work on and seem to need very small updates in order to support exporting to sqlalchemy table definitions instead of the old Schema format.

In order for this issue to be complete, we should add data manifests in the existing implementation for both the Demo and Fastly Sources so that we can test the implementation, support them as parameters in the CLI and add support for providing the manifest files in the Flask Api and the Meltano Analysis Web App.

Edited Aug 28, 2018 by Yannis Roussos
Assignee
Assign to
0.7.0
Milestone
0.7.0 (Past due)
Assign milestone
Time tracking
None
Due date
None
Reference: meltano/meltano#80