Skip to content

Metadata Provider to Assist Column Lineage Analysis #477

@reata

Description

@reata

Quoting sqllineage docs:

Column-level lineage will not be 100% accurate because that would require metadata information. However, there’s no unified metadata service for all kinds of SQL systems. For the moment, in column-level lineage, column-to-table resolution is conducted in a best-effort way, meaning we only provide possible table candidates for situation like select * or select col from tab1 join tab2.

Proposed Solution:
To build a metadata provider interface, that returns all the columns given a table name. The implementation can vary, from the naive provider where user store the metadata in a dictionary, to more complex ones that queries metadata service (like query hive metastore via thrift API, execute show tables SQL, query information_schema, etc.)

This way, user can register their metadata for sqllineage to resolve during lineage analysis.

We'll start with the naive solution, which walks us through the most common part. And ultimately try to provide common implementation like HiveMetaStoreMetaDataProvider and SQLAlchemyMetaDataProvider so user just need to feed in things like database url to enjoy the accurate column lineage with metadata assistance.

This will be the major feature for v1.5.x release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions