-
Notifications
You must be signed in to change notification settings - Fork 272
Description
Quoting sqllineage docs:
Column-level lineage will not be 100% accurate because that would require metadata information. However, there’s no unified metadata service for all kinds of SQL systems. For the moment, in column-level lineage, column-to-table resolution is conducted in a best-effort way, meaning we only provide possible table candidates for situation like select * or select col from tab1 join tab2.
Proposed Solution:
To build a metadata provider interface, that returns all the columns given a table name. The implementation can vary, from the naive provider where user store the metadata in a dictionary, to more complex ones that queries metadata service (like query hive metastore via thrift API, execute show tables SQL, query information_schema, etc.)
This way, user can register their metadata for sqllineage to resolve during lineage analysis.
We'll start with the naive solution, which walks us through the most common part. And ultimately try to provide common implementation like HiveMetaStoreMetaDataProvider and SQLAlchemyMetaDataProvider so user just need to feed in things like database url to enjoy the accurate column lineage with metadata assistance.
This will be the major feature for v1.5.x release.