Skip to content

[Feature Request] Ability to support Spark based DBT Models where all references will be <tablename>.<column name>. They throw error in the code #71

@Sathish-Metcash

Description

@Sathish-Metcash

📝 Feature Description

Support DBT Models without database / catalog ( Lakehouse Pattern).
i.e. Support 2 part references . instead of always requiring 3 part naming expected by the code.

🤔 Problem Statement

To support DBT models which are built with Lakehouse Adapter ( FabricSpark) which are of dialect 'spark' . Spark which is well supported by 'sqlglot' libraries.

The issue is current code expects minimum 3 part naming <db/catalog/schema>.

.
Without giving 3 part it throws error like 'NoneType' has no attribute 'meta'.

Even after providing default value for database , The Table Linking code for lineage seems to searching for table Catalog.Schema.Table pattern and in our case it is looking for table in DBT manifest for "..". The leading "." is forcing the lineage to be created as '_HARDCODED__REF' even though it already exists in the manifest.json as ,

🚀 Proposed Solution

By applying the 2 fixes the python source code, this can be easily supported.

  1. To check if DB is null while parsing the schema from manifest.json ( created by DBT docs), default to "default" to avoid the 'NoneType' runtime error
  2. When linking tables , if the catalog is not provided , remove the leading "." when looking up tables to make sure lineage links to correct tables . Without this fix , it will link to HARDCODED table without any column information

📋 Alternatives

🙋 Contribution Interest

  • I’d like to help implement this feature

✅ Additional Context

We appreciate dbt-colibri, its looks quite promising for our use case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions