Skip to content

Added tsql_utils.surrogate_key() macro#32

Merged
dataders merged 4 commits into
dbt-msft:mainfrom
infused-kim:tsql_surrogate_key
Apr 29, 2021
Merged

Added tsql_utils.surrogate_key() macro#32
dataders merged 4 commits into
dbt-msft:mainfrom
infused-kim:tsql_surrogate_key

Conversation

@infused-kim
Copy link
Copy Markdown
Contributor

Here is a suggested attempt at addressing the two issues I have raised related to the default surrogate_key() implementation: #26 and #25

This creates a a separate tsql_utils.surrogate_key() macro in addition to the default dbt_utils.surrogate_key() macro that allows for more t-sql specific customization.

This macro adds three additional features:

1. Field column type

It allows you to customize the column type into which the fields are cast. One use case for this is if you have nvarchar columns that you want to generate a surrogate key for. Since dbt_utils.surrogate_key() by default casts them into varchar, it's possible to get duplicate surrogate keys.

With this macro you can use tsql_utils.surrogate_key(["col"], col_type="nvarchar(4000)) to solve the problem

2. Generate binary hash

The current tsql_utils adapters cause dbt_utils.surrogate_key() to generate a hash that is stored as a varchar
string that uses 32 bytes of data.

This macro allows you to use tsql_utils.surrogate_key(["col"], use_binary_hash=True).

This will keep the key as varbinary that only uses 16 bytes of data. This will reduce space in the database and can potentially increase join performance, but the column has to be converted into varchar before it can be used in Power BI for relationships.

To help with that issue, this PR provides a second macro cast_hash_to_str() that allows you to convert the varbinary surrogate keys to varchar inside your report views before importing them into Power BI to allow relationships on your surrogate key columns.

3. Adjust default values through dbt_project vars:

You can also customize both settings through variables in your dbt_project.yml:

vars:
  dbt_utils_dispatch_list: ['tsql_utils']
  tsql_utils_surrogate_key_col_type: 'nvarchar(1234)'
  tsql_utils_surrogate_key_use_binary_hash: True

Let me know what you think of this and if you would like me to make any adjustments.

Copy link
Copy Markdown
Contributor

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @alittlesliceoftom what do you think?

@dataders
Copy link
Copy Markdown
Contributor

@infused-kim can you add a 0.6.6 section to the CHANGELOG.md?

@dataders dataders merged commit b916816 into dbt-msft:main Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants