Skip to content

Consider using varbinary in hash() instead of varchar #26

@infused-kim

Description

@infused-kim

Hi Guys,

First of all, thank you for creating this package. Your efforts are much appreciated.

I noticed the hash adapter converts the md5 bytes into a varchar:

{% macro sqlserver__hash(field) %}
    convert(varchar(50), hashbytes('md5', {{field}}), 2)
{% endmacro %}

If we leave the hash as varbinary, it uses only 16 bytes of data, but as varchar it uses 32. I suspect it's probably also slower to do joins on 32 character varchar than on a varbinary.

Would it be possible to take out that conversion?

I have been using varbinary in my own adapter for several months without problems. The only issue is that Power Bi doesn't support relationships on varbinary columns.

To solve that problem I created a separate macro that does the conversion when it's needed:

{% macro cast_hash_to_str(col) -%}
  convert(varchar(32), {{ col }}, 2)
{%- endmacro %}

And in my final model that is being consumed by PBI I just do something like...

final as (
    select
        {{ cast_hash_to_str('url_key') }} as UrlKey,
        ...
    from whatever
)

This way the hashes use less space in the DB, perform faster joins and when necessary, such as in Power Bi they are converted to strings.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions