Skip to content

[ADAP-383] [Feature] Support Dynamic Data Masking in CTAS Statements #85

@jdoldis

Description

@jdoldis

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-snowflake functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Currently you cannot specify column masking policies in Snowflake CTAS statements with dbt-snowflake. For example, CREATE TABLE <table_name>(<col_name> <col_type> WITH MASKING POLICY <policy_name>) AS SELECT <query>.

As a workaround masking policies can be applied to columns in a dbt post hook using an ALTER TABLE statement. The issue with doing this is that the CTAS and ALTER TABLE statements cannot be issued in the same transaction, as per the Snowflake documentation - "Each DDL statement executes as a separate transaction". As a result there is there is a small window of time between the CTAS and ALTER TABLE <table_name> MODIFY COLUMN <column_name> SET MASKING POLICY <policy_name> statements where the data is not masked, and if the ALTER TABLE statement fails it would remain that way.

Supporting masking policy specification in the CTAS statement would fix this. As per the Snowflake documentation "Executing a CREATE TABLE … AS SELECT (CTAS) statement applies any masking policies on columns included in the statement before the data is populated in the new table".

It also may not be difficult to support this given the recent work on model contracts which provides the CREATE TABLE <table_name>(<col_name> <col_type>) AS SELECT <query> syntax. All that would need to be added is the WITH MASKING POLICY <policy_name> part of the statement.

One way to provide the config would be something like:

columns:
      - name: id
        description: "Primary key for this model"
        data_type: integer
        masking_policy: <policy_name>
        contraints: <constraint_list>
        ...

The masking policy could then be applied in get_columns_spec_ddl.

Describe alternatives you've considered

Using an ALTER TABLE statement in a post hook to apply the masking policy. As described above due to Snowflake DDL statements always being executed in separate transactions this leaves the possibility of unmasked data.

Who will this benefit?

Anyone that wants to take advantage of dynamic data masking in Snowflake using dbt.

Are you interested in contributing this feature?

Yes

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions