-
Notifications
You must be signed in to change notification settings - Fork 270
Description
Is this your first time submitting a feature request?
- I have read the expectations for open source contributors
- I have searched the existing issues, and I could not find an existing issue for this feature
- I am requesting a straightforward extension of existing dbt-snowflake functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Currently you cannot specify column masking policies in Snowflake CTAS statements with dbt-snowflake. For example, CREATE TABLE <table_name>(<col_name> <col_type> WITH MASKING POLICY <policy_name>) AS SELECT <query>.
As a workaround masking policies can be applied to columns in a dbt post hook using an ALTER TABLE statement. The issue with doing this is that the CTAS and ALTER TABLE statements cannot be issued in the same transaction, as per the Snowflake documentation - "Each DDL statement executes as a separate transaction". As a result there is there is a small window of time between the CTAS and ALTER TABLE <table_name> MODIFY COLUMN <column_name> SET MASKING POLICY <policy_name> statements where the data is not masked, and if the ALTER TABLE statement fails it would remain that way.
Supporting masking policy specification in the CTAS statement would fix this. As per the Snowflake documentation "Executing a CREATE TABLE … AS SELECT (CTAS) statement applies any masking policies on columns included in the statement before the data is populated in the new table".
It also may not be difficult to support this given the recent work on model contracts which provides the CREATE TABLE <table_name>(<col_name> <col_type>) AS SELECT <query> syntax. All that would need to be added is the WITH MASKING POLICY <policy_name> part of the statement.
One way to provide the config would be something like:
columns:
- name: id
description: "Primary key for this model"
data_type: integer
masking_policy: <policy_name>
contraints: <constraint_list>
...
The masking policy could then be applied in get_columns_spec_ddl.
Describe alternatives you've considered
Using an ALTER TABLE statement in a post hook to apply the masking policy. As described above due to Snowflake DDL statements always being executed in separate transactions this leaves the possibility of unmasked data.
Who will this benefit?
Anyone that wants to take advantage of dynamic data masking in Snowflake using dbt.
Are you interested in contributing this feature?
Yes
Anything else?
No response