Skip to content

Conversation

@Vitor-Avila
Copy link
Contributor

This PR adds support for syncing calculated columns from the dbt model. These should be defined under $model.meta.superset.calculated_columns (list).

The behavior is:

Default sync (reload_columns set to True and merge_metadata set to False)

In this configuration, metrics from dbt are synced to Superset, and metrics that were only present in the dataset (and not in the model) are deleted. This is the same behavior for calculated columns:

  • If there isn't any calculated columns defined in dbt, then existing calculated columns from the dataset are preserved. This is to avoid breaking changes (as currently calculated columns aren't deleted in this flow).
  • If there's any calculated column defined in dbt, then these are synced and Superset-only calculated columns are removed.

Merge metadata (merge_metadata set to True and reload_columns set to False)

  • Superset-only calculated columns are preserved.
  • Metadata defined in dbt is synced.

Preserve Metadata (both merge_metadata and reload_columns set to False)

  • Superset is the source of truth.
  • new dbt calculated columns are synced.

@Vitor-Avila Vitor-Avila force-pushed the feat/dbt-sync-calc-columns branch from 09e2a0e to e0f7942 Compare January 7, 2025 14:00
@Vitor-Avila Vitor-Avila merged commit 856b2d4 into main Jan 7, 2025
5 checks passed
@Vitor-Avila Vitor-Avila deleted the feat/dbt-sync-calc-columns branch January 7, 2025 14:21
@adamcunnington-mlg
Copy link

@Vitor-Avila please can you update description of this MR to show an example valid YAML specification of a computed column. I can see from the code that the CLI is just passing it through - but it's going through several functions so difficult for me to see the exact right spec. I can also see that the docs have not been updated... https://docs.preset.io/docs/dbt-sync

I'm assuming an example is like this:

models:
- name: foo
  meta:
    superset:
      calculated_columns:
      - name: test
        filterable: true
        groupby: true
        expression: CONCAT("foo", "bar")

?

@Vitor-Avila
Copy link
Contributor Author

Hey @adamcunnington-mlg,

You're correct that all keys are just passed through during the sync, so this is not really a CLI configuration, but it actually has to be in accordance to the Superset API schema for a column:
image

Note that most of these are non-required, and when using --merge-metadata we'll use the column_name to decide if it's a new/existing column. For example, id and uuid would be generated if you're creating a new column. In the next sync, the column already exists so we'll find it by its column_name, then re-use the id and uuid in the update event.

@adamcunnington-mlg
Copy link

@Vitor-Avila thanks - can you confirm the following is a valid config - e.g. all property names and structure are valid?

models:
- name: foo
  meta:
    superset:
      calculated_columns:
      - column_name: test
        filterable: true
        groupby: true
        expression: CONCAT("foo", "bar")
        verbose_name: Test

@Vitor-Avila
Copy link
Contributor Author

@adamcunnington-mlg yup that looks correct. Let me know if you face any errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants