Creating a dbt Glossary with Term blocks #12500
jenna-jordan
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'd like to propose a new concept for dbt core, which will require the creation of a new jinja macro and accompanying functionality. If this is better suited as an issue, I can submit an issue instead!
The simple idea: term blocks. The big idea: a dbt-style glossary that turns into a concept map (and even a taxonomy), relating the physical dbt models to the overall conceptual model.
I first proposed this idea in this LinkedIn post, and I also brought it up during the dbt Catalog product roundtable at Coalesce 2025, where it seemed other participants agreed with the idea being useful.
While dbt doc blocks are used primarily as a convenience for long, formatted, and/or DRY descriptions, they have the potential to be more. I am proposing "term" blocks so as to not interfere with existing doc block functionality, but it may be preferable to instead re-think doc blocks and expand their functionality - I leave this up for debate.
Doc blocks are primarily used to document column descriptions, but can also be used to document model descriptions, exposure descriptions, etc. Regardless, the idea is that doc blocks are for documenting existing dbt artifacts.
Term blocks would work differently - they would exist independent of dbt artifacts, but could be called (ref'd) in those descriptions/doc blocks. A term block should be able to ref other term blocks, as well. The primary purpose of a term block is to document a concept, or term, such as "customer". While there may be, for example, documented models for
dim_customeror documented column descriptions forcustomer_id, there is currently no way to communicate that these are about the underlying concept "customer" beyond naming conventions. Defining this concept, then ref'ing this concept in the model and column descriptions, provides a way to link the physical table/attribute back to the conceptual model.Put together, these term blocks would turn into a business glossary. The business glossary is a vital component of enterprise-level data catalogs and data governance functions. By adding these terms to dbt models/columns/exposures, you can produce a map of how these concepts are represented in the data, which could then be visualized in a graph, similar to the lineage DAG.
I see two possible ways for these terms to be represented in a dbt project: (a) one following the doc block pattern, and (b) one following the semantic model pattern.
option A:
option B:
Option A should be familiar and relatively simply to implement, but is limited in future potential. Option B is more complex and less familiar (especially for those who do not use the semantic layer), but offers a lot more potential for expanding terms into a full taxonomy. In option A, you can assume some kind of relationship between terms customer and order, while in option B you can fully articulate what that relationship is. Perhaps even both option A and B can be supported, depending on which syntax and set of capabilities the project needs.
I'd recommend looking at SKOS (Simple Knowledge Organization System) as the foundation for implementing option B, and will also point to Jessica Talisman's guide on taxonomies.
I think that supporting this type of feature will be very important to dbt projects in the future, especially given the importance of ontologies/taxonomies to improving the results of LLMs querying data. I'll refer to Juan Sequeda's body of work here, which has inspired dbt Labs in the past. Furthermore, I think that dbt projects are uniquely suited to introducing data teams to this type of knowledge engineering work, and supporting terms in dbt Core and dbt Catalog can go a long way toward making dbt Catalog a fully capable data catalog solution.
Beta Was this translation helpful? Give feedback.
All reactions