Fusion Diaries: readiness checklist, logging UX, autofix glow up (2025-11-20) #1038
dataders
announced in
Announcements
Replies: 1 comment
-
|
hey @dataders ! i hope you don't mind but i've updated this discussion to add a link to the fusion upgrade guide for package maintainers: https://arc.net/l/quote/prvoafui |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Long time no talk! Here’s what we’ve been up to since Coalesce last month in Las Vegas.
Coalesce on the road
This week we’re Stockholm, Paris, London, and Madrid. Next month. In December we’ll be in Munich, Amsterdam and Tokyo!
We’d love to say hi in person so sign up if your city is on the list!
If you can’t make it, be sure at least to check out Elias’s demo linked below
TL;DR
Velocity
preview.47topreview.72)Per usual, for specifics, check out dbt-fusion’s CHANGELOG for more information.
big rocks
logs/dbt.lognow!dbt-autofixin progress
big themes
We continue marching towards general availability! There’s two main themes to our work right now:
Conformance
The Fusion team uses “conformance” to mean does a team’s dbt project work with the dbt Fusion engine, with only automated changes via dbt-autofix?
When we began the dbt Fusion engine, our original conformance goal was on
parse. Can the Fusion engine parse all of a project’s.sqland.ymlfiles and create amanifest.json? Once we achieved a significant enough of parse conformance across dbt projects, we graduated tocompileconformance.Since Coalesce, our focus has been on
buildconformance. Does a projectbuildwith no errors but also produces the same state on the data warehouse? This is no easy task!The bugs we’re discovering in our quest for build conformance tend to be tangled up in SQL understanding, differences b/w Rust and Python, and state deferral. This work has even surfaced several issues in dbt Core (e.g. dbt-core#12152)
Roadmap
In the Path to GA blog from May, we spelled out five milestones that are still relevant today. The Milestones in the dbt-fusion repo largely still correspond to our original roadmap and are a great place to check in our progress. Largely all of our current feature work falls into the “Fast follows” and “Feature parity” buckets such as: logging UX, model governance, semantic layer, state modified conformance, and python models
Big rocks: What shipped this week
Readiness checklist
Our amazing docs team shipped a great Fusion readiness checklist that spells out the below steps as the recommended path to getting Fusion working for you!
Please check it out and share with anyone you know who hopes to start using Fusion soon!
logging experience
We’ve asked for you feedback many times about the UX of logging, and y’all have delivered! The discussion created to collect feedback, dbt-fusion#584, has ~20 asks from the community. In the past month we’ve shipped 4 of them related to
logs/dbt.log, namely that it now:stdoutlogs/query_log.sqlWhat’s keeping us from further improvements is polishing the underlying platform. We're making strong progress on the OpenTelemetry-inspired tracing and telemetry system that replaces dbt Core's structured logging. This work not only brings much-needed improvements to log rendering and UX, but serves as a platform upon which we can continue to deliver improvements beyond what Core even can do.
Looking ahead, we plan to release open-source tooling that will let the community build integrations on top of this telemetry using strictly typed, well-defined Python APIs. More details coming soon.
In the meantime, if you’d like a taste of what what the new logging provides, I’d encourage you to experiment with new flags like
--otel-file-nameor--otel-parquet-file-name.I’m personally very excited at the prospect to really drill down into the performance of dbt invocations to discover the biggest bottlenecks.
dbt-autofix improvements!
dbt-autofixis already a great tool for finding (and fixing!) code that contains deprecated functionality, but the team has been hard at work making it even better for getting your project ready to run on Fusion. Fixes and new functionality we’ve added lately include:—include-private-packagesflag indbt-autofix deprecationsto identify and fix deprecations in private packages you’ve installed in your project. If any of the private packages have deprecations, this flag tells you if your project will run once those packages are fixed - and if you own those packages, you can fix them and make your project Fusion-compatible.dbt-autofix packagescommand to identify which package dependencies in your project are Fusion-compatible and optionally upgrade incompatible versions to a Fusion-compatible version—alloption indbt-autofix deprecationsto fix invalid YAML, deprecation warnings, and more complex behavior changes all in one godbt_project.ymltags (e.g.+ tags->+tags)To get started with
dbt-autofix, rundbt-autofixin the Studio IDE or install thedbt-autofixPython package on your computer (see the README for more details).If you have problems you think we could autofix, please let us know by opening an issue in Github.
In the coming weeks, we plan to launch the full version of
dbt-autofix packagesto upgrade all your packages to Fusion-compatible versions and continuing to expand the range of deprecation warnings that we can fix for you.a new Databricks Arrow ADBC Driver
In partnership with our colleagues at both Databricks and Columnar, we’ve contributed a new driver to Arrow ADBC!
This week we plan to swap over to the new driver which unblocks Fusion on Databricks to support both Python models as well as targeting All-Purpose Compute Clusters and not just SQL warehouses.
The work that our Fusion adapters team is doing is phenomenal. This doesn’t just benefit users of the dbt Fusion engine, but also the data ecosystem at large. We soon expect the data industry to begin to use ADBC. The opportunity to collaborate in the industry on this emergent standard allows data tooling to focus more on what “makes their beer taste better” rather than reinventing the wheel. End users benefit as well from tools that more performantly and consistently connect to data warehouses.
🚧 Work in progress
Python Models
Python models is are the oldest open issue on the dbt-fusion repo (dbt-fusion#3)!
However, it’s not so simple as shipping support for Python models in Fusion is a two-step process:
The first part is a lower-hanging fruit. In fact, we’ve already shipped support for Python models on Snowflake behind a flag. To try them out set the following environment variable
DBT_ENABLE_BETA_PYTHON_MODELS=trueThe second step, is more challenging! Currently, we can infer all the columns datatypes when they’re defined in SQL, but Python is a of course a different language! It’s still early, but more than likely it will require some annotation in YAML of what the output column types will be so that that information can be used in downstream.
Likely what needs to come first is that we have a path forward on our approach for storing table schema information. See Looking for Feedback for more info!
packages and
require-dbt-versionOne of the biggest efforts in getting to general availability is ensuring that dbt’s amazing package ecosystem works with the new engine! Many package maintainers have done heroic work to get their packages working with Fusion. We've created a package upgrade guide to help maintainers get their packages Fusion-compatible.
What we don’t have yet is a way for users to know for certain if their package will work with the dbt Fusion engine.
@Grace Goheen does a great job explaining our solution:
dbt-autofix autoupgrading your packages
dbt-autofixis already a great tool for finding (and fixing!) code that contains deprecated functionality, but the team has been hard at work making it even better for getting your project ready to run on Fusion. Fixes and new functionality we’ve added lately include:—include-private-packagesflag indbt-autofix deprecationsto identify and fix deprecations in private packages you’ve installed in your project. If any of the private packages have deprecations, this flag tells you if your project will run once those packages are fixed - and if you own those packages, you can fix them and make your project Fusion-compatible.dbt-autofix packagescommand to identify which package dependencies in your project are Fusion-compatible and optionally upgrade incompatible versions to a Fusion-compatible version (currently in beta)—alloption indbt-autofix deprecationsto fix invalid YAML, deprecation warnings, and more complex behavior changes all in one godbt_project.ymltags (e.g. “+ tags” -> “+tags”)In the coming weeks, we plan to launch the full version of
dbt-autofix packagesto upgrade all your packages to Fusion-compatible versions and continuing to expand the range of deprecation warnings that we can fix for you. If you have problems you think we could autofix, please let us know by opening an issue in Github.To get started withdbt-autofix, rundbt-autofixin the Studio IDE or install thedbt-autofixPython package on your computer (see the README for more details).more flexible CSV parsing for seeds
In our conformance work, we’ve seen Fusion struggle to
seedsome.csvs. The reason is that the Fusion’s currentcsvparser is much more strict than what dbt Core uses.The real reason is that
.csvis woefully underspecified! A great explanation of the common csv headaches are in this blog from a decade ago: So You Want To Write Your Own CSV code?While we could chose to force dbt users to make their
csv's prettier in order to use Fusion, this isn’t very practical. So dbt-fusion#1004 describes the work underway to support csvs that dbt Core could previously parse without issueLooking for feedback!
In the last Fusion diary, I mentioned how often we’ve heard users encountered dbt-fusion#615. While trying to figure out a way to address this, we actually found ourselves rather disappointed with the schema cache in general and landed on a proposal to not only solve the below paper cut, but create a much better user experience. we’d love to hear from as many folks as possible on this proposal!
dbt-fusion#1042: source schemas should be first-class, versioned artifacts
👓 Stuff you should
readwatchIf there’s one thing you should watch if you missed the keynotes, you should watch Elias’s demo of the dbt Fusion engine! Seriously stop what you’re doing and check it out! This demo is such a validation of all the hard work put in by so many over that past year to build the future tooling of analytics engineers. A huge shoutout is in order not just to the Fusion team but more so, the community, with whom we collaborated hand in hand for the past year. This demo should validate all the love and sweat that’s gone into the product.
🏁 Made it to the meme
This almost 10 year old meme certainly makes me feel old! Still the point stands. A lot of Fusion’s magic is aggressive caching! There’s no free lunch though, it’s on us to make this experience as smooth and intuitive as possible. To that end, here’s a work-in-progress docs page that explains the various ways that dbt and the Fusion engine cache things (PR: docs.getdbt.com#8084).
But above all, check out dbt-fusion#1042: source schemas should be first-class, versioned artifacts
Beta Was this translation helpful? Give feedback.
All reactions