add precision + exclude_columns option to equality test by rlh1994 · Pull Request #765 · dbt-labs/dbt-utils

rlh1994 · 2023-02-06T15:50:17Z

resolves #757, resolves #734, resolves #785, replaces #737, resolves #828

This is a:

documentation update
bug fix with no breaking changes
new functionality
a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

This adds an argument to the equality test that allows for floating point columns (column is_numeric or is_float) to be more easily compared and reducing false errors raised. It is fully backwards compatible (default arguments retain existing behaviour and code), and has been tested on and supports BQ/Postgres/Databricks/Snowflake/Redshift.

It also includes a few new tests both positive and negative assertions.

It also includes the work done in #737 to save rebasing if one of these was merged independently.

Finally, it adds a fix for #785 that ensures when no compare/ignore columns list is provided that both tables have the same columns.

Checklist

adammarples · 2023-04-05T13:51:45Z

@rlh1994 this PR would be very useful to me. Any idea why the tests are failing? It looks like errors in seeding on redshift but it's not clear to me why.

rlh1994 · 2023-04-05T15:18:04Z

@adammarples I don't think it's related to this change as it seems to be failing across multiple PRs, I was just waiting for @joellabes to review

joellabes

You and @brunocostalopes have done some great stuff here @rlh1994!

I particularly love the robust set of tests.

I will want to come back and do a deeper read of the specific jinja in the equality.sql file itself, but it makes sense to do that in one hit alongside any changes necessary with #785, as I imagine that'll lead to some changes too.

The good news is that the tests are now up and running again on Redshift, so we should be able to move much faster and more confidence.

rlh1994 · 2023-05-02T09:34:36Z

@joellabes I've added it in, but a few notes on this final change:

I've made it a separate block, which means that currently it hits the warehouse twice for the model to get the columns - I tried to do this as part of the normal get_columns... part but it seems like set() alters the object it is acting on (at least they way I was using it for the generator that's returned) which cause a load of new failures when it shouldn't have. I'm not aware if there is the equivalent of a deep_copy in dbt/jinja so I've done it this way just to be sure.
It feels like this should actually raise a compiler error rather than fail the test itself, but I wasn't sure how to add tests for such a case, let me know your preference on this and I can swap it out to raise an error instead.
I think this would now qualify as a breaking change, as previously passing tests would now fail. A workaround would be to add a new argument to the macro to enable/disable this check and set the default (until a breaking change version is released) to be false.
I had to make a change to one of the other tests as this highlighted that the tables had different columns (which means it works!)

nachimehta · 2023-05-10T05:50:21Z

also looking forward to this, at the moment the equality silently passes if your comparison model has more columns than the tested model, which seems like unexpected behavior. I believe your fix rectifies this issue.

joellabes · 2023-05-16T05:55:57Z

I'm not aware if there is the equivalent of a deep_copy in dbt/jinja so I've done it this way just to be sure.

Me either, I think this is OK

It feels like this should actually raise a compiler error rather than fail the test itself, but I wasn't sure how to add tests for such a case, let me know your preference on this and I can swap it out to raise an error instead.

Yeah it should raise a compiler error. I think that means we can't have a test running for it 😬 but better to do the right thing on the end user's side.

I think this would now qualify as a breaking change, as previously passing tests would now fail. A workaround would be to add a new argument to the macro to enable/disable this check and set the default (until a breaking change version is released) to be false.

They’ll fail because the data is invalid though, right? So I am OK with that - it's a bug that we're fixing as opposed to deviating from documented behaviour

4. I had to make a change to one of the other tests as this highlighted that the tables had different columns (which means it works!)

Ha! Great news 🎉

Will go over this properly soon

rlh1994 · 2023-07-04T09:10:16Z

Hey @joellabes just wondering if you know when you'll have a chance to look at this please?

macros/generic_tests/equality.sql

rlh1994 · 2023-10-24T18:17:10Z

Redshift failing due to a deadlock, not a failing test

brunocostalopes · 2023-10-25T08:30:07Z

@rlh1994 I noticed you've updated the equality test and renamed the arguments from ignore to exclude, just a note that when discussing with @joellabes he was of the opinion that we should name them "ignore" rather than "exclude". See here: #734 (comment)

rlh1994 · 2023-10-25T08:38:10Z

@rlh1994 I noticed you've updated the equality test and renamed the arguments from ignore to exclude, just a note that when discussing with @joellabes he was of the opinion that we should name them "ignore" rather than "exclude". See here: #734 (comment)

Ah okay, I went with exclude based on this comment here: #829 (comment)

To be honest I have no strong feelings either way and can easily change it once this gets reviewed 😅

brunocostalopes · 2023-10-25T09:55:55Z

Oh, I missed that PR. So we now have the same feature implemented in #737 (my original PR) #765 (which includes those changes) and #829 (new PR). I wonder if any of these will ever get merged :(

seub · 2023-10-25T13:18:14Z

@brunocostalopes Sorry I had missed your PR as well! It appears that there's a real need for these ignore/exclude columns! ;)

It would indeed help avoid a stack of redundant PRs if some of them finally got reviewed/approved @joellabes

gwenwindflower

Looks good! Left one comment for consideration but otherwise this seems good to me and everything is passing now in CI. Thanks again for your patience on this!

macros/generic_tests/equality.sql

gwenwindflower · 2024-03-05T23:56:53Z

Love where you landed with the new comments @rlh1994 ! Thanks again for the patience, this is a fantastic addition to the package, really appreciate the work on this one. 💜

jomccr · 2024-03-07T16:07:52Z

This has already helped me a ton, thanks for this contribution!

rlh1994 changed the title ~~add precision option to equality test~~ add precision + exclude_columns option to equality test Feb 7, 2023

rlh1994 mentioned this pull request Apr 25, 2023

equality test passes when first table has a subset of second tables columns #785

Closed

5 tasks

joellabes reviewed May 2, 2023

View reviewed changes

rlh1994 mentioned this pull request Sep 25, 2023

Equality exclude columns #829

Closed

17 tasks

bruno and others added 13 commits September 25, 2023 15:58

add exclude columns to equality test

aefaa9d

add precision option to equality test

d5853e3

CI fix?

24105b4

CI fix 2.0

12e3750

Update CHANGELOG.md

aa25bd2

Check for subset of columns (Close #785)

c8a3956

cast type

b722128

cast type across warehouses

f786fda

swap to copiler error, account for ignore columns

405d8ec

Update CL

a036436

allow for different cased names

d4caa52

fix CL

4ef3c06

linting

a481ad1

seub reviewed Oct 24, 2023

View reviewed changes

macros/generic_tests/equality.sql Outdated Show resolved Hide resolved

rlh1994 added 2 commits October 24, 2023 19:07

Rename to exclude_columns

e22fc2c

Fix typo

311d37e

joellabes and others added 3 commits March 1, 2024 09:06

Merge branch 'main' into add-precision-to-equality-test

76ac835

Add package-lock.yaml to .gitignore

ae95585

Merge branch 'main' into add-precision-to-equality-test

908f410

gwenwindflower approved these changes Mar 1, 2024

View reviewed changes

macros/generic_tests/equality.sql Outdated Show resolved Hide resolved

Update comments

cd7bd1c

gwenwindflower approved these changes Mar 5, 2024

View reviewed changes

gwenwindflower added this pull request to the merge queue Mar 5, 2024

Merged via the queue into dbt-labs:main with commit 23da1f4 Mar 5, 2024

rlh1994 deleted the add-precision-to-equality-test branch March 7, 2024 16:24

seub mentioned this pull request Apr 22, 2024

Add optional exclude_columns to the equality test #828

Closed

b-per mentioned this pull request Jun 26, 2024

Change of behavior in 1.2 makes passing tests with 1.1 fail with 1.2 #923

Closed

Conversation

rlh1994 commented Feb 6, 2023 • edited by dbeatty10 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description & motivation

Checklist

Uh oh!

adammarples commented Apr 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rlh1994 commented Apr 5, 2023

Uh oh!

joellabes left a comment

Choose a reason for hiding this comment

Uh oh!

rlh1994 commented May 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nachimehta commented May 10, 2023

Uh oh!

joellabes commented May 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rlh1994 commented Jul 4, 2023

Uh oh!

Uh oh!

rlh1994 commented Oct 24, 2023

Uh oh!

brunocostalopes commented Oct 25, 2023

Uh oh!

rlh1994 commented Oct 25, 2023

Uh oh!

brunocostalopes commented Oct 25, 2023

Uh oh!

seub commented Oct 25, 2023

Uh oh!

gwenwindflower left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gwenwindflower commented Mar 5, 2024

Uh oh!

jomccr commented Mar 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

rlh1994 commented Feb 6, 2023 •

edited by dbeatty10

Loading

adammarples commented Apr 5, 2023 •

edited

Loading

rlh1994 commented May 2, 2023 •

edited

Loading

joellabes commented May 16, 2023 •

edited

Loading