Fix at_least_one test when group_by_columns is configured#922
Fix at_least_one test when group_by_columns is configured#922dbeatty10 merged 8 commits intodbt-labs:mainfrom katieclaiborne-duet:kc-revert-at-least-one-pruning
at_least_one test when group_by_columns is configured#922Conversation
|
@dbeatty10, would you mind to review when you have a chance? |
dbeatty10
left a comment
There was a problem hiding this comment.
A crucial step before merging this would be the addition of test(s) as described in #922 (comment)
at_least_one test
dbeatty10
left a comment
There was a problem hiding this comment.
Thanks for raising this PR @katieclaiborne-duet 🤩
I used #934 to quickly check that the new test you added correctly surfaces the error case discovered in #838. Looks good ✅
It would be nice to keep the optimization and also fix the bug 🤞
When group_by_columns is an empty list, is there a way to keep the optimizations in #776 but avoid the bug reported in #838?
dbeatty10
left a comment
There was a problem hiding this comment.
This looks awesome @katieclaiborne-duet 🤩
Thank you for creating a relevant test for this bug and implementing a targeted fix 🧠
at_least_one testat_least_one test when group_by_columns is configured
|
Thank you so much for the fix, @katieclaiborne-duet! 🙌 |
resolves #838
Problem
The pruning logic introduced in #776 disrupted the grouping behavior in the
at_least_onetest.Specifically, the initial
pruned_rowsCTE includes alimit 1statement, which selects a single record where the target column is not null.As a result, the test passes inappropriately when the
group_by_columnsparameter is configured, as long as the target column has at least one not-null value.Essentially, the
group_by_columnsparameter currently has no effect on the test. It evaluates target column values as a single group, regardless of whether or not the parameter is present.Solution
Restore the
at_least_onetest to its prior state.Since the goal of #776 was to improve the performance of the test, I also considered a set-logic approach to evaluating a column at the group level:
In BigQuery, however, this version of the test took twice as long as the simple reverted version.
Checklist