Skip to content

Conversation

@stikkireddy
Copy link
Contributor

added jacoco plugin into sbt
added ruletype enum to make it easier to manage rule types for testing
added entry to contributing.md on how to submit pull requests
added information to readme on how to run tests
altered gitignore to include plugins.sbt which is for jacoco test reports
altered build.sbt to include jacoco test reporting & added scalatest
added 6 tests regarding validation.

added ruletype enum to make it easier to manage rule types for testing
added entry to contributing.md on how to submit pull requests
added information to readme on how to run tests
altered gitignore to include plugins.sbt which is for jacoco test reports
altered build.sbt to include jacoco test reporting & added scalatest
added 6 tests regarding validation.
@stikkireddy stikkireddy marked this pull request as ready for review April 14, 2020 15:07
@stikkireddy
Copy link
Contributor Author

@GeekSheikh let me know what you think 😄

@stikkireddy stikkireddy changed the title WIP: Adding unit tests and test reporting to the dataframe-rules-engine Adding unit tests and test reporting to the dataframe-rules-engine Apr 14, 2020
@GeekSheikh
Copy link
Contributor

Will review ASAP. Thank you!

@GeekSheikh GeekSheikh self-assigned this Apr 23, 2020
@GeekSheikh GeekSheikh added the enhancement New feature or request label Apr 23, 2020
@GeekSheikh
Copy link
Contributor

Hey, Sri,

This is great, thanks for building up the scaffolding and throwing in a few tests. If you get some time, could you review this PR? It fixed a bug with the output report, if you get time, please add some additional tests to validate the output df.

Also, is the coverage sufficient for all the tests without explicitly doing the test with a group by and without a group by? I saw the validations on the group by cols, but not 100% sure that covers all cases.

We also need to add an agg column in each test to validate the agg expressions vs simple column expressions. We also need to add a test for a complex column to each test case such as col("retail_price" - col("scan_price") to ensure we're validating expressions properly.

I believe these would be simple, quick additions, do you think you will have time to add them soon? Appreciate the contribution.

Summary

  • Output Report Validations
  • Add group by tests where/if needed
  • Add aggregate columns to each test with Rules
  • Add complex expression to each test with Rules

Copy link
Contributor

@GeekSheikh GeekSheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Made a few changes and added some docs. Thanks so much for getting the test suite started.

}

test("The input rule should have 3 invalid count for MinMax_Scan_Price_Minus_Retail_Price_min for failing complex type.") {
val expectedDF = Seq(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stikkireddy -- Changing this, this is not correct. Since each row of Retail_Price - Scan Price evaluates to -1, both the Min and max rules should fail as the min and the max both are -1 which violates the lower bound rule of 0.0. Additionally, aggregate InvalidCounts always evaluate to 0 or 1 since the min/max results in a single value for an entire column thus the sum of a single value (1) cannot > 1.

Aggregate columns return an invalid count of 1 or 0
Non-aggregate expression return the sum on invalids

@GeekSheikh GeekSheikh merged commit f7ebf98 into databrickslabs:master May 8, 2020
@GeekSheikh GeekSheikh linked an issue Aug 15, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a base set of tests

2 participants