Skip to content

Conversation

@lekv
Copy link
Contributor

@lekv lekv commented May 5, 2017

PR #46 introduced ColumnOrder with the limitation that a reader should
ignore stats for a column if the corresponding ColumnOrder in
FileMetaData contains an unknown value.

This change adds a special column order 'InvalidOrder' that can be used
to in tests and should not be used otherwise.

This change also fixes the paths in Makefile.

PR apache#46 introduced ColumnOrder with the limitation that a reader should
ignore stats for a column if the corresponding ColumnOrder in
FileMetaData contains an unknown value.

This change adds a special column order 'InvalidOrder' that can be used
to in tests and should not be used otherwise.

This change also fixes the paths in Makefile.
@rdblue
Copy link
Contributor

rdblue commented May 5, 2017

+1

@julienledem
Copy link
Member

Hi @lekv,
I'd suggest to create a separate thrift IDL in the test folder for this purpose.
That way we can test with something that is actually not defined in the reference IDL.
Here is an example of such a test here:
apache/parquet-java#405
You can write with one IDL and read with another and just have a few differences.

@julienledem
Copy link
Member

I'd prefer we don't change the reference metadata so I'd say -1 on this approach.

@lekv
Copy link
Contributor Author

lekv commented May 10, 2017

Here's the different options to test the ColumnOrder logic, that we discussed in today's sync. Please let me know if I forgot any.

  1. Create a copy of parquet.thrift (or the relevant parts of it) and add a testing ColumnOrder there. This copied IDL could be used to generate test data, which can then be used to validate the implementation using the parquet.thrift IDL.
  2. Create a test file once by editing the parquet.thrift file in my local repo, then undoing the edit. The InvalidOrder field should have a large ID so it doesn't collide with future IDs.
  3. Keep the proposal in the PR. It would serve as a reminder to any implementor that the logic to ignore unsupported ColumnOrders should be tested, and as a hint how to do so.

I'm leaning towards 3., but 2. would also work for our purpose.

@julienledem
Copy link
Member

I think we are trying to make a unit test for something that is being caught by the compiler.
Having this extra value will not force implementors to add a default case to their switch statement.
The code that handles this looks something like this:

boolean isSortUnderstood() {
switch (columnOrder.getTypeSet()) {
  case TypeDefinedOrder:
      return true;
  default:
    return false;
}

We don't really need to check this with an enum that exists but that we don't understand since it is the same case as an enum that doesn't exist in the IDL.
To test the code depending on that method you can just abstract it in an interface and test the code with new IsSupportedOrder() { boolean isSortUnderstood() { return false;} }

@lekv
Copy link
Contributor Author

lekv commented Jun 15, 2017

I agree with Julien that implementing this using a switch statement should make it more obvious that an implementation is correct. Since there was no consensus on whether we should add an extra test-only value, I suggest to close this PR.

@lekv lekv closed this Sep 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants