Skip to content

Conversation

@xiarixiaoyao
Copy link
Contributor

@xiarixiaoyao xiarixiaoyao commented Jul 23, 2021

Tips

What is the purpose of the pull request

pr for RFC - 28 : Support Z-order curve. Z-order is a technique that allows you to map multidimensional data to a single dimension. We can use this feature to improve query performance.

query performance test:
we use databricks test case
https://databricks.com/blog/2018/07/31/processing-petabytes-of-data-in-seconds-with-databricks-delta.html?spm=a2c4g.11186623.2.11.2a8f1693QNwUSH&_ga=2.29295480.552083878.1584501563-968665100.1584501563
https://help.aliyun.com/document_detail/168137.html?spm=a2c4g.11186623.6.563.10d758ccclYtVb
prepare data:
10 billion data, 5w files (parquet files)
test env:
spark-3.1.0
hadoop-3.1.0
hudi-0.8.0
test step:

  • step1: prepare hudi cow table with 10 billion data, 5w files
  • step2: do optimize with z-order, sort column: src_ip, dst_ip and produce 5w files。 df.optimizeByZOrder(Seq("src_ip", "dst_ip"), options, outputFileNum = 50000)
  • step3: do optimize with z-order, sort column:src_ip, src_port, dst_ip, dst_port and produce 5w files。 df.optimizeByZOrder(Seq("src_ip", "src_port", "dst_ip", "dst_port"), options, outputFileNum = 50000)

test result

<style> </style>
Table Files number Data size first query second query Query resource Data skipping
Hudi table 5w 10 billion 76980ms 29228ms 330core, 1t memory Scan 5w files
z-order with two-column sorting 5w 10 billion 4662ms 3855ms 16 core 96G memory Scan 4 files
z-order with four-column sorting 5w 10 billion 6168ms 3344ms 16 core 96G memory Scan 6 files
Hilbert sort 5w 10 billion 5694ms 4029ms 16 core 96G memory Scan 5 files

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

UT added.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@hudi-bot
Copy link
Collaborator

hudi-bot commented Jul 23, 2021

CI report:

  • 133379deca564ca42f10a1f3e59bb4aa17d80964 UNKNOWN
  • 7b975e5 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@xiarixiaoyao
Copy link
Contributor Author

now the RFC-27 is not implement, once RFC-27 is merged, i will update code to adapt it.
in this pr, even if we donnot do data skipping, we can also achive a good result, since we sort data by z-order

hilbert implement will come soon

@xiarixiaoyao xiarixiaoyao force-pushed the zorder branch 2 times, most recently from 93897c1 to b474498 Compare July 23, 2021 04:36
@xiarixiaoyao xiarixiaoyao changed the title [HUDI-2101][WIP]support z-order for hudi [HUDI-2101]support z-order for hudi Jul 23, 2021
@xiarixiaoyao
Copy link
Contributor Author

@leesf could you pls help me to review this pr

@xiarixiaoyao xiarixiaoyao changed the title [HUDI-2101]support z-order for hudi [HUDI-2101][RFC-28]support z-order for hudi Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need another write type for z-order? For my first think, z-order is a layout optimize for the table which is much similar to order-by. As currently we already have the clustering to optimize the layout by order-by, can we implement the z-order by under the clustering ? So that we can support two kind of data clustering:

Clustering table order by col1;
Clustering table zorder by col2;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we support two operation type

  1. cluster support
  2. zorder optimize by overwrite table。

Copy link

@pengzhiwei2018 pengzhiwei2018 Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It make sense to me now. But one thing I want to understand is that what is the difference between the two operation types you list, if cluster support can not cover the case of zorder optimize by overwrite table>

Copy link
Contributor Author

@xiarixiaoyao xiarixiaoyao Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pengzhiwei2018 you mean sql? good suggest, now we only support api . i will add sql support for anther pr. thanks for your suggest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean if 1. cluster support can cover the function of 2. zorder optimize by overwrite table, we can only implement the zorder by under the clustering. No need to introduce a write type for z-order. As you know, hudi currently has too many write types, which add the cost of use for our users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pengzhiwei2018 Due to the implementation of Hudi itself, optimze by overwritetable is more convenient to use, since cluster operation need to set many properties。
Another consideration: as we know delta lake support z-order, just like delta lake api, we want keep the same implement

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 to keep this as a clustering strategy. If needed, we can make top level API different. But underneath, WriteClient API and ActionExecutor will benefit from using same code (I do see lot of copy paste code there right now). Any changes we do in one ActionExecutor will have to be copied to other which is going to be a challenge.

@xiarixiaoyao
Copy link
Contributor Author

xiarixiaoyao commented Jul 23, 2021

@allwefantasy. could you pls help me to review the z-order partial codes. thanks

@xiarixiaoyao
Copy link
Contributor Author

@leesf thanks for your review. i will try to resolve all comments

@xiarixiaoyao
Copy link
Contributor Author

@leesf address all commts. make the code more genral to adapt both z-order and hilbert optimize.

@xiarixiaoyao xiarixiaoyao force-pushed the zorder branch 5 times, most recently from 533d02e to cbc214a Compare July 27, 2021 08:43
@xiarixiaoyao
Copy link
Contributor Author

xiarixiaoyao commented Jul 27, 2021

@leesf @garyli1019 @nsivabalan @pengzhiwei2018 . pls help me review this pr again, thanks.
this pr realized the following functions:

  1. A new write mode optimize is implemented, which supports the optimization of data layout with zorder and Hilbert
  2. clustering mode support zorde/hilbert
  3. Simple dataskipping implemented,(of course even if we have no dataskipping zorder/hilbert can also accelerated query). if RFC-27 is implemented, i will adaptation

@xiarixiaoyao xiarixiaoyao force-pushed the zorder branch 2 times, most recently from 78a02f4 to 6912152 Compare July 28, 2021 01:29
@vinothchandar
Copy link
Member

vinothchandar commented Oct 27, 2021

(WIP) Pending items (for my own tracking)

  • Zoptimize.scala to java
  • Using LAYOUT_OPTIMIZE* prefix for all configs, property names instead of DATA_OPTIMIZE_*
  • Should sort columns and optimize columns be separate configs or not?
  • Check out the HoodieTable API changes
  • Protect the query side changes with a flag.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiarixiaoyao I pushed some small changes to config names and such. The only issue to resolve is to move the data skipping config.

  • Add it to the Spark DataSourceReadOptions and remove from HoodieWriteConfig? Then we can rework the code in HoodieFileIndex to be protected by this config, also defensively protect the existing code paths.
  • For updating statistics during writing, we can just check whether layout optimization is enabled? (no need to check data skipping enabled there, since its a query side config).
  • Also can we make sure this works for Spark SQL as well (not just datasource)

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to probably guard these changes with a config as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes agree.

@vinothchandar
Copy link
Member

Side note: At-least from IDE builds, I get the following error when running tests. did you fae this? have you tried running this on a cluster using a spark bundle?

image

@xiarixiaoyao
Copy link
Contributor Author

xiarixiaoyao commented Oct 29, 2021

@vinothchandar thanks for your suggestion.
the question:Side note: At-least from IDE builds, I get the following error when running tests. did you fae this? have you tried running this on a cluster using a spark bundle?

ans: i use java 8 to run those code , of course i test those code in cluster. but i find if we uses java 11 to run those code in IDE, IDE will throw "package sun.misc. does not exist", since Packages com.sun.* and sun.* hold internal stuff, and should not be used by thirdparty applications (like yours probably) in general case. now i removed all unsafe function, it should be ok for java 11.

update the code.
1. rewrite most of scala functions(ZOptimze.scala) by java in ZCurveOptimizeHelper.java and remove ZOptimze.scala
2. remove unsafe lexicographical cmp and use java lexicographical cmp. now no unsafe class exisit.
3. dataskipping should work with sql and datasource, not just datasource.
4. Introduce a new conf ENABLE_DATA_SKIPPING in DataSourceReadOptions to protect the query side changes.

i think we may no need to move data skipping config to DataSourceReadOptions. we can introduce a config in DataSourceReadOptions. WDYT?

@xiarixiaoyao
Copy link
Contributor Author

@hudi-bot run azure

@vinothchandar
Copy link
Member

we can introduce a config in DataSourceReadOptions. WDYT?

yes. we can introduce a new one there. But wondering if we need the data skipping flag in HoodieWriteConfig still? What purpose do you think it will solve. I am fine either ways.

Let me know when/if you have tested the latest code on your cluster and verified performance gains are still same (after removing the unsafe code etc).

Thanks for working through this with me

@xiarixiaoyao
Copy link
Contributor Author

@vinothchandar
data skipping flag in HoodieWriteConfig is a write flag. config in DataSourceReadOptions is read flag. I think it’s better to read and write separately.
i will verified performance gains today.

I hope we can merge this feature before the 0.10 release, and I will try my best to resolve all the comments。

@xiarixiaoyao
Copy link
Contributor Author

xiarixiaoyao commented Nov 1, 2021

@vinothchandar i test the performance about z-order cluster by using unsafe sort and java sort,
test env: spark3.1.1 --master yarn-client --driver-memory 20g --executor-cores 3 --num-executors 70 --executor-memory 8g
test case: https://help.aliyun.com/document_detail/168137.html?spm=a2c4g.11186623.6.563.10d758ccclYtVb
test data: 53 G (10000w)
the test result:

<style> </style>
  unsafe sort(only record sort time ) java sort (only record sort time)
First time 3.1min 2.8min
Second time 2.9min 2.7min

test result show: No performance degradation.

@xiarixiaoyao
Copy link
Contributor Author

@hudi-bot run azure

@vinothchandar
Copy link
Member

Sounds good. I ll tkae a final pass and land this, this week.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few minor things. LGTM overall., will push out few tweaks and land

@vinothchandar
Copy link
Member

@xiarixiaoyao do you plan to add hilbert curves in the 0.10.0 release? (3rd week of nov)

right now, this config is sort of unused?

      .key(LAYOUT_OPTIMIZE_PARAM_PREFIX + "strategy")
      .defaultValue("z-order")
      .sinceVersion("0.10.0")
      .withDocumentation("Type of layout optimization to be applied, current only supports `z-order` and `hilbert` curves.");

@vinothchandar
Copy link
Member

@hudi-bot run azure

@xiarixiaoyao
Copy link
Contributor Author

@vinothchandar yes, hilbert curve is ready, i think time is enough to add hilbert curve.

@vinothchandar
Copy link
Member

sg. will land once CI passes. right now, its queued up here.

https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=3067&view=results

@xiarixiaoyao
Copy link
Contributor Author

xiarixiaoyao commented Nov 9, 2021

@xushiyan
Test result for z-order
query resource --executor-memory 12G --executor-cores 4 --num-executors 4

Table File nums Data size 1st query 2st query 3st query (enable data skipping) 4st query (enable data skipping) DataSkipping
Hudi origin table 20k 10billion 446806ms 397131 ms not support not support  
z-order with two column sorting 20k 10billion 94104 ms 80778 ms 5913 ms 2769ms Total file size is: 20000, after file skip size is: 5 skipping percent 0.99975
z-order with four-column sorting 20k 10billion 111393 ms 91664 ms 5868 ms 3186 ms Total file size is: 20000, after file skip size is: 5 skipping percent 0.99975

@vinothchandar
Copy link
Member

@xiarixiaoyao let's write a blog around z-ordering? If you are up for it, I can help you with an initial draft and you can add more to it? what do you think ?

@xiarixiaoyao
Copy link
Contributor Author

@vinothchandar Of course, let's finish this together

@xiarixiaoyao xiarixiaoyao deleted the zorder branch December 3, 2021 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants