更新代码 #1

qianchutao · 2022-04-29T09:25:32Z

更新代码

… columns (#4818) NOTE: This change is first part of the series to clean up Hudi's Spark DataSource related implementations, making sure there's minimal code duplication among them, implementations are consistent and performant This PR is making sure that BaseFileOnlyViewRelation only reads projected columns as well as avoiding unnecessary serde from Row to InternalRow Brief change log - Introduced HoodieBaseRDD as a base for all custom RDD impls - Extracted common fields/methods to HoodieBaseRelation - Cleaned up and streamlined HoodieBaseFileViewOnlyRelation - Fixed all of the Relations to avoid superfluous Row <> InternalRow conversions

…ng buildx (#5011)

…olumns from schema (#4972) * [HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema * Fix case sensitivity

* [HUDI-2999] rfc for consistent hashing index * [HUDI-2999] review: add metadata table & non-dual-write solution (virtual log file) for resizing Co-authored-by: xiaoyuwei <[email protected]>

Co-authored-by: 苏承祥 <[email protected]>

… in TestSchemaPostProcessor (#5019)

…4982)

…load previous Index Table state (#5015)

#4999) Co-authored-by: Rex An <[email protected]>

…etting multi processors at once (#4969)

…d compaction if rollback failed mid-way (#4971)

#5013) Create new TypedProperties while performing clustering Add OrderedProperties and minor refactoring Add javadoc and remove getters from OrderedProperties

…4984) Co-authored-by: Y Ethan Guo <[email protected]>

…5025)

…friendly for flink (#5010)

….compact.inline.max.delta.commits (#4976) * Update CompactionHoodiePathCommand.scala fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits * Update CompactionHoodiePathCommand.scala fix IndexOutOfBoundsException when there`s no schedule for compaction * Update CompactionHoodiePathCommand.scala fix CI issue

… Maxwell json string (#4987) * [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string * add ut * Address comment

…dle pom (#5017)

…5033) Co-authored-by: root <[email protected]>

#4948)

* [HUDI-3633] Allow non-string values to be set in TypedProperties * Override getProperty to ignore instanceof string check

* stop add event when has failed compact event Co-authored-by: wxp <[email protected]>

… instead of source columns (#5364) - Scaffolded `Spark24HoodieParquetFileFormat` extending `ParquetFileFormat` and overriding the behavior of adding partition columns to every row - Amended `SparkAdapter`s `createHoodieParquetFileFormat` API to be able to configure whether to append partition values or not - Fallback to append partition values in cases when the source columns are not persisted in data-file - Fixing HoodieBaseRelation incorrectly handling mandatory columns

…raction from Partition path (#5377)

- when columns names are renamed (schema evolution enabled), while copying records from old data file with HoodieMergeHande, renamed columns wasn't handled well.

) This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns. - Changes the logic to check and append the root level field for a required nested field in the mandatory columns in HoodieBaseRelation.appendMandatoryColumns

… w/ Spark 3.2.0 (#5378) - Due to the fact that Spark 3.2.1 is non-BWC w/ 3.2.0, we have to handle all these incompatibilities in Spark32HoodieParquetFileFormat. This PR is addressing that. Co-authored-by: Raymond Xu <[email protected]>

…5336)

…ned field (#5373)

Co-authored-by: hehuiyuan1 <[email protected]>

…link-hudi (#5405)" (#5421) This reverts commit bda3db0.

…eld with writes (#5424) Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.

…5381)

…dieNotSupportedException (#5432)

…ormance (#5441)

…k should exit. (#5391) Co-authored-by: y00617041 <[email protected]>

… default value error (#5368) Co-authored-by: pusheng.li01 <[email protected]>

…ucket hash Index (#5185) * fix duplicate fileId with bucket Index * replace to load FileGroup from FileSystemView

XuQianJin-Stars and others added 30 commits March 9, 2022 18:04

[MINOR] Add IT CI Test timeout option (#5003)

ca0b8fc

[HUDI-3581] Reorganize some clazz for hudi flink (#4983)

ec24407

[HUDI-3602][DOCS] Update docker README to build multi-arch images usi…

4e09545

…ng buildx (#5011)

[HUDI-3586] Add Trino Queries in integration tests (#4988)

fa5e750

[HUDI-3595] Fixing NULL schema provider for empty batch (#5002)

9dc6df5

[HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop c…

83cff3a

…olumns from schema (#4972) * [HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema * Fix case sensitivity

[HUDI-2999] [RFC-42] RFC for consistent hashing index (#4326)

18cdad9

* [HUDI-2999] rfc for consistent hashing index * [HUDI-2999] review: add metadata table & non-dual-write solution (virtual log file) for resizing Co-authored-by: xiaoyuwei <[email protected]>

[HUDI-3566] Add thread factory in BoundedInMemoryExecutor (#4926)

faed699

Co-authored-by: 苏承祥 <[email protected]>

[HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema…

b001803

… in TestSchemaPostProcessor (#5019)

[HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable (#…

56cb494

…4982)

[HUDI-3513] Make sure Column Stats does not fail in case it fails to …

5d59bf6

…load previous Index Table state (#5015)

[HUDI-3592] Fix NPE of DefaultHoodieRecordPayload if Property is empty (

93277b2

#4999) Co-authored-by: Rex An <[email protected]>

[HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support s…

e8918b6

…etting multi processors at once (#4969)

[HUDI-3556] Re-use rollback instant for rolling back of clustering an…

e7bb041

…d compaction if rollback failed mid-way (#4971)

[HUDI-3593] Restore TypedProperties and flush checksum in table config (

eee96e9

#5013) Create new TypedProperties while performing clustering Add OrderedProperties and minor refactoring Add javadoc and remove getters from OrderedProperties

[HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException (#…

e60acc1

…4984) Co-authored-by: Y Ethan Guo <[email protected]>

[HUDI-3501] Support savepoints command based on Call Produce Command (#…

6c8224c

…5025)

[HUDI-3613] Adding/fixing yamls for metadata (#5029)

1ba8220

[HUDI-3600] Tweak the default cleaning strategy to be more streaming …

465d553

…friendly for flink (#5010)

[MINODR] Remove repeated kafka-clients dependencies (#5034)

003c6ee

[HUDI-3621] Fixing NullPointerException in DeltaStreamer (#5039)

22c3ce7

[HUDI-3623] Removing hive sync node from non hive yamls (#5040)

30cf393

[HUDI-3620] Adding spark3.2.0 profile (#5038)

d40adfa

[HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from…

3b59b76

… Maxwell json string (#4987) * [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string * add ut * Address comment

[HUDI-3606] Add org.objenesis:objenesis to hudi-timeline-server-bun…

6ed7106

…dle pom (#5017)

[HUDI-3619] Fix HoodieOperation fromValue using wrong constant value (#…

9bdda2a

…5033) Co-authored-by: root <[email protected]>

[HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index (

5e8ff8d

#4948)

[HUDI-3633] Allow non-string values to be set in TypedProperties (#5045)

d514570

* [HUDI-3633] Allow non-string values to be set in TypedProperties * Override getProperty to ignore instanceof string check

yuzhaojing and others added 29 commits April 19, 2022 23:31

[HUDI-3904] Claim RFC number for Improve timeline server (#5354)

6a3ce92

[HUDI-3912] Fix lose data when rollback in flink async compact (#5357)

408663c

* stop add event when has failed compact event Co-authored-by: wxp <[email protected]>

[HUDI-3938] Fix default value for num retries to acquire lock (#5380)

a9506aa

[HUDI-3935] Adding config to fallback to enabled Partition Values ext…

4b296f7

…raction from Partition path (#5377)

[MINOR] Increase azure CI timeout to 120m (#5384)

4e1ac46

[HUDI-3940] Fix retry count increment in lock manager (#5387)

de5fa1f

[HUDI-3921] Fixed schema evolution cannot work with HUDI-3855 (#5376)

037f89e

- when columns names are renamed (schema evolution enabled), while copying records from old data file with HoodieMergeHande, renamed columns wasn't handled well.

[DOCS] Add commit activity, twitter badgers, and Hudi logo in README (#…

20781a5

…5336)

[HUDI-3947] Fixing Hive conf usage in HoodieSparkSqlWriter (#5401)

7523542

[HUDI-3950] add parquet-avro to gcp-bundle (#5399)

505ee67

[HUDI-3948] Fix presto bundle missing HBase classes (#5398)

8633bd6

[HUDI-3923] Fix cast exception while reading boolean type of partitio…

5e5c177

…ned field (#5373)

support generan parameter 'sink.parallelism' for flink-hudi (#5405)

bda3db0

Co-authored-by: hehuiyuan1 <[email protected]>

[HUDI-3946] Validate option path in flink hudi sink (#5397)

d994c58

Revert "[HUDI-3951]support generan parameter 'sink.parallelism' for f…

9054b85

…link-hudi (#5405)" (#5421) This reverts commit bda3db0.

[HUDI-3085] Improve bulk insert partitioner abstraction (#4441)

f2ba0fe

[HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine fi…

762623a

…eld with writes (#5424) Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.

[HUDI-3478] Claim RFC 51 For CDC (#5437)

77e3332

[MINOR] Update alter rename command class type for pattern matching (#…

6ec039b

…5381)

[HUDI-3977] Flink hudi table with date type partition path throws Hoo…

e1ccf2e

…dieNotSupportedException (#5432)

Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Perf…

924e2e9

…ormance (#5441)

[HUDI-3945] After the async compaction operation is complete, the tas…

cacbd98

…k should exit. (#5391) Co-authored-by: y00617041 <[email protected]>

[HUDI-3815] Fix docs description of metadata.compaction.delta_commits…

52953c8

… default value error (#5368) Co-authored-by: pusheng.li01 <[email protected]>

[HUDI-3943] Some description fixes for 0.10.1 docs (#5447)

4e928a6

[MINOR] support different cleaning policy for flink (#5459)

b27e8b5

[HUDI-3758] Fix duplicate fileId error in MOR table type with flink b…

e421d53

…ucket hash Index (#5185) * fix duplicate fileId with bucket Index * replace to load FileGroup from FileSystemView

qianchutao merged commit 3e0dcf4 into qianchutao:master Apr 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

更新代码 #1

更新代码 #1

Uh oh!

qianchutao commented Apr 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

更新代码 #1

更新代码 #1

Uh oh!

Conversation

qianchutao commented Apr 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants