fix: compaction & lookup performance optimization and SST fixes#207
Merged
lxy-9602 merged 4 commits intoalibaba:mainfrom Apr 3, 2026
Merged
fix: compaction & lookup performance optimization and SST fixes#207lxy-9602 merged 4 commits intoalibaba:mainfrom
lxy-9602 merged 4 commits intoalibaba:mainfrom
Conversation
lxy-9602
commented
Apr 2, 2026
lszskye
reviewed
Apr 2, 2026
There was a problem hiding this comment.
Pull request overview
This PR focuses on compaction/lookup hot-path performance improvements and SST read/write correctness/efficiency by reducing heap allocations, avoiding unnecessary deserialization, and adjusting parquet reader buffering to reduce compaction memory usage.
Changes:
- Refactors SST/memory-slice APIs from
shared_ptr-heavy usage to value types and reduces iterator work during block binary-search. - Adds parquet
parquet.read.enable-pre-bufferoption and disables pre-buffering in compaction read contexts to reduce memory consumption. - Consolidates key-value compaction integration tests into
pk_compaction_inte_testand adds IO-exception coverage.
Reviewed changes
Copilot reviewed 44 out of 44 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/inte/pk_compaction_inte_test.cpp | Merges key-value compaction IT coverage into PK compaction suite; adds new parameterized tests and IO-exception test. |
| test/inte/key_value_compaction_inte_test.cpp | Removes standalone key-value compaction IT file after consolidation. |
| test/inte/CMakeLists.txt | Drops the removed key-value compaction IT target from CMake. |
| src/paimon/format/parquet/parquet_format_defs.h | Adds PARQUET_READ_ENABLE_PRE_BUFFER option definition. |
| src/paimon/format/parquet/parquet_file_batch_reader.cpp | Plumbs parquet.read.enable-pre-buffer into parquet reader properties. |
| src/paimon/core/utils/fields_comparator.h | Removes CompareVariant API used for variant boxing comparisons. |
| src/paimon/core/utils/fields_comparator.cpp | Removes CompareVariant implementation. |
| src/paimon/core/operation/append_only_file_store_write.cpp | Disables parquet pre-buffering during compaction read path via read context options. |
| src/paimon/core/mergetree/lookup/persist_processor_test.cpp | Updates string assertions to use GetStringView() API. |
| src/paimon/core/mergetree/lookup_levels.cpp | Disables parquet pre-buffering for lookup-level reads; adds short-circuit/schema reductions for some T. |
| src/paimon/core/mergetree/compact/merge_tree_compact_rewriter.cpp | Disables parquet pre-buffering during merge-tree compaction reads; adds short-circuit when no rolling writer exists. |
| src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter.cpp | Disables parquet pre-buffering during lookup-merge-tree compaction reads. |
| src/paimon/common/utils/bit_set.h | Switches BitSet::ToSlice() to return MemorySlice by value. |
| src/paimon/common/sst/sst_file_writer.h | Updates writer APIs to use value MemorySlice / value BlockHandle. |
| src/paimon/common/sst/sst_file_writer.cpp | Implements value-based writes/flush/index/footer write paths; updates bloom filter writing accordingly. |
| src/paimon/common/sst/sst_file_reader.h | Simplifies ReadBlock overloads; adjusts trailer ownership type in decompression helper. |
| src/paimon/common/sst/sst_file_reader.cpp | Updates block cache interaction to support decompression callback; adapts to value-based slice/handle/trailer APIs. |
| src/paimon/common/sst/sst_file_io_test.cpp | Updates comparator signature to value MemorySlice API. |
| src/paimon/common/sst/block_writer.h | Updates BlockWriter::Finish() to return MemorySlice by value. |
| src/paimon/common/sst/block_writer.cpp | Implements value-returning Finish(). |
| src/paimon/common/sst/block_trailer.h | Changes trailer read/write APIs to pointer input + value slice output. |
| src/paimon/common/sst/block_trailer.cpp | Implements updated trailer read/write APIs. |
| src/paimon/common/sst/block_reader.h | Switches block storage/input handling to value MemorySlice and MemorySliceInput. |
| src/paimon/common/sst/block_reader.cpp | Implements value-based block parsing and input handling. |
| src/paimon/common/sst/block_iterator.h | Refactors iterator to store input by value and reduce allocations during seek. |
| src/paimon/common/sst/block_iterator.cpp | Implements key-only seek path and polled-position approach. |
| src/paimon/common/sst/block_handle.h | Changes handle read/write APIs to pointer input + value slice output. |
| src/paimon/common/sst/block_handle.cpp | Implements updated handle read/write APIs. |
| src/paimon/common/sst/block_footer.h | Switches footer to store index handle by value; updates read/write APIs. |
| src/paimon/common/sst/block_footer.cpp | Implements updated footer read/write APIs. |
| src/paimon/common/sst/block_entry.h | Stores key/value as value MemorySlice instead of shared_ptr. |
| src/paimon/common/sst/block_cache.h | Extends block cache API to accept a decompression function and cache the resulting data. |
| src/paimon/common/memory/memory_slice.h | Converts MemorySlice to a value type; updates comparator and accessors for const/value semantics. |
| src/paimon/common/memory/memory_slice.cpp | Implements value-based wrapping/slicing/input conversion. |
| src/paimon/common/memory/memory_slice_output.h | Makes ToSlice() return MemorySlice by value. |
| src/paimon/common/memory/memory_slice_output.cpp | Implements value-returning ToSlice(). |
| src/paimon/common/memory/memory_slice_input.h | Makes input own MemorySlice by value and return slices by value. |
| src/paimon/common/memory/memory_slice_input.cpp | Implements value-based input operations and const correctness. |
| src/paimon/common/lookup/sort/sort_lookup_store_factory.cpp | Adapts to value BlockHandle returned from WriteIndexBlock(). |
| src/paimon/common/io/cache/cache_key.h | Adds virtual HashCode() and attempts to adjust hashing/equality for cache keys. |
| src/paimon/common/data/serializer/row_compacted_serializer.h | Adds field metadata and direct field comparison; switches string reads to string_view. |
| src/paimon/common/data/serializer/row_compacted_serializer.cpp | Implements direct CompareField; switches comparator to avoid variant boxing; ensures deserialized rows hold backing bytes. |
| src/paimon/common/data/serializer/row_compacted_serializer_test.cpp | Updates tests to validate GetStringView()-based behavior. |
| src/paimon/common/data/generic_row.h | Adds byte holder support and renames internal holder vector to preserve string_view lifetimes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter.cpp
Show resolved
Hide resolved
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: #93
Compact Efficiency Optimization
shared_ptrto value type, eliminating heap allocations on the hot path.shared_ptrto value type, and passSliceComparatorby const reference to avoidstd::functioncopies.polled_(unique_ptr<BlockEntry>) withpolled_position_to avoid re-creatingBlockEntryduring binary search; addReadKeyAndSkipValue()to skip value deserialization when only the key is needed.FieldReader+VariantComparatorFunccomparison with directCompareFieldonRowReader, avoiding intermediateVariantTypeboxing; changeReadString()toReadStringView()to avoid copies.bytes_holder_to hold raw bytes during deserialization, ensuringstring_viewlifetime safety.T=boolinCreateSstFileFromDataFileto skip unnecessary data file reading; forT=FilePosition, only read key fields instead of full schema when creating SST files.parquet.read.enable-pre-bufferoption and disable pre-buffer during compaction to reduce memory consumption.Bug fix
decompress_funcparameter inGetBlock; fixshared_ptr<CacheKey>map key comparison by adding customhashandequal_tospecializations with virtualHashCode().Tests
Merge
key_value_compaction_inte_testintopk_compaction_inte_testAPI and Format
Documentation
Generative AI tooling
Partially Generated-by: Claude-4.6-Opus