Skip to content

refactor: extract WriteBuffer from MergeTreeWriter#206

Merged
zjw1111 merged 11 commits intoalibaba:mainfrom
zjw1111:refactor/write-buffer
Apr 3, 2026
Merged

refactor: extract WriteBuffer from MergeTreeWriter#206
zjw1111 merged 11 commits intoalibaba:mainfrom
zjw1111:refactor/write-buffer

Conversation

@zjw1111
Copy link
Copy Markdown
Collaborator

@zjw1111 zjw1111 commented Apr 1, 2026

Purpose

Linked issue: #149

Refactor MergeTreeWriter by extracting the in-memory buffering logic into WriteBuffer. This moves buffered batch ingestion, memory tracking, and flush-to-reader preparation into a dedicated component, while keeping MergeTreeWriter focused on file writing and compaction orchestration.

Tests

TEST_F(WriteBufferTest.*)

  • added focused unit tests for WriteBuffer flush state reset and sequence number advancement
  • added focused unit tests for WriteBuffer row kind preservation
  • added memory estimation coverage for WriteBuffer

TEST_F(CleanInteTest.TestOrphanFilesCleanWithIOException)

  • fix unstable case in TestOrphanFilesCleanWithIOException

Existing unit and integration tests

  • fixed arrow::struct_(FieldVector) call sites across affected test files so schema construction matches the Arrow API

API and Format

No API, storage format, or protocol changes.

Documentation

No.

Generative AI tooling

Generated-by: Aone Copilot(Claude-4.6-Opus) and GitHub Copilot(GPT-5.4)

Extract batch buffering logic from MergeTreeWriter into a dedicated
WriteBuffer class. WriteBuffer manages batch_vec_, row_kinds_vec_,
and current_memory_in_bytes_, and provides Write() and Flush() methods.

This is a pure internal refactoring that:
- Does not change any external API or behavior
- Prepares for future Spill-to-Disk functionality
- Improves separation of concerns in MergeTreeWriter
Copilot AI review requested due to automatic review settings April 1, 2026 10:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extracts MergeTreeWriter’s in-memory buffering and flush logic into a dedicated WriteBuffer abstraction and adds unit tests to validate flush behavior and memory estimation.

Changes:

  • Introduces new WriteBuffer class for buffering, flushing, and memory usage tracking.
  • Refactors MergeTreeWriter to delegate buffering/flush responsibilities to WriteBuffer.
  • Adds focused tests for flush state reset, sequence advancement, row kind preservation, and memory estimation.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/paimon/core/mergetree/write_buffer.h Adds the WriteBuffer API used by MergeTreeWriter.
src/paimon/core/mergetree/write_buffer.cpp Implements buffering, flushing to in-memory readers, and memory estimation.
src/paimon/core/mergetree/write_buffer_test.cpp Adds new unit tests for WriteBuffer flush semantics and memory estimation.
src/paimon/core/mergetree/merge_tree_writer.h Replaces ad-hoc buffer state with a WriteBuffer member.
src/paimon/core/mergetree/merge_tree_writer.cpp Delegates write buffering and flush reader creation to WriteBuffer.
src/paimon/core/mergetree/merge_tree_writer_test.cpp Refactors batch creation helper and removes now-moved memory estimation tests.
src/paimon/CMakeLists.txt Wires write_buffer.cpp and write_buffer_test.cpp into the build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lucasfang
lucasfang previously approved these changes Apr 3, 2026
Copy link
Copy Markdown
Collaborator

@lucasfang lucasfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Collaborator

@lucasfang lucasfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@zjw1111 zjw1111 merged commit f052b2b into alibaba:main Apr 3, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants