Skip to content

Conversation

@dqhl76
Copy link
Collaborator

@dqhl76 dqhl76 commented Oct 31, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Replace the final aggregate spiller with a sync writer from SpillsBufferPool
  • Refactor the experimental final aggregate processor to split local/shared state, reuse hash tables, and bound recursive spilling via max_aggregate_spill_level
  • Rename the feature flag to enable_experiment_aggregate, add the max_aggregate_spill_level setting

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Oct 31, 2025
@dqhl76 dqhl76 requested a review from Copilot November 2, 2025 09:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the aggregate spill implementation by renaming settings and introducing a new spiller architecture with multi-level spill support. The main changes consolidate spill logic, add a maximum recursion depth setting for aggregate spills, and improve memory management during aggregation.

  • Renamed enable_experiment_aggregate_final to enable_experiment_aggregate with updated default from 0 to 1
  • Added max_aggregate_spill_level setting to limit recursion depth during aggregate spilling
  • Replaced FinalAggregateSpiller with NewAggregateSpiller using parquet-based spilling

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
settings_getter_setter.rs Renamed getter method to match new setting name and added getter for max spill level
settings_default.rs Renamed setting, changed default to 1, and added max_aggregate_spill_level setting
new_transform_final_aggregate.rs Major refactor to extract state management, improve spilling logic, and add multi-level spill support
new_final_aggregate_state.rs New file containing extracted state management structs (RoundPhase, LocalRoundState, FinalAggregateSharedState)
new_aggregate_spiller.rs New file implementing parquet-based spiller replacing legacy serialization approach
final_aggregate_spiller.rs Deleted file - legacy spiller implementation removed
aggregate_meta.rs Added NewSpilledPayload struct and NewBucketSpilled enum variant
transform_partition_bucket*.rs Added unreachable handlers for NewBucketSpilled variant
payload/partitioned_payload/aggregate_hashtable Added reset_for_reuse methods to support hashtable reuse
build_partition_bucket.rs Updated to use new spiller and pass max_aggregate_spill_level parameter
physical_aggregate_final.rs Updated to use renamed setting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dqhl76 dqhl76 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Nov 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2025

Docker Image for PR

  • tag: pr-18907-07241e7-1762098381

note: this image tag is only available for internal use.

@dqhl76 dqhl76 force-pushed the performance-final-aggregate-31 branch from 7ceabe9 to 4587399 Compare November 4, 2025 06:21
@dqhl76 dqhl76 force-pushed the performance-final-aggregate-31 branch from 4587399 to 1ba503a Compare November 4, 2025 06:35
@dqhl76 dqhl76 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Nov 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

Docker Image for PR

  • tag: pr-18907-af8289a-1762247328

note: this image tag is only available for internal use.

@dqhl76 dqhl76 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Nov 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

Docker Image for PR

  • tag: pr-18907-39ec537-1762267174

note: this image tag is only available for internal use.

@dqhl76 dqhl76 marked this pull request as ready for review November 5, 2025 01:09
@dqhl76 dqhl76 requested a review from zhang2014 November 5, 2025 01:09
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@zhang2014 zhang2014 merged commit 5b874fa into databendlabs:main Nov 5, 2025
95 of 173 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-refactor this PR changes the code base without new features or bugfix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants