Skip to content

Conversation

@peasee
Copy link

@peasee peasee commented Nov 26, 2025

  • Updates CollectLeftAccumulator struct to MinMaxLeftAccumulator
  • Creates the CollectLeftAccumulator and implements it for MinMaxLeftAccumulator
  • Updates HashJoinExec::try_new to set a default accumulator for MinMaxLeftAccumulator. This ensures the refactor is not a breaking change by defaulting the generic type.
  • Updates ColumnBounds to MinMaxColumnBounds
  • Creates the ColumnBounds trait which returns Arc<dyn PhysicalExpr> representing the expression for the bounds returned by the accumulator. Implements it for MinMaxColumnBounds

This refactor maintains existing min-max bounds functionality, while providing the ability for consumers to override HashJoinExec's with custom accumulators for their specific use case.

  • Makes some structs and methods from pruning public, to make it easier to build custom pruners.

@peasee peasee self-assigned this Nov 26, 2025
@peasee peasee added the enhancement New feature or request label Nov 26, 2025
@peasee
Copy link
Author

peasee commented Nov 26, 2025

Physical planner tests - failure is unrelated to changes, and due to feature flagging:

---- spill::tests::test_spill_compression stdout ----

thread 'spill::tests::test_spill_compression' panicked at /home/owner/.cargo/git/checkouts/arrow-rs-ddec0d7286d46bb5/a55ba62/arrow-ipc/src/writer.rs:1134:14:
StreamWriter is configured to not error on dictionary replacement: InvalidArgumentError("zstd IPC compression requires the zstd feature")


failures:
    spill::tests::test_spill_compression

test result: FAILED. 920 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 9.08s

@peasee peasee changed the title Peasee backport 2 refactor: Make CollectLeftAccumulator a trait for custom left-side accumulators Nov 26, 2025
@peasee peasee requested a review from Copilot November 26, 2025 01:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the hash join bounds accumulation system to support custom accumulator implementations through traits. The refactor maintains backward compatibility by providing a default MinMaxLeftAccumulator implementation while enabling consumers to implement custom accumulator logic.

Key changes:

  • Introduces CollectLeftAccumulator and ColumnBounds traits for extensibility
  • Renames CollectLeftAccumulator struct to MinMaxLeftAccumulator and ColumnBounds struct to MinMaxColumnBounds
  • Makes HashJoinExec generic over the accumulator type with MinMaxLeftAccumulator as the default
  • Exposes previously internal types and functions through public APIs to support custom implementations

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
datafusion/pruning/src/pruning_predicate.rs Exposes internal utilities (BoolVecBuilder, build_statistics_record_batch, StatisticsType) as public APIs
datafusion/pruning/src/lib.rs Re-exports newly public types from pruning_predicate module
datafusion/physical-plan/src/joins/mod.rs Re-exports new trait types and renamed accumulator from hash_join module
datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs Introduces ColumnBounds trait, renames ColumnBounds struct to MinMaxColumnBounds, updates types to use trait objects
datafusion/physical-plan/src/joins/hash_join/mod.rs Re-exports new public types (CollectLeftAccumulator, MinMaxLeftAccumulator, ColumnBounds)
datafusion/physical-plan/src/joins/hash_join/exec.rs Makes HashJoinExec generic over accumulator type, introduces CollectLeftAccumulator trait, renames struct to MinMaxLeftAccumulator, adds try_new_with_accumulator method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@peasee peasee requested review from a team and phillipleblanc November 26, 2025 01:03
@peasee peasee merged commit d4649aa into spiceai-50 Nov 26, 2025
@peasee peasee deleted the peasee-backport-2 branch November 26, 2025 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants