Skip to content

CX-39170: DF51 / Arrow 57 upgrade#420

Draft
avantgardnerio wants to merge 10 commits into51_basefrom
brent/df51
Draft

CX-39170: DF51 / Arrow 57 upgrade#420
avantgardnerio wants to merge 10 commits into51_basefrom
brent/df51

Conversation

@avantgardnerio
Copy link
Copy Markdown

Summary

  • Point arrow/parquet deps at CX arrow-rs fork (rev 7d5c1c973, branch brent/arrow57)
  • CI cleanup: remove push trigger, delete unused workflows, trim feature checks
  • Update PR template

Test plan

  • CI green on this PR
  • Cherry-pick CX patches from v50.3
  • DQE builds against this branch

🤖 Generated with Claude Code

- Point arrow/parquet deps at CX fork (rev 7d5c1c973)
- Relax object_store version (>=0.12.4, <0.13)
- CI cleanup: remove push trigger, delete unused workflows, trim feature checks
- Update PR template

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Comment thread Cargo.toml Outdated
] }
apache-avro = { version = "0.20", default-features = false }
arrow = { version = "57.0.0", features = [
arrow = { git = "https://github.com/Coralogix/arrow-rs.git", rev = "7d5c1c973", features = [
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do this right? We override the arrow version in DQE anyway.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then CI doesn't test datafusion against our fork changes, which is the purpose of this PR. Does anything in coralogix use datafusion other than DQE?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of

Combines three fork-only commits from v49:
- Hook for doing distributed CollectLeft joins (#269/apache#12523)
- Add JoinContext with JoinLeftData to TaskContext in HashJoinExec (#300)
- Make HASH_JOIN_SEED public (fork-only)

Adds SharedJoinState/SharedJoinStateImpl trait for distributed probe
coordination, JoinContext for sharing build-side state via TaskContext,
contains_hash on JoinHashMapType, and converts process_unmatched_build_batch
to async for shared state polling.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
fsdvh and others added 2 commits April 30, 2026 12:24
* ignore writer shutdown error

* cargo check

---
[Cherry-pick summary: v46→v47]
Source commit: eaf5520 (Ignore writer shutdown error (#271))
Strategy: cherry-picked cleanly
Upstream PR: fork-only
Test coverage: insufficient (no dedicated unit test for this error path; behaviour is a runtime edge case)
Tests: cargo nextest run -p datafusion-datasource passed

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
---
[Cherry-pick summary: v46→v47]
Source commit: 4fff23e (Disable grouping set in CSE (fork only))
Strategy: cherry-picked cleanly
Upstream PR: fork-only
Test coverage: insufficient (no dedicated test for this early-return path; the change prevents a panic/incorrect optimization with GroupingSet expressions)
Tests: cargo nextest run -p datafusion-optimizer passed (579 tests)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
avantgardnerio and others added 3 commits April 30, 2026 13:13
apache#20063) v53

Extends v49 cherry-pick a296c12 (decode-only) with the encode-side fix:
seed DictionaryTracker via schema_to_bytes_with_dictionary_tracker before
encoded_batch, so IPC has dict IDs for nested dictionary arrays.

Adapted from upstream apache#20063 (which targets arrow 57's new encode API; we
retain the arrow-56 encoded_batch API and just add the seed call).
@github-actions github-actions Bot added the core label Apr 30, 2026
@avantgardnerio avantgardnerio force-pushed the brent/df51 branch 4 times, most recently from 5717991 to 532c45a Compare May 4, 2026 17:03
Also makes topk module public for downstream access to TopKDynamicFilters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants