Skip to content

Conversation

@ethan-tyler
Copy link
Contributor

@ethan-tyler ethan-tyler commented Nov 7, 2025

Description

As of today, the Rust meta-crate re-exports the storage crates but never calls their register_handlers helpers. That means every Rust binary still has to remember to call deltalake::gcp::register_handlers(None) (and the equivalents for S3/Azure/etc.) before using cloud URIs, even though the Python bindings auto-register. This PR brings the meta-crate to parity so DeltaOps::try_from_uri("gs://…") works out of the box when the gcs feature is enabled.

Problem

  • Users of the deltalake crate must manually register each storage backend before working with gs://, s3://, abfss://, etc.

  • Forgetting the call leads to DeltaTableError::InvalidTableLocation("Unknown scheme: gs"), which blocks workflows like DataFusion writers on GCS.

  • Docs/examples didn’t make it obvious when manual registration was still required.

Solution

  • Add feature-gated ctor hooks in crates/deltalake/src/lib.rs that call register_handlers(None) for AWS, Azure, GCS, HDFS, LakeFS, and Unity as soon as their features are enabled.

  • Pull in the lightweight ctor = "0.2" dependency so the hooks run at startup.

  • Add a small regression test that exercises DeltaTableBuilder::from_uri("gs://…") with the gcs feature to guard against regressions.

  • Update the GCS integration docs and changelog to explain that the meta-crate now auto-registers backends while deltalake-core users still need to call the storage crates explicitly.

Changes

  • crates/deltalake/src/lib.rs: new #[ctor::ctor] modules for s3, azure, gcs, hdfs, lakefs, and unity.

  • crates/deltalake/Cargo.toml: add ctor dependency.

  • crates/deltalake/tests/gcs_auto_registration.rs: new smoke tests for gs:// URI recognition when the gcs feature is enabled.

  • docs/integrations/object-storage/gcs.md & CHANGELOG.md: document the auto-registration behavior.

Testing

  • cargo check -p deltalake --all-features
  • cargo test -p deltalake --features gcs
  • cargo test --test gcs_auto_registration --features gcs
  • cargo build --example pharma_pipeline_gcs --features gcs,datafusion

Documentation

  • docs/integrations/object-storage/gcs.md
  • CHANGELOG.md

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Nov 7, 2025
@ethan-tyler ethan-tyler marked this pull request as draft November 7, 2025 03:06
@ethan-tyler ethan-tyler force-pushed the gcs-auto-registration branch from cc6a80e to 2faa989 Compare November 7, 2025 03:12
@ethan-tyler ethan-tyler marked this pull request as ready for review November 7, 2025 03:26
@ethan-tyler ethan-tyler marked this pull request as draft November 7, 2025 03:42
@ethan-tyler ethan-tyler marked this pull request as ready for review November 7, 2025 05:12
@codecov
Copy link

codecov bot commented Nov 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.93%. Comparing base (16621c4) to head (c1f704f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3923      +/-   ##
==========================================
+ Coverage   73.91%   73.93%   +0.02%     
==========================================
  Files         152      153       +1     
  Lines       39465    39455      -10     
  Branches    39465    39455      -10     
==========================================
+ Hits        29170    29172       +2     
+ Misses       8971     8960      -11     
+ Partials     1324     1323       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ion-elgreco
Copy link
Collaborator

Lgtm, can you fix the markdown links check and add tests for each store

@ethan-tyler
Copy link
Contributor Author

@ion-elgreco Fixed both items:

  • Added crates/deltalake/tests/storage_auto_registration.rs, which exercises each scheme

  • Updated docs/integrations/object-storage/gcs.md to link to the GitHub changelog directly

Lmk if you’d like any other schemes covered.

@ion-elgreco ion-elgreco merged commit d7dea29 into delta-io:main Nov 8, 2025
29 checks passed
roeap pushed a commit to roeap/delta-rs that referenced this pull request Nov 18, 2025
# Description

As of today, the Rust meta-crate re-exports the storage crates but never
calls their register_handlers helpers. That means every Rust binary
still has to remember to call deltalake::gcp::register_handlers(None)
(and the equivalents for S3/Azure/etc.) before using cloud URIs, even
though the Python bindings auto-register. This PR brings the meta-crate
to parity so DeltaOps::try_from_uri("gs://…") works out of the box when
the gcs feature is enabled.

# Problem

- Users of the deltalake crate must manually register each storage
backend before working with gs://, s3://, abfss://, etc.

- Forgetting the call leads to
DeltaTableError::InvalidTableLocation("Unknown scheme: gs"), which
blocks workflows like DataFusion writers on GCS.

- Docs/examples didn’t make it obvious when manual registration was
still required.

# Solution

- Add feature-gated ctor hooks in crates/deltalake/src/lib.rs that call
register_handlers(None) for AWS, Azure, GCS, HDFS, LakeFS, and Unity as
soon as their features are enabled.

- Pull in the lightweight ctor = "0.2" dependency so the hooks run at
startup.

- Add a small regression test that exercises
DeltaTableBuilder::from_uri("gs://…") with the gcs feature to guard
against regressions.

- Update the GCS integration docs and changelog to explain that the
meta-crate now auto-registers backends while deltalake-core users still
need to call the storage crates explicitly.

# Changes

- crates/deltalake/src/lib.rs: new #[ctor::ctor] modules for s3, azure,
gcs, hdfs, lakefs, and unity.

- crates/deltalake/Cargo.toml: add ctor dependency.

- crates/deltalake/tests/gcs_auto_registration.rs: new smoke tests for
gs:// URI recognition when the gcs feature is enabled.

- docs/integrations/object-storage/gcs.md & CHANGELOG.md: document the
auto-registration behavior.

# Testing

- cargo check -p deltalake --all-features
- cargo test -p deltalake --features gcs
- cargo test --test gcs_auto_registration --features gcs
- cargo build --example pharma_pipeline_gcs --features gcs,datafusion

# Documentation

- docs/integrations/object-storage/gcs.md
- CHANGELOG.md

---------

Signed-off-by: Ethan Urbanski <[email protected]>
hntd187 pushed a commit to hntd187/delta-rs that referenced this pull request Nov 24, 2025
# Description

As of today, the Rust meta-crate re-exports the storage crates but never
calls their register_handlers helpers. That means every Rust binary
still has to remember to call deltalake::gcp::register_handlers(None)
(and the equivalents for S3/Azure/etc.) before using cloud URIs, even
though the Python bindings auto-register. This PR brings the meta-crate
to parity so DeltaOps::try_from_uri("gs://…") works out of the box when
the gcs feature is enabled.

# Problem

- Users of the deltalake crate must manually register each storage
backend before working with gs://, s3://, abfss://, etc.

- Forgetting the call leads to
DeltaTableError::InvalidTableLocation("Unknown scheme: gs"), which
blocks workflows like DataFusion writers on GCS.

- Docs/examples didn’t make it obvious when manual registration was
still required.

# Solution

- Add feature-gated ctor hooks in crates/deltalake/src/lib.rs that call
register_handlers(None) for AWS, Azure, GCS, HDFS, LakeFS, and Unity as
soon as their features are enabled.

- Pull in the lightweight ctor = "0.2" dependency so the hooks run at
startup.

- Add a small regression test that exercises
DeltaTableBuilder::from_uri("gs://…") with the gcs feature to guard
against regressions.

- Update the GCS integration docs and changelog to explain that the
meta-crate now auto-registers backends while deltalake-core users still
need to call the storage crates explicitly.

# Changes

- crates/deltalake/src/lib.rs: new #[ctor::ctor] modules for s3, azure,
gcs, hdfs, lakefs, and unity.

- crates/deltalake/Cargo.toml: add ctor dependency.

- crates/deltalake/tests/gcs_auto_registration.rs: new smoke tests for
gs:// URI recognition when the gcs feature is enabled.

- docs/integrations/object-storage/gcs.md & CHANGELOG.md: document the
auto-registration behavior.

# Testing

- cargo check -p deltalake --all-features
- cargo test -p deltalake --features gcs
- cargo test --test gcs_auto_registration --features gcs
- cargo build --example pharma_pipeline_gcs --features gcs,datafusion

# Documentation

- docs/integrations/object-storage/gcs.md
- CHANGELOG.md

---------

Signed-off-by: Ethan Urbanski <[email protected]>
Signed-off-by: Stephen Carman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/rust Issues for the Rust crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants