-
Notifications
You must be signed in to change notification settings - Fork 549
feat: add GCS auto-registration via ctor hooks #3923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ethan Urbanski <[email protected]>
cc6a80e to
2faa989
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3923 +/- ##
==========================================
+ Coverage 73.91% 73.93% +0.02%
==========================================
Files 152 153 +1
Lines 39465 39455 -10
Branches 39465 39455 -10
==========================================
+ Hits 29170 29172 +2
+ Misses 8971 8960 -11
+ Partials 1324 1323 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Lgtm, can you fix the markdown links check and add tests for each store |
Signed-off-by: Ethan Urbanski <[email protected]>
|
@ion-elgreco Fixed both items:
Lmk if you’d like any other schemes covered. |
# Description
As of today, the Rust meta-crate re-exports the storage crates but never
calls their register_handlers helpers. That means every Rust binary
still has to remember to call deltalake::gcp::register_handlers(None)
(and the equivalents for S3/Azure/etc.) before using cloud URIs, even
though the Python bindings auto-register. This PR brings the meta-crate
to parity so DeltaOps::try_from_uri("gs://…") works out of the box when
the gcs feature is enabled.
# Problem
- Users of the deltalake crate must manually register each storage
backend before working with gs://, s3://, abfss://, etc.
- Forgetting the call leads to
DeltaTableError::InvalidTableLocation("Unknown scheme: gs"), which
blocks workflows like DataFusion writers on GCS.
- Docs/examples didn’t make it obvious when manual registration was
still required.
# Solution
- Add feature-gated ctor hooks in crates/deltalake/src/lib.rs that call
register_handlers(None) for AWS, Azure, GCS, HDFS, LakeFS, and Unity as
soon as their features are enabled.
- Pull in the lightweight ctor = "0.2" dependency so the hooks run at
startup.
- Add a small regression test that exercises
DeltaTableBuilder::from_uri("gs://…") with the gcs feature to guard
against regressions.
- Update the GCS integration docs and changelog to explain that the
meta-crate now auto-registers backends while deltalake-core users still
need to call the storage crates explicitly.
# Changes
- crates/deltalake/src/lib.rs: new #[ctor::ctor] modules for s3, azure,
gcs, hdfs, lakefs, and unity.
- crates/deltalake/Cargo.toml: add ctor dependency.
- crates/deltalake/tests/gcs_auto_registration.rs: new smoke tests for
gs:// URI recognition when the gcs feature is enabled.
- docs/integrations/object-storage/gcs.md & CHANGELOG.md: document the
auto-registration behavior.
# Testing
- cargo check -p deltalake --all-features
- cargo test -p deltalake --features gcs
- cargo test --test gcs_auto_registration --features gcs
- cargo build --example pharma_pipeline_gcs --features gcs,datafusion
# Documentation
- docs/integrations/object-storage/gcs.md
- CHANGELOG.md
---------
Signed-off-by: Ethan Urbanski <[email protected]>
# Description
As of today, the Rust meta-crate re-exports the storage crates but never
calls their register_handlers helpers. That means every Rust binary
still has to remember to call deltalake::gcp::register_handlers(None)
(and the equivalents for S3/Azure/etc.) before using cloud URIs, even
though the Python bindings auto-register. This PR brings the meta-crate
to parity so DeltaOps::try_from_uri("gs://…") works out of the box when
the gcs feature is enabled.
# Problem
- Users of the deltalake crate must manually register each storage
backend before working with gs://, s3://, abfss://, etc.
- Forgetting the call leads to
DeltaTableError::InvalidTableLocation("Unknown scheme: gs"), which
blocks workflows like DataFusion writers on GCS.
- Docs/examples didn’t make it obvious when manual registration was
still required.
# Solution
- Add feature-gated ctor hooks in crates/deltalake/src/lib.rs that call
register_handlers(None) for AWS, Azure, GCS, HDFS, LakeFS, and Unity as
soon as their features are enabled.
- Pull in the lightweight ctor = "0.2" dependency so the hooks run at
startup.
- Add a small regression test that exercises
DeltaTableBuilder::from_uri("gs://…") with the gcs feature to guard
against regressions.
- Update the GCS integration docs and changelog to explain that the
meta-crate now auto-registers backends while deltalake-core users still
need to call the storage crates explicitly.
# Changes
- crates/deltalake/src/lib.rs: new #[ctor::ctor] modules for s3, azure,
gcs, hdfs, lakefs, and unity.
- crates/deltalake/Cargo.toml: add ctor dependency.
- crates/deltalake/tests/gcs_auto_registration.rs: new smoke tests for
gs:// URI recognition when the gcs feature is enabled.
- docs/integrations/object-storage/gcs.md & CHANGELOG.md: document the
auto-registration behavior.
# Testing
- cargo check -p deltalake --all-features
- cargo test -p deltalake --features gcs
- cargo test --test gcs_auto_registration --features gcs
- cargo build --example pharma_pipeline_gcs --features gcs,datafusion
# Documentation
- docs/integrations/object-storage/gcs.md
- CHANGELOG.md
---------
Signed-off-by: Ethan Urbanski <[email protected]>
Signed-off-by: Stephen Carman <[email protected]>
Description
As of today, the Rust meta-crate re-exports the storage crates but never calls their register_handlers helpers. That means every Rust binary still has to remember to call deltalake::gcp::register_handlers(None) (and the equivalents for S3/Azure/etc.) before using cloud URIs, even though the Python bindings auto-register. This PR brings the meta-crate to parity so DeltaOps::try_from_uri("gs://…") works out of the box when the gcs feature is enabled.
Problem
Users of the deltalake crate must manually register each storage backend before working with gs://, s3://, abfss://, etc.
Forgetting the call leads to DeltaTableError::InvalidTableLocation("Unknown scheme: gs"), which blocks workflows like DataFusion writers on GCS.
Docs/examples didn’t make it obvious when manual registration was still required.
Solution
Add feature-gated ctor hooks in crates/deltalake/src/lib.rs that call register_handlers(None) for AWS, Azure, GCS, HDFS, LakeFS, and Unity as soon as their features are enabled.
Pull in the lightweight ctor = "0.2" dependency so the hooks run at startup.
Add a small regression test that exercises DeltaTableBuilder::from_uri("gs://…") with the gcs feature to guard against regressions.
Update the GCS integration docs and changelog to explain that the meta-crate now auto-registers backends while deltalake-core users still need to call the storage crates explicitly.
Changes
crates/deltalake/src/lib.rs: new #[ctor::ctor] modules for s3, azure, gcs, hdfs, lakefs, and unity.
crates/deltalake/Cargo.toml: add ctor dependency.
crates/deltalake/tests/gcs_auto_registration.rs: new smoke tests for gs:// URI recognition when the gcs feature is enabled.
docs/integrations/object-storage/gcs.md & CHANGELOG.md: document the auto-registration behavior.
Testing
Documentation