A Parser for creating Virtual Zarr stores from TIFF files using VirtualiZarr 2.0 and async-tiff.
First, some thoughts on why we should virtualize GeoTIFFs and/or COGS:
- Provide faster access to non-cloud-optimized GeoTIFFS that contain some form of internal tiling without any data duplication see notebook #1.
- Provide fully async I/O for both GeoTIFFs and COGs using Zarr-Python
- Allow loading a stack of GeoTIFFS/COGS into a data cube while minimizing the number of GET requests relative to using stackstac/odc-stac, thereby decreasing cost and increasing performance
- Provide users access to a lazily loaded DataTree providing both the data and the overviews, allowing scientists to use the overviews not only for tile-based visualization but also quickly iterating on analytics
- Include etags in the virtualized datasets to support reproducibility
- A motivation that's less clear to me, but maybe possible, is using the virtualization layer to access COGs with disparate CRSs as a single dataset (zarr-developers/geozarr-spec#53)
The library can be installed from PyPI:
python -m pip install virtual-tiffYou can use Virtual TIFF to load data directly:
import obstore
from virtualizarr.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF
import xarray as xr
# Configuration
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"
# Setup and open dataset
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})
parser = VirtualTIFF(ifd=0)
manifest_store = parser(url=file_url, registry=registry)
ds = xr.open_zarr(manifest_store, zarr_format=3, consolidated=False)
ds.load()or create a virtual dataset:
import obstore
from virtualizarr import open_virtual_dataset
from virtualizarr.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF
# Configuration
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"
# Setup and open dataset
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})
ds = open_virtual_dataset(
url=file_url,
registry=registry,
parser=VirtualTIFF(ifd=0)
)- Clone the repository:
git clone https://github.com/virtual-zarr/virtual-tiff.git. - Pull baseline image data from dvc remote
pixi run -e test download-test-imagesWARNING: This will download ~1.4GB of TIFFs for testing to your machine. - Run the test suite using
pixi run -e test run-testsWARNING: Some tests will fail due to incomplete status of the implementation. - Start a shell if needed in the development environment using
pixi run -e test zsh.
virtual-tiff is distributed under the terms of the MIT license.