cloudnativegeo · wildintellect · Jun 10, 2025 · May 20, 2025 · May 21, 2025 · May 21, 2025
diff --git a/flatgeobuf/environment.yml b/flatgeobuf/environment.yml
@@ -3,8 +3,8 @@ channels:
   - conda-forge
 dependencies:
   - python=3.11
-  - geopandas==0.13.2
-  - pyogrio==0.6.0
+  - geopandas==1.0.1
+  - pyogrio==0.11.0
   - ipykernel
   - jupyterlab
   - pyarrow
diff --git a/flatgeobuf/flatgeobuf.ipynb b/flatgeobuf/flatgeobuf.ipynb
@@ -15,15 +15,13 @@
    "source": [
     "The primary way to interact with FlatGeobuf in Python is via bindings to GDAL, as there is no pure-Python implementation of FlatGeobuf.\n",
     "\n",
-    "There are two different Python libraries for interacting between Python and GDAL's vector support: `fiona` and `pyogrio`. Both of these are integrated into [`geopandas.read_file`](https://geopandas.org/en/stable/docs/reference/api/geopandas.read_file.html) via the `engine` keyword, but `pyogrio` is much faster. Set `engine=\"pyogrio\"` when using `read_file` or [`GeoDataFrame.to_file`](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_file.html) to speed up reading and writing significantly. We also suggest passing `use_arrow=True` when reading for a slight extra speedup (this is not supported when writing).\n",
+    "There are two different Python libraries for interacting between Python and GDAL's vector support: `fiona` and `pyogrio`. Both of these are integrated into [`geopandas.read_file`](https://geopandas.org/en/stable/docs/reference/api/geopandas.read_file.html) via the `engine` keyword, but `pyogrio` is much faster. From `geopandas` version 1.0.0, the default is `pyogrio`. For prior `geopandas` versions, set `engine=\"pyogrio\"` when using `read_file` or [`GeoDataFrame.to_file`](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_file.html) to speed up reading and writing significantly. We also suggest passing `use_arrow=True` when reading for a slight extra speedup (this is not supported when writing).\n",
     "\n",
     "::: {.callout-note}\n",
     "\n",
-    "[`fiona`](https://github.com/Toblerity/Fiona) is the default engine for `geopandas.read_file`. It provides full-featured bindings to GDAL but does not implement _vectorized_ operations. [Vectorization](https://wesmckinney.com/book/numpy-basics#ndarray_binops) refers to operating on whole arrays of data at once rather than operating on individual values using a Python for loop. `fiona`'s non-vectorized approach means that each row of the source file is read individually with Python, and a Python for loop. In contrast, [`pyogrio`](https://github.com/geopandas/pyogrio)'s vectorized implementation reads all rows in C before passing the data to Python, allowing it to achieve vast speedups (up to 40x) over `fiona`.\n",
+    "[`fiona`](https://github.com/Toblerity/Fiona) was the default engine for `geopandas.read_file` prior to version 1.0.0. It provides full-featured bindings to GDAL but does not implement _vectorized_ operations. [Vectorization](https://wesmckinney.com/book/numpy-basics#ndarray_binops) refers to operating on whole arrays of data at once rather than operating on individual values using a Python for loop. `fiona`'s non-vectorized approach means that each row of the source file is read individually with Python, and a Python for loop. In contrast, [`pyogrio`](https://github.com/geopandas/pyogrio)'s vectorized implementation reads all rows in C before passing the data to Python, allowing it to achieve vast speedups (up to 40x) over `fiona`.\n",
     "\n",
-    "You can opt in to using `pyogrio` with `geopandas.read_file` by passing `engine=\"pyogrio\"`.\n",
-    "\n",
-    "Additionally, if you're using GDAL version 3.6 or higher (usually the case when using pyogrio), you can pass `use_arrow=True` to `geopandas.read_file` to use `pyogrio`'s support for [GDAL's RFC 86](https://gdal.org/development/rfc/rfc86_column_oriented_api.html), which speeds up data reading even more.\n",
+    "If you're using GDAL version 3.6 or higher (usually the case when using pyogrio), you can pass `use_arrow=True` to `geopandas.read_file` to use `pyogrio`'s support for [GDAL's RFC 86](https://gdal.org/development/rfc/rfc86_column_oriented_api.html), which speeds up data reading even more.\n",
     "\n",
     ":::"
    ]
@@ -60,7 +58,13 @@
     "Alternatively, you can install the versions of `pyogrio` and `geopandas` used in this notebook with pip:\n",
     "\n",
     "```bash\n",
-    "pip install pyogrio==0.6.0 geopandas==0.13.2\n",
+    "pip install pyogrio==0.11.0 geopandas==1.0.1\n",
+    "```\n",
+    "\n",
+    "Additionally, to reproduce the comparisons between `fiona` and `pyogrio` below, you would need to install `fiona` separately as well, but if you prefer to only run the (faster) `pyogrio` examples, you can skip the `fiona` install and examples altogether.\n",
+    "\n",
+    "```bash\n",
+    "pip install fiona==1.10.1\n",
     "```"
    ]
   },
@@ -115,7 +119,7 @@
    "source": [
     "In each of the cases below, we use `geopandas.read_file` to read the file into a `GeoDataFrame`.\n",
     "\n",
-    "First we'll show that reading this file with `engine=\"fiona\"` (the default) is slower. Taking an extra 500 milliseconds might not seem like a lot, but this file contains only 3,000 rows, so this difference gets magnified with larger files."
+    "First we'll show that reading this file with `engine=\"fiona\"` is slower. Taking an extra 500 milliseconds might not seem like a lot, but this file contains only 3,000 rows, so this difference gets magnified with larger files."
    ]
   },
   {
@@ -127,8 +131,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 565 ms, sys: 16.9 ms, total: 582 ms\n",
-      "Wall time: 685 ms\n"
+      "CPU times: user 1.85 s, sys: 34.7 ms, total: 1.88 s\n",
+      "Wall time: 2.35 s\n"
      ]
     }
    ],
@@ -140,7 +144,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Passing `engine=\"pyogrio\"` speeds up loading by 18x here!"
+    "Using the (since version 1.0.0 default) `engine=\"pyogrio\"` speeds up loading by 22x here!"
    ]
   },
   {
@@ -152,8 +156,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 25.3 ms, sys: 6.84 ms, total: 32.1 ms\n",
-      "Wall time: 31.3 ms\n"
+      "CPU times: user 69.6 ms, sys: 15.9 ms, total: 85.5 ms\n",
+      "Wall time: 89.6 ms\n"
      ]
     }
    ],
@@ -165,7 +169,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Using `use_arrow=True` often makes loading slightly faster again! We're now 21x faster than using fiona."
+    "Using `use_arrow=True` often makes loading slightly faster again! We're now 24x faster than using fiona."
    ]
   },
   {
@@ -177,8 +181,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 19.7 ms, sys: 10.1 ms, total: 29.7 ms\n",
-      "Wall time: 29.1 ms\n"
+      "CPU times: user 48 ms, sys: 30.7 ms, total: 78.8 ms\n",
+      "Wall time: 118 ms\n"
      ]
     }
    ],
@@ -206,13 +210,13 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 362 ms, sys: 44.4 ms, total: 407 ms\n",
-      "Wall time: 418 ms\n"
+      "CPU times: user 875 ms, sys: 53.2 ms, total: 928 ms\n",
+      "Wall time: 944 ms\n"
      ]
     }
    ],
    "source": [
-    "%time gdf.to_file(f\"{tmpdir.name}/out_fiona.fgb\")"
+    "%time gdf.to_file(f\"{tmpdir.name}/out_fiona.fgb\", engine=\"fiona\")"
    ]
   },
   {
@@ -224,8 +228,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 60.8 ms, sys: 23.4 ms, total: 84.2 ms\n",
-      "Wall time: 83.5 ms\n"
+      "CPU times: user 62.9 ms, sys: 13.9 ms, total: 76.8 ms\n",
+      "Wall time: 76.8 ms\n"
      ]
     }
    ],
@@ -250,7 +254,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "url = \"https://data.source.coop/cholmes/eurocrops/unprojected/flatgeobuf/FR_2018_EC21.fgb\""
+    "url = \"https://s3.us-west-2.amazonaws.com/us-west-2.opendata.source.coop/cholmes/eurocrops/unprojected/flatgeobuf/FR_2018_EC21.fgb\""
    ]
   },
   {
@@ -268,19 +272,28 @@
     {
      "data": {
       "text/plain": [
-       "{'crs': 'EPSG:4326',\n",
+       "{'layer_name': 'FR_2018_EC21',\n",
+       " 'crs': 'EPSG:4326',\n",
        " 'encoding': 'UTF-8',\n",
        " 'fields': array(['ID_PARCEL', 'SURF_PARC', 'CODE_CULTU', 'CODE_GROUP', 'CULTURE_D1',\n",
        "        'CULTURE_D2', 'EC_org_n', 'EC_trans_n', 'EC_hcat_n', 'EC_hcat_c'],\n",
        "       dtype=object),\n",
        " 'dtypes': array(['object', 'float64', 'object', 'object', 'object', 'object',\n",
        "        'object', 'object', 'object', 'object'], dtype=object),\n",
+       " 'fid_column': '',\n",
+       " 'geometry_name': '',\n",
        " 'geometry_type': 'MultiPolygon',\n",
        " 'features': 9517874,\n",
+       " 'total_bounds': (-6.047022416643922,\n",
+       "  -3.916364769838749,\n",
+       "  68.89050422648864,\n",
+       "  51.075100624023094),\n",
        " 'driver': 'FlatGeobuf',\n",
-       " 'capabilities': {'random_read': 1,\n",
-       "  'fast_set_next_by_index': 0,\n",
-       "  'fast_spatial_filter': 1},\n",
+       " 'capabilities': {'random_read': True,\n",
+       "  'fast_set_next_by_index': False,\n",
+       "  'fast_spatial_filter': True,\n",
+       "  'fast_feature_count': True,\n",
+       "  'fast_total_bounds': True},\n",
        " 'layer_metadata': None,\n",
        " 'dataset_metadata': None}"
       ]
@@ -328,13 +341,13 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 144 ms, sys: 21.4 ms, total: 165 ms\n",
-      "Wall time: 6 s\n"
+      "CPU times: user 195 ms, sys: 20.6 ms, total: 216 ms\n",
+      "Wall time: 2.51 s\n"
      ]
     }
    ],
    "source": [
-    "%time crops_gdf = gpd.read_file(url, bbox=bounds)"
+    "%time crops_gdf = gpd.read_file(url, bbox=bounds, engine=\"fiona\")"
    ]
   },
   {
@@ -353,8 +366,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 26.9 ms, sys: 2.98 ms, total: 29.9 ms\n",
-      "Wall time: 490 ms\n"
+      "CPU times: user 26.4 ms, sys: 20 ms, total: 46.4 ms\n",
+      "Wall time: 1.91 s\n"
      ]
     }
    ],
@@ -565,8 +578,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 25 ms, sys: 2.47 ms, total: 27.4 ms\n",
-      "Wall time: 706 ms\n"
+      "CPU times: user 12.9 ms, sys: 0 ns, total: 12.9 ms\n",
+      "Wall time: 155 ms\n"
      ]
     }
    ],
@@ -671,13 +684,6 @@
    "source": [
     "crops_gdf.head()"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
@@ -696,7 +702,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.13.1"
   },
   "orig_nbformat": 4
  },

diff --git a/images/data-structures.png b/images/data-structures.png
diff --git a/images/virtual-zarr.png b/images/virtual-zarr.png