huggingface · lhoestq · Oct 24, 2025 · Oct 13, 2025 · Oct 13, 2025 · Oct 14, 2025
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -88,6 +88,8 @@
       title: Load document data
     - local: document_dataset
       title: Create a document dataset
+    - local: nifti_dataset
+      title: Create a medical imaging dataset
     title: "Vision"
   - sections:
     - local: nlp_load

diff --git a/docs/source/nifti_dataset.mdx b/docs/source/nifti_dataset.mdx
@@ -0,0 +1,130 @@
+# Create a NIfTI dataset
+
+This page shows how to create and share a dataset of medical images in NIfTI format (.nii / .nii.gz) using the `datasets` library.
+
+You can share a dataset with your team or with anyone in the community by creating a dataset repository on the Hugging Face Hub:
+
+```py
+from datasets import load_dataset
+
+dataset = load_dataset("<username>/my_nifti_dataset")
+```
+
+There are two common ways to create a NIfTI dataset:
+
+- Create a dataset from local NIfTI files in Python and upload it with `Dataset.push_to_hub`.
+- Use a folder-based convention (one file per example) and a small helper to convert it into a `Dataset`.
+
+> [!TIP]
+> You can control access to your dataset by requiring users to share their contact information first. Check out the [Gated datasets](https://huggingface.co/docs/hub/datasets-gated) guide for more information.
+
+## Local files
+
+If you already have a list of file paths to NIfTI files, the easiest workflow is to create a `Dataset` from that list and cast the column to the `Nifti` feature.
+
+```py
+from datasets import Dataset
+from datasets import Nifti
+
+# simple example: create a dataset from file paths
+files = ["/path/to/scan_001.nii.gz", "/path/to/scan_002.nii.gz"]
+ds = Dataset.from_dict({"nifti": files}).cast_column("nifti", Nifti())
+
+# access a decoded nibabel image (if decode=True)
+# ds[0]["nifti"] will be a nibabel.Nifti1Image object when decode=True
+# or a dict {'bytes': None, 'path': '...'} when decode=False
+```
+
+The `Nifti` feature supports a `decode` parameter. When `decode=True` (the default), it loads the NIfTI file into a `nibabel.nifti1.Nifti1Image` object. You can access the image data as a numpy array with `img.get_fdata()`. When `decode=False`, it returns a dict with the file path and bytes.
+
+```py
+from datasets import Dataset, Nifti
+
+ds = Dataset.from_dict({"nifti": ["/path/to/scan.nii.gz"]}).cast_column("nifti", Nifti(decode=True))
+img = ds[0]["nifti"]  # instance of: nibabel.nifti1.Nifti1Image
+arr = img.get_fdata()
+```
+
+After preparing the dataset you can push it to the Hub:
+
+```py
+ds.push_to_hub("<username>/my_nifti_dataset")
+```
+
+This will create a dataset repository containing your NIfTI dataset with a `data/` folder of parquet shards.
+
+## Folder conventions and metadata
+
+If you organize your dataset in folders you can create splits automatically (train/test/validation) by following a structure like:
+
+```
+dataset/train/scan_0001.nii
+dataset/train/scan_0002.nii
+dataset/validation/scan_1001.nii
+dataset/test/scan_2001.nii
+```
+
+If you have labels or other metadata, provide a `metadata.csv`, `metadata.jsonl`, or `metadata.parquet` in the folder so files can be linked to metadata rows. The metadata must contain a `file_name` (or `*_file_name`) field with the relative path to the NIfTI file next to the metadata file.
+
+Example `metadata.csv`:
+
+```csv
+file_name,patient_id,age,diagnosis
+scan_0001.nii.gz,P001,45,healthy
+scan_0002.nii.gz,P002,59,disease_x
+```
+
+The `Nifti` feature works with zipped datasets too — each zip can contain NIfTI files and a metadata file. This is useful when uploading large datasets as archives.
+This means your dataset structure could look like this (mixed compressed and uncompressed files):
+```
+dataset/train/scan_0001.nii.gz
+dataset/train/scan_0002.nii
+dataset/validation/scan_1001.nii.gz
+dataset/test/scan_2001.nii
+```
+
+## Converting to PyTorch tensors
+
+Use the [`~Dataset.set_transform`] function to apply the transformation on-the-fly to batches of the dataset:
+
+```py
+import torch 
+import nibabel
+import numpy as np
+
+def transform_to_pytorch(example):
+    example["nifti_torch"] = [torch.tensor(ex.get_fdata()) for ex in example["nifti"]]
+    return example
+
+ds.set_transform(transform_to_pytorch)
+
+```
+Accessing elements now (e.g. `ds[0]`) will yield torch tensors in the `"nifti_torch"` key.
+
+
+## Usage of NifTI1Image
+
+NifTI is a format to store the result of 3 (or even 4) dimensional brain scans. This includes 3 spatial dimensions (x,y,z)
+and optionally a time dimension (t). Furthermore, the given positions here are only relative to the scanner, therefore 
+the dimensions (4, 5, 6) are used to lift this to real world coordinates.
+
+You can visualize nifti files for instance leveraging `matplotlib` as follows:
+```python
+import matplotlib.pyplot as plt
+from datasets import load_dataset
+
+def show_slices(slices):
+   """ Function to display row of image slices """
+   fig, axes = plt.subplots(1, len(slices))
+   for i, slice in enumerate(slices):
+       axes[i].imshow(slice.T, cmap="gray", origin="lower")
+
+nifti_ds = load_dataset("<username>/my_nifti_dataset")
+for epi_img in nifti_ds:
+    nifti_img = epi_img["nifti"].get_fdata()
+    show_slices([nifti_img[:, :, 16], nifti_img[26, :, :], nifti_img[:, 30, :]])
+    plt.show()
+```
+
+For further reading we refer to the [nibabel documentation](https://nipy.org/nibabel/index.html) and especially [this nibabel tutorial](https://nipy.org/nibabel/coordinate_systems.html)
+---
diff --git a/docs/source/package_reference/loading_methods.mdx b/docs/source/package_reference/loading_methods.mdx
@@ -103,6 +103,12 @@ load_dataset("csv", data_dir="path/to/data/dir", sep="\t")
 
 [[autodoc]] datasets.packaged_modules.pdffolder.PdfFolder
 
+### Nifti
+
+[[autodoc]] datasets.packaged_modules.niftifolder.NiftiFolderConfig
+
+[[autodoc]] datasets.packaged_modules.niftifolder.NiftiFolder
+
 ### WebDataset
 
 [[autodoc]] datasets.packaged_modules.webdataset.WebDataset
diff --git a/docs/source/package_reference/main_classes.mdx b/docs/source/package_reference/main_classes.mdx
@@ -271,6 +271,10 @@ Dictionary with split names as keys ('train', 'test' for example), and `Iterable
 
 [[autodoc]] datasets.Pdf
 
+### Nifti
+
+[[autodoc]] datasets.Nifti
+
 ## Filesystems
 
 [[autodoc]] datasets.filesystems.is_remote_filesystem

diff --git a/setup.py b/setup.py
@@ -186,6 +186,7 @@
     "polars[timezone]>=0.20.0",
     "Pillow>=9.4.0",  # When PIL.Image.ExifTags was introduced
     "torchcodec>=0.7.0",  # minium version to get windows support
+    "nibabel>=5.3.1",
 ]
 
 NUMPY2_INCOMPATIBLE_LIBRARIES = [
@@ -207,6 +208,8 @@
 
 PDFS_REQUIRE = ["pdfplumber>=0.11.4"]
 
+NIBABEL_REQUIRE = ["nibabel>=5.3.2"]
+
 EXTRAS_REQUIRE = {
     "audio": AUDIO_REQUIRE,
     "vision": VISION_REQUIRE,
@@ -224,6 +227,7 @@
     "benchmarks": BENCHMARKS_REQUIRE,
     "docs": DOCS_REQUIRE,
     "pdfs": PDFS_REQUIRE,
+    "nibabel": NIBABEL_REQUIRE,
 }
 
 setup(

diff --git a/src/datasets/config.py b/src/datasets/config.py
@@ -139,6 +139,7 @@
 TORCHCODEC_AVAILABLE = importlib.util.find_spec("torchcodec") is not None
 TORCHVISION_AVAILABLE = importlib.util.find_spec("torchvision") is not None
 PDFPLUMBER_AVAILABLE = importlib.util.find_spec("pdfplumber") is not None
+NIBABEL_AVAILABLE = importlib.util.find_spec("nibabel") is not None
 
 # Optional compression tools
 RARFILE_AVAILABLE = importlib.util.find_spec("rarfile") is not None

diff --git a/src/datasets/features/__init__.py b/src/datasets/features/__init__.py
@@ -15,10 +15,12 @@
     "TranslationVariableLanguages",
     "Video",
     "Pdf",
+    "Nifti",
 ]
 from .audio import Audio
 from .features import Array2D, Array3D, Array4D, Array5D, ClassLabel, Features, LargeList, List, Sequence, Value
 from .image import Image
+from .nifti import Nifti
 from .pdf import Pdf
 from .translation import Translation, TranslationVariableLanguages
 from .video import Video
diff --git a/src/datasets/features/features.py b/src/datasets/features/features.py
@@ -42,6 +42,7 @@
 from ..utils.py_utils import asdict, first_non_null_value, zip_dict
 from .audio import Audio
 from .image import Image, encode_pil_image
+from .nifti import Nifti
 from .pdf import Pdf, encode_pdfplumber_pdf
 from .translation import Translation, TranslationVariableLanguages
 from .video import Video
@@ -1270,6 +1271,7 @@ def __repr__(self):
     Image,
     Video,
     Pdf,
+    Nifti,
 ]
 
 
@@ -1428,6 +1430,7 @@ def decode_nested_example(schema, obj, token_per_repo_id: Optional[dict[str, Uni
     Image.__name__: Image,
     Video.__name__: Video,
     Pdf.__name__: Pdf,
+    Nifti.__name__: Nifti,
 }
 
 
@@ -1761,6 +1764,9 @@ class Features(dict):
         - [`Pdf`] feature to store the absolute path to a PDF file, a `pdfplumber.pdf.PDF` object
           or a dictionary with the relative path to a PDF file ("path" key) and its bytes content ("bytes" key).
           This feature loads the PDF lazily with a PDF reader.
+        - [`Nifti`] feature to store the absolute path to a NIfTI neuroimaging file, a `nibabel.Nifti1Image` object
+          or a dictionary with the relative path to a NIfTI file ("path" key) and its bytes content ("bytes" key).
+          This feature loads the NIfTI file lazily with nibabel.
         - [`Translation`] or [`TranslationVariableLanguages`] feature specific to Machine Translation.
     """