Skip to content

Conversation

@The-Obstacle-Is-The-Way

Summary

Single-line fix: dataobj=nifti_image.get_fdata()dataobj=nifti_image.dataobj

Preserves nibabel's ArrayProxy for true lazy loading instead of eagerly loading entire NIfTI files into memory.

Why

Discovered while testing BIDS loader with OpenNeuro data - a corrupted file caused an EOFError with no indication of which file was bad. The eager get_fdata() call was loading data at decode time instead of access time.

Also opened on upstream

huggingface#7887

lhoestq and others added 30 commits October 9, 2025 09:55
* more parquet stream arg docs

* minor

* minor
less api calls when resolving data_files
* fix polars cast_column issue

* remove debug statements

* cast large_strings to string for image handling
* keep hffs cache in workers when streaming

* bonus: reorder hffs args to improve caching
* Update document_dataset.mdx

* Update document_dataset.mdx OCR
* Add custom suffix support to from_generator

* Renamed a new arg to fingerprint

* Changed name to config_id in builder

* Change version

* Added a test

* Version update

* Update version

* Update tests/test_arrow_dataset.py

* Rename config_id to fingerprint in generator.py

* Apply suggestions from code review

* Update src/datasets/io/generator.py

* Apply suggestions from code review

---------

Co-authored-by: Quentin Lhoest <[email protected]>
* Add nifti support

* update docs

* update nifti after testing locally and from remote hub

* update setup.py to add nibabel and update docs

* add nifti_dataset

* fix nifti dataset documentation

* add nibabel to test dependency

* Add section for creating a medical imaging dataset

---------

Co-authored-by: Quentin Lhoest <[email protected]>
* WIP: shuffle working, interleave_ds not yet

* remove debug statements

* add test

* update test

* use recursive overwriting of generator seeds

* update test description

* remove debugging strings

* return instances of baseexiterable instead of modifying inplace

* add test to make sure multiple iterations over data are deterministic
* fix ci compressionfs

* again

* style
* update signature for _batch_setitems

* arguments passthrough
…huggingface#7831)

* Fix argument passing in stratified shuffle split

NumPy 2.0 changed the behavior of the `copy=False` parameter to be stricter. When `train_test_split` converted Arrow arrays to NumPy format for stratification, it triggered this error for non-contiguous arrays. Using `np.asarray()` allows copying when necessary, which is the recommended migration path per NumPy 2.0 documentation.

* make style

---------

Co-authored-by: Quentin Lhoest <[email protected]>
* add 3.14

* update ci

* go home tf

* torchcodec

* numba

* fix ci

* no lz4 in python 3.14

* fix tests

* again

* again

* again
* WIP: add audio, tests failing

* WIP: add mono argument, tests failing

* change from mono to num_channels in documentation, audio tests passing

* update docs and move test for audio

* update audio

* update docstring for audio

* Apply suggestions from code review

---------

Co-authored-by: Quentin Lhoest <[email protected]>
lhoestq and others added 12 commits November 4, 2025 11:44
…gingface#7848)

remove mode parameter in docstring of pdf and video feature
* WIP: allow uploading of nifti

* remove debug statements and fix test

* remove debug statements

* remove debug statements
Change arxiv to hg papers
* fix some broken links

* some more

---------

Co-authored-by: Quentin Lhoest <[email protected]>
* WIP: nifti vis working, now improve

* seems to work fine, tests not there yet

* remove uncommented lines
* try latest papaya

* try niivue

* update repr_html for nifti to work better with niivue

* remove papaya files

* remove papaya from setup.py

* use ipyniivue

* update nifti feature to use ipyniivue

* add 3d crosshair for orientation

* remove docstring
- Change dataobj initialization from get_fdata() to .dataobj
- Preserves nibabel's ArrayProxy for lazy loading
- Prevents OOM errors on large 4D fMRI files
- Defers I/O errors to usage time for better error handling

Resolves #1
@The-Obstacle-Is-The-Way The-Obstacle-Is-The-Way deleted the feat/nifti-lazy-loading branch November 29, 2025 14:56
@The-Obstacle-Is-The-Way The-Obstacle-Is-The-Way restored the feat/nifti-lazy-loading branch November 29, 2025 14:58
@The-Obstacle-Is-The-Way
Copy link
Author

Apologies for the accidental close - was cleaning up branches on my fork and it auto-closed linked PRs.

This lazy loading fix should integrate cleanly with your NiiVue work that just merged upstream.

@The-Obstacle-Is-The-Way
Copy link
Author

Closing - reviewed directly on upstream PR huggingface#7887. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.