-
Notifications
You must be signed in to change notification settings - Fork 84
Open
Description
Describe the bug
When running validation/test/prediction with DeepSpeed ZeRO-3, non-training dataset loaders can miss the expected .datasets attribute and sampler behavior used by the training dataloader path. This appears to break or destabilize ZeRO-3 inference/eval flows in the current OpenFold3-preview codebase.
To reproduce
- Run validation or prediction with a non-training dataset configured under ZeRO-3.
- Use existing DEEPSPEED zero-3 config and DataModule setup path for non-training modes.
- Observe failures/instability around sampler/dataloader behavior before/within evaluator execution.
Expected behavior
Validation/test/prediction dataloaders should mirror the training dataset wrapper behavior and preserve sampler semantics so ZeRO-3 can execute without unexpected data-module issues.
Notes from TCE fork
Our fork fixed this by:
- wrapping validation/test/prediction datasets in
SamplerDatasetduringDataModule.setup(). - preserving eval sampler length (
epoch_len=len(dataset)) for consistent sampler semantics.
If useful, I can provide the exact patch/commit references from the fork.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels