-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
Right now there are few hiccups using interleave_datasets. Interleaved dataset iterates until the smallest dataset completes it's iterator. In this way larger datasets may not complete full epoch of iteration.
It creates new problems in calculation of epoch since there are no way to track which dataset from interleave_datasets completes how many epoch.
Describe the solution you'd like
For interleave_datasets module,
- Add a boolean argument
--stop-iterininterleave_datasetsthat enables dataset to either iterate infinite amount of time or not. That means it should not returnStopIteratorexception in case--stop-iter=False. - Internal list variable
iter_cntthat explains how many times (in steps/epochs) each dataset iterates at a given point. - Add an argument
--max-iter(list type) that explain maximum amount of time each of the dataset can iterate. After complete--max-iterof one dataset, other dataset should continue sampling and when all the dataset finish their respective--max-iter, only then returnStopIterator
Note: I'm new to datasets api. May be these features are already there in the datasets.
Since multitask training is the latest trends, I believe this feature would make the datasets api more popular.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request