-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
As reported by @partiallytyped (and by @Muennighoff):
There is an issue with the indices of a filtered dataset.
Steps to reproduce the bug
ds = Dataset.from_dict({"num": [0, 1, 2, 3]})
ds = ds.filter(lambda num: num % 2 == 0, input_columns="num", batch_size=2)
assert all(item["num"] % 2 == 0 for item in ds)Expected results
The indices of the filtered dataset should correspond to the examples with "language" equals to "english".
Actual results
Indices to items with other languages are included in the filtered dataset indices
Preliminar investigation
It seems a bug introduced by:
PartiallyUntyped
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working