Add some iteration method on a dataset column (specific for inference)

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is.

Currently, `dataset["audio"]` will load EVERY element in the dataset in RAM, which can be quite big for an audio dataset.
Having an iterator (or sequence) type of object, would make inference with `transformers` 's `pipeline` easier to use and not so memory hungry.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

For a non breaking change:

```python
for audio in dataset.iterate("audio"):
    # {"array": np.array(...), "sampling_rate":...}
```

For a  breaking change solution (not necessary), changing the type of `dataset["audio"]` to a sequence type so that

```python
pipe = pipeline(model="...")
for out in pipe(dataset["audio"]):
    # {"text":....}
```
could work

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

```python
def iterate(dataset, key):
    for item in dataset:
        yield dataset[key]

for out in pipeline(iterate(dataset, "audio")):
    # {"array": ...}
```

This works but requires the helper function which feels slightly clunky.

**Additional context**
Add any other context about the feature request here.

The context is actually to showcase better integration between  `pipeline` and `datasets` in the Quicktour demo: https://github.com/huggingface/transformers/pull/16723/files

@lhoestq 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add some iteration method on a dataset column (specific for inference) #4180

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add some iteration method on a dataset column (specific for inference) #4180

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions