You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create new sections for audio and vision in guides (#4519)
* 📝 first draft
* 📝 create modality specific pages
* 📝 create NLP section
* 📝 update set_format section
* 📝 add use tf/torch to toctree
* 🖍 remove visual cues
* 🖍 apply quentin review
* 🖍 minor edits
* 🖍 apply mario review
* 🖍 collapse some sections
* 🖍 try collapse again
* Update _toctree.yml
* 🖍 collapse all nested sections except for general usage
* 🖍 add link to install dependencies for audio/vision sections
* ✨ add text decoration for different guides
* 🖍 remove text decorations for now
Co-authored-by: Mishig Davaadorj <[email protected]>
Audio datasets are loaded from the `audio` column, which contains three important fields:
4
+
5
+
*`array`: the decoded audio data represented as a 1-dimensional array.
6
+
*`path`: the path to the downloaded audio file.
7
+
*`sampling_rate`: the sampling rate of the audio data.
8
+
9
+
<Tip>
10
+
11
+
To work with audio datasets, you need to have the `audio` dependency installed. Check out the [installation](./installation#audio) guide to learn how to install it.
12
+
13
+
</Tip>
14
+
15
+
When you load an audio dataset and call the `audio` column, the [`Audio`] feature automatically decodes and resamples the audio file:
Index into an audio dataset using the row index first and then the `audio` column - `dataset[0]["audio"]` - to avoid decoding and resampling all the audio files in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
31
+
32
+
</Tip>
33
+
34
+
For a guide on how to load any type of dataset, take a look at the [general loading guide](./loading).
35
+
36
+
## Local files
37
+
38
+
The `path` is useful for loading your own dataset. Use the [`~Dataset.cast_column`] function to take a column of audio file paths, and decode it into `array`'s with the [`Audio`] feature:
If you only want to load the underlying path to the audio dataset without decoding the audio file into an `array`, set `decode=False` in the [`Audio`] feature:
🤗 Datasets supports an [`Audio`] feature, enabling users to load and process raw audio files for training. This guide will show you how to:
3
+
This guide shows specific methods for processing audio datasets. Learn how to:
4
4
5
-
- Load your own custom audio dataset.
6
-
- Resample audio files.
7
-
- Use [`Dataset.map`] with audio files.
5
+
- Resample the sampling rate.
6
+
- Use [`~Dataset.map`] with audio datasets.
8
7
9
-
## Installation
8
+
For a guide on how to process any type of dataset, take a look at the [general process guide](./process).
10
9
11
-
The [`Audio`] feature should be installed as an extra dependency in 🤗 Datasets. Install the [`Audio`] feature (and its dependencies) with pip:
10
+
## Cast
12
11
13
-
```bash
14
-
pip install datasets[audio]
15
-
```
16
-
17
-
<Tipwarning={true}>
18
-
19
-
On Linux, non-Python dependency on `libsndfile` package must be installed manually, using your distribution package manager, for example:
20
-
21
-
```bash
22
-
sudo apt-get install libsndfile1
23
-
```
24
-
25
-
</Tip>
26
-
27
-
To support loading audio datasets containing MP3 files, users should additionally install [torchaudio](https://pytorch.org/audio/stable/index.html), so that audio data is handled with high performance.
28
-
29
-
```bash
30
-
pip install torchaudio
31
-
```
32
-
33
-
<Tipwarning={true}>
34
-
35
-
torchaudio's `sox_io`[backend](https://pytorch.org/audio/stable/backend.html#) supports decoding `mp3` files. Unfortunately, the `sox_io` backend is only available on Linux/macOS, and is not supported by Windows.
36
-
37
-
</Tip>
38
-
39
-
Then you can load an audio dataset the same way you would load a text dataset. For example, load the [Common Voice](https://huggingface.co/datasets/common_voice) dataset with the Turkish configuration:
When you access an audio file, it is automatically decoded and resampled. Generally, you should query an audio file like: `common_voice[0]["audio"]`. If you query an audio file with `common_voice["audio"][0]` instead, **all** the audio files in your dataset will be decoded and resampled. This process can take a long time if you have a large dataset.
61
-
62
-
`path` or `file` is an absolute path to an audio file.
The `path` is useful if you want to load your own audio dataset. In this case, provide a column of audio file paths to [`Dataset.cast_column`]:
12
+
The [`~Dataset.cast_column`] function is used to cast a column to another feature to be decoded. When you use this function with the [`Audio`] feature, you can resample the sampling rate:
Some models expect the audio data to have a certain sampling rate due to how the model was pretrained. For example, the [XLSR-Wav2Vec2](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) model expects the input to have a sampling rate of 16kHz, but an audio file from the Common Voice dataset has a sampling rate of 48kHz. You can use [`Dataset.cast_column`] to downsample the sampling rate to 16kHz:
@@ -97,31 +35,29 @@ The next time you load the audio file, the [`Audio`] feature will load and resam
97
35
98
36
## Map
99
37
100
-
Just like text datasets, you can apply a preprocessing function over an entire dataset with [`Dataset.map`], which is useful for preprocessing all of your audio data at once. Start with a [speech recognition model](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) of your choice, and load a `processor` object that contains:
38
+
The [`~Dataset.map`]function helps preprocess your entire dataset at once. Depending on the type of model you're working with, you'll need to either load a [feature extractor](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoFeatureExtractor) or a [processor](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoProcessor).
101
39
102
-
1. A feature extractor to convert the speech signal to the model's input format. Every speech recognition model on the 🤗 [Hub](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) contains a predefined feature extractor that can be easily loaded with `AutoFeatureExtractor.from_pretrained(...)`.
40
+
- For pretrained speech recognition models, load a feature extractor and tokenizer and combine them in a `processor`:
103
41
104
-
2. A tokenizer to convert the model's output format to text. Fine-tuned speech recognition models, such as [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h), contain a predefined tokenizer that can be easily loaded with `AutoTokenizer.from_pretrained(...)`.
For pretrained speech recognition models, such as [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53), a tokenizer needs to be created from the target text as explained [here](https://huggingface.co/blog/fine-tune-wav2vec2-english). The following example demonstrates how to load a feature extractor, tokenizer and processor for a pretrained speech recognition model:
Our how-to guides will show you how to complete a specific task. These guides are intended to help you apply your knowledge of 🤗 Datasets to real-world problems you may encounter. Want to flatten a column or load a dataset from a local file? We got you covered! You should already be familiar and comfortable with the 🤗 Datasets basics, and if you aren't, we recommend reading our [tutorial](./tutorial) first.
3
+
The how-to guides offer a more comprehensive overview of all the tools 🤗 Datasets offers and how to use them. This will help you tackle messier real-world datasets where you may need to manipulate the dataset structure or content to get it ready for training.
4
4
5
-
The how-to guides will cover eight key areas of 🤗 Datasets:
5
+
The guides assume you are familiar and comfortable with the 🤗 Datasets basics. We recommend newer users check out our [tutorials](tutorial) first.
6
6
7
-
* How to load a dataset from other data sources.
7
+
<Tip>
8
8
9
-
* How to process a dataset.
9
+
Interested in learning more? Take a look at [Chapter 5](https://huggingface.co/course/chapter5/1?fw=pt) of the Hugging Face course!
10
10
11
-
* How to use a dataset with your favorite ML/DL framework.
11
+
</Tip>
12
12
13
-
* How to stream large datasets.
13
+
The guides are organized into five sections:
14
14
15
-
* How to upload and share a dataset.
15
+
-**General usage**: Functions for general dataset loading and processing. The functions shown in this section are applicable across all dataset modalities.
16
+
-**Audio**: How to load, process, and share audio datasets.
17
+
-**Vision**: How to load, process, and share image datasets.
18
+
-**Text**: How to load, process, and share text datasets.
19
+
-**Dataset repository**: How to share and upload a dataset to the [Hub](https://huggingface.co/datasets).
16
20
17
-
* How to create a dataset loading script.
18
-
19
-
* How to create a dataset card.
20
-
21
-
* How to compute metrics.
22
-
23
-
* How to manage the cache.
24
-
25
-
You can also find guides on how to process massive datasets with Beam, how to integrate with cloud storage providers, and how to add an index to search your dataset.
21
+
If you have any questions about 🤗 Datasets, feel free to join and ask the community on our [forum](https://discuss.huggingface.co/c/datasets/10).
Copy file name to clipboardExpand all lines: docs/source/how_to_metrics.mdx
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,11 @@
1
1
# Metrics
2
2
3
+
<Tipwarning={true}>
4
+
5
+
Metrics will soon be deprecated in 🤗 Datasets. To learn more about how to use metrics, take a look at our newest library 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index)! In addition to metrics, we've also added more tools for evaluating models and datasets.
6
+
7
+
</Tip>
8
+
3
9
Metrics are important for evaluating a model's predictions. In the tutorial, you learned how to compute a metric over an entire evaluation set. You have also seen how to load a metric.
0 commit comments