Create new sections for audio and vision in guides #4519

stevhliu · 2022-06-16T21:38:24Z

This PR creates separate sections in the guides for audio, vision, text, and general usage so it is easier for users to find loading, processing, or sharing guides specific to the dataset type they're working with. It'll also allow us to scale the docs to additional dataset types - like time series, tabular, etc. - while keeping our docs information architecture.

Some other changes include:

~~Experimented with decorating text with some CSS to highlight guides specific to each modality. Hopefully, it'll be easier for users to find and realize that these different docs exist!~~ Will experiment with this in a different PR.
Added deprecation warning for Metrics and redirect to Evaluate.
Updated set_format section to recommend using the new to_tf_dataset function if you need to convert to a TensorFlow dataset.
Reorganized toctree to nest general usage, audio, vision, and text sections under the how-to guides.
A quick review and edit to the Load and Process docs for clarity.

HuggingFaceDocBuilderDev · 2022-06-16T21:45:22Z

The documentation is not available anymore as the PR was closed or merged.

lhoestq

Nice thank you !

We can also add the "Load text data" page, currently it feels weird to not have it ;)

In particular users can do load_dataset("text", data_dir=...) or load_dataset("text", data_files=...) and they can use grep patterns to select several files.

The "text" loaders has a few parameters. The main parameter is sample_by. By default sample_by="line", so one example = one line from the text files, but you can change it to "paragraph" or "document"

A bit unrelated, but I feel like those pages can also be grouped together in a "Dataset repository" section:

Share
Create a dataset loading script
Create a dataset card
Structure your repository

This way we can decouple "General usage" (how to use datasets) from "Dataset repositories" (how to create a repository)

docs/source/loading.mdx

docs/source/audio_load.mdx

stevhliu · 2022-06-24T22:02:33Z

Ready for review!

The toctree is a bit longer now with the sections. I think if we keep the audio/vision/text/dataset repository sections collapsed by default, and keep the general usage expanded, it may look a little cleaner and not as overwhelming. Let me know what you think! 😄

mariosasko

Cool!

Just one nit.

docs/source/process.mdx

lhoestq

Thank you ! I think it's ok to leave the new sections uncollapsed though, as you want

docs/source/audio_process.mdx

stevhliu added 5 commits June 16, 2022 11:38

📝 first draft

390730d

📝 create modality specific pages

7d22e2c

📝 create NLP section

d3ae512

📝 update set_format section

e5863a0

📝 add use tf/torch to toctree

493213b

stevhliu added the documentation Improvements or additions to documentation label Jun 16, 2022

stevhliu requested review from lhoestq and mariosasko June 16, 2022 21:38

lhoestq reviewed Jun 23, 2022

View reviewed changes

docs/source/loading.mdx Outdated Show resolved Hide resolved

docs/source/audio_load.mdx Outdated Show resolved Hide resolved

docs/source/audio_load.mdx Outdated Show resolved Hide resolved

stevhliu added 3 commits June 23, 2022 14:01

🖍 remove visual cues

b5f290c

🖍 apply quentin review

375d217

🖍 minor edits

aa497e9

stevhliu marked this pull request as ready for review June 24, 2022 21:58

mariosasko reviewed Jun 28, 2022

View reviewed changes

docs/source/process.mdx Outdated Show resolved Hide resolved

stevhliu and others added 5 commits June 28, 2022 10:46

🖍 apply mario review

8b9c7ea

🖍 collapse some sections

54f9b3b

🖍 try collapse again

e41728f

Update _toctree.yml

b203023

🖍 collapse all nested sections except for general usage

cdc963a

lhoestq approved these changes Jul 6, 2022

View reviewed changes

docs/source/audio_process.mdx Show resolved Hide resolved

stevhliu added 3 commits July 6, 2022 11:41

🖍 add link to install dependencies for audio/vision sections

768d6c7

✨ add text decoration for different guides

104eac9

🖍 remove text decorations for now

df6cb4f

stevhliu merged commit 28946e2 into huggingface:main Jul 7, 2022

stevhliu deleted the reorg-structure branch July 7, 2022 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create new sections for audio and vision in guides #4519

Create new sections for audio and vision in guides #4519

Uh oh!

stevhliu commented Jun 16, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2022 •

edited

Loading

Uh oh!

lhoestq left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevhliu commented Jun 24, 2022

Uh oh!

mariosasko left a comment

Uh oh!

Uh oh!

lhoestq left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Create new sections for audio and vision in guides #4519

Create new sections for audio and vision in guides #4519

Uh oh!

Conversation

stevhliu commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevhliu commented Jun 24, 2022

Uh oh!

mariosasko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevhliu commented Jun 16, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 16, 2022 •

edited

Loading