-
Notifications
You must be signed in to change notification settings - Fork 3k
Create new sections for audio and vision in guides #4519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thank you !
We can also add the "Load text data" page, currently it feels weird to not have it ;)
In particular users can do load_dataset("text", data_dir=...) or load_dataset("text", data_files=...) and they can use grep patterns to select several files.
The "text" loaders has a few parameters. The main parameter is sample_by. By default sample_by="line", so one example = one line from the text files, but you can change it to "paragraph" or "document"
A bit unrelated, but I feel like those pages can also be grouped together in a "Dataset repository" section:
- Share
- Create a dataset loading script
- Create a dataset card
- Structure your repository
This way we can decouple "General usage" (how to use datasets) from "Dataset repositories" (how to create a repository)
|
Ready for review! The |
mariosasko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Just one nit.
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ! I think it's ok to leave the new sections uncollapsed though, as you want
This PR creates separate sections in the guides for audio, vision, text, and general usage so it is easier for users to find loading, processing, or sharing guides specific to the dataset type they're working with. It'll also allow us to scale the docs to additional dataset types - like time series, tabular, etc. - while keeping our docs information architecture.
Some other changes include:
Experimented with decorating text with some CSS to highlight guides specific to each modality. Hopefully, it'll be easier for users to find and realize that these different docs exist!Will experiment with this in a different PR.set_formatsection to recommend using the newto_tf_datasetfunction if you need to convert to a TensorFlow dataset.toctreeto nest general usage, audio, vision, and text sections under the how-to guides.