-
Notifications
You must be signed in to change notification settings - Fork 3k
Update docs around audio and vision #4440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
mariosasko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the changes!
We plan to address this with end-to-end examples (for each modality) more focused on preprocessing than the ones in the Transformers docs. |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks !
Let me know what you think, especially if we should include some code samples for training a model in the audio/vision sections. I left this out since we already showed it in the NLP section.
I'd add the conversion to pytorch DataLoader and TF Dataset as well for audio and vision, if it's not too much information. But I think the training loop itself needs to be part of an end-to-end example as @mariosasko suggested.
As soon as there is a pytorch DataLoader or a TF Dataset, it means we're ready for training, I don't think it's necessary to show the complexity of one particular task/model and how it's trained in the quickstart.
Maybe the quickstart can redirect to its corresponding end-to-end example for those who would like to see a complete example with more context ? Something like "Want to see more in a concrete example ? See how a dataset can be prepared for speech recognition with transformers, etc."
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the links you provided ! Later I think we can provide end-to-end examples in the datasets doc itself using a super simpler training loop instead of redirecting to the transformers doc that uses the Trainer
My last comments:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Color gradients look all good now, thanks !
CI failures are unrelated to this PR btw - you can ignore them.
Feel free to merge if it's all good for you @stevhliu
As part of the strategy to center the docs around the different modalities, this PR updates the quickstart to include audio and vision examples. This improves the developer experience by making audio and vision content more discoverable, enabling users working in these modalities to also quickly get started without digging too deeply into the docs.
Other changes include:
tf.data.Datasetbecause it was throwing an error. Theto_tensor()bit was redundant and removing it fixed the error (please double-check me here!).torchtext is different from thetftext.Let me know what you think, especially if we should include some code samples for training a model in the audio/vision sections. I left this out since we already showed it in the NLP section. I want to keep the focus on using Datasets to load and process a dataset, and not so much the training part. Maybe we can add links to the Transformers docs instead?