-
Notifications
You must be signed in to change notification settings - Fork 3k
Add args description to DatasetInfo #2384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :) added some suggestions
src/datasets/info.py
Outdated
| homepage (str): A URL to the official homepage for the dataset. | ||
| license (str): The dataset's license. | ||
| features (Features, optional): The features used to specify the dataset's columns, types and conversion methods. | ||
| post_processed (PostProcessedInfo, optional): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| post_processed (PostProcessedInfo, optional): | |
| post_processed (PostProcessedInfo, optional): Info regarding the resources of a possible post-processing of a dataset. It can contain the information of an index for example. |
src/datasets/info.py
Outdated
| splits (dict, optional): The mapping between split name and metadata. | ||
| download_checksums (dict, optional): The mapping between the URL to download the dataset's checksums and corresponding metadata. | ||
| download_size (int, optional): The size of the compressed dataset in bytes. | ||
| post_processing_size (int, optional): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| post_processing_size (int, optional): | |
| post_processing_size (int, optional): Size of the dataset after the post-processing, if any. |
src/datasets/info.py
Outdated
| download_checksums (dict, optional): | ||
| download_size (int, optional): | ||
| supervised_keys (SupervisedKeysData, optional): Specifies the input feature and the label for supervised learning if applicable for the dataset. | ||
| builder_name (str, optional): The name of the :class:`GeneratorBasedBuilder` subclass used to create the dataset. Usually matched to the corresponding script name, but with CamelCase instead of snake_case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| builder_name (str, optional): The name of the :class:`GeneratorBasedBuilder` subclass used to create the dataset. Usually matched to the corresponding script name, but with CamelCase instead of snake_case. | |
| builder_name (str, optional): The name of the :class:`GeneratorBasedBuilder` subclass used to create the dataset. Usually matched to the corresponding script name. It is also the snake_case version of the dataset builder class name. |
src/datasets/info.py
Outdated
| splits (dict, optional): | ||
| download_checksums (dict, optional): | ||
| download_size (int, optional): | ||
| supervised_keys (SupervisedKeysData, optional): Specifies the input feature and the label for supervised learning if applicable for the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| supervised_keys (SupervisedKeysData, optional): Specifies the input feature and the label for supervised learning if applicable for the dataset. | |
| supervised_keys (SupervisedKeysData, optional): Specifies the input feature and the label for supervised learning if applicable for the dataset (legacy from TFDS). |
src/datasets/info.py
Outdated
| description (str): A description of the dataset. | ||
| citation (str): A BibTeX citation of the dataset. | ||
| homepage (str): A URL to the official homepage for the dataset. | ||
| license (str): The dataset's license. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| license (str): The dataset's license. | |
| license (str): The dataset's license. It can be the name of the license or a paragraph containing the terms of the license. |
src/datasets/info.py
Outdated
| size_in_bytes (int, optional): | ||
| task_templates (List[TaskTemplate], optional): | ||
| dataset_size (int, optional): The combined size of the Apache Arrow tables for all splits in bytes. | ||
| size_in_bytes (int, optional): The combined size of all files associated with the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| size_in_bytes (int, optional): The combined size of all files associated with the dataset. | |
| size_in_bytes (int, optional): The combined size of all files associated with the dataset (downloaded files + arrow files) |
src/datasets/info.py
Outdated
| version (str or Version, optional): The version of the dataset. | ||
| splits (dict, optional): The mapping between split name and metadata. | ||
| download_checksums (dict, optional): The mapping between the URL to download the dataset's checksums and corresponding metadata. | ||
| download_size (int, optional): The size of the compressed dataset in bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| download_size (int, optional): The size of the compressed dataset in bytes. | |
| download_size (int, optional): The size of the files to download to generate the dataset, in bytes. |
src/datasets/info.py
Outdated
| citation (str): A BibTeX citation of the dataset. | ||
| homepage (str): A URL to the official homepage for the dataset. | ||
| license (str): The dataset's license. | ||
| features (Features, optional): The features used to specify the dataset's columns, types and conversion methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| features (Features, optional): The features used to specify the dataset's columns, types and conversion methods. | |
| features (Features, optional): The features used to specify the dataset's columns types. |
|
Thanks for the suggestions! I've included them and made a few minor tweaks along the way |
|
Please merge master into this branch to fix the CI, I just fixed metadata validation tests. |
Add args description to DatasetInfo (huggingface#2384)
Closes #2354
I am not sure what
post_processedandpost_processing_sizecorrespond to, so have left them empty for now. I also took a guess at some of the other fields likedataset_sizevssize_in_bytes, so might have misunderstood their meaning.