huggingface
diff --git a/‎.github/hub/update_hub_repositories.py‎
Lines changed: 2 additions & 7 deletions b/‎.github/hub/update_hub_repositories.py‎
Lines changed: 2 additions & 7 deletions
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 10 additions & 0 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎datasets/aeslc/README.md‎
Lines changed: 29 additions & 15 deletions b/‎datasets/aeslc/README.md‎
Lines changed: 29 additions & 15 deletions
diff --git a/‎datasets/amazon_us_reviews/README.md‎
Lines changed: 47 additions & 6 deletions b/‎datasets/amazon_us_reviews/README.md‎
Lines changed: 47 additions & 6 deletions
diff --git a/‎datasets/art/README.md‎
Lines changed: 30 additions & 19 deletions b/‎datasets/art/README.md‎
Lines changed: 30 additions & 19 deletions
diff --git a/‎datasets/discofuse/README.md‎
Lines changed: 21 additions & 8 deletions b/‎datasets/discofuse/README.md‎
Lines changed: 21 additions & 8 deletions
@@ -194,13 +194,8 @@ def __call__(self, dataset_name: str) -> bool:
     commit_args += (f"-m Commit from {DATASETS_LIB_COMMIT_URL.format(hexsha=current_commit.hexsha)}",)
     commit_args += (f"--author={author_name} <{author_email}>",)
 
-    for _tag in datasets_lib_repo.tags:
-        # Add a new tag if this is a `datasets` release
-        if _tag.commit == current_commit and re.match(r"^[0-9]+\.[0-9]+\.[0-9]+$", _tag.name):
-            new_tag = _tag
-            break
-    else:
-        new_tag = None
+    # we don't add a new tag as we used to when there's a release
+    new_tag = None
 
     changed_files_since_last_commit = [
         path
 
@@ -72,3 +72,13 @@ jobs:
       - name: Test with pytest
         run: |
           python -m pytest -rfExX -m ${{ matrix.test }} -n 2 --dist loadfile -sv ./tests/
+      - name: Install dependencies to test torchaudio>=0.12 on Ubuntu
+        if: ${{ matrix.os == 'ubuntu-latest' }}
+        run: |
+          pip uninstall -y torchaudio torch
+          pip install "torchaudio>=0.12"
+          sudo apt-get -y install ffmpeg
+      - name: Test torchaudio>=0.12 on Ubuntu
+        if: ${{ matrix.os == 'ubuntu-latest' }}
+        run: |
+          python -m pytest -rfExX -m torchaudio_latest -n 2 --dist loadfile -sv ./tests/features/test_audio.py
@@ -1,15 +1,27 @@
 ---
+annotations_creators:
+- crowdsourced
 language:
 - en
-paperswithcode_id: aeslc
-pretty_name: AESLC
+language_creators:
+- found
+license:
+- unknown
+multilinguality:
+- monolingual
+pretty_name: "AESLC: Annotated Enron Subject Line Corpus"
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
 task_categories:
 - summarization
 task_ids:
 - summarization-other-email-headline-generation
 - summarization-other-conversations-summarization
 - summarization-other-multi-document-summarization
 - summarization-other-aspect-based-summarization
+paperswithcode_id: aeslc
 ---
 
 # Dataset Card for "aeslc"
@@ -40,9 +52,9 @@ task_ids:
 
 ## Dataset Description
 
-- **Homepage:** [https://github.com/ryanzhumich/AESLC](https://github.com/ryanzhumich/AESLC)
-- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Homepage:**
+- **Repository:** https://github.com/ryanzhumich/AESLC
+- **Paper:** [This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation](https://arxiv.org/abs/1906.03497)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 11.10 MB
 - **Size of the generated dataset:** 14.26 MB
@@ -153,19 +165,21 @@ The data fields are the same among all splits.
 ### Citation Information
 
 ```
-
-@misc{zhang2019email,
-    title={This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation},
-    author={Rui Zhang and Joel Tetreault},
-    year={2019},
-    eprint={1906.03497},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
+@inproceedings{zhang-tetreault-2019-email,
+    title = "This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation",
+    author = "Zhang, Rui  and
+      Tetreault, Joel",
+    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
+    month = jul,
+    year = "2019",
+    address = "Florence, Italy",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/P19-1043",
+    doi = "10.18653/v1/P19-1043",
+    pages = "446--456",
 }
-
 ```
 
-
 ### Contributions
 
 Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf), [@lewtun](https://github.com/lewtun) for adding this dataset.
@@ -1,8 +1,32 @@
 ---
+annotations_creators:
+- no-annotation
 language:
 - en
+language_creators:
+- found
+license:
+- other
+multilinguality:
+- monolingual
+pretty_name: Amazon US Reviews
+size_categories:
+- 100M<n<1B
+source_datasets:
+- original
+task_categories:
+- summarization
+- text-generation
+- fill-mask
+- text-classification
+task_ids:
+- text-scoring
+- language-modeling
+- masked-language-modeling
+- sentiment-classification
+- sentiment-scoring
+- topic-classification
 paperswithcode_id: null
-pretty_name: AmazonUsReviews
 ---
 
 # Dataset Card for "amazon_us_reviews"
@@ -407,14 +431,31 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+https://s3.amazonaws.com/amazon-reviews-pds/LICENSE.txt
+
+By accessing the Amazon Customer Reviews Library ("Reviews Library"), you agree that the
+Reviews Library is an Amazon Service subject to the [Amazon.com Conditions of Use](https://www.amazon.com/gp/help/customer/display.html/ref=footer_cou?ie=UTF8&nodeId=508088)
+and you agree to be bound by them, with the following additional conditions:
+
+In addition to the license rights granted under the Conditions of Use,
+Amazon or its content providers grant you a limited, non-exclusive, non-transferable,
+non-sublicensable, revocable license to access and use the Reviews Library
+for purposes of academic research.
+You may not resell, republish, or make any commercial use of the Reviews Library
+or its contents, including use of the Reviews Library for commercial research,
+such as research related to a funding or consultancy contract, internship, or
+other relationship in which the results are provided for a fee or delivered
+to a for-profit organization. You may not (a) link or associate content
+in the Reviews Library with any personal information (including Amazon customer accounts),
+or (b) attempt to determine the identity of the author of any content in the
+Reviews Library.
+If you violate any of the foregoing conditions, your license to access and use the
+Reviews Library will automatically terminate without prejudice to any of the
+other rights or remedies Amazon may have.
 
 ### Citation Information
 
-```
-
-```
-
+No citation information.
 
 ### Contributions
 
 
@@ -1,8 +1,26 @@
 ---
+annotations_creators:
+- crowdsourced
 language:
 - en
-paperswithcode_id: art-dataset
+language_creators:
+- found
+license:
+- unknown
+multilinguality:
+- monolingual
 pretty_name: Abductive Reasoning in narrative Text
+size_categories:
+- 100K<n<1M
+source_datasets:
+- original
+task_categories:
+- multiple-choice
+- text-classification
+task_ids:
+- natural-language-inference
+- text-classification-other-abductive-natural-language-inference
+paperswithcode_id: art-dataset
 ---
 
 # Dataset Card for "art"
@@ -34,16 +52,18 @@ pretty_name: Abductive Reasoning in narrative Text
 ## Dataset Description
 
 - **Homepage:** [https://leaderboard.allenai.org/anli/submissions/get-started](https://leaderboard.allenai.org/anli/submissions/get-started)
-- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Repository:** https://github.com/allenai/abductive-commonsense-reasoning
+- **Paper:** [Abductive Commonsense Reasoning](https://arxiv.org/abs/1908.05739)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 4.88 MB
 - **Size of the generated dataset:** 32.77 MB
 - **Total amount of disk used:** 37.65 MB
 
 ### Dataset Summary
 
-the Abductive Natural Language Inference Dataset from AI2
+ART consists of over 20k commonsense narrative contexts and 200k explanations.
+
+The Abductive Natural Language Inference Dataset from AI2.
 
 ### Supported Tasks and Leaderboards
 
@@ -55,8 +75,6 @@ the Abductive Natural Language Inference Dataset from AI2
 
 ## Dataset Structure
 
-`
-
 ### Data Instances
 
 #### anli
@@ -150,22 +168,15 @@ The data fields are the same among all splits.
 ### Citation Information
 
 ```
-@InProceedings{anli,
-  author = "Chandra, Bhagavatula
-    and Ronan, Le Bras
-    and Chaitanya, Malaviya
-    and Keisuke, Sakaguchi
-    and Ari, Holtzman
-    and Hannah, Rashkin
-    and Doug, Downey
-    and Scott, Wen-tau Yih
-    and Yejin, Choi",
-  title = "Abductive Commonsense Reasoning",
-  year = "2020",
+@inproceedings{Bhagavatula2020Abductive,
+  title={Abductive Commonsense Reasoning},
+  author={Chandra Bhagavatula and Ronan Le Bras and Chaitanya Malaviya and Keisuke Sakaguchi and Ari Holtzman and Hannah Rashkin and Doug Downey and Wen-tau Yih and Yejin Choi},
+  booktitle={International Conference on Learning Representations},
+  year={2020},
+  url={https://openreview.net/forum?id=Byg1v1HKDB}
 }
 ```
 
-
 ### Contributions
 
 Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf), [@mariamabarham](https://github.com/mariamabarham), [@lewtun](https://github.com/lewtun), [@lhoestq](https://github.com/lhoestq) for adding this dataset.
@@ -1,8 +1,24 @@
 ---
+annotations_creators:
+- machine-generated
 language:
 - en
-paperswithcode_id: discofuse
+language_creators:
+- found
+license:
+- cc-by-sa-3.0
+multilinguality:
+- monolingual
 pretty_name: DiscoFuse
+size_categories:
+- 10M<n<100M
+source_datasets:
+- original
+task_categories:
+- text2text-generation
+task_ids:
+- text2text-generation-other-sentence-fusion
+paperswithcode_id: discofuse
 ---
 
 # Dataset Card for "discofuse"
@@ -33,17 +49,16 @@ pretty_name: DiscoFuse
 
 ## Dataset Description
 
-- **Homepage:** [https://github.com/google-research-datasets/discofuse](https://github.com/google-research-datasets/discofuse)
-- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Repository:** https://github.com/google-research-datasets/discofuse
+- **Paper:** [DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion](https://arxiv.org/abs/1902.10526)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 5764.06 MB
 - **Size of the generated dataset:** 20547.64 MB
 - **Total amount of disk used:** 26311.70 MB
 
 ### Dataset Summary
 
- DISCOFUSE is a large scale dataset for discourse-based sentence fusion.
+DiscoFuse is a large scale dataset for discourse-based sentence fusion.
 
 ### Supported Tasks and Leaderboards
 
@@ -180,7 +195,7 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+The data is licensed under [Creative Commons Attribution-ShareAlike 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license.
 
 ### Citation Information
 
@@ -192,10 +207,8 @@ The data fields are the same among all splits.
   note = {arXiv preprint arXiv:1902.10526},
   year = {2019}
 }
-
 ```
 
-
 ### Contributions
 
 Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@mariamabarham](https://github.com/mariamabarham), [@lewtun](https://github.com/lewtun) for adding this dataset.