huggingface
diff --git a/‎datasets/aeslc/README.md‎
Lines changed: 29 additions & 15 deletions b/‎datasets/aeslc/README.md‎
Lines changed: 29 additions & 15 deletions
diff --git a/‎datasets/empathetic_dialogues/README.md‎
Lines changed: 37 additions & 12 deletions b/‎datasets/empathetic_dialogues/README.md‎
Lines changed: 37 additions & 12 deletions
diff --git a/‎datasets/event2Mind/README.md‎
Lines changed: 35 additions & 10 deletions b/‎datasets/event2Mind/README.md‎
Lines changed: 35 additions & 10 deletions
diff --git a/‎datasets/gap/README.md‎
Lines changed: 33 additions & 21 deletions b/‎datasets/gap/README.md‎
Lines changed: 33 additions & 21 deletions
diff --git a/‎datasets/iwslt2017/README.md‎
Lines changed: 48 additions & 19 deletions b/‎datasets/iwslt2017/README.md‎
Lines changed: 48 additions & 19 deletions
diff --git a/‎datasets/iwslt2017/dataset_infos.json‎
Lines changed: 1 addition & 1 deletion b/‎datasets/iwslt2017/dataset_infos.json‎
Lines changed: 1 addition & 1 deletion
@@ -1,15 +1,27 @@
 ---
+annotations_creators:
+- crowdsourced
 language:
 - en
-paperswithcode_id: aeslc
-pretty_name: AESLC
+language_creators:
+- found
+license:
+- unknown
+multilinguality:
+- monolingual
+pretty_name: "AESLC: Annotated Enron Subject Line Corpus"
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
 task_categories:
 - summarization
 task_ids:
 - summarization-other-email-headline-generation
 - summarization-other-conversations-summarization
 - summarization-other-multi-document-summarization
 - summarization-other-aspect-based-summarization
+paperswithcode_id: aeslc
 ---
 
 # Dataset Card for "aeslc"
@@ -40,9 +52,9 @@ task_ids:
 
 ## Dataset Description
 
-- **Homepage:** [https://github.com/ryanzhumich/AESLC](https://github.com/ryanzhumich/AESLC)
-- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Homepage:**
+- **Repository:** https://github.com/ryanzhumich/AESLC
+- **Paper:** [This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation](https://arxiv.org/abs/1906.03497)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 11.10 MB
 - **Size of the generated dataset:** 14.26 MB
@@ -153,19 +165,21 @@ The data fields are the same among all splits.
 ### Citation Information
 
 ```
-
-@misc{zhang2019email,
-    title={This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation},
-    author={Rui Zhang and Joel Tetreault},
-    year={2019},
-    eprint={1906.03497},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
+@inproceedings{zhang-tetreault-2019-email,
+    title = "This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation",
+    author = "Zhang, Rui  and
+      Tetreault, Joel",
+    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
+    month = jul,
+    year = "2019",
+    address = "Florence, Italy",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/P19-1043",
+    doi = "10.18653/v1/P19-1043",
+    pages = "446--456",
 }
-
 ```
 
-
 ### Contributions
 
 Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf), [@lewtun](https://github.com/lewtun) for adding this dataset.
@@ -1,7 +1,25 @@
 ---
-pretty_name: EmpatheticDialogues
+annotations_creators:
+- crowdsourced
 language:
 - en
+language_creators:
+- crowdsourced
+license:
+- cc-by-nc-4.0
+multilinguality:
+- monolingual
+pretty_name: EmpatheticDialogues
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
+task_categories:
+- conversational
+- question-answering
+task_ids:
+- dialogue-generation
+- open-domain-qa
 paperswithcode_id: empatheticdialogues
 ---
 
@@ -34,8 +52,8 @@ paperswithcode_id: empatheticdialogues
 ## Dataset Description
 
 - **Homepage:** [https://github.com/facebookresearch/EmpatheticDialogues](https://github.com/facebookresearch/EmpatheticDialogues)
-- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Repository:** https://github.com/facebookresearch/EmpatheticDialogues
+- **Paper:** [Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset](https://arxiv.org/abs/1811.00207)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 26.72 MB
 - **Size of the generated dataset:** 23.97 MB
@@ -149,21 +167,28 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+Creative Commons [Attribution-NonCommercial 4.0 International](https://creativecommons.org/licenses/by-nc/4.0/).
 
 ### Citation Information
 
 ```
-@inproceedings{rashkin2019towards,
-  title = {Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset},
-  author = {Hannah Rashkin and Eric Michael Smith and Margaret Li and Y-Lan Boureau},
-  booktitle = {ACL},
-  year = {2019},
+@inproceedings{rashkin-etal-2019-towards,
+    title = "Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset",
+    author = "Rashkin, Hannah  and
+      Smith, Eric Michael  and
+      Li, Margaret  and
+      Boureau, Y-Lan",
+    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
+    month = jul,
+    year = "2019",
+    address = "Florence, Italy",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/P19-1534",
+    doi = "10.18653/v1/P19-1534",
+    pages = "5370--5381",
 }
-
 ```
 
-
 ### Contributions
 
-Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun) for adding this dataset.
+Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun) for adding this dataset.
@@ -1,8 +1,24 @@
 ---
+annotations_creators:
+- crowdsourced
 language:
 - en
-paperswithcode_id: event2mind
+language_creators:
+- found
+license:
+- unknown
+multilinguality:
+- monolingual
 pretty_name: Event2Mind
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
+task_categories:
+- text2text-generation
+task_ids:
+- text2text-generation-other-common-sense-inference
+paperswithcode_id: event2mind
 ---
 
 # Dataset Card for "event2Mind"
@@ -34,9 +50,9 @@ pretty_name: Event2Mind
 ## Dataset Description
 
 - **Homepage:** [https://uwnlp.github.io/event2mind/](https://uwnlp.github.io/event2mind/)
-- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Repository:** https://github.com/uwnlp/event2mind
+- **Paper:** [Event2Mind: Commonsense Inference on Events, Intents, and Reactions](https://arxiv.org/abs/1805.06939)
+- **Point of Contact:** [Hannah Rashkin](mailto:[email protected]), [Maarten Sap](mailto:[email protected])
 - **Size of downloaded dataset files:** 1.24 MB
 - **Size of the generated dataset:** 6.90 MB
 - **Total amount of disk used:** 8.14 MB
@@ -152,15 +168,24 @@ The data fields are the same among all splits.
 ### Citation Information
 
 ```
-@inproceedings{event2Mind,
-    title={Event2Mind: Commonsense Inference on Events, Intents, and Reactions},
-    author={Hannah Rashkin and Maarten Sap and Emily Allaway and Noah A. Smith† Yejin Choi},
-    year={2018}
+@inproceedings{rashkin-etal-2018-event2mind,
+    title = "{E}vent2{M}ind: Commonsense Inference on Events, Intents, and Reactions",
+    author = "Rashkin, Hannah  and
+      Sap, Maarten  and
+      Allaway, Emily  and
+      Smith, Noah A.  and
+      Choi, Yejin",
+    booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
+    month = jul,
+    year = "2018",
+    address = "Melbourne, Australia",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/P18-1043",
+    doi = "10.18653/v1/P18-1043",
+    pages = "463--473",
 }
-
 ```
 
-
 ### Contributions
 
 Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun) for adding this dataset.
@@ -1,8 +1,24 @@
 ---
+annotations_creators:
+- crowdsourced
 language:
 - en
-paperswithcode_id: gap
+language_creators:
+- found
+license:
+- unknown
+multilinguality:
+- monolingual
 pretty_name: GAP Benchmark Suite
+size_categories:
+- 1K<n<10K
+source_datasets:
+- original
+task_categories:
+- token-classification
+task_ids:
+- coreference-resolution
+paperswithcode_id: gap
 ---
 
 # Dataset Card for "gap"
@@ -35,8 +51,8 @@ pretty_name: GAP Benchmark Suite
 
 - **Homepage:** [https://github.com/google-research-datasets/gap-coreference](https://github.com/google-research-datasets/gap-coreference)
 - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Paper:** [Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns](https://arxiv.org/abs/1810.05201)
+- **Point of Contact:** [[email protected]](mailto:gap-coreference@google.com)
 - **Size of downloaded dataset files:** 2.29 MB
 - **Size of the generated dataset:** 2.32 MB
 - **Total amount of disk used:** 4.61 MB
@@ -163,27 +179,23 @@ The data fields are the same among all splits.
 ### Citation Information
 
 ```
-
-@article{DBLP:journals/corr/abs-1810-05201,
-  author    = {Kellie Webster and
-               Marta Recasens and
-               Vera Axelrod and
-               Jason Baldridge},
-  title     = {Mind the {GAP:} {A} Balanced Corpus of Gendered Ambiguous Pronouns},
-  journal   = {CoRR},
-  volume    = {abs/1810.05201},
-  year      = {2018},
-  url       = {http://arxiv.org/abs/1810.05201},
-  archivePrefix = {arXiv},
-  eprint    = {1810.05201},
-  timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1810-05201},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
+@article{webster-etal-2018-mind,
+    title = "Mind the {GAP}: A Balanced Corpus of Gendered Ambiguous Pronouns",
+    author = "Webster, Kellie  and
+      Recasens, Marta  and
+      Axelrod, Vera  and
+      Baldridge, Jason",
+    journal = "Transactions of the Association for Computational Linguistics",
+    volume = "6",
+    year = "2018",
+    address = "Cambridge, MA",
+    publisher = "MIT Press",
+    url = "https://aclanthology.org/Q18-1042",
+    doi = "10.1162/tacl_a_00240",
+    pages = "605--617",
 }
-
 ```
 
-
 ### Contributions
 
 Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@otakumesi](https://github.com/otakumesi), [@lewtun](https://github.com/lewtun) for adding this dataset.
@@ -1,8 +1,32 @@
 ---
-paperswithcode_id: null
-pretty_name: IWSLT 2017
+annotations_creators:
+- crowdsourced
+language:
+- ar
+- de
+- en
+- fr
+- it
+- ja
+- ko
+- nl
+- ro
+- zh
+language_creators:
+- expert-generated
 license:
 - cc-by-nc-nd-4.0
+multilinguality:
+- translation
+pretty_name: IWSLT 2017
+size_categories:
+- 1M<n<10M
+source_datasets:
+- original
+task_categories:
+- translation
+task_ids: []
+paperswithcode_id: iwslt-2017
 ---
 
 # Dataset Card for IWSLT 2017
@@ -35,19 +59,17 @@ license:
 
 - **Homepage:** [https://sites.google.com/site/iwsltevaluation2017/TED-tasks](https://sites.google.com/site/iwsltevaluation2017/TED-tasks)
 - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Paper:** [Overview of the IWSLT 2017 Evaluation Campaign](https://aclanthology.org/2017.iwslt-1.1/)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 4046.89 MB
 - **Size of the generated dataset:** 1087.28 MB
 - **Total amount of disk used:** 5134.17 MB
 
 ### Dataset Summary
 
-The IWSLT 2017 Evaluation Campaign includes a multilingual TED Talks MT task. The languages involved are five:
-
-  German, English, Italian, Dutch, Romanian.
-
-For each language pair, training and development sets are available through the entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file. Numbers in the table refer to millions of units (untokenized words) of the target side of all parallel training sets.
+The IWSLT 2017 Multilingual Task addresses text translation, including zero-shot translation, with a single MT system
+across all directions including English, German, Dutch, Italian and Romanian. As unofficial task, conventional
+bilingual text translation is offered between English and Arabic, French, Japanese, Chinese, German and Korean.
 
 ### Supported Tasks and Leaderboards
 
@@ -227,19 +249,26 @@ See the (TED Talks Usage Policy)[https://www.ted.com/about/our-organization/our-
 ### Citation Information
 
 ```
-@inproceedings{cettoloEtAl:EAMT2012,
-Address = {Trento, Italy},
-Author = {Mauro Cettolo and Christian Girardi and Marcello Federico},
-Booktitle = {Proceedings of the 16$^{th}$ Conference of the European Association for Machine Translation (EAMT)},
-Date = {28-30},
-Month = {May},
-Pages = {261--268},
-Title = {WIT$^3$: Web Inventory of Transcribed and Translated Talks},
-Year = {2012}}
-
+@inproceedings{cettolo-etal-2017-overview,
+    title = "Overview of the {IWSLT} 2017 Evaluation Campaign",
+    author = {Cettolo, Mauro  and
+      Federico, Marcello  and
+      Bentivogli, Luisa  and
+      Niehues, Jan  and
+      St{\"u}ker, Sebastian  and
+      Sudoh, Katsuhito  and
+      Yoshino, Koichiro  and
+      Federmann, Christian},
+    booktitle = "Proceedings of the 14th International Conference on Spoken Language Translation",
+    month = dec # " 14-15",
+    year = "2017",
+    address = "Tokyo, Japan",
+    publisher = "International Workshop on Spoken Language Translation",
+    url = "https://aclanthology.org/2017.iwslt-1.1",
+    pages = "2--14",
+}
 ```
 
-
 ### Contributions
 
 Thanks to [@thomwolf](https://github.com/thomwolf), [@Narsil](https://github.com/Narsil) for adding this dataset.