Skip to content

Commit dc26489

Browse files
Fix missing tags in dataset cards (#4991)
1 parent 4cb235b commit dc26489

File tree

10 files changed

+298
-118
lines changed

10 files changed

+298
-118
lines changed

datasets/aeslc/README.md

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,27 @@
11
---
2+
annotations_creators:
3+
- crowdsourced
24
language:
35
- en
4-
paperswithcode_id: aeslc
5-
pretty_name: AESLC
6+
language_creators:
7+
- found
8+
license:
9+
- unknown
10+
multilinguality:
11+
- monolingual
12+
pretty_name: "AESLC: Annotated Enron Subject Line Corpus"
13+
size_categories:
14+
- 10K<n<100K
15+
source_datasets:
16+
- original
617
task_categories:
718
- summarization
819
task_ids:
920
- summarization-other-email-headline-generation
1021
- summarization-other-conversations-summarization
1122
- summarization-other-multi-document-summarization
1223
- summarization-other-aspect-based-summarization
24+
paperswithcode_id: aeslc
1325
---
1426

1527
# Dataset Card for "aeslc"
@@ -40,9 +52,9 @@ task_ids:
4052

4153
## Dataset Description
4254

43-
- **Homepage:** [https://github.com/ryanzhumich/AESLC](https://github.com/ryanzhumich/AESLC)
44-
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
45-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
55+
- **Homepage:**
56+
- **Repository:** https://github.com/ryanzhumich/AESLC
57+
- **Paper:** [This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation](https://arxiv.org/abs/1906.03497)
4658
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
4759
- **Size of downloaded dataset files:** 11.10 MB
4860
- **Size of the generated dataset:** 14.26 MB
@@ -153,19 +165,21 @@ The data fields are the same among all splits.
153165
### Citation Information
154166

155167
```
156-
157-
@misc{zhang2019email,
158-
title={This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation},
159-
author={Rui Zhang and Joel Tetreault},
160-
year={2019},
161-
eprint={1906.03497},
162-
archivePrefix={arXiv},
163-
primaryClass={cs.CL}
168+
@inproceedings{zhang-tetreault-2019-email,
169+
title = "This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation",
170+
author = "Zhang, Rui and
171+
Tetreault, Joel",
172+
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
173+
month = jul,
174+
year = "2019",
175+
address = "Florence, Italy",
176+
publisher = "Association for Computational Linguistics",
177+
url = "https://aclanthology.org/P19-1043",
178+
doi = "10.18653/v1/P19-1043",
179+
pages = "446--456",
164180
}
165-
166181
```
167182

168-
169183
### Contributions
170184

171185
Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf), [@lewtun](https://github.com/lewtun) for adding this dataset.

datasets/empathetic_dialogues/README.md

Lines changed: 37 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,25 @@
11
---
2-
pretty_name: EmpatheticDialogues
2+
annotations_creators:
3+
- crowdsourced
34
language:
45
- en
6+
language_creators:
7+
- crowdsourced
8+
license:
9+
- cc-by-nc-4.0
10+
multilinguality:
11+
- monolingual
12+
pretty_name: EmpatheticDialogues
13+
size_categories:
14+
- 10K<n<100K
15+
source_datasets:
16+
- original
17+
task_categories:
18+
- conversational
19+
- question-answering
20+
task_ids:
21+
- dialogue-generation
22+
- open-domain-qa
523
paperswithcode_id: empatheticdialogues
624
---
725

@@ -34,8 +52,8 @@ paperswithcode_id: empatheticdialogues
3452
## Dataset Description
3553

3654
- **Homepage:** [https://github.com/facebookresearch/EmpatheticDialogues](https://github.com/facebookresearch/EmpatheticDialogues)
37-
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
38-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
55+
- **Repository:** https://github.com/facebookresearch/EmpatheticDialogues
56+
- **Paper:** [Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset](https://arxiv.org/abs/1811.00207)
3957
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
4058
- **Size of downloaded dataset files:** 26.72 MB
4159
- **Size of the generated dataset:** 23.97 MB
@@ -149,21 +167,28 @@ The data fields are the same among all splits.
149167

150168
### Licensing Information
151169

152-
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
170+
Creative Commons [Attribution-NonCommercial 4.0 International](https://creativecommons.org/licenses/by-nc/4.0/).
153171

154172
### Citation Information
155173

156174
```
157-
@inproceedings{rashkin2019towards,
158-
title = {Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset},
159-
author = {Hannah Rashkin and Eric Michael Smith and Margaret Li and Y-Lan Boureau},
160-
booktitle = {ACL},
161-
year = {2019},
175+
@inproceedings{rashkin-etal-2019-towards,
176+
title = "Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset",
177+
author = "Rashkin, Hannah and
178+
Smith, Eric Michael and
179+
Li, Margaret and
180+
Boureau, Y-Lan",
181+
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
182+
month = jul,
183+
year = "2019",
184+
address = "Florence, Italy",
185+
publisher = "Association for Computational Linguistics",
186+
url = "https://aclanthology.org/P19-1534",
187+
doi = "10.18653/v1/P19-1534",
188+
pages = "5370--5381",
162189
}
163-
164190
```
165191

166-
167192
### Contributions
168193

169-
Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun) for adding this dataset.
194+
Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun) for adding this dataset.

datasets/event2Mind/README.md

Lines changed: 35 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,24 @@
11
---
2+
annotations_creators:
3+
- crowdsourced
24
language:
35
- en
4-
paperswithcode_id: event2mind
6+
language_creators:
7+
- found
8+
license:
9+
- unknown
10+
multilinguality:
11+
- monolingual
512
pretty_name: Event2Mind
13+
size_categories:
14+
- 10K<n<100K
15+
source_datasets:
16+
- original
17+
task_categories:
18+
- text2text-generation
19+
task_ids:
20+
- text2text-generation-other-common-sense-inference
21+
paperswithcode_id: event2mind
622
---
723

824
# Dataset Card for "event2Mind"
@@ -34,9 +50,9 @@ pretty_name: Event2Mind
3450
## Dataset Description
3551

3652
- **Homepage:** [https://uwnlp.github.io/event2mind/](https://uwnlp.github.io/event2mind/)
37-
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
38-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
39-
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
53+
- **Repository:** https://github.com/uwnlp/event2mind
54+
- **Paper:** [Event2Mind: Commonsense Inference on Events, Intents, and Reactions](https://arxiv.org/abs/1805.06939)
55+
- **Point of Contact:** [Hannah Rashkin](mailto:[email protected]), [Maarten Sap](mailto:[email protected])
4056
- **Size of downloaded dataset files:** 1.24 MB
4157
- **Size of the generated dataset:** 6.90 MB
4258
- **Total amount of disk used:** 8.14 MB
@@ -152,15 +168,24 @@ The data fields are the same among all splits.
152168
### Citation Information
153169

154170
```
155-
@inproceedings{event2Mind,
156-
title={Event2Mind: Commonsense Inference on Events, Intents, and Reactions},
157-
author={Hannah Rashkin and Maarten Sap and Emily Allaway and Noah A. Smith† Yejin Choi},
158-
year={2018}
171+
@inproceedings{rashkin-etal-2018-event2mind,
172+
title = "{E}vent2{M}ind: Commonsense Inference on Events, Intents, and Reactions",
173+
author = "Rashkin, Hannah and
174+
Sap, Maarten and
175+
Allaway, Emily and
176+
Smith, Noah A. and
177+
Choi, Yejin",
178+
booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
179+
month = jul,
180+
year = "2018",
181+
address = "Melbourne, Australia",
182+
publisher = "Association for Computational Linguistics",
183+
url = "https://aclanthology.org/P18-1043",
184+
doi = "10.18653/v1/P18-1043",
185+
pages = "463--473",
159186
}
160-
161187
```
162188

163-
164189
### Contributions
165190

166191
Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun) for adding this dataset.

datasets/gap/README.md

Lines changed: 33 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,24 @@
11
---
2+
annotations_creators:
3+
- crowdsourced
24
language:
35
- en
4-
paperswithcode_id: gap
6+
language_creators:
7+
- found
8+
license:
9+
- unknown
10+
multilinguality:
11+
- monolingual
512
pretty_name: GAP Benchmark Suite
13+
size_categories:
14+
- 1K<n<10K
15+
source_datasets:
16+
- original
17+
task_categories:
18+
- token-classification
19+
task_ids:
20+
- coreference-resolution
21+
paperswithcode_id: gap
622
---
723

824
# Dataset Card for "gap"
@@ -35,8 +51,8 @@ pretty_name: GAP Benchmark Suite
3551

3652
- **Homepage:** [https://github.com/google-research-datasets/gap-coreference](https://github.com/google-research-datasets/gap-coreference)
3753
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
38-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
39-
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
54+
- **Paper:** [Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns](https://arxiv.org/abs/1810.05201)
55+
- **Point of Contact:** [[email protected]](mailto:gap-coreference@google.com)
4056
- **Size of downloaded dataset files:** 2.29 MB
4157
- **Size of the generated dataset:** 2.32 MB
4258
- **Total amount of disk used:** 4.61 MB
@@ -163,27 +179,23 @@ The data fields are the same among all splits.
163179
### Citation Information
164180

165181
```
166-
167-
@article{DBLP:journals/corr/abs-1810-05201,
168-
author = {Kellie Webster and
169-
Marta Recasens and
170-
Vera Axelrod and
171-
Jason Baldridge},
172-
title = {Mind the {GAP:} {A} Balanced Corpus of Gendered Ambiguous Pronouns},
173-
journal = {CoRR},
174-
volume = {abs/1810.05201},
175-
year = {2018},
176-
url = {http://arxiv.org/abs/1810.05201},
177-
archivePrefix = {arXiv},
178-
eprint = {1810.05201},
179-
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
180-
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1810-05201},
181-
bibsource = {dblp computer science bibliography, https://dblp.org}
182+
@article{webster-etal-2018-mind,
183+
title = "Mind the {GAP}: A Balanced Corpus of Gendered Ambiguous Pronouns",
184+
author = "Webster, Kellie and
185+
Recasens, Marta and
186+
Axelrod, Vera and
187+
Baldridge, Jason",
188+
journal = "Transactions of the Association for Computational Linguistics",
189+
volume = "6",
190+
year = "2018",
191+
address = "Cambridge, MA",
192+
publisher = "MIT Press",
193+
url = "https://aclanthology.org/Q18-1042",
194+
doi = "10.1162/tacl_a_00240",
195+
pages = "605--617",
182196
}
183-
184197
```
185198

186-
187199
### Contributions
188200

189201
Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@otakumesi](https://github.com/otakumesi), [@lewtun](https://github.com/lewtun) for adding this dataset.

datasets/iwslt2017/README.md

Lines changed: 48 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,32 @@
11
---
2-
paperswithcode_id: null
3-
pretty_name: IWSLT 2017
2+
annotations_creators:
3+
- crowdsourced
4+
language:
5+
- ar
6+
- de
7+
- en
8+
- fr
9+
- it
10+
- ja
11+
- ko
12+
- nl
13+
- ro
14+
- zh
15+
language_creators:
16+
- expert-generated
417
license:
518
- cc-by-nc-nd-4.0
19+
multilinguality:
20+
- translation
21+
pretty_name: IWSLT 2017
22+
size_categories:
23+
- 1M<n<10M
24+
source_datasets:
25+
- original
26+
task_categories:
27+
- translation
28+
task_ids: []
29+
paperswithcode_id: iwslt-2017
630
---
731

832
# Dataset Card for IWSLT 2017
@@ -35,19 +59,17 @@ license:
3559

3660
- **Homepage:** [https://sites.google.com/site/iwsltevaluation2017/TED-tasks](https://sites.google.com/site/iwsltevaluation2017/TED-tasks)
3761
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
38-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
62+
- **Paper:** [Overview of the IWSLT 2017 Evaluation Campaign](https://aclanthology.org/2017.iwslt-1.1/)
3963
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
4064
- **Size of downloaded dataset files:** 4046.89 MB
4165
- **Size of the generated dataset:** 1087.28 MB
4266
- **Total amount of disk used:** 5134.17 MB
4367

4468
### Dataset Summary
4569

46-
The IWSLT 2017 Evaluation Campaign includes a multilingual TED Talks MT task. The languages involved are five:
47-
48-
German, English, Italian, Dutch, Romanian.
49-
50-
For each language pair, training and development sets are available through the entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file. Numbers in the table refer to millions of units (untokenized words) of the target side of all parallel training sets.
70+
The IWSLT 2017 Multilingual Task addresses text translation, including zero-shot translation, with a single MT system
71+
across all directions including English, German, Dutch, Italian and Romanian. As unofficial task, conventional
72+
bilingual text translation is offered between English and Arabic, French, Japanese, Chinese, German and Korean.
5173

5274
### Supported Tasks and Leaderboards
5375

@@ -227,19 +249,26 @@ See the (TED Talks Usage Policy)[https://www.ted.com/about/our-organization/our-
227249
### Citation Information
228250

229251
```
230-
@inproceedings{cettoloEtAl:EAMT2012,
231-
Address = {Trento, Italy},
232-
Author = {Mauro Cettolo and Christian Girardi and Marcello Federico},
233-
Booktitle = {Proceedings of the 16$^{th}$ Conference of the European Association for Machine Translation (EAMT)},
234-
Date = {28-30},
235-
Month = {May},
236-
Pages = {261--268},
237-
Title = {WIT$^3$: Web Inventory of Transcribed and Translated Talks},
238-
Year = {2012}}
239-
252+
@inproceedings{cettolo-etal-2017-overview,
253+
title = "Overview of the {IWSLT} 2017 Evaluation Campaign",
254+
author = {Cettolo, Mauro and
255+
Federico, Marcello and
256+
Bentivogli, Luisa and
257+
Niehues, Jan and
258+
St{\"u}ker, Sebastian and
259+
Sudoh, Katsuhito and
260+
Yoshino, Koichiro and
261+
Federmann, Christian},
262+
booktitle = "Proceedings of the 14th International Conference on Spoken Language Translation",
263+
month = dec # " 14-15",
264+
year = "2017",
265+
address = "Tokyo, Japan",
266+
publisher = "International Workshop on Spoken Language Translation",
267+
url = "https://aclanthology.org/2017.iwslt-1.1",
268+
pages = "2--14",
269+
}
240270
```
241271

242-
243272
### Contributions
244273

245274
Thanks to [@thomwolf](https://github.com/thomwolf), [@Narsil](https://github.com/Narsil) for adding this dataset.

datasets/iwslt2017/dataset_infos.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)