Skip to content

Commit c6434a2

Browse files
Fix missing tags in dataset cards (#4833)
1 parent a7557a3 commit c6434a2

File tree

16 files changed

+328
-32
lines changed

16 files changed

+328
-32
lines changed

datasets/boolq/README.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,27 @@
11
---
2+
annotations_creators:
3+
- crowdsourced
4+
language_creators:
5+
- found
26
language:
37
- en
8+
license:
9+
- cc-by-sa-3.0
10+
multilinguality:
11+
- monolingual
12+
size_categories:
13+
- 10K<n<100K
14+
source_datasets:
15+
- original
16+
task_categories:
17+
- text-classification
18+
task_ids:
19+
- natural-language-inference
420
paperswithcode_id: boolq
5-
pretty_name: Boolean Questions
21+
pretty_name: BoolQ
622
---
723

8-
# Dataset Card for "boolq"
24+
# Dataset Card for Boolq
925

1026
## Table of Contents
1127
- [Dataset Description](#dataset-description)
@@ -144,7 +160,7 @@ The data fields are the same among all splits.
144160

145161
### Licensing Information
146162

147-
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
163+
BoolQ is released under the [Creative Commons Share-Alike 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license.
148164

149165
### Citation Information
150166

datasets/break_data/README.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
11
---
2+
annotations_creators:
3+
- crowdsourced
4+
language_creators:
5+
- crowdsourced
26
language:
37
- en
8+
license:
9+
- unknown
10+
multilinguality:
11+
- monolingual
12+
size_categories:
13+
- 10K<n<100K
14+
source_datasets:
15+
- original
16+
task_categories:
17+
- text2text-generation
18+
task_ids:
19+
- open-domain-abstractive-qa
420
paperswithcode_id: break
521
pretty_name: BREAK
622
---
@@ -250,10 +266,8 @@ The data fields are the same among all splits.
250266
journal={Transactions of the Association for Computational Linguistics},
251267
year={2020},
252268
}
253-
254269
```
255270

256-
257271
### Contributions
258272

259273
Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun), [@thomwolf](https://github.com/thomwolf) for adding this dataset.

datasets/definite_pronoun_resolution/README.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
11
---
2+
annotations_creators:
3+
- expert-generated
4+
language_creators:
5+
- crowdsourced
26
language:
37
- en
8+
license:
9+
- unknown
10+
multilinguality:
11+
- monolingual
12+
size_categories:
13+
- 1K<n<10K
14+
source_datasets:
15+
- original
16+
task_categories:
17+
- token-classification
18+
task_ids:
19+
- word-sense-disambiguation
420
paperswithcode_id: definite-pronoun-resolution-dataset
521
pretty_name: Definite Pronoun Resolution Dataset
622
---
@@ -33,7 +49,7 @@ pretty_name: Definite Pronoun Resolution Dataset
3349

3450
## Dataset Description
3551

36-
- **Homepage:** [http://www.hlt.utdallas.edu/~vince/data/emnlp12/](http://www.hlt.utdallas.edu/~vince/data/emnlp12/)
52+
- **Homepage:** [https://www.hlt.utdallas.edu/~vince/data/emnlp12/](https://www.hlt.utdallas.edu/~vince/data/emnlp12/)
3753
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
3854
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
3955
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

datasets/emo/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
11
---
2+
annotations_creators:
3+
- expert-generated
4+
language_creators:
5+
- crowdsourced
26
language:
37
- en
8+
license:
9+
- unknown
10+
multilinguality:
11+
- monolingual
12+
size_categories:
13+
- 10K<n<100K
14+
source_datasets:
15+
- original
16+
task_categories:
17+
- text-classification
18+
task_ids:
19+
- sentiment-classification
420
paperswithcode_id: emocontext
521
pretty_name: EmoContext
622
---

datasets/kor_nli/README.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,26 @@
11
---
2+
annotations_creators:
3+
- crowdsourced
4+
language_creators:
5+
- machine-generated
6+
- expert-generated
7+
language:
8+
- ko
9+
license:
10+
- cc-by-sa-4.0
11+
multilinguality:
12+
- monolingual
13+
size_categories:
14+
- 100K<n<1M
15+
source_datasets:
16+
- extended|multi_nli
17+
- extended|snli
18+
- extended|xnli
19+
task_categories:
20+
- text-classification
21+
task_ids:
22+
- natural-language-inference
23+
- multi-input-text-classification
224
paperswithcode_id: kornli
325
pretty_name: KorNLI
426
---
@@ -41,7 +63,7 @@ pretty_name: KorNLI
4163

4264
### Dataset Summary
4365

44-
Korean Natural Language Inference datasets
66+
Korean Natural Language Inference datasets.
4567

4668
### Supported Tasks and Leaderboards
4769

@@ -179,7 +201,7 @@ The data fields are the same among all splits.
179201

180202
### Licensing Information
181203

182-
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
204+
The dataset is licensed under Creative Commons [Attribution-ShareAlike license (CC BY-SA 4.0)](http://creativecommons.org/licenses/by-sa/4.0/).
183205

184206
### Citation Information
185207

datasets/pg19/README.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,22 @@
11
---
2+
annotations_creators:
3+
- expert-generated
4+
language_creators:
5+
- expert-generated
26
language:
37
- en
48
license:
59
- apache-2.0
10+
multilinguality:
11+
- monolingual
12+
size_categories:
13+
- 10K<n<100K
14+
source_datasets:
15+
- original
16+
task_categories:
17+
- text-generation
18+
task_ids:
19+
- language-modeling
620
paperswithcode_id: pg-19
721
pretty_name: PG-19
822
---
@@ -37,7 +51,7 @@ pretty_name: PG-19
3751

3852
- **Homepage:** [https://github.com/deepmind/pg19](https://github.com/deepmind/pg19)
3953
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
40-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
54+
- **Paper:** [Compressive Transformers for Long-Range Sequence Modelling](https://arxiv.org/abs/1911.05507)
4155
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
4256
- **Size of downloaded dataset files:** 11196.60 MB
4357
- **Size of the generated dataset:** 10978.29 MB
@@ -154,7 +168,7 @@ The data fields are the same among all splits.
154168

155169
### Licensing Information
156170

157-
Apache 2.0
171+
The dataset is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0.html).
158172

159173
### Citation Information
160174

@@ -167,7 +181,6 @@ Apache 2.0
167181
url = {https://arxiv.org/abs/1911.05507},
168182
year = {2019},
169183
}
170-
171184
```
172185

173186

datasets/quartz/README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,25 @@
11
---
2+
annotations_creators:
3+
- crowdsourced
4+
language_creators:
5+
- crowdsourced
26
language:
37
- en
8+
license:
9+
- cc-by-4.0
10+
multilinguality:
11+
- monolingual
12+
size_categories:
13+
- 1K<n<10K
14+
source_datasets:
15+
- original
16+
task_categories:
17+
- question-answering
18+
task_ids:
19+
- extractive-qa
20+
- open-domain-qa
421
paperswithcode_id: quartz
5-
pretty_name: QuaRTz Dataset
22+
pretty_name: QuaRTz
623
---
724

825
# Dataset Card for "quartz"
@@ -183,7 +200,7 @@ The data fields are the same among all splits.
183200

184201
### Licensing Information
185202

186-
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
203+
The dataset is licensed under Creative Commons [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
187204

188205
### Citation Information
189206

datasets/sciq/README.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
11
---
2+
annotations_creators:
3+
- no-annotation
4+
language_creators:
5+
- crowdsourced
26
language:
37
- en
8+
license:
9+
- cc-by-nc-3.0
10+
multilinguality:
11+
- monolingual
12+
size_categories:
13+
- 10K<n<100K
14+
source_datasets:
15+
- original
16+
task_categories:
17+
- question-answering
18+
task_ids:
19+
- closed-domain-qa
420
paperswithcode_id: sciq
521
pretty_name: SciQ
622
---
@@ -147,7 +163,7 @@ The data fields are the same among all splits.
147163

148164
### Licensing Information
149165

150-
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
166+
The dataset is licensed under the [Creative Commons Attribution-NonCommercial 3.0 Unported License](http://creativecommons.org/licenses/by-nc/3.0/).
151167

152168
### Citation Information
153169

@@ -158,10 +174,8 @@ The data fields are the same among all splits.
158174
year={2017},
159175
journal={arXiv:1707.06209v1}
160176
}
161-
162177
```
163178

164-
165179
### Contributions
166180

167181
Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun), [@thomwolf](https://github.com/thomwolf) for adding this dataset.

datasets/squad_es/README.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,22 @@
11
---
2+
annotations_creators:
3+
- machine-generated
4+
language_creators:
5+
- machine-generated
6+
language:
7+
- es
8+
license:
9+
- cc-by-4.0
10+
multilinguality:
11+
- monolingual
12+
size_categories:
13+
- 10K<n<100K
14+
source_datasets:
15+
- extended|squad
16+
task_categories:
17+
- question-answering
18+
task_ids:
19+
- extractive-qa
220
paperswithcode_id: squad-es
321
pretty_name: SQuAD-es
422
---
@@ -41,7 +59,7 @@ pretty_name: SQuAD-es
4159

4260
### Dataset Summary
4361

44-
automatic translation of the Stanford Question Answering Dataset (SQuAD) v2 into Spanish
62+
Automatic translation of the Stanford Question Answering Dataset (SQuAD) v2 into Spanish
4563

4664
### Supported Tasks and Leaderboards
4765

@@ -148,7 +166,7 @@ The data fields are the same among all splits.
148166

149167
### Licensing Information
150168

151-
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
169+
The SQuAD-es dataset is licensed under the [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.
152170

153171
### Citation Information
154172

datasets/wmt14/README.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,32 @@
11
---
2-
pretty_name: WMT14
3-
paperswithcode_id: wmt-2014
2+
annotations_creators:
3+
- no-annotation
4+
language_creators:
5+
- found
6+
language:
7+
- cs
8+
- de
9+
- en
10+
- fr
11+
- hi
12+
- ru
13+
license:
14+
- unknown
415
multilinguality:
516
- translation
17+
size_categories:
18+
- 10M<n<100M
19+
source_datasets:
20+
- extended|europarl_bilingual
21+
- extended|giga_fren
22+
- extended|news_commentary
23+
- extended|un_multi
24+
- extended|hind_encorp
625
task_categories:
726
- translation
827
task_ids: []
28+
pretty_name: WMT14
29+
paperswithcode_id: wmt-2014
930
---
1031

1132
# Dataset Card for "wmt14"

0 commit comments

Comments
 (0)