Skip to content

Commit a0b6402

Browse files
Fix missing tags in dataset cards (#4921)
* Fix missing tags in dataset cards * Force CI re-run
1 parent 7380140 commit a0b6402

File tree

10 files changed

+265
-71
lines changed

10 files changed

+265
-71
lines changed

datasets/eraser_multi_rc/README.md

Lines changed: 54 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,24 @@
11
---
2-
pretty_name: Eraser Multi Rc
2+
annotations_creators:
3+
- crowdsourced
34
language:
45
- en
6+
language_creators:
7+
- found
8+
license:
9+
- other
10+
multilinguality:
11+
- monolingual
12+
pretty_name: Eraser MultiRC (Multi-Sentence Reading Comprehension)
13+
size_categories:
14+
- 10K<n<100K
15+
source_datasets:
16+
- original
17+
task_categories:
18+
- multiple-choice
19+
task_ids:
20+
- multiple-choice-qa
21+
- multiple-choice-other-inference
522
paperswithcode_id: null
623
---
724

@@ -33,23 +50,24 @@ paperswithcode_id: null
3350

3451
## Dataset Description
3552

36-
- **Homepage:** [https://cogcomp.seas.upenn.edu/multirc/](https://cogcomp.seas.upenn.edu/multirc/)
37-
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
38-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
53+
- **Homepage:** http://cogcomp.org/multirc/
54+
- **Repository:** https://github.com/CogComp/multirc
55+
- **Paper:** [Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences](https://cogcomp.seas.upenn.edu/page/publication_view/833)
3956
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
4057
- **Size of downloaded dataset files:** 1.59 MB
4158
- **Size of the generated dataset:** 60.70 MB
4259
- **Total amount of disk used:** 62.29 MB
4360

4461
### Dataset Summary
4562

46-
Eraser Multi RC is a dataset for queries over multi-line passages, along with
47-
answers and a rationalte. Each example in this dataset has the following 5 parts
48-
1. A Mutli-line Passage
49-
2. A Query about the passage
50-
3. An Answer to the query
51-
4. A Classification as to whether the answer is right or wrong
52-
5. An Explanation justifying the classification
63+
MultiRC (Multi-Sentence Reading Comprehension) is a dataset of short paragraphs and multi-sentence questions that can be answered from the content of the paragraph.
64+
65+
We have designed the dataset with three key challenges in mind:
66+
- The number of correct answer-options for each question is not pre-specified. This removes the over-reliance of current approaches on answer-options and forces them to decide on the correctness of each candidate answer independently of others. In other words, unlike previous work, the task here is not to simply identify the best answer-option, but to evaluate the correctness of each answer-option individually.
67+
- The correct answer(s) is not required to be a span in the text.
68+
- The paragraphs in our dataset have diverse provenance by being extracted from 7 different domains such as news, fiction, historical text etc., and hence are expected to be more diverse in their contents as compared to single-domain datasets.
69+
70+
The goal of this dataset is to encourage the research community to explore approaches that can do more than sophisticated lexical-level matching.
5371

5472
### Supported Tasks and Leaderboards
5573

@@ -149,26 +167,46 @@ The data fields are the same among all splits.
149167

150168
### Licensing Information
151169

152-
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
170+
https://github.com/CogComp/multirc/blob/master/LICENSE
171+
172+
Research and Academic Use License
173+
Cognitive Computation Group
174+
University of Illinois at Urbana-Champaign
175+
176+
Downloading software implies that you accept the following license terms:
177+
178+
Under this Agreement, The Board of Trustees of the University of Illinois ("University"), a body corporate and politic of the State of Illinois with its principal offices at 506 South Wright Street, Urbana, Illinois 61801, U.S.A., on behalf of its Department of Computer Science on the Urbana-Champaign Campus, provides the software ("Software") described in Appendix A, attached hereto and incorporated herein, to the Licensee identified below ("Licensee") subject to the following conditions:
179+
180+
1. Upon execution of this Agreement by Licensee below, the University grants, and Licensee accepts, a roylaty-free, non-exclusive license:
181+
A. To use unlimited copies of the Software for its own academic and research purposes.
182+
B. To make derivative works. However, if Licensee distributes any derivative work based on or derived from the Software (with such distribution limited to binary form only), then Licensee will (1) notify the University (c/o Professor Dan Roth, e-mail: [email protected]) regarding its distribution of the derivative work and provide a copy if requested, and (2) clearly notify users that such derivative work is a modified version and not the original Software distributed by the University.
183+
C. To redistribute (sublicense) derivative works based on the Software in binary form only to third parties provided that (1) the copyright notice and any accompanying legends or proprietary notices are reproduced on all copies, (2) no royalty is charged for such copies, and (3) third parties are restricted to using the derivative work for academic and research purposes only, without further sublicensing rights.
184+
No license is granted herein that would permit Licensee to incorporate the Software into a commercial product, or to otherwise commercially exploit the Software. Should Licensee wish to make commercial use of the Software, Licensee should contact the University, c/o the Office of Technology Management ("OTM") to negotiate an appropriate license for such commercial use. To contact the OTM: [email protected]; telephone: (217)333-3781; fax: (217) 265-5530.
185+
2. THE UNIVERSITY GIVES NO WARRANTIES, EITHER EXPRESSED OR IMPLIED, FOR THE SOFTWARE AND/OR ASSOCIATED MATERIALS PROVIDED UNDER THIS AGREEMENT, INCLUDING, WITHOUT LIMITATION, WARRANTY OF MERCHANTABILITY AND WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE, AND ANY WARRANTY AGAINST INFRINGEMENT OF ANY INTELLECTUAL PROPERTY RIGHTS.
186+
3. Licensee understands the Software is a research tool for which no warranties as to capabilities or accuracy are made, and Licensee accepts the Software on an "as is, with all defects" basis, without maintenance, debugging , support or improvement. Licensee assumes the entire risk as to the results and performance of the Software and/or associated materials. Licensee agrees that University shall not be held liable for any direct, indirect, consequential, or incidental damages with respect to any claim by Licensee or any third party on account of or arising from this Agreement or use of the Software and/or associated materials.
187+
4. Licensee understands the Software is proprietary to the University. Licensee will take all reasonable steps to insure that the source code is protected and secured from unauthorized disclosure, use, or release and will treat it with at least the same level of care as Licensee would use to protect and secure its own proprietary computer programs and/or information, but using no less than reasonable care.
188+
5. In the event that Licensee shall be in default in the performance of any material obligations under this Agreement, and if the default has not been remedied within sixty (60) days after the date of notice in writing of such default, University may terminate this Agreement by written notice. In the event of termination, Licensee shall promptly return to University the original and any copies of licensed Software in Licensee's possession. In the event of any termination of this Agreement, any and all sublicenses granted by Licensee to third parties pursuant to this Agreement (as permitted by this Agreement) prior to the date of such termination shall nevertheless remain in full force and effect.
189+
6. The Software was developed, in part, with support from the National Science Foundation, and the Federal Government has certain license rights in the Software.
190+
7. This Agreement shall be construed and interpreted in accordance with the laws of the State of Illinois, U.S.A..
191+
8. This Agreement shall be subject to all United States Government laws and regulations now and hereafter applicable to the subject matter of this Agreement, including specifically the Export Law provisions of the Departments of Commerce and State. Licensee will not export or re-export the Software without the appropriate United States or foreign government license.
192+
193+
By its registration below, Licensee confirms that it understands the terms and conditions of this Agreement, and agrees to be bound by them. This Agreement shall become effective as of the date of execution by Licensee.
153194

154195
### Citation Information
155196

156197
```
157-
158198
@unpublished{eraser2019,
159199
title = {ERASER: A Benchmark to Evaluate Rationalized NLP Models},
160200
author = {Jay DeYoung and Sarthak Jain and Nazneen Fatema Rajani and Eric Lehman and Caiming Xiong and Richard Socher and Byron C. Wallace}
161201
}
162202
@inproceedings{MultiRC2018,
163203
author = {Daniel Khashabi and Snigdha Chaturvedi and Michael Roth and Shyam Upadhyay and Dan Roth},
164204
title = {Looking Beyond the Surface:A Challenge Set for Reading Comprehension over Multiple Sentences},
165-
booktitle = {NAACL},
205+
booktitle = {Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL)},
166206
year = {2018}
167207
}
168-
169208
```
170209

171-
172210
### Contributions
173211

174212
Thanks to [@lewtun](https://github.com/lewtun), [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf) for adding this dataset.

datasets/hotpot_qa/README.md

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,24 @@
11
---
2+
annotations_creators:
3+
- crowdsourced
24
language:
35
- en
4-
paperswithcode_id: hotpotqa
6+
language_creators:
7+
- found
8+
license:
9+
- cc-by-sa-4.0
10+
multilinguality:
11+
- monolingual
512
pretty_name: HotpotQA
13+
size_categories:
14+
- 100K<n<1M
15+
source_datasets:
16+
- original
17+
task_categories:
18+
- question-answering
19+
task_ids:
20+
- question-answering-other-multi-hop
21+
paperswithcode_id: hotpotqa
622
---
723

824
# Dataset Card for "hotpot_qa"
@@ -34,16 +50,16 @@ pretty_name: HotpotQA
3450
## Dataset Description
3551

3652
- **Homepage:** [https://hotpotqa.github.io/](https://hotpotqa.github.io/)
37-
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
38-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
53+
- **Repository:** https://github.com/hotpotqa/hotpot
54+
- **Paper:** [HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering](https://arxiv.org/abs/1809.09600)
3955
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
4056
- **Size of downloaded dataset files:** 1213.88 MB
4157
- **Size of the generated dataset:** 1186.81 MB
4258
- **Total amount of disk used:** 2400.69 MB
4359

4460
### Dataset Summary
4561

46-
HotpotQA is a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowingQA systems to reason with strong supervisionand explain the predictions; (4) we offer a new type of factoid comparison questions to testQA systems’ ability to extract relevant facts and perform necessary comparison.
62+
HotpotQA is a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowingQA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison.
4763

4864
### Supported Tasks and Leaderboards
4965

@@ -203,22 +219,19 @@ The data fields are the same among all splits.
203219

204220
### Licensing Information
205221

206-
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
222+
HotpotQA is distributed under a [CC BY-SA 4.0 License](http://creativecommons.org/licenses/by-sa/4.0/).
207223

208224
### Citation Information
209225

210226
```
211-
212227
@inproceedings{yang2018hotpotqa,
213228
title={{HotpotQA}: A Dataset for Diverse, Explainable Multi-hop Question Answering},
214229
author={Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D.},
215230
booktitle={Conference on Empirical Methods in Natural Language Processing ({EMNLP})},
216231
year={2018}
217232
}
218-
219233
```
220234

221-
222235
### Contributions
223236

224237
Thanks to [@albertvillanova](https://github.com/albertvillanova), [@ghomasHudson](https://github.com/ghomasHudson) for adding this dataset.

datasets/metooma/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ language_creators:
55
- found
66
language:
77
- en
8+
license:
9+
- cc0-1.0
810
multilinguality:
911
- monolingual
1012
size_categories:
@@ -50,8 +52,9 @@ pretty_name: '#MeTooMA dataset'
5052
## Dataset Description
5153

5254
- **Homepage:** https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JN4EYU
55+
- **Repository:** https://github.com/midas-research/MeTooMA
5356
- **Paper:** https://ojs.aaai.org//index.php/ICWSM/article/view/7292
54-
- **Point of Contact:** https://github.com/midas-research/MeTooMA
57+
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
5558

5659

5760
### Dataset Summary

datasets/movie_rationales/README.md

Lines changed: 37 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,23 @@
11
---
2-
pretty_name: MovieRationales
2+
annotations_creators:
3+
- crowdsourced
34
language:
45
- en
6+
language_creators:
7+
- found
8+
license:
9+
- unknown
10+
multilinguality:
11+
- monolingual
12+
pretty_name: MovieRationales
13+
size_categories:
14+
- 1K<n<10K
15+
source_datasets:
16+
- original
17+
task_categories:
18+
- text-classification
19+
task_ids:
20+
- sentiment-classification
521
paperswithcode_id: null
622
---
723

@@ -33,9 +49,9 @@ paperswithcode_id: null
3349

3450
## Dataset Description
3551

36-
- **Homepage:** [http://www.cs.jhu.edu/~ozaidan/rationales/](http://www.cs.jhu.edu/~ozaidan/rationales/)
37-
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
38-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
52+
- **Homepage:**
53+
- **Repository:** https://github.com/jayded/eraserbenchmark
54+
- **Paper:** [ERASER: A Benchmark to Evaluate Rationalized NLP Models](https://aclanthology.org/2020.acl-main.408/)
3955
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
4056
- **Size of downloaded dataset files:** 3.72 MB
4157
- **Size of the generated dataset:** 8.33 MB
@@ -145,10 +161,23 @@ The data fields are the same among all splits.
145161
### Citation Information
146162

147163
```
148-
149-
@unpublished{eraser2019,
150-
title = {ERASER: A Benchmark to Evaluate Rationalized NLP Models},
151-
author = {Jay DeYoung and Sarthak Jain and Nazneen Fatema Rajani and Eric Lehman and Caiming Xiong and Richard Socher and Byron C. Wallace}
164+
@inproceedings{deyoung-etal-2020-eraser,
165+
title = "{ERASER}: {A} Benchmark to Evaluate Rationalized {NLP} Models",
166+
author = "DeYoung, Jay and
167+
Jain, Sarthak and
168+
Rajani, Nazneen Fatema and
169+
Lehman, Eric and
170+
Xiong, Caiming and
171+
Socher, Richard and
172+
Wallace, Byron C.",
173+
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
174+
month = jul,
175+
year = "2020",
176+
address = "Online",
177+
publisher = "Association for Computational Linguistics",
178+
url = "https://aclanthology.org/2020.acl-main.408",
179+
doi = "10.18653/v1/2020.acl-main.408",
180+
pages = "4443--4458",
152181
}
153182
@InProceedings{zaidan-eisner-piatko-2008:nips,
154183
author = {Omar F. Zaidan and Jason Eisner and Christine Piatko},
@@ -157,7 +186,6 @@ The data fields are the same among all splits.
157186
month = {December},
158187
year = {2008}
159188
}
160-
161189
```
162190

163191

datasets/qanta/README.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,24 @@
11
---
2+
annotations_creators:
3+
- machine-generated
24
language:
35
- en
4-
paperswithcode_id: quizbowl
6+
language_creators:
7+
- found
8+
license:
9+
- unknown
10+
multilinguality:
11+
- monolingual
512
pretty_name: Quizbowl
13+
size_categories:
14+
- 100K<n<1M
15+
source_datasets:
16+
- original
17+
task_categories:
18+
- question-answering
19+
task_ids:
20+
- question-answering-other-quizbowl
21+
paperswithcode_id: quizbowl
622
---
723

824
# Dataset Card for "qanta"
@@ -35,8 +51,8 @@ pretty_name: Quizbowl
3551

3652
- **Homepage:** [http://www.qanta.org/](http://www.qanta.org/)
3753
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
38-
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
39-
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
54+
- **Paper:** [Quizbowl: The Case for Incremental Question Answering](https://arxiv.org/abs/1904.04792)
55+
- **Point of Contact:** [Jordan Boyd-Graber](mailto:[email protected])
4056
- **Size of downloaded dataset files:** 162.84 MB
4157
- **Size of the generated dataset:** 140.36 MB
4258
- **Total amount of disk used:** 303.20 MB
@@ -183,15 +199,13 @@ The data fields are the same among all splits.
183199
### Citation Information
184200

185201
```
186-
187202
@article{Rodriguez2019QuizbowlTC,
188203
title={Quizbowl: The Case for Incremental Question Answering},
189204
author={Pedro Rodriguez and Shi Feng and Mohit Iyyer and He He and Jordan L. Boyd-Graber},
190205
journal={ArXiv},
191206
year={2019},
192207
volume={abs/1904.04792}
193208
}
194-
195209
```
196210

197211

datasets/quora/README.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,24 @@
11
---
2+
annotations_creators:
3+
- expert-generated
24
language:
35
- en
6+
language_creators:
7+
- found
8+
license:
9+
- unknown
10+
multilinguality:
11+
- monolingual
12+
pretty_name: Quora Question Pairs
13+
size_categories:
14+
- 100K<n<1M
15+
source_datasets:
16+
- original
17+
task_categories:
18+
- text-classification
19+
task_ids:
20+
- semantic-similarity-classification
421
paperswithcode_id: null
5-
pretty_name: quora
622
---
723

824
# Dataset Card for "quora"
@@ -142,13 +158,11 @@ The data fields are the same among all splits.
142158

143159
### Licensing Information
144160

145-
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
161+
Unknown license.
146162

147163
### Citation Information
148164

149-
```
150-
151-
```
165+
Unknown.
152166

153167

154168
### Contributions

0 commit comments

Comments
 (0)