huggingface
diff --git a/‎datasets/eraser_multi_rc/README.md‎
Lines changed: 54 additions & 16 deletions b/‎datasets/eraser_multi_rc/README.md‎
Lines changed: 54 additions & 16 deletions
diff --git a/‎datasets/hotpot_qa/README.md‎
Lines changed: 21 additions & 8 deletions b/‎datasets/hotpot_qa/README.md‎
Lines changed: 21 additions & 8 deletions
diff --git a/‎datasets/metooma/README.md‎
Lines changed: 4 additions & 1 deletion b/‎datasets/metooma/README.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎datasets/movie_rationales/README.md‎
Lines changed: 37 additions & 9 deletions b/‎datasets/movie_rationales/README.md‎
Lines changed: 37 additions & 9 deletions
diff --git a/‎datasets/qanta/README.md‎
Lines changed: 19 additions & 5 deletions b/‎datasets/qanta/README.md‎
Lines changed: 19 additions & 5 deletions
diff --git a/‎datasets/quora/README.md‎
Lines changed: 19 additions & 5 deletions b/‎datasets/quora/README.md‎
Lines changed: 19 additions & 5 deletions
@@ -1,7 +1,24 @@
 ---
-pretty_name: Eraser Multi Rc
+annotations_creators:
+- crowdsourced
 language:
 - en
+language_creators:
+- found
+license:
+- other
+multilinguality:
+- monolingual
+pretty_name: Eraser MultiRC (Multi-Sentence Reading Comprehension)
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
+task_categories:
+- multiple-choice
+task_ids:
+- multiple-choice-qa
+- multiple-choice-other-inference
 paperswithcode_id: null
 ---
 
@@ -33,23 +50,24 @@ paperswithcode_id: null
 
 ## Dataset Description
 
-- **Homepage:** [https://cogcomp.seas.upenn.edu/multirc/](https://cogcomp.seas.upenn.edu/multirc/)
-- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Homepage:** http://cogcomp.org/multirc/
+- **Repository:** https://github.com/CogComp/multirc
+- **Paper:** [Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences](https://cogcomp.seas.upenn.edu/page/publication_view/833)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 1.59 MB
 - **Size of the generated dataset:** 60.70 MB
 - **Total amount of disk used:** 62.29 MB
 
 ### Dataset Summary
 
-Eraser Multi RC is a dataset for queries over multi-line passages, along with
-answers and a rationalte. Each example in this dataset has the following 5 parts
-1. A Mutli-line Passage
-2. A Query about the passage
-3. An Answer to the query
-4. A Classification as to whether the answer is right or wrong
-5. An Explanation justifying the classification
+MultiRC (Multi-Sentence Reading Comprehension) is a dataset of short paragraphs and multi-sentence questions that can be answered from the content of the paragraph.
+
+We have designed the dataset with three key challenges in mind:
+- The number of correct answer-options for each question is not pre-specified. This removes the over-reliance of current approaches on answer-options and forces them to decide on the correctness of each candidate answer independently of others. In other words, unlike previous work, the task here is not to simply identify the best answer-option, but to evaluate the correctness of each answer-option individually.
+- The correct answer(s) is not required to be a span in the text.
+- The paragraphs in our dataset have diverse provenance by being extracted from 7 different domains such as news, fiction, historical text etc., and hence are expected to be more diverse in their contents as compared to single-domain datasets.
+
+The goal of this dataset is to encourage the research community to explore approaches that can do more than sophisticated lexical-level matching. 
 
 ### Supported Tasks and Leaderboards
 
@@ -149,26 +167,46 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+https://github.com/CogComp/multirc/blob/master/LICENSE
+
+Research and Academic Use License
+Cognitive Computation Group
+University of Illinois at Urbana-Champaign
+
+Downloading software implies that you accept the following license terms:
+
+Under this Agreement, The Board of Trustees of the University of Illinois ("University"), a body corporate and politic of the State of Illinois with its principal offices at 506 South Wright Street, Urbana, Illinois 61801, U.S.A., on behalf of its Department of Computer Science on the Urbana-Champaign Campus, provides the software ("Software") described in Appendix A, attached hereto and incorporated herein, to the Licensee identified below ("Licensee") subject to the following conditions:
+
+	1. Upon execution of this Agreement by Licensee below, the University grants, and Licensee accepts, a roylaty-free, non-exclusive license:
+		A. To use unlimited copies of the Software for its own academic and research purposes.
+		B. To make derivative works. However, if Licensee distributes any derivative work based on or derived from the Software (with such distribution limited to binary form only), then Licensee will (1) notify the University (c/o Professor Dan Roth, e-mail: [email protected]) regarding its distribution of the derivative work and provide a copy if requested, and (2) clearly notify users that such derivative work is a modified version and not the original Software distributed by the University.
+		C. To redistribute (sublicense) derivative works based on the Software in binary form only to third parties provided that (1) the copyright notice and any accompanying legends or proprietary notices are reproduced on all copies, (2) no royalty is charged for such copies, and (3) third parties are restricted to using the derivative work for academic and research purposes only, without further sublicensing rights.
+	No license is granted herein that would permit Licensee to incorporate the Software into a commercial product, or to otherwise commercially exploit the Software. Should Licensee wish to make commercial use of the Software, Licensee should contact the University, c/o the Office of Technology Management ("OTM") to negotiate an appropriate license for such commercial use. To contact the OTM: [email protected]; telephone: (217)333-3781;  fax: (217) 265-5530.
+	2. THE UNIVERSITY GIVES NO WARRANTIES, EITHER EXPRESSED OR IMPLIED, FOR THE SOFTWARE AND/OR ASSOCIATED MATERIALS PROVIDED UNDER THIS AGREEMENT, INCLUDING, WITHOUT LIMITATION, WARRANTY OF MERCHANTABILITY AND WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE, AND ANY WARRANTY AGAINST INFRINGEMENT OF ANY INTELLECTUAL PROPERTY RIGHTS.
+	3. Licensee understands the Software is a research tool for which no warranties as to capabilities or accuracy are made, and Licensee accepts the Software on an "as is, with all defects" basis, without maintenance, debugging , support or improvement. Licensee assumes the entire risk as to the results and performance of the Software and/or associated materials. Licensee agrees that University shall not be held liable for any direct, indirect, consequential, or incidental damages with respect to any claim by Licensee or any third party on account of or arising from this Agreement or use of the Software and/or associated materials.
+	4. Licensee understands the Software is proprietary to the University. Licensee will take all reasonable steps to insure that the source code is protected and secured from unauthorized disclosure, use, or release and will treat it with at least the same level of care as Licensee would use to protect and secure its own proprietary computer programs and/or information, but using no less than reasonable care.
+	5. In the event that Licensee shall be in default in the performance of any material obligations under this Agreement, and if the default has not been remedied within sixty (60) days after the date of notice in writing of such default, University may terminate this Agreement by written notice. In the event of termination, Licensee shall promptly return to University the original and any copies of licensed Software in Licensee's possession. In the event of any termination of this Agreement, any and all sublicenses granted by Licensee to third parties pursuant to this Agreement (as permitted by this Agreement) prior to the date of such termination shall nevertheless remain in full force and effect.
+	6. The Software was developed, in part, with support from the National Science Foundation, and the Federal Government has certain license rights in the Software.
+	7. This Agreement shall be construed and interpreted in accordance with the laws of the State of Illinois, U.S.A..
+	8. This Agreement shall be subject to all United States Government laws and regulations now and hereafter applicable to the subject matter of this Agreement, including specifically the Export Law provisions of the Departments of Commerce and State. Licensee will not export or re-export the Software without the appropriate United States or foreign government license.
+
+By its registration below, Licensee confirms that it understands the terms and conditions of this Agreement, and agrees to be bound by them. This Agreement shall become effective as of the date of execution by Licensee.
 
 ### Citation Information
 
 ```
-
 @unpublished{eraser2019,
     title = {ERASER: A Benchmark to Evaluate Rationalized NLP Models},
     author = {Jay DeYoung and Sarthak Jain and Nazneen Fatema Rajani and Eric Lehman and Caiming Xiong and Richard Socher and Byron C. Wallace}
 }
 @inproceedings{MultiRC2018,
     author = {Daniel Khashabi and Snigdha Chaturvedi and Michael Roth and Shyam Upadhyay and Dan Roth},
     title = {Looking Beyond the Surface:A Challenge Set for Reading Comprehension over Multiple Sentences},
-    booktitle = {NAACL},
+    booktitle = {Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL)},
     year = {2018}
 }
-
 ```
 
-
 ### Contributions
 
 Thanks to [@lewtun](https://github.com/lewtun), [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf) for adding this dataset.
@@ -1,8 +1,24 @@
 ---
+annotations_creators:
+- crowdsourced
 language:
 - en
-paperswithcode_id: hotpotqa
+language_creators:
+- found
+license:
+- cc-by-sa-4.0
+multilinguality:
+- monolingual
 pretty_name: HotpotQA
+size_categories:
+- 100K<n<1M
+source_datasets:
+- original
+task_categories:
+- question-answering
+task_ids:
+- question-answering-other-multi-hop
+paperswithcode_id: hotpotqa
 ---
 
 # Dataset Card for "hotpot_qa"
@@ -34,16 +50,16 @@ pretty_name: HotpotQA
 ## Dataset Description
 
 - **Homepage:** [https://hotpotqa.github.io/](https://hotpotqa.github.io/)
-- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Repository:** https://github.com/hotpotqa/hotpot
+- **Paper:** [HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering](https://arxiv.org/abs/1809.09600)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 1213.88 MB
 - **Size of the generated dataset:** 1186.81 MB
 - **Total amount of disk used:** 2400.69 MB
 
 ### Dataset Summary
 
-HotpotQA is a new dataset with 113k  Wikipedia-based question-answer  pairs with  four  key  features:  (1)  the  questions  require finding and reasoning over multiple supporting  documents  to  answer;  (2)  the  questions  are  diverse  and  not  constrained  to  any pre-existing  knowledge  bases  or  knowledge schemas;  (3)  we  provide  sentence-level  supporting facts required for reasoning, allowingQA systems to reason with strong supervisionand explain the predictions; (4) we offer a new type  of  factoid  comparison  questions  to  testQA  systems’  ability  to  extract  relevant  facts and perform necessary comparison.
+HotpotQA is a new dataset with 113k  Wikipedia-based question-answer  pairs with  four  key  features:  (1)  the  questions  require finding and reasoning over multiple supporting  documents  to  answer;  (2)  the  questions  are  diverse  and  not  constrained  to  any pre-existing  knowledge  bases  or  knowledge schemas;  (3)  we  provide  sentence-level  supporting facts required for reasoning, allowingQA systems to reason with strong supervision and explain the predictions; (4) we offer a new type  of  factoid  comparison  questions  to  test QA  systems’  ability  to  extract  relevant  facts and perform necessary comparison.
 
 ### Supported Tasks and Leaderboards
 
@@ -203,22 +219,19 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+HotpotQA is distributed under a [CC BY-SA 4.0 License](http://creativecommons.org/licenses/by-sa/4.0/).
 
 ### Citation Information
 
 ```
-
 @inproceedings{yang2018hotpotqa,
   title={{HotpotQA}: A Dataset for Diverse, Explainable Multi-hop Question Answering},
   author={Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D.},
   booktitle={Conference on Empirical Methods in Natural Language Processing ({EMNLP})},
   year={2018}
 }
-
 ```
 
-
 ### Contributions
 
 Thanks to [@albertvillanova](https://github.com/albertvillanova), [@ghomasHudson](https://github.com/ghomasHudson) for adding this dataset.
@@ -5,6 +5,8 @@ language_creators:
 - found
 language:
 - en
+license:
+- cc0-1.0
 multilinguality:
 - monolingual
 size_categories:
@@ -50,8 +52,9 @@ pretty_name: '#MeTooMA dataset'
 ## Dataset Description
 
 - **Homepage:** https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JN4EYU
+- **Repository:** https://github.com/midas-research/MeTooMA
 - **Paper:** https://ojs.aaai.org//index.php/ICWSM/article/view/7292
-- **Point of Contact:** https://github.com/midas-research/MeTooMA
+- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 
 
 ### Dataset Summary
 
@@ -1,7 +1,23 @@
 ---
-pretty_name: MovieRationales
+annotations_creators:
+- crowdsourced
 language:
 - en
+language_creators:
+- found
+license:
+- unknown
+multilinguality:
+- monolingual
+pretty_name: MovieRationales
+size_categories:
+- 1K<n<10K
+source_datasets:
+- original
+task_categories:
+- text-classification
+task_ids:
+- sentiment-classification
 paperswithcode_id: null
 ---
 
@@ -33,9 +49,9 @@ paperswithcode_id: null
 
 ## Dataset Description
 
-- **Homepage:** [http://www.cs.jhu.edu/~ozaidan/rationales/](http://www.cs.jhu.edu/~ozaidan/rationales/)
-- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Homepage:**
+- **Repository:** https://github.com/jayded/eraserbenchmark
+- **Paper:** [ERASER: A Benchmark to Evaluate Rationalized NLP Models](https://aclanthology.org/2020.acl-main.408/)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 3.72 MB
 - **Size of the generated dataset:** 8.33 MB
@@ -145,10 +161,23 @@ The data fields are the same among all splits.
 ### Citation Information
 
 ```
-
-@unpublished{eraser2019,
-    title = {ERASER: A Benchmark to Evaluate Rationalized NLP Models},
-    author = {Jay DeYoung and Sarthak Jain and Nazneen Fatema Rajani and Eric Lehman and Caiming Xiong and Richard Socher and Byron C. Wallace}
+@inproceedings{deyoung-etal-2020-eraser,
+    title = "{ERASER}: {A} Benchmark to Evaluate Rationalized {NLP} Models",
+    author = "DeYoung, Jay  and
+      Jain, Sarthak  and
+      Rajani, Nazneen Fatema  and
+      Lehman, Eric  and
+      Xiong, Caiming  and
+      Socher, Richard  and
+      Wallace, Byron C.",
+    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
+    month = jul,
+    year = "2020",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2020.acl-main.408",
+    doi = "10.18653/v1/2020.acl-main.408",
+    pages = "4443--4458",
 }
 @InProceedings{zaidan-eisner-piatko-2008:nips,
   author    =  {Omar F. Zaidan  and  Jason Eisner  and  Christine Piatko},
@@ -157,7 +186,6 @@ The data fields are the same among all splits.
   month     =  {December},
   year      =  {2008}
 }
-
 ```
 
 
 
@@ -1,8 +1,24 @@
 ---
+annotations_creators:
+- machine-generated
 language:
 - en
-paperswithcode_id: quizbowl
+language_creators:
+- found
+license:
+- unknown
+multilinguality:
+- monolingual
 pretty_name: Quizbowl
+size_categories:
+- 100K<n<1M
+source_datasets:
+- original
+task_categories:
+- question-answering
+task_ids:
+- question-answering-other-quizbowl
+paperswithcode_id: quizbowl
 ---
 
 # Dataset Card for "qanta"
@@ -35,8 +51,8 @@ pretty_name: Quizbowl
 
 - **Homepage:** [http://www.qanta.org/](http://www.qanta.org/)
 - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Paper:** [Quizbowl: The Case for Incremental Question Answering](https://arxiv.org/abs/1904.04792)
+- **Point of Contact:** [Jordan Boyd-Graber](mailto:[email protected])
 - **Size of downloaded dataset files:** 162.84 MB
 - **Size of the generated dataset:** 140.36 MB
 - **Total amount of disk used:** 303.20 MB
@@ -183,15 +199,13 @@ The data fields are the same among all splits.
 ### Citation Information
 
 ```
-
 @article{Rodriguez2019QuizbowlTC,
   title={Quizbowl: The Case for Incremental Question Answering},
   author={Pedro Rodriguez and Shi Feng and Mohit Iyyer and He He and Jordan L. Boyd-Graber},
   journal={ArXiv},
   year={2019},
   volume={abs/1904.04792}
 }
-
 ```
 
 
 
@@ -1,8 +1,24 @@
 ---
+annotations_creators:
+- expert-generated
 language:
 - en
+language_creators:
+- found
+license:
+- unknown
+multilinguality:
+- monolingual
+pretty_name: Quora Question Pairs
+size_categories:
+- 100K<n<1M
+source_datasets:
+- original
+task_categories:
+- text-classification
+task_ids:
+- semantic-similarity-classification
 paperswithcode_id: null
-pretty_name: quora
 ---
 
 # Dataset Card for "quora"
@@ -142,13 +158,11 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+Unknown license.
 
 ### Citation Information
 
-```
-
-```
+Unknown.
 
 
 ### Contributions