huggingface · lhoestq · Jun 28, 2021 · Jun 28, 2021 · Jun 28, 2021
diff --git a/datasets/fever/README.md b/datasets/fever/README.md
@@ -2,6 +2,24 @@
 languages:
 - en
 paperswithcode_id: fever
+annotations_creators:
+- crowdsourced
+language_creators:
+- found
+licenses:
+- cc-by-sa-3.0
+- gpl-3.0
+multilinguality:
+- monolingual
+pretty_name: FEVER
+size_categories:
+- 100K<n<1M
+source_datasets:
+- extended|wikipedia
+task_categories:
+- text-classification
+task_ids:
+- text-classification-other-knowledge-verification
 ---
 
 # Dataset Card for "fever"
@@ -46,15 +64,17 @@ With billions of individual pages on the web providing information on almost eve
 
 The FEVER workshops are a venue for work in verifiable knowledge extraction and to stimulate progress in this direction.
 
-FEVER  V1.0
+It consists of claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as SUPPORTED, REFUTED or NOTENOUGHINFO by annotators.
 
 ### Supported Tasks and Leaderboards
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+The task is verification of textual claims against textual sources.
+
+When compared to textual entailment (TE)/natural language inference, the key difference is that in these tasks the passage to verify each claim is given, and in recent years it typically consists a single sentence, while in verification systems it is retrieved from a large set of documents in order to form the evidence.
 
 ### Languages
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+The dataset is in English.
 
 ## Dataset Structure
 
@@ -70,7 +90,13 @@ We show detailed information for up to 5 configurations of the dataset.
 
 An example of 'train' looks as follows.
 ```
-
+'claim': 'Nikolaj Coster-Waldau worked with the Fox Broadcasting Company.',
+ 'evidence_wiki_url': 'Nikolaj_Coster-Waldau',
+ 'label': 'SUPPORTS',
+ 'id': 75397,
+ 'evidence_id': 104971,
+ 'evidence_sentence_id': 7,
+ 'evidence_annotation_id': 92206}
 ```
 
 #### v2.0
@@ -81,7 +107,13 @@ An example of 'train' looks as follows.
 
 An example of 'validation' looks as follows.
 ```
-
+{'claim': "There is a convicted statutory rapist called Chinatown's writer.",
+  'evidence_wiki_url': '',
+  'label': 'NOT ENOUGH INFO',
+  'id': 500000,
+  'evidence_id': -1,
+  'evidence_sentence_id': -1,
+  'evidence_annotation_id': 269158}
 ```
 
 #### wiki_pages
@@ -92,14 +124,17 @@ An example of 'validation' looks as follows.
 
 An example of 'wikipedia_pages' looks as follows.
 ```
-
+{'text': 'The following are the football -LRB- soccer -RRB- events of the year 1928 throughout the world . ',
+  'lines': '0\tThe following are the football -LRB- soccer -RRB- events of the year 1928 throughout the world .\n1\t',
+  'id': '1928_in_association_football'}
 ```
 
 ### Data Fields
 
 The data fields are the same among all splits.
 
 #### v1.0
+
 - `id`: a `int32` feature.
 - `label`: a `string` feature.
 - `claim`: a `string` feature.
@@ -109,6 +144,7 @@ The data fields are the same among all splits.
 - `evidence_sentence_id`: a `int32` feature.
 
 #### v2.0
+
 - `id`: a `int32` feature.
 - `label`: a `string` feature.
 - `claim`: a `string` feature.
@@ -118,6 +154,7 @@ The data fields are the same among all splits.
 - `evidence_sentence_id`: a `int32` feature.
 
 #### wiki_pages
+
 - `id`: a `string` feature.
 - `text`: a `string` feature.
 - `lines`: a `string` feature.
@@ -194,20 +231,21 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-
-### Citation Information
+FEVER license:
 
+```
+These data annotations incorporate material from Wikipedia, which is licensed pursuant to the Wikipedia Copyright Policy. These annotations are made available under the license terms described on the applicable Wikipedia article pages, or, where Wikipedia license terms are unavailable, under the Creative Commons Attribution-ShareAlike License (version 3.0), available at http://creativecommons.org/licenses/by-sa/3.0/ (collectively, the â€œLicense Termsâ€). You may not use these files except in compliance with the applicable License Terms.
 ```
 
+### Citation Information
+
+```bibtex
 @inproceedings{Thorne18Fever,
     author = {Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Mittal, Arpit},
     title = {{FEVER}: a Large-scale Dataset for Fact Extraction and VERification},
     booktitle = {NAACL-HLT},
     year = {2018}
 }
-}
-
 ```
 
 

diff --git a/datasets/fever/fever.py b/datasets/fever/fever.py
@@ -212,8 +212,8 @@ def _generate_examples(self, filepath):
                             "evidence_sentence_id": -1,
                         }
         elif self.config.name == "wiki_pages":
-            for file in filepath:
+            for file_id, file in enumerate(filepath):
                 with open(file, encoding="utf-8") as f:
-                    for id_, row in enumerate(f):
+                    for row_id, row in enumerate(f):
                         data = json.loads(row)
-                        yield id_, data
+                        yield f"{file_id}_{row_id}", data