This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
For a detailed description of this see the paper The NarrativeQA Reading Comprehension Challenge. Please cite the paper if you use this corpus in your work.
- documents.csv - contains document_id, set, kind, story_url, story_file_size, wiki_url, wiki_title, story_word_count, story_start, story_end. The word count is approximate after some basic cleanup and tokenization.
- third_party/wikipedia/summaries.csv - contains document_id, set, summary, summary_tokenized. The summaries are from Wikipedia.
- qaps.csv - contains document_id, set, question, answer1, answer2, question_tokenized, answer1_tokenized, answer2_tokenized.
- download_stories.sh - script to download the stories.
- compare.sh - compare downloaded story's file size to the document size we had. (At the time of publication, all stories have <3.5% file difference (except one), likely due to punctuation encoding.)
@article{narrativeqa,
author = {Tom\'a\v s Ko\v cisk\'y and Jonathan Schwarz and Phil Blunsom and
Chris Dyer and Karl Moritz Hermann and G\'abor Melis and
Edward Grefenstette},
title = {The {NarrativeQA} Reading Comprehension Challenge},
journal = {Transactions of the Association for Computational Linguistics},
url = {https://TBD},
volume = {TBD},
year = {2018},
pages = {TBD},
}
The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.
| property | value | ||||||
|---|---|---|---|---|---|---|---|
| name | The NarrativeQA Reading Comprehension Challenge Dataset |
||||||
| alternateName | NarrativeQA |
||||||
| url | https://github.com/deepmind/narrativeqa |
||||||
| sameAs | https://github.com/deepmind/narrativeqa |
||||||
| description | This repository contains the NarrativeQA dataset. It includes the list of
documents with Wikipedia summaries, links to full stories, and questions and answers. |
||||||
| provider |
|
||||||
| license |
|
||||||
| citation | https://identifiers.org/arxiv:1712.07040 |