diff --git a/datasets/bookcorpus/README.md b/datasets/bookcorpus/README.md index 322f3d52a04..bea1fe72d41 100644 --- a/datasets/bookcorpus/README.md +++ b/datasets/bookcorpus/README.md @@ -153,7 +153,9 @@ The data fields are the same among all splits. ### Licensing Information -[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) +The books have been crawled from https://www.smashwords.com, see their [terms of service](https://www.smashwords.com/about/tos) for more information. + +A data sheet for this dataset has also been created and published in [Addressing "Documentation Debt" in Machine Learning Research: A Retrospective Datasheet for BookCorpus](https://arxiv.org/abs/2105.05241). ### Citation Information @@ -165,10 +167,8 @@ The data fields are the same among all splits. month = {December}, year = {2015} } - ``` - ### Contributions -Thanks to [@lewtun](https://github.com/lewtun), [@richarddwang](https://github.com/richarddwang), [@lhoestq](https://github.com/lhoestq), [@thomwolf](https://github.com/thomwolf) for adding this dataset. \ No newline at end of file +Thanks to [@lewtun](https://github.com/lewtun), [@richarddwang](https://github.com/richarddwang), [@lhoestq](https://github.com/lhoestq), [@thomwolf](https://github.com/thomwolf) for adding this dataset.