Skip to content

Conversation

@HallerPatrick
Copy link
Contributor

Because I regularly work with enwik8, I would like to contribute the dataset loader 🤗

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks for adding this dataset :)

Before merging it would be nice to include some dummy data for the dataset to be tested properly.

Here are a few steps to create them:

  • create a directory datasets/enwik8/dummy/enwik8/1.1.0/dummy_data
  • inside the directory, create another dir enwik8.zip (yes, a directory, not an actual ZIP archive)
  • inside enwik8.zip, place two or three .raw files
  • zip the dummy_data folder to datasets/enwik8/dummy/enwik8/1.1.0/dummy_data.zip

@HallerPatrick
Copy link
Contributor Author

@lhoestq Thank you for the great feedback! Looks like all tests are passing now :)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you !

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 1, 2022

The documentation is not available anymore as the PR was closed or merged.

@lhoestq lhoestq merged commit d68a747 into huggingface:master Jun 1, 2022
@HallerPatrick HallerPatrick deleted the enwik8-dataset branch June 1, 2022 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants