-
Notifications
You must be signed in to change notification settings - Fork 3k
Adding dataset enwik8 #4321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding dataset enwik8 #4321
Conversation
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks for adding this dataset :)
Before merging it would be nice to include some dummy data for the dataset to be tested properly.
Here are a few steps to create them:
- create a directory
datasets/enwik8/dummy/enwik8/1.1.0/dummy_data - inside the directory, create another dir
enwik8.zip(yes, a directory, not an actual ZIP archive) - inside
enwik8.zip, place two or three .raw files - zip the
dummy_datafolder todatasets/enwik8/dummy/enwik8/1.1.0/dummy_data.zip
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
|
@lhoestq Thank you for the great feedback! Looks like all tests are passing now :) |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you !
|
The documentation is not available anymore as the PR was closed or merged. |
Because I regularly work with enwik8, I would like to contribute the dataset loader 🤗