Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Aug 31, 2021

Update the size of the Openwebtext dataset

I also regenerated the dataset_infos.json but the data file checksum didn't change, and the number of examples either (8013769 examples)

Close #2839, close #726.

@lhoestq
Copy link
Member Author

lhoestq commented Sep 7, 2021

merging since the CI error in unrelated to this PR and fixed on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenWebText: NonMatchingSplitsSizesError "Checksums didn't match for dataset source files" error while loading openwebtext dataset

2 participants