-
Notifications
You must be signed in to change notification settings - Fork 3k
adding masahaner dataset #2465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding masahaner dataset #2465
Conversation
|
Thank you for the review. |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this dataset !
I just added a few comments
Also the CI is failing for two reasons:
- you need to run
make styleto format the code - the dummy data are missing. They're used to quickly test that the script works as expected. You can generate them automatically using this command:
datasets-cli dummy_data datasets/masaknaner --auto_generate --n_lines 20
For more infos regarding the dummy data generation, please read: https://github.com/huggingface/datasets/blob/master/ADD_NEW_DATASET.md#automatically-add-code-metadata
Ideally the resulting dummy_data.zip files should be less than 20KB each, let me know if this is not the case
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
|
Thanks a lot for the corrections and comments. I have resolved point 2. The make style still throws some errors, please see below black --line-length 119 --target-version py36 tests src benchmarks datasets/**/*.py metrics Can you help to resolve this? |
|
Thank you very much @lhoestq for the help. |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks all good now !
Thanks a lot :)
Adding Masakhane dataset https://github.com/masakhane-io/masakhane-ner
@lhoestq , can you please review