Skip to content

Conversation

@dadelani
Copy link
Contributor

@dadelani dadelani commented Jun 8, 2021

Adding Masakhane dataset https://github.com/masakhane-io/masakhane-ner

@lhoestq , can you please review

@dadelani
Copy link
Contributor Author

Thank you for the review.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this dataset !

I just added a few comments

Also the CI is failing for two reasons:

  1. you need to run make style to format the code
  2. the dummy data are missing. They're used to quickly test that the script works as expected. You can generate them automatically using this command:
datasets-cli dummy_data datasets/masaknaner --auto_generate --n_lines 20

For more infos regarding the dummy data generation, please read: https://github.com/huggingface/datasets/blob/master/ADD_NEW_DATASET.md#automatically-add-code-metadata

Ideally the resulting dummy_data.zip files should be less than 20KB each, let me know if this is not the case

@dadelani
Copy link
Contributor Author

Thanks a lot for the corrections and comments.

I have resolved point 2. The make style still throws some errors, please see below

black --line-length 119 --target-version py36 tests src benchmarks datasets/**/*.py metrics
/bin/sh: 1: black: not found
Makefile:13: recipe for target 'style' failed
make: *** [style] Error 127

Can you help to resolve this?

@dadelani
Copy link
Contributor Author

Thank you very much @lhoestq for the help.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks all good now !

Thanks a lot :)

@lhoestq lhoestq merged commit 04fdb95 into huggingface:master Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants