Skip to content
This repository was archived by the owner on Jun 30, 2022. It is now read-only.

Conversation

@gchhablani
Copy link
Contributor

Add more size categories in the tagging app.

Add more size categories in the tagging app.
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ! Should we change 1T to 1B ?

EDIT: actually it looks like the 1B is missing, and then we can have 1T for trillion

@gchhablani
Copy link
Contributor Author

@lhoestq 🙈 I missed it.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

@lhoestq
Copy link
Member

lhoestq commented Mar 22, 2021

cc @yjernite can you take a look at these changes ?
I don't have permissions on this repo

@yjernite
Copy link
Member

Looks good to me, although I'm not sure how many datasets we have that fall in any of these categories (the size here is the total number of rows in the table in a single config)

@gchhablani did you have specific datasets in mind?

@gchhablani
Copy link
Contributor Author

gchhablani commented Mar 22, 2021

Hi @yjernite,

In the existing datasets, there are some above a billion, some above 10 million, I just thought it would be better for such datasets.

You can check the PR I made on datasets, a lot of the datasets(configs) are between 1M and 10M and some above 10M, where it was previous labelled as "n>1M".

I don't have anything specific in mind for 1T, but I think we can have it there, just in case something comes along.

Up to you and @lhoestq to decide :) I'll change everything back to "n>1M" if you'd like on the PR.

@yjernite yjernite merged commit fd60c83 into huggingface:main Mar 22, 2021
theo-m pushed a commit to huggingface/datasets that referenced this pull request Mar 24, 2021
SBrandeis added a commit to huggingface/datasets that referenced this pull request Apr 26, 2021
* basic validation

* ci script and test change

* color is better

* check all option

* validate size cats & multiling, point to reference file urls on error

* add validation to ci and rename files

* spurrious change to trigger CI

* add qa reqs

* disallow empty lists

* better error msg: show all invalid values rather than first one

* some code shuffling & better error msg for langcodes

* add pyyaml to qa reqs

* fix package file loading

* include json resources

* reflect changes to size cats from huggingface/datasets-tagging#11

* trying another format for package_data

* ci works! fixing the readme like a good citizen 🤗

* escape validation everywhere it's allowed in the tagging app

* code review: more json files, conditional import

* pointers to integrate readme metadata in class (wip)

* no pydantic

* fix docs?

* Revert "fix docs?"

This reverts commit ab82a6c.

* remove pointers to add readme to loader

* Get rid of langcodes, some refactor

* Update languages.json

* Refactor, add tests

* I said, tests!!

Co-authored-by: theo <[email protected]>
Co-authored-by: SBrandeis <[email protected]>
lvwerra pushed a commit to huggingface/evaluate that referenced this pull request Mar 31, 2022
* basic validation

* ci script and test change

* color is better

* check all option

* validate size cats & multiling, point to reference file urls on error

* add validation to ci and rename files

* spurrious change to trigger CI

* add qa reqs

* disallow empty lists

* better error msg: show all invalid values rather than first one

* some code shuffling & better error msg for langcodes

* add pyyaml to qa reqs

* fix package file loading

* include json resources

* reflect changes to size cats from huggingface/datasets-tagging#11

* trying another format for package_data

* ci works! fixing the readme like a good citizen 🤗

* escape validation everywhere it's allowed in the tagging app

* code review: more json files, conditional import

* pointers to integrate readme metadata in class (wip)

* no pydantic

* fix docs?

* Revert "fix docs?"

This reverts commit ab82a6cbb1dd5fbc7f0ea70e98156d7419c54bf1.

* remove pointers to add readme to loader

* Get rid of langcodes, some refactor

* Update languages.json

* Refactor, add tests

* I said, tests!!

Co-authored-by: theo <[email protected]>
Co-authored-by: SBrandeis <[email protected]>
lvwerra pushed a commit to huggingface/evaluate that referenced this pull request Mar 31, 2022
* basic validation

* ci script and test change

* color is better

* check all option

* validate size cats & multiling, point to reference file urls on error

* add validation to ci and rename files

* spurrious change to trigger CI

* add qa reqs

* disallow empty lists

* better error msg: show all invalid values rather than first one

* some code shuffling & better error msg for langcodes

* add pyyaml to qa reqs

* fix package file loading

* include json resources

* reflect changes to size cats from huggingface/datasets-tagging#11

* trying another format for package_data

* ci works! fixing the readme like a good citizen 🤗

* escape validation everywhere it's allowed in the tagging app

* code review: more json files, conditional import

* pointers to integrate readme metadata in class (wip)

* no pydantic

* fix docs?

* Revert "fix docs?"

This reverts commit ab82a6cbb1dd5fbc7f0ea70e98156d7419c54bf1.

* remove pointers to add readme to loader

* Get rid of langcodes, some refactor

* Update languages.json

* Refactor, add tests

* I said, tests!!

Co-authored-by: theo <[email protected]>
Co-authored-by: SBrandeis <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants