-
Notifications
You must be signed in to change notification settings - Fork 3k
BIG-bench #4125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
BIG-bench #4125
Changes from all commits
Commits
Show all changes
98 commits
Select commit
Hold shift + click to select a range
4b317d6
Draft PR for bigbench datasets
andersjohanandreassen 241a4c2
Update handeling tasks with non-rectangular targets
andersjohanandreassen be1c6ca
updates before PR
andersjohanandreassen 575e383
Modifying behavior for min num examples
andersjohanandreassen 5107096
Updating error message
andersjohanandreassen 58774a4
adding strip to inputs and targets to match bbseqio implementation
andersjohanandreassen ca11c22
Updated readme
andersjohanandreassen d03005f
added config doc string
andersjohanandreassen 497a9d8
trimming linewidths to pass tests
andersjohanandreassen db42258
reformatting using black
andersjohanandreassen ca3693a
reformatting using black with py36
andersjohanandreassen bf27af2
using isort to fix import order
andersjohanandreassen 9f46efc
formatting for flake8
andersjohanandreassen adbf59c
Merge branch 'huggingface:master' into master
andersjohanandreassen 36460a1
updated readme
andersjohanandreassen 3d755dc
added YAML tags
andersjohanandreassen 00ee938
Adding bigbench github to additional requirements to pass tests
andersjohanandreassen adf89b9
replaing print statement with logger
andersjohanandreassen 1c3128b
isort formatting
andersjohanandreassen aab36f6
adding missing t5 dependency to additional requirements to pass tests
andersjohanandreassen c2364d7
adding missing seqio dependency to additional requirements to pass tests
andersjohanandreassen ba07c27
adding missing tfds dependency to additional requirements to pass tests
andersjohanandreassen 4f5f961
adding missing tf metadata dependency to additional requirements to p…
andersjohanandreassen b602a79
Edits to make tests pass. Some modifications to args (task_name -> na…
andersjohanandreassen b4ff987
fixed configs and moved checks for subtask and task names
andersjohanandreassen d615d7f
update error messages and some formatting
andersjohanandreassen 804aac4
more test requirements
andersjohanandreassen 04a3fbf
file formatting
andersjohanandreassen b5b8363
adding dummy.zip
andersjohanandreassen 9501673
additional requirements again
andersjohanandreassen 80ea48c
tensorflow hub requirement
andersjohanandreassen c948370
removing dummy.zip file. wrong format
andersjohanandreassen 0e035e5
adding individual dummy_data.zip for each config
andersjohanandreassen 58cc744
immutable dict requirement
andersjohanandreassen 57a37c6
removing yaml flags text
andersjohanandreassen e241675
added en to languages
andersjohanandreassen 5849ebd
Merge branch 'huggingface:master' into master
andersjohanandreassen 175e805
adding dataset_infos.json
andersjohanandreassen d68d469
Update datasets/bigbench/README.md
andersjohanandreassen 51184c1
Update datasets/bigbench/README.md
andersjohanandreassen 1ebb159
adding bigbench to setup.py
andersjohanandreassen f50c546
removing dependencies from additional-test-requirements.txt as they w…
andersjohanandreassen e0b186f
cleaning up bigbench import
andersjohanandreassen 630eee6
delete dummy data
andersjohanandreassen 8c54f80
adding back in abstract_narrative_understanding dummy data
andersjohanandreassen 2386aad
updating setup dependencies to temp branch that imports bigbench with…
andersjohanandreassen e7e72f1
fixing isort format issue
andersjohanandreassen 1a764b9
more formatting for isort and flake8
andersjohanandreassen caac766
Merge branch 'huggingface:master' into master
andersjohanandreassen cb98a40
commenting out bigbench to test apache-beam issue
andersjohanandreassen 128597b
adding bigbench back into setup.py
andersjohanandreassen 4a8bec1
Merge branch 'huggingface:master' into master
andersjohanandreassen ae1e201
updating language section in readme
andersjohanandreassen 675c369
Updated Dataset Creation in readme
andersjohanandreassen 75f57f3
More updates to README
andersjohanandreassen f55dedd
updated index in readme
andersjohanandreassen 2015896
update path to bigbench install
andersjohanandreassen fe6e2d1
Merge branch 'huggingface:master' into master
andersjohanandreassen 3969599
dummy update to trigger tests
andersjohanandreassen 987ae39
switching to experimental bigbench branch to resolve dependency issues
andersjohanandreassen 8b3107a
switching to experimental bigbench branch to resolve dependency issues
andersjohanandreassen 3f5f8eb
trigger test
andersjohanandreassen 4cce1ba
trigger test
andersjohanandreassen e6ae146
trigger test
andersjohanandreassen 4b7ff88
trigger test
andersjohanandreassen 1ad556f
add version
andersjohanandreassen d6d6810
switching back to tar install for bigbenhc
andersjohanandreassen 70e81eb
formatting
andersjohanandreassen c175589
Merge branch 'huggingface:master' into master
andersjohanandreassen 86f4c22
trigger test
andersjohanandreassen ef33a84
trigger test
andersjohanandreassen 650b4c2
Adding back in empty sections to readme
andersjohanandreassen 7fd8348
fix sections of readme
andersjohanandreassen bbb061d
Merge branch 'huggingface:master' into master
andersjohanandreassen 3eaf284
Modifying dummy data. Passing empty folder to test failure
andersjohanandreassen 6f2da0c
deleting dummy data
andersjohanandreassen 23bb70b
pin s3 dependencies to speed up pip backtracking
lhoestq af41167
disable s3 test on windows for now
lhoestq 88aeb20
disable other test too
lhoestq 62ff3d9
explicitely require sentencepiece
lhoestq 0441a0f
Apply suggestions from code review
lhoestq 890a49f
style
lhoestq 1317e8f
Minor updates to README
andersjohanandreassen e356fb6
trigger test
andersjohanandreassen 54894e0
temporarily commenting out 4/5 test to focus on windows issue.
andersjohanandreassen bd72c7f
switching to private github repo for bigbench while debugging current…
andersjohanandreassen 9bb5eb8
long filenames workaround attempt
andersjohanandreassen 1cc4d9c
trigger new tests after adding editdistance on bigbench requirements …
andersjohanandreassen 1f85c7a
adding sacremoses
andersjohanandreassen 7828cd2
changing to bigbench tar file
andersjohanandreassen db51cd2
reverting circleci to run all test
andersjohanandreassen 6bb3707
Experiment: pinning apache-beam and removing sacremoses
andersjohanandreassen 40fcd54
reverting circleci edit
andersjohanandreassen 0e621cf
adding sacremoses back in and reverting apache-beam version
andersjohanandreassen 9c25117
someone changed dependencies of bigbench. testing to see if it causes…
andersjohanandreassen 363d421
updated bigbench requirements. triggering tests
andersjohanandreassen 1b706ed
Update format_fn after breaking change from bigbench
andersjohanandreassen a4e03a8
code formatting
andersjohanandreassen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,230 @@ | ||
| --- | ||
| annotations_creators: | ||
| - crowdsourced | ||
| - expert-generated | ||
| - machine-generated | ||
| language_creators: | ||
| - crowdsourced | ||
| - expert-generated | ||
| - machine-generated | ||
| - other | ||
| languages: | ||
| - en | ||
| licenses: | ||
| - apache-2.0 | ||
| multilinguality: | ||
| - multilingual | ||
| - monolingual | ||
| pretty_name: bigbench | ||
| size_categories: | ||
| - unknown | ||
| source_datasets: | ||
| - original | ||
| task_categories: | ||
| - multiple-choice | ||
| - question-answering | ||
| - text-classification | ||
| - text-generation | ||
| - zero-shot-classification | ||
| - other | ||
| task_ids: | ||
| - multiple-choice-qa | ||
| - extractive-qa | ||
| - open-domain-qa | ||
| - closed-domain-qa | ||
| - fact-checking | ||
| - acceptability-classification | ||
| - intent-classification | ||
| - multi-class-classification | ||
| - multi-label-classification | ||
| - text-scoring | ||
| - hate-speech-detection | ||
| - language-modeling | ||
| --- | ||
|
|
||
| # Dataset Card for BIG-bench | ||
|
|
||
| ## Table of Contents | ||
| - [Table of Contents](#table-of-contents) | ||
| - [Dataset Description](#dataset-description) | ||
| - [Dataset Summary](#dataset-summary) | ||
| - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) | ||
| - [Languages](#languages) | ||
| - [Dataset Structure](#dataset-structure) | ||
| - [Data Instances](#data-instances) | ||
| - [Data Fields](#data-fields) | ||
| - [Data Splits](#data-splits) | ||
| - [Dataset Creation](#dataset-creation) | ||
| - [Curation Rationale](#curation-rationale) | ||
| - [Source Data](#source-data) | ||
| - [Annotations](#annotations) | ||
| - [Personal and Sensitive Information](#personal-and-sensitive-information) | ||
| - [Considerations for Using the Data](#considerations-for-using-the-data) | ||
| - [Social Impact of Dataset](#social-impact-of-dataset) | ||
| - [Discussion of Biases](#discussion-of-biases) | ||
| - [Other Known Limitations](#other-known-limitations) | ||
| - [Additional Information](#additional-information) | ||
| - [Dataset Curators](#dataset-curators) | ||
| - [Licensing Information](#licensing-information) | ||
| - [Citation Information](#citation-information) | ||
| - [Contributions](#contributions) | ||
|
|
||
| ## Dataset Description | ||
|
|
||
| - **Homepage/Repository:** [https://github.com/google/BIG-bench](https://github.com/google/BIG-bench) | ||
| - **Paper:** In progress | ||
| - **Leaderboard:** | ||
| - **Point of Contact:** [[email protected]](mailto:[email protected]) | ||
|
|
||
|
|
||
| ### Dataset Summary | ||
|
|
||
| The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Tasks included in BIG-bench are summarized by keyword [here](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/keywords_to_tasks.md), and by task name [here](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/README.md). A paper introducing the benchmark, including evaluation results on large language models, is currently in preparation. | ||
|
|
||
| ### Supported Tasks and Leaderboards | ||
|
|
||
| BIG-Bench consists of both json and programmatic tasks. | ||
| This implementation in HuggingFace datasets implements | ||
|
|
||
| - 24 BIG-bench Lite tasks | ||
|
|
||
| - 167 BIG-bench json tasks (includes BIG-bench Lite) | ||
|
|
||
| To study the remaining programmatic tasks, please see the [BIG-bench GitHub repo](https://github.com/google/BIG-bench) | ||
|
|
||
| ### Languages | ||
|
|
||
| Although predominantly English, BIG-bench contains tasks in over 1000 written languages, as well as some synthetic and programming languages. | ||
| See [BIG-bench organized by keywords](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/keywords_to_tasks.md). Relevant keywords include `multilingual`, `non-english`, `low-resource-language`, `translation`. | ||
|
|
||
| For tasks specifically targeting low-resource languages, see the table below: | ||
|
|
||
| Task Name | Languages | | ||
| --|--| | ||
| Conlang Translation Problems | English, German, Finnish, Abma, Apinayé, Inapuri, Ndebele, Palauan| | ||
| Kannada Riddles | Kannada| | ||
| Language Identification | 1000 languages | | ||
| Swahili English Proverbs | Swahili | | ||
| Which Wiki Edit | English, Russian, Spanish, German, French, Turkish, Japanese, Vietnamese, Chinese, Arabic, Norwegian, Tagalog| | ||
|
|
||
|
|
||
|
|
||
|
|
||
| ## Dataset Structure | ||
|
|
||
| ### Data Instances | ||
|
|
||
| Each dataset contains 5 features. For example an instance from the `emoji_movie` task is: | ||
|
|
||
| ``` | ||
| { | ||
| "idx": 0, | ||
| "inputs": "Q: What movie does this emoji describe? 👦👓⚡️\n choice: harry potter\n. choice: shutter island\n. choice: inglourious basterds\n. choice: die hard\n. choice: moonlight\nA:" | ||
| "targets": ["harry potter"], | ||
| "multiple_choice_targets":["harry potter", "shutter island", "die hard", "inglourious basterds", "moonlight"], | ||
| "multiple_choice_scores": [1, 0, 0, 0, 0] | ||
| } | ||
| ``` | ||
|
|
||
| For tasks that do not have multiple choice targets, the lists are empty. | ||
|
|
||
|
|
||
| ### Data Fields | ||
|
|
||
| Every example has the following fields | ||
| - `idx`: an `int` feature | ||
| - `inputs`: a `string` feature | ||
| - `targets`: a sequence of `string` feature | ||
| - `multiple_choice_targets`: a sequence of `string` features | ||
| - `multiple_choice_scores`: a sequence of `int` features | ||
|
|
||
| ### Data Splits | ||
|
|
||
| Each task has a `default`, `train` and `validation` split. | ||
| The split `default` uses all the samples for each task (and it's the same as `all` used in the `bigbench.bbseqio` implementation.) | ||
| For standard evaluation on BIG-bench, we recommend using the `default` split, and the `train` and `validation` split is to be used if one wants to train a model on BIG-bench. | ||
|
|
||
| ## Dataset Creation | ||
|
|
||
| BIG-bench tasks were collaboratively submitted through GitHub pull requests. | ||
|
|
||
| Each task went through a review and meta-review process with criteria outlined in the [BIG-bench repository documentation](https://github.com/google/BIG-bench/blob/main/docs/doc.md#submission-review-process). | ||
| Each task was required to describe the data source and curation methods on the task README page. | ||
|
|
||
| ### Curation Rationale | ||
|
|
||
| [More Information Needed] | ||
|
|
||
| ### Source Data | ||
|
|
||
| #### Initial Data Collection and Normalization | ||
|
|
||
| [More Information Needed] | ||
|
|
||
| #### Who are the source language producers? | ||
|
|
||
| [More Information Needed] | ||
|
|
||
|
|
||
| ### Annotations | ||
|
|
||
| #### Annotation process | ||
|
|
||
| [More Information Needed] | ||
|
|
||
| #### Who are the annotators? | ||
|
|
||
| [More Information Needed] | ||
|
|
||
|
|
||
| ### Personal and Sensitive Information | ||
|
|
||
| [More Information Needed] | ||
|
|
||
|
|
||
| ## Considerations for Using the Data | ||
|
|
||
| BIG-bench contains a wide range of tasks, some of which are sensitive and should be used with care. | ||
|
|
||
| Some tasks are specifically designed to test biases and failures common to large language models, and so may elicit inappropriate or harmful responses. | ||
| For a more thorough discussion see the [BIG-bench paper](in progress). | ||
|
|
||
| To view tasks designed to probe pro-social behavior, including alignment, social, racial, gender, religious or political bias; toxicity; inclusion; and other issues please see tasks under the [pro-social behavior keywords](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/keywords_to_tasks.md#pro-social-behavior) on the BIG-bench repository. | ||
|
|
||
|
|
||
| ### Social Impact of Dataset | ||
|
|
||
| [More Information Needed] | ||
|
|
||
|
|
||
| ### Discussion of Biases | ||
|
|
||
| [More Information Needed] | ||
|
|
||
|
|
||
| ### Other Known Limitations | ||
|
|
||
| [More Information Needed] | ||
|
|
||
|
|
||
| ## Additional Information | ||
|
|
||
| For a more thorough discussion of all aspects of BIG-bench including dataset creation and evaluations see the BIG-bench repository [https://github.com/google/BIG-bench](https://github.com/google/BIG-bench) and paper [] | ||
|
|
||
| ### Dataset Curators | ||
|
|
||
| [More Information Needed] | ||
|
|
||
|
|
||
| ### Licensing Information | ||
|
|
||
| [Apache License 2.0](https://github.com/google/BIG-bench/blob/main/LICENSE) | ||
|
|
||
| ### Citation Information | ||
|
|
||
| To be added soon ! | ||
|
|
||
| ### Contributions | ||
| For a full list of contributors to the BIG-bench dataset, see the paper. | ||
|
|
||
| Thanks to [@andersjohanandreassen](https://github.com/andersjohanandreassen) and [@ethansdyer](https://github.com/ethansdyer) for adding this dataset to HuggingFace. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to open a new PR to add the paper once it's ready :)