Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
4b317d6
Draft PR for bigbench datasets
andersjohanandreassen Mar 14, 2022
241a4c2
Update handeling tasks with non-rectangular targets
andersjohanandreassen Mar 14, 2022
be1c6ca
updates before PR
andersjohanandreassen Apr 6, 2022
575e383
Modifying behavior for min num examples
andersjohanandreassen Apr 7, 2022
5107096
Updating error message
andersjohanandreassen Apr 7, 2022
58774a4
adding strip to inputs and targets to match bbseqio implementation
andersjohanandreassen Apr 7, 2022
ca11c22
Updated readme
andersjohanandreassen Apr 7, 2022
d03005f
added config doc string
andersjohanandreassen Apr 7, 2022
497a9d8
trimming linewidths to pass tests
andersjohanandreassen Apr 7, 2022
db42258
reformatting using black
andersjohanandreassen Apr 7, 2022
ca3693a
reformatting using black with py36
andersjohanandreassen Apr 7, 2022
bf27af2
using isort to fix import order
andersjohanandreassen Apr 7, 2022
9f46efc
formatting for flake8
andersjohanandreassen Apr 7, 2022
adbf59c
Merge branch 'huggingface:master' into master
andersjohanandreassen Apr 7, 2022
36460a1
updated readme
andersjohanandreassen Apr 8, 2022
3d755dc
added YAML tags
andersjohanandreassen Apr 8, 2022
00ee938
Adding bigbench github to additional requirements to pass tests
andersjohanandreassen Apr 8, 2022
adf89b9
replaing print statement with logger
andersjohanandreassen Apr 8, 2022
1c3128b
isort formatting
andersjohanandreassen Apr 8, 2022
aab36f6
adding missing t5 dependency to additional requirements to pass tests
andersjohanandreassen Apr 8, 2022
c2364d7
adding missing seqio dependency to additional requirements to pass tests
andersjohanandreassen Apr 8, 2022
ba07c27
adding missing tfds dependency to additional requirements to pass tests
andersjohanandreassen Apr 8, 2022
4f5f961
adding missing tf metadata dependency to additional requirements to p…
andersjohanandreassen Apr 8, 2022
b602a79
Edits to make tests pass. Some modifications to args (task_name -> na…
andersjohanandreassen Apr 8, 2022
b4ff987
fixed configs and moved checks for subtask and task names
andersjohanandreassen Apr 8, 2022
d615d7f
update error messages and some formatting
andersjohanandreassen Apr 8, 2022
804aac4
more test requirements
andersjohanandreassen Apr 8, 2022
04a3fbf
file formatting
andersjohanandreassen Apr 8, 2022
b5b8363
adding dummy.zip
andersjohanandreassen Apr 8, 2022
9501673
additional requirements again
andersjohanandreassen Apr 8, 2022
80ea48c
tensorflow hub requirement
andersjohanandreassen Apr 8, 2022
c948370
removing dummy.zip file. wrong format
andersjohanandreassen Apr 8, 2022
0e035e5
adding individual dummy_data.zip for each config
andersjohanandreassen Apr 8, 2022
58cc744
immutable dict requirement
andersjohanandreassen Apr 8, 2022
57a37c6
removing yaml flags text
andersjohanandreassen Apr 8, 2022
e241675
added en to languages
andersjohanandreassen Apr 8, 2022
5849ebd
Merge branch 'huggingface:master' into master
andersjohanandreassen Apr 8, 2022
175e805
adding dataset_infos.json
andersjohanandreassen Apr 9, 2022
d68d469
Update datasets/bigbench/README.md
andersjohanandreassen Apr 11, 2022
51184c1
Update datasets/bigbench/README.md
andersjohanandreassen Apr 11, 2022
1ebb159
adding bigbench to setup.py
andersjohanandreassen Apr 11, 2022
f50c546
removing dependencies from additional-test-requirements.txt as they w…
andersjohanandreassen Apr 11, 2022
e0b186f
cleaning up bigbench import
andersjohanandreassen Apr 11, 2022
630eee6
delete dummy data
andersjohanandreassen Apr 11, 2022
8c54f80
adding back in abstract_narrative_understanding dummy data
andersjohanandreassen Apr 11, 2022
2386aad
updating setup dependencies to temp branch that imports bigbench with…
andersjohanandreassen Apr 11, 2022
e7e72f1
fixing isort format issue
andersjohanandreassen Apr 11, 2022
1a764b9
more formatting for isort and flake8
andersjohanandreassen Apr 11, 2022
caac766
Merge branch 'huggingface:master' into master
andersjohanandreassen Apr 11, 2022
cb98a40
commenting out bigbench to test apache-beam issue
andersjohanandreassen Apr 11, 2022
128597b
adding bigbench back into setup.py
andersjohanandreassen Apr 12, 2022
4a8bec1
Merge branch 'huggingface:master' into master
andersjohanandreassen Apr 13, 2022
ae1e201
updating language section in readme
andersjohanandreassen Apr 13, 2022
675c369
Updated Dataset Creation in readme
andersjohanandreassen Apr 13, 2022
75f57f3
More updates to README
andersjohanandreassen Apr 13, 2022
f55dedd
updated index in readme
andersjohanandreassen Apr 13, 2022
2015896
update path to bigbench install
andersjohanandreassen Apr 15, 2022
fe6e2d1
Merge branch 'huggingface:master' into master
andersjohanandreassen Apr 15, 2022
3969599
dummy update to trigger tests
andersjohanandreassen Apr 15, 2022
987ae39
switching to experimental bigbench branch to resolve dependency issues
andersjohanandreassen Apr 19, 2022
8b3107a
switching to experimental bigbench branch to resolve dependency issues
andersjohanandreassen Apr 19, 2022
3f5f8eb
trigger test
andersjohanandreassen Apr 19, 2022
4cce1ba
trigger test
andersjohanandreassen Apr 19, 2022
e6ae146
trigger test
andersjohanandreassen Apr 19, 2022
4b7ff88
trigger test
andersjohanandreassen Apr 19, 2022
1ad556f
add version
andersjohanandreassen Apr 20, 2022
d6d6810
switching back to tar install for bigbenhc
andersjohanandreassen Apr 20, 2022
70e81eb
formatting
andersjohanandreassen Apr 20, 2022
c175589
Merge branch 'huggingface:master' into master
andersjohanandreassen Apr 21, 2022
86f4c22
trigger test
andersjohanandreassen Apr 21, 2022
ef33a84
trigger test
andersjohanandreassen Apr 21, 2022
650b4c2
Adding back in empty sections to readme
andersjohanandreassen Apr 21, 2022
7fd8348
fix sections of readme
andersjohanandreassen Apr 21, 2022
bbb061d
Merge branch 'huggingface:master' into master
andersjohanandreassen Apr 25, 2022
3eaf284
Modifying dummy data. Passing empty folder to test failure
andersjohanandreassen Apr 26, 2022
6f2da0c
deleting dummy data
andersjohanandreassen Apr 26, 2022
23bb70b
pin s3 dependencies to speed up pip backtracking
lhoestq May 5, 2022
af41167
disable s3 test on windows for now
lhoestq May 5, 2022
88aeb20
disable other test too
lhoestq May 6, 2022
62ff3d9
explicitely require sentencepiece
lhoestq May 6, 2022
0441a0f
Apply suggestions from code review
lhoestq May 6, 2022
890a49f
style
lhoestq May 6, 2022
1317e8f
Minor updates to README
andersjohanandreassen May 16, 2022
e356fb6
trigger test
andersjohanandreassen May 18, 2022
54894e0
temporarily commenting out 4/5 test to focus on windows issue.
andersjohanandreassen May 19, 2022
bd72c7f
switching to private github repo for bigbench while debugging current…
andersjohanandreassen May 19, 2022
9bb5eb8
long filenames workaround attempt
andersjohanandreassen May 19, 2022
1cc4d9c
trigger new tests after adding editdistance on bigbench requirements …
andersjohanandreassen May 19, 2022
1f85c7a
adding sacremoses
andersjohanandreassen May 20, 2022
7828cd2
changing to bigbench tar file
andersjohanandreassen May 20, 2022
db51cd2
reverting circleci to run all test
andersjohanandreassen May 20, 2022
6bb3707
Experiment: pinning apache-beam and removing sacremoses
andersjohanandreassen May 20, 2022
40fcd54
reverting circleci edit
andersjohanandreassen May 20, 2022
0e621cf
adding sacremoses back in and reverting apache-beam version
andersjohanandreassen May 20, 2022
9c25117
someone changed dependencies of bigbench. testing to see if it causes…
andersjohanandreassen May 27, 2022
363d421
updated bigbench requirements. triggering tests
andersjohanandreassen May 27, 2022
1b706ed
Update format_fn after breaking change from bigbench
andersjohanandreassen Jun 1, 2022
a4e03a8
code formatting
andersjohanandreassen Jun 2, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
230 changes: 230 additions & 0 deletions datasets/bigbench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
---
annotations_creators:
- crowdsourced
- expert-generated
- machine-generated
language_creators:
- crowdsourced
- expert-generated
- machine-generated
- other
languages:
- en
licenses:
- apache-2.0
multilinguality:
- multilingual
- monolingual
pretty_name: bigbench
size_categories:
- unknown
source_datasets:
- original
task_categories:
- multiple-choice
- question-answering
- text-classification
- text-generation
- zero-shot-classification
- other
task_ids:
- multiple-choice-qa
- extractive-qa
- open-domain-qa
- closed-domain-qa
- fact-checking
- acceptability-classification
- intent-classification
- multi-class-classification
- multi-label-classification
- text-scoring
- hate-speech-detection
- language-modeling
---

# Dataset Card for BIG-bench

## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)

## Dataset Description

- **Homepage/Repository:** [https://github.com/google/BIG-bench](https://github.com/google/BIG-bench)
- **Paper:** In progress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to open a new PR to add the paper once it's ready :)

- **Leaderboard:**
- **Point of Contact:** [[email protected]](mailto:[email protected])


### Dataset Summary

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Tasks included in BIG-bench are summarized by keyword [here](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/keywords_to_tasks.md), and by task name [here](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/README.md). A paper introducing the benchmark, including evaluation results on large language models, is currently in preparation.

### Supported Tasks and Leaderboards

BIG-Bench consists of both json and programmatic tasks.
This implementation in HuggingFace datasets implements

- 24 BIG-bench Lite tasks

- 167 BIG-bench json tasks (includes BIG-bench Lite)

To study the remaining programmatic tasks, please see the [BIG-bench GitHub repo](https://github.com/google/BIG-bench)

### Languages

Although predominantly English, BIG-bench contains tasks in over 1000 written languages, as well as some synthetic and programming languages.
See [BIG-bench organized by keywords](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/keywords_to_tasks.md). Relevant keywords include `multilingual`, `non-english`, `low-resource-language`, `translation`.

For tasks specifically targeting low-resource languages, see the table below:

Task Name | Languages |
--|--|
Conlang Translation Problems | English, German, Finnish, Abma, Apinayé, Inapuri, Ndebele, Palauan|
Kannada Riddles | Kannada|
Language Identification | 1000 languages |
Swahili English Proverbs | Swahili |
Which Wiki Edit | English, Russian, Spanish, German, French, Turkish, Japanese, Vietnamese, Chinese, Arabic, Norwegian, Tagalog|




## Dataset Structure

### Data Instances

Each dataset contains 5 features. For example an instance from the `emoji_movie` task is:

```
{
"idx": 0,
"inputs": "Q: What movie does this emoji describe? 👦👓⚡️\n choice: harry potter\n. choice: shutter island\n. choice: inglourious basterds\n. choice: die hard\n. choice: moonlight\nA:"
"targets": ["harry potter"],
"multiple_choice_targets":["harry potter", "shutter island", "die hard", "inglourious basterds", "moonlight"],
"multiple_choice_scores": [1, 0, 0, 0, 0]
}
```

For tasks that do not have multiple choice targets, the lists are empty.


### Data Fields

Every example has the following fields
- `idx`: an `int` feature
- `inputs`: a `string` feature
- `targets`: a sequence of `string` feature
- `multiple_choice_targets`: a sequence of `string` features
- `multiple_choice_scores`: a sequence of `int` features

### Data Splits

Each task has a `default`, `train` and `validation` split.
The split `default` uses all the samples for each task (and it's the same as `all` used in the `bigbench.bbseqio` implementation.)
For standard evaluation on BIG-bench, we recommend using the `default` split, and the `train` and `validation` split is to be used if one wants to train a model on BIG-bench.

## Dataset Creation

BIG-bench tasks were collaboratively submitted through GitHub pull requests.

Each task went through a review and meta-review process with criteria outlined in the [BIG-bench repository documentation](https://github.com/google/BIG-bench/blob/main/docs/doc.md#submission-review-process).
Each task was required to describe the data source and curation methods on the task README page.

### Curation Rationale

[More Information Needed]

### Source Data

#### Initial Data Collection and Normalization

[More Information Needed]

#### Who are the source language producers?

[More Information Needed]


### Annotations

#### Annotation process

[More Information Needed]

#### Who are the annotators?

[More Information Needed]


### Personal and Sensitive Information

[More Information Needed]


## Considerations for Using the Data

BIG-bench contains a wide range of tasks, some of which are sensitive and should be used with care.

Some tasks are specifically designed to test biases and failures common to large language models, and so may elicit inappropriate or harmful responses.
For a more thorough discussion see the [BIG-bench paper](in progress).

To view tasks designed to probe pro-social behavior, including alignment, social, racial, gender, religious or political bias; toxicity; inclusion; and other issues please see tasks under the [pro-social behavior keywords](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/keywords_to_tasks.md#pro-social-behavior) on the BIG-bench repository.


### Social Impact of Dataset

[More Information Needed]


### Discussion of Biases

[More Information Needed]


### Other Known Limitations

[More Information Needed]


## Additional Information

For a more thorough discussion of all aspects of BIG-bench including dataset creation and evaluations see the BIG-bench repository [https://github.com/google/BIG-bench](https://github.com/google/BIG-bench) and paper []

### Dataset Curators

[More Information Needed]


### Licensing Information

[Apache License 2.0](https://github.com/google/BIG-bench/blob/main/LICENSE)

### Citation Information

To be added soon !

### Contributions
For a full list of contributors to the BIG-bench dataset, see the paper.

Thanks to [@andersjohanandreassen](https://github.com/andersjohanandreassen) and [@ethansdyer](https://github.com/ethansdyer) for adding this dataset to HuggingFace.
Loading