Skip to content

Commit 72e8679

Browse files
BIG-bench (#4125)
* Draft PR for bigbench datasets * Update handeling tasks with non-rectangular targets * updates before PR * Modifying behavior for min num examples * Updating error message * adding strip to inputs and targets to match bbseqio implementation * Updated readme * added config doc string * trimming linewidths to pass tests * reformatting using black * reformatting using black with py36 * using isort to fix import order * formatting for flake8 * updated readme * added YAML tags * Adding bigbench github to additional requirements to pass tests * replaing print statement with logger * isort formatting * adding missing t5 dependency to additional requirements to pass tests * adding missing seqio dependency to additional requirements to pass tests * adding missing tfds dependency to additional requirements to pass tests * adding missing tf metadata dependency to additional requirements to pass tests * Edits to make tests pass. Some modifications to args (task_name -> name), some edits to logging errors * fixed configs and moved checks for subtask and task names * update error messages and some formatting * more test requirements * file formatting * adding dummy.zip * additional requirements again * tensorflow hub requirement * removing dummy.zip file. wrong format * adding individual dummy_data.zip for each config * immutable dict requirement * removing yaml flags text * added en to languages * adding dataset_infos.json * Update datasets/bigbench/README.md Co-authored-by: Quentin Lhoest <[email protected]> * Update datasets/bigbench/README.md Co-authored-by: Quentin Lhoest <[email protected]> * adding bigbench to setup.py * removing dependencies from additional-test-requirements.txt as they will now be in setup.py * cleaning up bigbench import * delete dummy data * adding back in abstract_narrative_understanding dummy data * updating setup dependencies to temp branch that imports bigbench without datasets=1.17. PR in bigbench on the way * fixing isort format issue * more formatting for isort and flake8 * commenting out bigbench to test apache-beam issue * adding bigbench back into setup.py * updating language section in readme * Updated Dataset Creation in readme * More updates to README * updated index in readme * update path to bigbench install * dummy update to trigger tests * switching to experimental bigbench branch to resolve dependency issues * switching to experimental bigbench branch to resolve dependency issues * trigger test * trigger test * trigger test * trigger test * add version * switching back to tar install for bigbenhc * formatting * trigger test * trigger test * Adding back in empty sections to readme * fix sections of readme * Modifying dummy data. Passing empty folder to test failure * deleting dummy data * pin s3 dependencies to speed up pip backtracking * disable s3 test on windows for now * disable other test too * explicitely require sentencepiece * Apply suggestions from code review * style * Minor updates to README * trigger test * temporarily commenting out 4/5 test to focus on windows issue. * switching to private github repo for bigbench while debugging current errors * long filenames workaround attempt * trigger new tests after adding editdistance on bigbench requirements remotely * adding sacremoses * changing to bigbench tar file * reverting circleci to run all test * Experiment: pinning apache-beam and removing sacremoses * reverting circleci edit * adding sacremoses back in and reverting apache-beam version * someone changed dependencies of bigbench. testing to see if it causes issues * updated bigbench requirements. triggering tests * Update format_fn after breaking change from bigbench * code formatting Co-authored-by: Anders Johan Andreassen <[email protected]> Co-authored-by: Quentin Lhoest <[email protected]> Co-authored-by: Quentin Lhoest <[email protected]>
1 parent 8935ccd commit 72e8679

File tree

6 files changed

+516
-3
lines changed

6 files changed

+516
-3
lines changed

datasets/bigbench/README.md

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
annotations_creators:
3+
- crowdsourced
4+
- expert-generated
5+
- machine-generated
6+
language_creators:
7+
- crowdsourced
8+
- expert-generated
9+
- machine-generated
10+
- other
11+
languages:
12+
- en
13+
licenses:
14+
- apache-2.0
15+
multilinguality:
16+
- multilingual
17+
- monolingual
18+
pretty_name: bigbench
19+
size_categories:
20+
- unknown
21+
source_datasets:
22+
- original
23+
task_categories:
24+
- multiple-choice
25+
- question-answering
26+
- text-classification
27+
- text-generation
28+
- zero-shot-classification
29+
- other
30+
task_ids:
31+
- multiple-choice-qa
32+
- extractive-qa
33+
- open-domain-qa
34+
- closed-domain-qa
35+
- fact-checking
36+
- acceptability-classification
37+
- intent-classification
38+
- multi-class-classification
39+
- multi-label-classification
40+
- text-scoring
41+
- hate-speech-detection
42+
- language-modeling
43+
---
44+
45+
# Dataset Card for BIG-bench
46+
47+
## Table of Contents
48+
- [Table of Contents](#table-of-contents)
49+
- [Dataset Description](#dataset-description)
50+
- [Dataset Summary](#dataset-summary)
51+
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
52+
- [Languages](#languages)
53+
- [Dataset Structure](#dataset-structure)
54+
- [Data Instances](#data-instances)
55+
- [Data Fields](#data-fields)
56+
- [Data Splits](#data-splits)
57+
- [Dataset Creation](#dataset-creation)
58+
- [Curation Rationale](#curation-rationale)
59+
- [Source Data](#source-data)
60+
- [Annotations](#annotations)
61+
- [Personal and Sensitive Information](#personal-and-sensitive-information)
62+
- [Considerations for Using the Data](#considerations-for-using-the-data)
63+
- [Social Impact of Dataset](#social-impact-of-dataset)
64+
- [Discussion of Biases](#discussion-of-biases)
65+
- [Other Known Limitations](#other-known-limitations)
66+
- [Additional Information](#additional-information)
67+
- [Dataset Curators](#dataset-curators)
68+
- [Licensing Information](#licensing-information)
69+
- [Citation Information](#citation-information)
70+
- [Contributions](#contributions)
71+
72+
## Dataset Description
73+
74+
- **Homepage/Repository:** [https://github.com/google/BIG-bench](https://github.com/google/BIG-bench)
75+
- **Paper:** In progress
76+
- **Leaderboard:**
77+
- **Point of Contact:** [[email protected]](mailto:[email protected])
78+
79+
80+
### Dataset Summary
81+
82+
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Tasks included in BIG-bench are summarized by keyword [here](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/keywords_to_tasks.md), and by task name [here](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/README.md). A paper introducing the benchmark, including evaluation results on large language models, is currently in preparation.
83+
84+
### Supported Tasks and Leaderboards
85+
86+
BIG-Bench consists of both json and programmatic tasks.
87+
This implementation in HuggingFace datasets implements
88+
89+
- 24 BIG-bench Lite tasks
90+
91+
- 167 BIG-bench json tasks (includes BIG-bench Lite)
92+
93+
To study the remaining programmatic tasks, please see the [BIG-bench GitHub repo](https://github.com/google/BIG-bench)
94+
95+
### Languages
96+
97+
Although predominantly English, BIG-bench contains tasks in over 1000 written languages, as well as some synthetic and programming languages.
98+
See [BIG-bench organized by keywords](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/keywords_to_tasks.md). Relevant keywords include `multilingual`, `non-english`, `low-resource-language`, `translation`.
99+
100+
For tasks specifically targeting low-resource languages, see the table below:
101+
102+
Task Name | Languages |
103+
--|--|
104+
Conlang Translation Problems | English, German, Finnish, Abma, Apinayé, Inapuri, Ndebele, Palauan|
105+
Kannada Riddles | Kannada|
106+
Language Identification | 1000 languages |
107+
Swahili English Proverbs | Swahili |
108+
Which Wiki Edit | English, Russian, Spanish, German, French, Turkish, Japanese, Vietnamese, Chinese, Arabic, Norwegian, Tagalog|
109+
110+
111+
112+
113+
## Dataset Structure
114+
115+
### Data Instances
116+
117+
Each dataset contains 5 features. For example an instance from the `emoji_movie` task is:
118+
119+
```
120+
{
121+
"idx": 0,
122+
"inputs": "Q: What movie does this emoji describe? 👦👓⚡️\n choice: harry potter\n. choice: shutter island\n. choice: inglourious basterds\n. choice: die hard\n. choice: moonlight\nA:"
123+
"targets": ["harry potter"],
124+
"multiple_choice_targets":["harry potter", "shutter island", "die hard", "inglourious basterds", "moonlight"],
125+
"multiple_choice_scores": [1, 0, 0, 0, 0]
126+
}
127+
```
128+
129+
For tasks that do not have multiple choice targets, the lists are empty.
130+
131+
132+
### Data Fields
133+
134+
Every example has the following fields
135+
- `idx`: an `int` feature
136+
- `inputs`: a `string` feature
137+
- `targets`: a sequence of `string` feature
138+
- `multiple_choice_targets`: a sequence of `string` features
139+
- `multiple_choice_scores`: a sequence of `int` features
140+
141+
### Data Splits
142+
143+
Each task has a `default`, `train` and `validation` split.
144+
The split `default` uses all the samples for each task (and it's the same as `all` used in the `bigbench.bbseqio` implementation.)
145+
For standard evaluation on BIG-bench, we recommend using the `default` split, and the `train` and `validation` split is to be used if one wants to train a model on BIG-bench.
146+
147+
## Dataset Creation
148+
149+
BIG-bench tasks were collaboratively submitted through GitHub pull requests.
150+
151+
Each task went through a review and meta-review process with criteria outlined in the [BIG-bench repository documentation](https://github.com/google/BIG-bench/blob/main/docs/doc.md#submission-review-process).
152+
Each task was required to describe the data source and curation methods on the task README page.
153+
154+
### Curation Rationale
155+
156+
[More Information Needed]
157+
158+
### Source Data
159+
160+
#### Initial Data Collection and Normalization
161+
162+
[More Information Needed]
163+
164+
#### Who are the source language producers?
165+
166+
[More Information Needed]
167+
168+
169+
### Annotations
170+
171+
#### Annotation process
172+
173+
[More Information Needed]
174+
175+
#### Who are the annotators?
176+
177+
[More Information Needed]
178+
179+
180+
### Personal and Sensitive Information
181+
182+
[More Information Needed]
183+
184+
185+
## Considerations for Using the Data
186+
187+
BIG-bench contains a wide range of tasks, some of which are sensitive and should be used with care.
188+
189+
Some tasks are specifically designed to test biases and failures common to large language models, and so may elicit inappropriate or harmful responses.
190+
For a more thorough discussion see the [BIG-bench paper](in progress).
191+
192+
To view tasks designed to probe pro-social behavior, including alignment, social, racial, gender, religious or political bias; toxicity; inclusion; and other issues please see tasks under the [pro-social behavior keywords](https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/keywords_to_tasks.md#pro-social-behavior) on the BIG-bench repository.
193+
194+
195+
### Social Impact of Dataset
196+
197+
[More Information Needed]
198+
199+
200+
### Discussion of Biases
201+
202+
[More Information Needed]
203+
204+
205+
### Other Known Limitations
206+
207+
[More Information Needed]
208+
209+
210+
## Additional Information
211+
212+
For a more thorough discussion of all aspects of BIG-bench including dataset creation and evaluations see the BIG-bench repository [https://github.com/google/BIG-bench](https://github.com/google/BIG-bench) and paper []
213+
214+
### Dataset Curators
215+
216+
[More Information Needed]
217+
218+
219+
### Licensing Information
220+
221+
[Apache License 2.0](https://github.com/google/BIG-bench/blob/main/LICENSE)
222+
223+
### Citation Information
224+
225+
To be added soon !
226+
227+
### Contributions
228+
For a full list of contributors to the BIG-bench dataset, see the paper.
229+
230+
Thanks to [@andersjohanandreassen](https://github.com/andersjohanandreassen) and [@ethansdyer](https://github.com/ethansdyer) for adding this dataset to HuggingFace.

0 commit comments

Comments
 (0)