diff --git a/datasets/wino_bias/README.md b/datasets/wino_bias/README.md index 9f795e0df3f..b71370be3ca 100644 --- a/datasets/wino_bias/README.md +++ b/datasets/wino_bias/README.md @@ -58,8 +58,7 @@ pretty_name: WinoBias ### Dataset Summary WinoBias, a Winograd-schema dataset for coreference resolution focused on gender bias. -The corpus contains Winograd-schema style sentences with entities corresponding to people -referred by their occupation (e.g. the nurse, the doctor, the carpenter). +The corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). ### Supported Tasks and Leaderboards @@ -72,7 +71,11 @@ English ### Data Instances -[More Information Needed] +The dataset has 4 subsets: `type1_pro`, `type1_anti`, `type2_pro` and `type2_anti`. + +The `*_pro` subsets contain sentences that reinforce gender stereotypes (e.g. mechanics are male, nurses are female), whereas the `*_anti` datasets contain "anti-stereotypical" sentences (e.g. mechanics are female, nurses are male). + +The `type1` (*WB-Knowledge*) subsets contain sentences for which world knowledge is necessary to resolve the co-references, and `type2` (*WB-Syntax*) subsets require only the syntactic information present in the sentence to resolve them. ### Data Fields @@ -97,7 +100,7 @@ Dev and Test Split available ### Curation Rationale -[More Information Needed] +The WinoBias dataset was introduced in 2018 (see [paper](https://arxiv.org/abs/1804.06876)), with its original task being *coreference resolution*, which is a task that aims to identify mentions that refer to the same entity or person. ### Source Data @@ -107,7 +110,7 @@ Dev and Test Split available #### Who are the source language producers? -[More Information Needed] + The dataset was created by researchers familiar with the WinoBias project, based on two prototypical templates provided by the authors, in which entities interact in plausible ways. ### Annotations @@ -117,7 +120,7 @@ Dev and Test Split available #### Who are the annotators? -[More Information Needed] +"Researchers familiar with the [WinoBias] project" ### Personal and Sensitive Information @@ -131,7 +134,7 @@ Dev and Test Split available ### Discussion of Biases -Gender Bias is discussed with the help of this dataset. +[Recent work](https://www.microsoft.com/en-us/research/uploads/prod/2021/06/The_Salmon_paper.pdf) has shown that this dataset contains grammatical issues, incorrect or ambiguous labels, and stereotype conflation, among other limitations. ### Other Known Limitations @@ -141,7 +144,7 @@ Gender Bias is discussed with the help of this dataset. ### Dataset Curators -[More Information Needed] +Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez and Kai-Wei Chan ### Licensing Information