You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the XED dataset. The dataset consists of emotion annotated movie subtitles from OPUS. We use Plutchik's 8 core emotions to annotate. The data is multilabel. The original annotations have been sourced for mainly English and Finnish.
62
62
For the English data we used Stanford NER (named entity recognition) (Finkel et al., 2005) to replace names and locations with the tags: [PERSON] and [LOCATION] respectively.
63
63
For the Finnish data, we replaced names and locations using the Turku NER corpus (Luoma et al., 2020).
{ "sentence": "A confession that you hired [PERSON] ... and are responsible for my father's murder."
79
-
"labels": [1, 6]
79
+
"labels": [1, 6] # anger, sadness
80
80
}
81
81
```
82
82
83
83
### Data Fields
84
84
85
85
- sentence: a line from the dataset
86
-
- labels: labels corresponding to the emotion
86
+
- labels: labels corresponding to the emotion as an integer
87
87
88
-
Where the number indicates the emotion in ascending alphabetical order: anger:1, anticipation:2, disgust:3, fear:4, joy:5, sadness:6, surprise:7, trust:8, with neutral:0 where applicable.
88
+
Where the number indicates the emotion in ascending alphabetical order: anger:1, anticipation:2, disgust:3, fear:4, joy:5, sadness:6, surprise:7, trust:8, with neutral:0 where applicable.
89
89
90
90
### Data Splits
91
91
92
92
For English:
93
-
Number of unique data points: 17530 + 6420 (neutral)
94
-
Number of emotions: 8 (+pos, neg, neu)
93
+
Number of unique data points: 17528 ('en_annotated' config) + 9675 ('en_neutral' config)
94
+
Number of emotions: 8 (+neutral)
95
+
96
+
For Finnish:
97
+
Number of unique data points: 14449 ('fi_annotated' config) + 10794 ('fi_neutral' config)
0 commit comments