Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 37 additions & 13 deletions datasets/wnut_17/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,24 @@
---
annotations_creators:
- crowdsourced
language_creators:
- found
languages:
- en
licenses:
- cc-by-4.0
multilinguality:
- monolingual
paperswithcode_id: wnut-2017-emerging-and-rare-entity
pretty_name: WNUT 17
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- structure-prediction
task_ids:
- named-entity-recognition
---

# Dataset Card for "wnut_17"
Expand Down Expand Up @@ -62,12 +79,8 @@ The goal of this task is to provide a definition of emerging and of rare entitie

## Dataset Structure

We show detailed information for up to 5 configurations of the dataset.

### Data Instances

#### wnut_17

- **Size of downloaded dataset files:** 0.76 MB
- **Size of the generated dataset:** 1.66 MB
- **Total amount of disk used:** 2.43 MB
Expand All @@ -83,18 +96,29 @@ An example of 'train' looks as follows.

### Data Fields

The data fields are the same among all splits.

#### wnut_17
- `id`: a `string` feature.
- `tokens`: a `list` of `string` features.
- `ner_tags`: a `list` of classification labels, with possible values including `O` (0), `B-corporation` (1), `I-corporation` (2), `B-creative-work` (3), `I-creative-work` (4).
The data fields are the same among all splits:
- `id` (`string`): ID of the example.
- `tokens` (`list` of `string`): Tokens of the example text.
- `ner_tags` (`list` of class labels): NER tags of the tokens (using IOB2 format), with possible values:
- 0: `O`
- 1: `B-corporation`
- 2: `I-corporation`
- 3: `B-creative-work`
- 4: `I-creative-work`
- 5: `B-group`
- 6: `I-group`
- 7: `B-location`
- 8: `I-location`
- 9: `B-person`
- 10: `I-person`
- 11: `B-product`
- 12: `I-product`

### Data Splits

| name |train|validation|test|
|-------|----:|---------:|---:|
|wnut_17| 3394| 1009|1287|
|train|validation|test|
|----:|---------:|---:|
| 3394| 1009|1287|

## Dataset Creation

Expand Down