Skip to content
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion datasets/acronym_identification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ task_categories:
task_ids:
- structure-prediction-other-acronym-identification
paperswithcode_id: acronym-identification
pretty_name: Acronym Identification Dataset
---

# Dataset Card for Acronym Identification Dataset
Expand Down Expand Up @@ -117,7 +118,7 @@ The training, validation, and test set contain `14,006`, `1,717`, and `1750` sen
> This is unfortunate as rules are in general not able to capture all the diverse forms to express acronyms and their long forms in text.
> Second, most of the existing datasets are in the medical domain, ignoring the challenges in other scientific domains.
> In order to address these limitations this paper introduces two new datasets for Acronym Identification.
> Notably, our datasets are annotated by human to achieve high quality and have substantially larger numbers of examples than the existing AI datasets in the non-medical domain.
> Notably, our datasets are annotated by human to achieve high quality and have substantially larger numbers of examples than the existing AI datasets in the non-medical domain.

### Source Data

Expand Down
38 changes: 21 additions & 17 deletions datasets/ade_corpus_v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ task_ids:
Ade_corpus_v2_drug_dosage_relation:
- coreference-resolution
paperswithcode_id: null
pretty_name:
Ade_corpus_v2_classification: Adverse Drug Reaction Data v2 (Ade_corpus_v2_classification)
Ade_corpus_v2_drug_ade_relation: Adverse Drug Reaction Data v2 (Ade_corpus_v2_drug_ade_relation)
Ade_corpus_v2_drug_dosage_relation: Adverse Drug Reaction Data v2 (Ade_corpus_v2_drug_dosage_relation)
---

# Dataset Card for Adverse Drug Reaction Data v2
Expand Down Expand Up @@ -92,7 +96,7 @@ English
#### Config - `Ade_corpus_v2_classification`
```
{
'label': 1,
'label': 1,
'text': 'Intravenous azithromycin-induced ototoxicity.'
}

Expand All @@ -101,21 +105,21 @@ English
#### Config - `Ade_corpus_v2_drug_ade_relation`

```
{
'drug': 'azithromycin',
'effect': 'ototoxicity',
{
'drug': 'azithromycin',
'effect': 'ototoxicity',
'indexes': {
'drug': {
'end_char': [24],
'end_char': [24],
'start_char': [12]
},
},
'effect': {
'end_char': [44],
'end_char': [44],
'start_char': [33]
}
},
},
'text': 'Intravenous azithromycin-induced ototoxicity.'

}

```
Expand All @@ -124,17 +128,17 @@ English

```
{
'dosage': '4 times per day',
'drug': 'insulin',
'dosage': '4 times per day',
'drug': 'insulin',
'indexes': {
'dosage': {
'end_char': [56],
'end_char': [56],
'start_char': [41]
},
},
'drug': {
'end_char': [40],
'end_char': [40],
'start_char': [33]}
},
},
'text': 'She continued to receive regular insulin 4 times per day over the following 3 years with only occasional hives.'
}

Expand All @@ -147,7 +151,7 @@ English

- `text` - Input text.
- `label` - Whether the adverse drug effect(ADE) related (1) or not (0).
-
-
#### Config - `Ade_corpus_v2_drug_ade_relation`

- `text` - Input text.
Expand All @@ -172,7 +176,7 @@ English
### Data Splits

| Train |
| ------ |
| ------ |
| 23516 |

## Dataset Creation
Expand Down
5 changes: 5 additions & 0 deletions datasets/adversarial_qa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ task_ids:
- extractive-qa
- open-domain-qa
paperswithcode_id: adversarialqa
pretty_name:
adversarialQA: adversarialQA (adversarialQA)
dbert: adversarialQA (dbert)
dbidaf: adversarialQA (dbidaf)
droberta: adversarialQA (droberta)
---

# Dataset Card for adversarialQA
Expand Down
3 changes: 2 additions & 1 deletion datasets/aeslc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
languages:
- en
paperswithcode_id: aeslc
pretty_name: '"aeslc"'
---

# Dataset Card for "aeslc"
Expand Down Expand Up @@ -162,4 +163,4 @@ The data fields are the same among all splits.

### Contributions

Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf), [@lewtun](https://github.com/lewtun) for adding this dataset.
Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@thomwolf](https://github.com/thomwolf), [@lewtun](https://github.com/lewtun) for adding this dataset.
5 changes: 3 additions & 2 deletions datasets/afrikaans_ner_corpus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ task_categories:
task_ids:
- named-entity-recognition
paperswithcode_id: null
pretty_name: Afrikaans Ner Corpus
---

# Dataset Card for Afrikaans Ner Corpus
Expand Down Expand Up @@ -69,7 +70,7 @@ The language supported is Afrikaans.

### Data Instances

A data point consists of sentences seperated by empty line and tab-seperated tokens and tags.
A data point consists of sentences seperated by empty line and tab-seperated tokens and tags.
{'id': '0',
'ner_tags': [0, 0, 0, 0, 0],
'tokens': ['Vertaling', 'van', 'die', 'inligting', 'in']
Expand Down Expand Up @@ -171,4 +172,4 @@ The data is under the [Creative Commons Attribution 2.5 South Africa License](ht
```
### Contributions

Thanks to [@yvonnegitau](https://github.com/yvonnegitau) for adding this dataset.
Thanks to [@yvonnegitau](https://github.com/yvonnegitau) for adding this dataset.
3 changes: 2 additions & 1 deletion datasets/ag_news/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ task_categories:
task_ids:
- topic-classification
paperswithcode_id: ag-news
pretty_name: '"ag_news"'
---

# Dataset Card for "ag_news"
Expand Down Expand Up @@ -184,4 +185,4 @@ The data fields are the same among all splits.

### Contributions

Thanks to [@jxmorris12](https://github.com/jxmorris12), [@thomwolf](https://github.com/thomwolf), [@lhoestq](https://github.com/lhoestq), [@lewtun](https://github.com/lewtun) for adding this dataset.
Thanks to [@jxmorris12](https://github.com/jxmorris12), [@thomwolf](https://github.com/thomwolf), [@lhoestq](https://github.com/lhoestq), [@lewtun](https://github.com/lewtun) for adding this dataset.
3 changes: 3 additions & 0 deletions datasets/ai2_arc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ task_ids:
- open-domain-qa
- multiple-choice-qa
paperswithcode_id: null
pretty_name:
ARC-Challenge: '"ai2_arc" (ARC-Challenge)'
ARC-Easy: '"ai2_arc" (ARC-Easy)'
---

# Dataset Card for "ai2_arc"
Expand Down
7 changes: 5 additions & 2 deletions datasets/air_dialogue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ task_ids:
- dialogue-modeling
- language-modeling
paperswithcode_id: null
pretty_name:
air_dialogue_data: air_dialogue (air_dialogue_data)
air_dialogue_kb: air_dialogue (air_dialogue_kb)
---

# Dataset Card for air_dialogue
Expand Down Expand Up @@ -55,7 +58,7 @@ paperswithcode_id: null
- **Repository:** https://github.com/google/airdialogue
- **Paper:** https://www.aclweb.org/anthology/D18-1419/
- **Leaderboard:** https://worksheets.codalab.org/worksheets/0xa79833f4b3c24f4188cee7131b120a59
- **Point of Contact:** [AirDialogue-Google](mailto:[email protected])
- **Point of Contact:** [AirDialogue-Google](mailto:[email protected])
[Aakash Gupta](mailto:[email protected])

### Dataset Summary
Expand Down Expand Up @@ -199,4 +202,4 @@ cc-by-nc-4.0

### Contributions

Thanks to [@skyprince999](https://github.com/skyprince999) for adding this dataset.
Thanks to [@skyprince999](https://github.com/skyprince999) for adding this dataset.
19 changes: 10 additions & 9 deletions datasets/ajgt_twitter_ar/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ task_categories:
task_ids:
- sentiment-classification
paperswithcode_id: null
pretty_name: MetRec
---

# Dataset Card for MetRec
Expand Down Expand Up @@ -59,7 +60,7 @@ Arabic Jordanian General Tweets (AJGT) Corpus consisted of 1,800 tweets annotate

### Supported Tasks and Leaderboards

The dataset was published on this [paper](https://link.springer.com/chapter/10.1007/978-3-319-60042-0_66).
The dataset was published on this [paper](https://link.springer.com/chapter/10.1007/978-3-319-60042-0_66).

### Languages

Expand All @@ -69,19 +70,19 @@ The dataset is based on Arabic.

### Data Instances

A binary datset with with negative and positive sentiments.
A binary datset with with negative and positive sentiments.

### Data Fields

[More Information Needed]

### Data Splits

The dataset is not split.
The dataset is not split.

| | Tain |
|---------- | ------ |
|no split | 1,800 |
| | Tain |
|---------- | ------ |
|no split | 1,800 |

## Dataset Creation

Expand All @@ -95,11 +96,11 @@ The dataset is not split.

#### Initial Data Collection and Normalization

Contains 1,800 tweets collected from twitter.
Contains 1,800 tweets collected from twitter.

#### Who are the source language producers?

From tweeter.
From tweeter.

### Annotations

Expand Down Expand Up @@ -143,4 +144,4 @@ The dataset does not contain any additional annotations.

### Contributions

Thanks to [@zaidalyafeai](https://github.com/zaidalyafeai), [@lhoestq](https://github.com/lhoestq) for adding this dataset.
Thanks to [@zaidalyafeai](https://github.com/zaidalyafeai), [@lhoestq](https://github.com/lhoestq) for adding this dataset.
7 changes: 4 additions & 3 deletions datasets/allegro_reviews/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,10 @@ task_categories:
task_ids:
- sentiment-scoring
paperswithcode_id: allegro-reviews
pretty_name: Allegro Reviews
---

# Dataset Card for [Dataset Name]
# Dataset Card for Allegro Reviews

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down Expand Up @@ -78,7 +79,7 @@ Polish

### Data Instances

Two tsv files (train, dev) with two columns (text, rating) and one (test) with just one (text).
Two tsv files (train, dev) with two columns (text, rating) and one (test) with just one (text).

### Data Fields

Expand Down Expand Up @@ -159,4 +160,4 @@ Dataset licensed under CC BY-SA 4.0

### Contributions

Thanks to [@abecadel](https://github.com/abecadel) for adding this dataset.
Thanks to [@abecadel](https://github.com/abecadel) for adding this dataset.