Elmo results inconsistent

When I run the Elmo code multiple times on the same data, results vary significantly and surpass the results reported in the literature. What am I doing wrong?

The script I'm running:

```
python3 elmoft.py \
  --task bc5cdr-chem \
  --seq2vec boe \
  --options_path /path/to/options.json \
  --weights_path /path/to/weights.hdf5 \
  --maxlen 128 \
  --fchdim 500 \
  --lr 0.001 \
  --pdrop 0.5 \
  --do_norm \
  --norm_type batch \
  --do_lastdrop \
  --initln \
  --earlystop \
  --epochs 20 \
  --bsize 64 \
  --data_dir /path/to/data
```

Pre-trained model `weights.hdf5` and `options.json` were downloaded from:
[ELMo PubMed AllenNLP](https://allennlp.org/elmo)

The code outputs the following F1 score for task `bc5cdr-chem` (Literature report numbers around 91.5% for elmo)

```
accuracy: 0.9943132108
macro avg: 0.9489234576
weighted avg: 0.9941723561
```

The code outputs the following F1 score for task `bc5cdr-dz` (Literature report numbers around 83.9% for elmo)

```
accuracy: 0.988988989
macro avg: 0.909805591
weighted avg: 0.9888870565
```

The datasets were downloaded from:
[bert_data.zip](https://github.com/ncbi-nlp/BLUE_Benchmark/releases/tag/0.1)
And two additional columns were added, so that the labels are in the column that the code expects.

Am I doing something wrong? Or is it a bug in the implementation?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elmo results inconsistent #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Elmo results inconsistent #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions