Install the environment from environment.yml
conda env create -f environment.yml
Then active your environment.
The dataset should be like
i am about to s ##cre ##am ma ##dly in the office / especially \t when they bring more papers to pi ##le higher on my des ##k . \n
You can download the raw dataset from Wiki Dataset and put it under directory data.
Then run dataset/create_dataset.py to generate the dataset data, or you can use your own dataset.
The
tokenization.pyis referenced from BERT-Official
Run dataset/create_dataset.py
Run main.py
| Loss | Accuracy | |
|---|---|---|
| Train | 7.804 | 82.319 |
| Test | 7.823 | 80.426 |
If you can have better results on this dataset or any question, welcome to open an issue.