BERT implementation with pytorch

1. Install the environment

Install the environment from environment.yml

conda env create -f environment.yml

Then active your environment.

2.Prepare dataset

The dataset should be like

i am about to s ##cre ##am ma ##dly in the office / especially \t when they bring more papers to pi ##le higher on my des ##k . \n

You can download the raw dataset from Wiki Dataset and put it under directory data.
Then run dataset/create_dataset.py to generate the dataset data, or you can use your own dataset.

The tokenization.py is referenced from BERT-Official

3. Generate the vocab file

Run dataset/create_dataset.py

4. Pretrain your BERT

Run main.py

RESULT

	Loss	Accuracy
Train	7.804	82.319
Test	7.823	80.426

Contributing

If you can have better results on this dataset or any question, welcome to open an issue.

Reference

[BERT-pytorch]
[BERT-Official]

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
dataset		dataset
model		model
trainer		trainer
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BERT implementation with pytorch

1. Install the environment

2.Prepare dataset

3. Generate the vocab file

4. Pretrain your BERT

RESULT

Contributing

Reference

About

Uh oh!

Releases

Packages

Languages

solitude-alive/bert-pytorch

Folders and files

Latest commit

History

Repository files navigation

BERT implementation with pytorch

1. Install the environment

2.Prepare dataset

3. Generate the vocab file

4. Pretrain your BERT

RESULT

Contributing

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages