Can't reproduce the same results as the paper on LSUN churches dataset.

Thank you so much for this work. Currently, I'm trying to reproduce the results in the paper. I kept all the hyperparameters the same as the source code of this repository.

I trained the 1st stage vq-gan model for 2.2M steps, the same as the pre-train model you provided. Then I did the reconstruction on the whole LSUN churches dataset and calculated the FID, using calc_VQGAN_FID.py. As training went by, the FID score went down at first but increased after 0.8M steps, as shown in Figure 1 below.
![Fig1_1-stage_vqgan](https://github.com/samb-t/unleashing-transformers/assets/60313002/019cc08a-e2c6-499b-87ae-4e6bff9c41d4)

Then I trained the 2nd stage diffusion model for 2.0M steps. I sampled 50K images with a temperature of 0.9 and calculated the FID score. The FID score of my local reproducing experiments is 5.53, which has a gap with the FID of 4.07 reported in the paper. 
![Fig2_2-stage_vqgan](https://github.com/samb-t/unleashing-transformers/assets/60313002/77ea0ef9-a53b-4e61-9335-36ed16166993)

I did all my local experiments on a single NVIDIA V100 Volta GPU.

So I just wonder if the hyperparameter setting in the [hparams/defaults/](https://github.com/samb-t/unleashing-transformers/tree/master/hparams/defaults/) folder is what you used to get the final results? Or are there any tricks that I might neglect? And do you have any clue about the weird increase when training the 1st-stage vq-gan models?

Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce the same results as the paper on LSUN churches dataset. #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Can't reproduce the same results as the paper on LSUN churches dataset. #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions