This looks great!
Could you share some information on what setup you used for the training of the transformer model?
- how many gpu / for how long
- how many steps
- what batch size
It would be helpful to have these information to better understand the cost of training dalle models.