Yuhui Yuan, Rao Fu, Lang Huang, Weihong Lin, Chao Zhang, Xilin Chen, and Jingdong Wang. "HRFormer: High-Resolution Transformer for Dense Prediction." arXiv preprint arXiv:2110.09408v2 (2021).
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|---|---|---|---|---|---|---|---|
| OCRNet | HRformer_small | 1024x512 | 80000 | 80.62% | 80.82% | 80.98% | model | log |
| OCRNet | HRFormer_base | 1024x512 | 80000 | 80.35% | 80.63% | 80.87% | model | log |
The accuracy obtained by the model using HRFormer_base as backbone is lower than that in the original paper. We attribute this performance gap to the difference in OCRNet specification. In the original implementation, the authors fixed the number of hidden channels of aux_head to 512. Yet, in our OCRNet implementation, the number of hidden channels of aux_head is equal to input_channel.