|
| 1 | +# FlexiViT: One Model for All Patch Sizes |
| 2 | +*by Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic* |
| 3 | + |
| 4 | +## Introduction |
| 5 | +We publish all pre-trained FlexiViT models, and configurations for training |
| 6 | +those, as well as training logs for one run. |
| 7 | + |
| 8 | +Please read the main [big_vision README](/README.md) to learn how to run |
| 9 | +configs, and remember that each config file contains an example invocation in |
| 10 | +the top-level comment. |
| 11 | + |
| 12 | +## Pre-trained paper models |
| 13 | + |
| 14 | +Here are the models that we used as backbones in the paper. See Tables in the |
| 15 | +appendix of the paper for expected scores at various patch-sizes and on various |
| 16 | +datasets. |
| 17 | + |
| 18 | +First, the recommended models we used for all experiments. |
| 19 | +Remember that the input is 240px, not 224px: |
| 20 | + |
| 21 | +| Dataset | Model | Download link | Notes | |
| 22 | +| :--- | :---: | :---: | :---: | |
| 23 | +| ImageNet-1k | FlexiViT-L | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_l_i1k.npz) | 1200ep version | |
| 24 | +| ImageNet-1k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i1k.npz) | 1200ep version | |
| 25 | +| ImageNet-1k | FlexiViT-S | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_s_i1k.npz) | 1200ep version | |
| 26 | +| ImageNet-21k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i21k_300ep.npz) | 300ep version. 1000ep version below is better but was not used in the paper for fair comparison to baselines. | |
| 27 | +| ImageNet-21k | ViT-B/16 | [link](https://storage.googleapis.com/big_vision/flexivit/vit_b16_i21k_300ep.npz) | Apples-to-apples non-flexi baseline used throughout the paper. | |
| 28 | +| ImageNet-21k | ViT-B/30 | [link](https://storage.googleapis.com/big_vision/flexivit/vit_b30_i21k_300ep.npz) | Apples-to-apples non-flexi baseline used throughout the paper. | |
| 29 | + |
| 30 | +These models can be used directly in our codebase by specifying |
| 31 | +`model_name = "proj.flexi.vit"` and `model_init = "FlexiViT-L i1k"` for example. |
| 32 | +See the file `models/proj/flexi/vit.py` for more names. |
| 33 | + |
| 34 | +*Important detail:* When further re-using these models with a flexible patch |
| 35 | +size, it is recommended to keep the patch-embedding parameter buffer at its |
| 36 | +original size, and change patch-size on the fly using pi-resize, as opposed to |
| 37 | +changing the parameter buffer's size at load-time. |
| 38 | +For re-using the models with a fixed patch size, either way is fine. |
| 39 | +(The reason is that it is impossible to chain multiple resizes without loss, |
| 40 | +eg doing 32->8->32 does not result in the original weights.) |
| 41 | + |
| 42 | +Second, the list of all released models for completeness: |
| 43 | + |
| 44 | +| Dataset | Model | Download link | Notes | |
| 45 | +| :--- | :---: | :---: | :---: | |
| 46 | +| ImageNet-21k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i21k_1000ep.npz) | 1000ep version. Should be the best available -B model. | |
| 47 | +| ImageNet-21k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i21k_90ep.npz) | 90ep version | |
| 48 | +| ImageNet-1k | FlexiViT-L | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_l_i1k_600ep.npz) | 600ep version | |
| 49 | +| ImageNet-1k | FlexiViT-L | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_l_i1k_300ep.npz) | 300ep version | |
| 50 | +| ImageNet-1k | FlexiViT-L | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_l_i1k_90ep.npz) | 90ep version | |
| 51 | +| ImageNet-1k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i1k_600ep.npz) | 600ep version | |
| 52 | +| ImageNet-1k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i1k_300ep.npz) | 300ep version | |
| 53 | +| ImageNet-1k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i1k_90ep.npz) | 90ep version | |
| 54 | +| ImageNet-1k | FlexiViT-S | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_s_i1k_600ep.npz) | 600ep version | |
| 55 | +| ImageNet-1k | FlexiViT-S | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_s_i1k_300ep.npz) | 300ep version | |
| 56 | +| ImageNet-1k | FlexiViT-S | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_s_i1k_90ep.npz) | 90ep version | |
| 57 | + |
| 58 | +## Results |
| 59 | + |
| 60 | +We provide full training logs for a run with this public code on Cloud that |
| 61 | +reproduces the FlexiViT-S 90ep on i1k results: |
| 62 | + - [metrics](https://storage.googleapis.com/big_vision/flexivit/deit3_i1k_s_90ep_12-15_2254/big_vision_metrics.txt) |
| 63 | + - [config](https://storage.googleapis.com/big_vision/flexivit/deit3_i1k_s_90ep_12-15_2254/config.json) |
| 64 | + - or `gs://big_vision/flexivit/deit3_i1k_s_90ep_12-15_2254`. |
0 commit comments