Skip to content

Commit 9560348

Browse files
authored
fix benchmark (#411)
1 parent d9976c4 commit 9560348

File tree

2 files changed

+21
-30
lines changed

2 files changed

+21
-30
lines changed

docs/en/benchmark.md

Lines changed: 11 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
## Benchmark
22

33
### Backends
4+
45
CPU: ncnn, ONNXRuntime, OpenVINO
56

67
GPU: ncnn, TensorRT, PPLNN
78

89
### Latency benchmark
10+
911
#### Platform
12+
1013
- Ubuntu 18.04
1114
- ncnn 20211208
1215
- Cuda 11.3
@@ -15,19 +18,19 @@ GPU: ncnn, TensorRT, PPLNN
1518
- NVIDIA tesla T4 tensor core GPU for TensorRT.
1619

1720
#### Other settings
21+
1822
- Static graph
1923
- Batch size 1
2024
- Synchronize devices after each inference.
2125
- We count the average inference performance of 100 images of the dataset.
2226
- Warm up. For ncnn, we warm up 30 iters for all codebases. As for other backends: for classification, we warm up 1010 iters; for other codebases, we warm up 10 iters.
2327
- Input resolution varies for different datasets of different codebases. All inputs are real images except for `mmediting` because the dataset is not large enough.
2428

25-
2629
Users can directly test the speed through [how_to_measure_performance_of_models.md](tutorials/how_to_measure_performance_of_models.md). And here is the benchmark in our environment.
30+
2731
<details>
2832
<summary style="margin-left: 25px;">MMCls</summary>
2933
<div style="margin-left: 25px;">
30-
3134
<table class="docutils">
3235
<thead>
3336
<tr>
@@ -180,14 +183,12 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
180183
</tr>
181184
</tbody>
182185
</table>
183-
184186
</div>
185187
</details>
186188

187189
<details>
188190
<summary style="margin-left: 25px;">MMDet</summary>
189191
<div style="margin-left: 25px;">
190-
191192
<table class="docutils">
192193
<thead>
193194
<tr>
@@ -405,7 +406,6 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
405406
<details>
406407
<summary style="margin-left: 25px;">MMEdit</summary>
407408
<div style="margin-left: 25px;">
408-
409409
<table class="docutils">
410410
<thead>
411411
<tr>
@@ -475,7 +475,6 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
475475
</tr>
476476
</tbody>
477477
</table>
478-
479478
</div>
480479
</details>
481480

@@ -568,7 +567,6 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
568567
<details>
569568
<summary style="margin-left: 25px;">MMSeg</summary>
570569
<div style="margin-left: 25px;">
571-
572570
<table class="docutils">
573571
<thead>
574572
<tr>
@@ -673,7 +671,6 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
673671
</tr>
674672
</tbody>
675673
</table>
676-
677674
</div>
678675
</details>
679676

@@ -684,7 +681,6 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut
684681
<details>
685682
<summary style="margin-left: 25px;">MMCls</summary>
686683
<div style="margin-left: 25px;">
687-
688684
<table class="docutils">
689685
<thead>
690686
<tr>
@@ -781,7 +777,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut
781777
<td align="center">93.84</td>
782778
</tr>
783779
<tr>
784-
<td align="center" rowspan="2">ShuffleNetV1 1.0x</td>
780+
<td align="center" rowspan="2">ShuffleNetV1</td>
785781
<td align="center" rowspan="2">Classification</td>
786782
<td align="center">top-1</td>
787783
<td align="center">68.13</td>
@@ -791,7 +787,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut
791787
<td align="center">68.13</td>
792788
<td align="center">67.71</td>
793789
<td align="center">68.11</td>
794-
<td rowspan="2">$MMCLS_DIR/configs/shufflenet_v1/shufflenet_v1_1x_b64x16_linearlr_bn_nowd_imagenet.py</td>
790+
<td rowspan="2">$MMCLS_DIR/configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py</td>
795791
</tr>
796792
<tr>
797793
<td align="center">top-5</td>
@@ -804,7 +800,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut
804800
<td align="center">87.80</td>
805801
</tr>
806802
<tr>
807-
<td align="center" rowspan="2">ShuffleNetV2 1.0x</td>
803+
<td align="center" rowspan="2">ShuffleNetV2</td>
808804
<td align="center" rowspan="2">Classification</td>
809805
<td align="center">top-1</td>
810806
<td align="center">69.55</td>
@@ -814,7 +810,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut
814810
<td align="center">69.54</td>
815811
<td align="center">69.10</td>
816812
<td align="center">69.54</td>
817-
<td rowspan="2">$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py</td>
813+
<td rowspan="2">$MMCLS_DIR/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py</td>
818814
</tr>
819815
<tr>
820816
<td align="center">top-5</td>
@@ -837,7 +833,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut
837833
<td align="center">71.87</td>
838834
<td align="center">70.91</td>
839835
<td align="center">71.84</td>
840-
<td rowspan="2">$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py</td>
836+
<td rowspan="2">$MMEDIT_DIR/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py</td>
841837
</tr>
842838
<tr>
843839
<td align="center">top-5</td>
@@ -1819,8 +1815,8 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut
18191815
</div>
18201816
</details>
18211817

1822-
18231818
### Notes
1819+
18241820
- As some datasets contain images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.
18251821

18261822
- Some int8 performance benchmarks of TensorRT require Nvidia cards with tensor core, or the performance would drop heavily.

docs/zh_cn/benchmark.md

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
## 基准
22

33
### 后端
4+
45
CPU: ncnn, ONNXRuntime, OpenVINO
56

67
GPU: ncnn, TensorRT, PPLNN
78

89
### 延迟基准
910

1011
#### 平台
12+
1113
- Ubuntu 18.04 操作系统
1214
- ncnn 20211208
1315
- Cuda 11.3
@@ -16,19 +18,19 @@ GPU: ncnn, TensorRT, PPLNN
1618
- NVIDIA tesla T4 显卡.
1719

1820
#### 其他设置
21+
1922
- 静态图导出
2023
- 批次大小为 1
2124
- 每次推理后均同步
2225
- 延迟基准测试时,我们计算各个数据集中100张图片的平均延时。
2326
- 热身。 针对ncnn后端,我们热身30轮; 对于其他后端:针对分类任务,我们热身1010轮,对其他任务,我们热身10轮。
2427
- 输入分辨率根据代码库的数据集不同而不同,除了`mmediting`,其他代码库均使用真实图片作为输入。
2528

26-
2729
用户可以直接通过[如何测试延迟](tutorials/how_to_measure_performance_of_models.md)获得想要的速度测试结果。下面是我们环境中的测试结果:
30+
2831
<details>
2932
<summary style="margin-left: 25px;">MMCls</summary>
3033
<div style="margin-left: 25px;">
31-
3234
<table class="docutils">
3335
<thead>
3436
<tr>
@@ -181,14 +183,12 @@ GPU: ncnn, TensorRT, PPLNN
181183
</tr>
182184
</tbody>
183185
</table>
184-
185186
</div>
186187
</details>
187188

188189
<details>
189190
<summary style="margin-left: 25px;">MMDet</summary>
190191
<div style="margin-left: 25px;">
191-
192192
<table class="docutils">
193193
<thead>
194194
<tr>
@@ -406,7 +406,6 @@ GPU: ncnn, TensorRT, PPLNN
406406
<details>
407407
<summary style="margin-left: 25px;">MMEdit</summary>
408408
<div style="margin-left: 25px;">
409-
410409
<table class="docutils">
411410
<thead>
412411
<tr>
@@ -476,7 +475,6 @@ GPU: ncnn, TensorRT, PPLNN
476475
</tr>
477476
</tbody>
478477
</table>
479-
480478
</div>
481479
</details>
482480

@@ -569,7 +567,6 @@ GPU: ncnn, TensorRT, PPLNN
569567
<details>
570568
<summary style="margin-left: 25px;">MMSeg</summary>
571569
<div style="margin-left: 25px;">
572-
573570
<table class="docutils">
574571
<thead>
575572
<tr>
@@ -674,7 +671,6 @@ GPU: ncnn, TensorRT, PPLNN
674671
</tr>
675672
</tbody>
676673
</table>
677-
678674
</div>
679675
</details>
680676

@@ -686,7 +682,6 @@ GPU: ncnn, TensorRT, PPLNN
686682
<details>
687683
<summary style="margin-left: 25px;">MMCls</summary>
688684
<div style="margin-left: 25px;">
689-
690685
<table class="docutils">
691686
<thead>
692687
<tr>
@@ -783,7 +778,7 @@ GPU: ncnn, TensorRT, PPLNN
783778
<td align="center">93.84</td>
784779
</tr>
785780
<tr>
786-
<td align="center" rowspan="2">ShuffleNetV1 1.0x</td>
781+
<td align="center" rowspan="2">ShuffleNetV1</td>
787782
<td align="center" rowspan="2">Classification</td>
788783
<td align="center">top-1</td>
789784
<td align="center">68.13</td>
@@ -793,7 +788,7 @@ GPU: ncnn, TensorRT, PPLNN
793788
<td align="center">68.13</td>
794789
<td align="center">67.71</td>
795790
<td align="center">68.11</td>
796-
<td rowspan="2">$MMCLS_DIR/configs/shufflenet_v1/shufflenet_v1_1x_b64x16_linearlr_bn_nowd_imagenet.py</td>
791+
<td rowspan="2">$MMCLS_DIR/configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py</td>
797792
</tr>
798793
<tr>
799794
<td align="center">top-5</td>
@@ -806,7 +801,7 @@ GPU: ncnn, TensorRT, PPLNN
806801
<td align="center">87.80</td>
807802
</tr>
808803
<tr>
809-
<td align="center" rowspan="2">ShuffleNetV2 1.0x</td>
804+
<td align="center" rowspan="2">ShuffleNetV2</td>
810805
<td align="center" rowspan="2">Classification</td>
811806
<td align="center">top-1</td>
812807
<td align="center">69.55</td>
@@ -816,7 +811,7 @@ GPU: ncnn, TensorRT, PPLNN
816811
<td align="center">69.54</td>
817812
<td align="center">69.10</td>
818813
<td align="center">69.54</td>
819-
<td rowspan="2">$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py</td>
814+
<td rowspan="2">$MMCLS_DIR/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py</td>
820815
</tr>
821816
<tr>
822817
<td align="center">top-5</td>
@@ -839,7 +834,7 @@ GPU: ncnn, TensorRT, PPLNN
839834
<td align="center">71.87</td>
840835
<td align="center">70.91</td>
841836
<td align="center">71.84</td>
842-
<td rowspan="2">$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py</td>
837+
<td rowspan="2">$MMEDIT_DIR/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py</td>
843838
</tr>
844839
<tr>
845840
<td align="center">top-5</td>
@@ -1807,8 +1802,8 @@ GPU: ncnn, TensorRT, PPLNN
18071802
</div>
18081803
</details>
18091804

1810-
18111805
### 注意
1806+
18121807
- 由于某些数据集在代码库中包含各种分辨率的图像,例如 MMDet,速度基准是通过 MMDeploy 中的静态配置获得的,而性能基准是通过动态配置获得的。
18131808

18141809
- TensorRT 的一些 int8 性能基准测试需要具有 tensor core 的 Nvidia 卡,否则性能会大幅下降。

0 commit comments

Comments
 (0)