diff --git a/docs/en/benchmark.md b/docs/en/benchmark.md index 9e66f19b85..e1609c6f67 100644 --- a/docs/en/benchmark.md +++ b/docs/en/benchmark.md @@ -1,12 +1,15 @@ ## Benchmark ### Backends + CPU: ncnn, ONNXRuntime, OpenVINO GPU: ncnn, TensorRT, PPLNN ### Latency benchmark + #### Platform + - Ubuntu 18.04 - ncnn 20211208 - Cuda 11.3 @@ -15,6 +18,7 @@ GPU: ncnn, TensorRT, PPLNN - NVIDIA tesla T4 tensor core GPU for TensorRT. #### Other settings + - Static graph - Batch size 1 - Synchronize devices after each inference. @@ -22,12 +26,11 @@ GPU: ncnn, TensorRT, PPLNN - Warm up. For ncnn, we warm up 30 iters for all codebases. As for other backends: for classification, we warm up 1010 iters; for other codebases, we warm up 10 iters. - Input resolution varies for different datasets of different codebases. All inputs are real images except for `mmediting` because the dataset is not large enough. - Users can directly test the speed through [how_to_measure_performance_of_models.md](tutorials/how_to_measure_performance_of_models.md). And here is the benchmark in our environment. +
MMCls
- @@ -180,14 +183,12 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
-
MMDet
- @@ -405,7 +406,6 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
MMEdit
-
@@ -475,7 +475,6 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
-
@@ -568,7 +567,6 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
MMSeg
- @@ -673,7 +671,6 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
-
@@ -684,7 +681,6 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut
MMCls
- @@ -781,7 +777,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut - + @@ -791,7 +787,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut - + @@ -804,7 +800,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut - + @@ -814,7 +810,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut - + @@ -837,7 +833,7 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut - + @@ -1819,8 +1815,8 @@ Users can directly test the performance through [how_to_evaluate_a_model.md](tut - ### Notes + - As some datasets contain images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones. - Some int8 performance benchmarks of TensorRT require Nvidia cards with tensor core, or the performance would drop heavily. diff --git a/docs/zh_cn/benchmark.md b/docs/zh_cn/benchmark.md index 3225c44fd8..2a96884ac7 100644 --- a/docs/zh_cn/benchmark.md +++ b/docs/zh_cn/benchmark.md @@ -1,6 +1,7 @@ ## 基准 ### 后端 + CPU: ncnn, ONNXRuntime, OpenVINO GPU: ncnn, TensorRT, PPLNN @@ -8,6 +9,7 @@ GPU: ncnn, TensorRT, PPLNN ### 延迟基准 #### 平台 + - Ubuntu 18.04 操作系统 - ncnn 20211208 - Cuda 11.3 @@ -16,6 +18,7 @@ GPU: ncnn, TensorRT, PPLNN - NVIDIA tesla T4 显卡. #### 其他设置 + - 静态图导出 - 批次大小为 1 - 每次推理后均同步 @@ -23,12 +26,11 @@ GPU: ncnn, TensorRT, PPLNN - 热身。 针对ncnn后端,我们热身30轮; 对于其他后端:针对分类任务,我们热身1010轮,对其他任务,我们热身10轮。 - 输入分辨率根据代码库的数据集不同而不同,除了`mmediting`,其他代码库均使用真实图片作为输入。 - 用户可以直接通过[如何测试延迟](tutorials/how_to_measure_performance_of_models.md)获得想要的速度测试结果。下面是我们环境中的测试结果: +
MMCls
-
93.84
ShuffleNetV1 1.0xShuffleNetV1 Classification top-1 68.1368.13 67.71 68.11$MMCLS_DIR/configs/shufflenet_v1/shufflenet_v1_1x_b64x16_linearlr_bn_nowd_imagenet.py$MMCLS_DIR/configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py
top-587.80
ShuffleNetV2 1.0xShuffleNetV2 Classification top-1 69.5569.54 69.10 69.54$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py$MMCLS_DIR/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py
top-571.87 70.91 71.84$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py$MMEDIT_DIR/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py
top-5
@@ -181,14 +183,12 @@ GPU: ncnn, TensorRT, PPLNN
-
MMDet
- @@ -406,7 +406,6 @@ GPU: ncnn, TensorRT, PPLNN
MMEdit
-
@@ -476,7 +475,6 @@ GPU: ncnn, TensorRT, PPLNN
-
@@ -569,7 +567,6 @@ GPU: ncnn, TensorRT, PPLNN
MMSeg
- @@ -674,7 +671,6 @@ GPU: ncnn, TensorRT, PPLNN
-
@@ -686,7 +682,6 @@ GPU: ncnn, TensorRT, PPLNN
MMCls
- @@ -783,7 +778,7 @@ GPU: ncnn, TensorRT, PPLNN - + @@ -793,7 +788,7 @@ GPU: ncnn, TensorRT, PPLNN - + @@ -806,7 +801,7 @@ GPU: ncnn, TensorRT, PPLNN - + @@ -816,7 +811,7 @@ GPU: ncnn, TensorRT, PPLNN - + @@ -839,7 +834,7 @@ GPU: ncnn, TensorRT, PPLNN - + @@ -1807,8 +1802,8 @@ GPU: ncnn, TensorRT, PPLNN - ### 注意 + - 由于某些数据集在代码库中包含各种分辨率的图像,例如 MMDet,速度基准是通过 MMDeploy 中的静态配置获得的,而性能基准是通过动态配置获得的。 - TensorRT 的一些 int8 性能基准测试需要具有 tensor core 的 Nvidia 卡,否则性能会大幅下降。
93.84
ShuffleNetV1 1.0xShuffleNetV1 Classification top-1 68.1368.13 67.71 68.11$MMCLS_DIR/configs/shufflenet_v1/shufflenet_v1_1x_b64x16_linearlr_bn_nowd_imagenet.py$MMCLS_DIR/configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py
top-587.80
ShuffleNetV2 1.0xShuffleNetV2 Classification top-1 69.5569.54 69.10 69.54$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py$MMCLS_DIR/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py
top-571.87 70.91 71.84$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py$MMEDIT_DIR/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py
top-5