Add v2 dist benchmark vgg#7539
Conversation
… dist_train_benchmark_vgg16
… dist_train_benchmark_vgg16
… dist_train_benchmark_vgg16
… dist_train_benchmark_vgg16
…onzero/Paddle into dist_train_benchmark_vgg16
…onzero/Paddle into dist_train_benchmark_vgg16
benchmark/cluster/vgg16/README.md
Outdated
|
|
||
| | Batch Size | 32 | 64 | 128 | 256 | | ||
| | -- | -- | -- | -- | -- | | ||
| | PaddlePaddle Fluid | - | 247.40 | - | - | |
There was a problem hiding this comment.
It seesm fluid's performance is 247.40/64=3.866 batch per second, and v2's performance is 256.14/128=2.001 batch per second.
Seems the different is huge, do you have an idea why? (also could you please check if my math is correct).
There was a problem hiding this comment.
Sorry, wrong column. I'll update this PR with full test result.
gongweibao
left a comment
There was a problem hiding this comment.
As discussed offline, we should think about how to
avoid duplication with the same content of PaddleCloud.
|
Thanks! Looks like we have a nice improvement over V2 on batch size 256! |
benchmark/cluster/vgg16/Dockerfile
Outdated
| #RUN mkdir -p /workspace | ||
| #ADD reader.py /workspace/ | ||
| #RUN python /workspace/reader.py | ||
| FROM python:2.7.14 |
There was a problem hiding this comment.
我觉得既然是测试,最好不用这个而是用paddle:dev。
- 不用安装其他的依赖
- 调试的时候进入容器可以用各种命令查看系统的状态。
benchmark/cluster/vgg16/Dockerfile
Outdated
| RUN pip install /*.whl && rm -f /*.whl | ||
| ENV LD_LIBRARY_PATH=/usr/local/lib | ||
| ADD reader.py /workspace/ | ||
| RUN python /workspace/reader.py |
There was a problem hiding this comment.
这个基本上下载不下来,所以需要加提示,提示用户使用代理。
| - name: TOPOLOGY | ||
| value: "" | ||
| - name: ENTRY | ||
| value: "cd /workspace && MKL_NUM_THREADS=1 python /workspace/vgg16_v2.py" |
| - name: TOPOLOGY | ||
| value: "" | ||
| - name: ENTRY | ||
| value: "python train.py" |
… dist_train_benchmark_vgg16
benchmark/cluster/vgg16/README.md
Outdated
| | PaddlePaddle v2 | 15.97 | 17.04 | 17.60 | 17.83 | | ||
| | TensorFlow | - | - | - | - | | ||
|
|
||
| ### different batch size |
There was a problem hiding this comment.
different batch size
=>
Different Batch Size
benchmark/cluster/vgg16/README.md
Outdated
| | TensorFlow | - | - | - | - | | ||
|
|
||
|
|
||
| ### Accelerate rate |
benchmark/cluster/vgg16/README.md
Outdated
| | PaddlePaddle v2 (need more tests) | 326.85 | 534.58 | 853.30 | 1041.99 | | ||
| | TensorFlow | - | - | - | - | | ||
|
|
||
| ### different pserver number |
benchmark/cluster/vgg16/README.md
Outdated
| | TensorFlow | - | - | - | - | | ||
|
|
||
|
|
||
| ### Accelerate rate |
There was a problem hiding this comment.
it's a rate metrics, so maybe we need to calculate this value by https://github.com/PaddlePaddle/Paddle/tree/develop/benchmark/cluster#measure-parallel-efficiency-by-increasing-trainer-count ?
Add results.
…onzero/Paddle into dist_train_benchmark_vgg16
Yancey0623
left a comment
There was a problem hiding this comment.
LGTM, and please refine the titles with the web-site: http://www.titlecase.com
benchmark/cluster/vgg16/README.md
Outdated
|
|
||
| - Trainer Count: 60 | ||
| - Batch Size: 128 | ||
| - Metrics: mini-batch / sec |
There was a problem hiding this comment.
mini-batch / sec
Do you mean samples / sec ?
benchmark/cluster/vgg16/README.md
Outdated
|
|
||
| ## Enable verbos logs | ||
|
|
||
| Edit `pserver.yaml` and `trainer.yaml` and add an environment variable `GLOG_v=3` to see what happend in detail. |
There was a problem hiding this comment.
I'm not sure whether we need to add GLOG_logtostderr=1, if you have tested it, please ignore this comment.
| RUN pip install -U kubernetes opencv-python && apt-get update -y && apt-get install -y iputils-ping libgtk2.0-dev | ||
| # NOTE: By default CI built wheel packages turn WITH_DISTRIBUTE=OFF, | ||
| # so we must build one with distribute support to install in this image. | ||
| RUN pip install paddlepaddle |
There was a problem hiding this comment.
Does this pip install is redundant? Move the dataset download after line12 ?
There was a problem hiding this comment.
No, in order to make debugging faster, lines below changes much, and download dataset is slow, so add this line.
| @@ -1,3 +1,16 @@ | |||
| // Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve. | |||
There was a problem hiding this comment.
The copyright message is duplicated.
No description provided.