Fluid distributed training benchmark by Yancey0623 · Pull Request #7410 · PaddlePaddle/Paddle

Yancey0623 · 2018-01-10T11:51:18Z

typhoonzero

Should we put this doc into design or a separated repo?

typhoonzero · 2018-01-10T12:42:29Z

benchmark/cluster/README.md

+- Docker Image
+
+  We use different base Docker Image to run the benchmark on Kubernetes:
+  - PaddlePaddle v2: paddlepaddle/paddle:latest


Should use a static tag, so when latest tag updates, this benchmark still can be reproduced.

Sure, but we don't have a static tag for fluid distributed training, how about a commit ID?

helinwang · 2018-01-11T00:32:45Z

benchmark/cluster/README.md

+  - TensorFlow: tensorflow/tensorflow:latest
+
+- Model
+  A digits recognize model and MNIST dataset is used in this benchmark.


I think this model is too small. Maybe vgg-16 (probably around 500MB) is more closer to the real usage.

helinwang · 2018-01-11T00:33:58Z

benchmark/cluster/README.md

+  - PServer count of the training job.
+
+- Invariant
+  - The number of trainers.


What is the trainer count we plan to try?

Done.
And @typhoonzero reminds me that we need to measure the parallel efficiency by increasing the trainer count.

Yancey0623 · 2018-01-11T09:34:07Z

From @typhoonzero

Should we put this doc into design or a separated repo?

Maybe not, I saw https://github.com/dzhwinter/benchmark is working for the Fluid benchmark, and I knew from @dzhwinter that will be merged into the Paddle repo in this week.

typhoonzero · 2018-01-11T10:50:08Z

benchmark/cluster/README.md

+- Docker Image
+
+  We use different base Docker Image to run the benchmark on Kubernetes:
+  - PaddlePaddle v2: paddlepaddle/paddle:[commit-id]


v2 should use 0.10.0 tag, and fluid should use commit id

Done, since 0.10.0 does not support v2 distributed training, use 0.11.0 .

typhoonzero

LGTM++

add cluster training bencharmk design

4183a12

Yancey0623 requested review from dzhwinter, helinwang and typhoonzero January 10, 2018 11:51

Yancey0623 changed the title ~~add cluster training bencharmk design~~ Fluid distributed training benchmark Jan 10, 2018

typhoonzero reviewed Jan 10, 2018

View reviewed changes

helinwang reviewed Jan 11, 2018

View reviewed changes

update by comment

97e480a

typhoonzero reviewed Jan 11, 2018

View reviewed changes

update by comment

c86e744

typhoonzero approved these changes Jan 11, 2018

View reviewed changes

Yancey0623 merged commit 5dbd537 into PaddlePaddle:develop Jan 12, 2018

Yancey0623 deleted the cluster_benchmark_design branch January 12, 2018 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fluid distributed training benchmark#7410

Fluid distributed training benchmark#7410
Yancey0623 merged 3 commits intoPaddlePaddle:developfrom
Yancey0623:cluster_benchmark_design

Yancey0623 commented Jan 10, 2018

Uh oh!

typhoonzero left a comment

Uh oh!

typhoonzero Jan 10, 2018

Uh oh!

Yancey0623 Jan 11, 2018

Uh oh!

helinwang Jan 11, 2018 •

edited

Loading

Uh oh!

Yancey0623 Jan 11, 2018

Uh oh!

helinwang Jan 11, 2018

Uh oh!

Yancey0623 Jan 11, 2018

Uh oh!

Yancey0623 commented Jan 11, 2018

Uh oh!

typhoonzero Jan 11, 2018

Uh oh!

Yancey0623 Jan 11, 2018 •

edited

Loading

Uh oh!

typhoonzero left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

Yancey0623 commented Jan 10, 2018

Uh oh!

typhoonzero left a comment

Choose a reason for hiding this comment

Uh oh!

typhoonzero Jan 10, 2018

Choose a reason for hiding this comment

Uh oh!

Yancey0623 Jan 11, 2018

Choose a reason for hiding this comment

Uh oh!

helinwang Jan 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yancey0623 Jan 11, 2018

Choose a reason for hiding this comment

Uh oh!

helinwang Jan 11, 2018

Choose a reason for hiding this comment

Uh oh!

Yancey0623 Jan 11, 2018

Choose a reason for hiding this comment

Uh oh!

Yancey0623 commented Jan 11, 2018

Uh oh!

typhoonzero Jan 11, 2018

Choose a reason for hiding this comment

Uh oh!

Yancey0623 Jan 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

typhoonzero left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

helinwang Jan 11, 2018 •

edited

Loading

Yancey0623 Jan 11, 2018 •

edited

Loading