Add some dist-training robust cases into fluid benchmark test by velconia · Pull Request #11207 · PaddlePaddle/Paddle

velconia · 2018-06-05T14:01:04Z

2. add learning rate decay feature into fluid benchmark test 3. add L1&L2 regularization feature into fluid benchmark test 4. add error clipping feature into fluid benchmark test 5. add gradient clipping feature into fluid benchmark test

typhoonzero · 2018-06-05T15:01:10Z

benchmark/fluid/models/machine_translation.py

 import paddle.fluid.core as core
 import paddle.fluid.framework as framework
 from paddle.fluid.executor import Executor
+from models.model_base import get_decay_learning_rate


model_base is not uploaded?

Thanks for review, I added the benchmark/fluid/models/model_base.py file in next commit

… benchmark

typhoonzero · 2018-06-06T15:05:28Z

benchmark/fluid/models/machine_translation.py


-    optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
+    # set gradient clip
+    set_gradient_clip(args.gradient_clip_method, args.gradient_clip_norm)


Is there a way that we can disable these settings if the args is empty?

if clip_method in args is None, these settings will be disabled, and if user do NOT specify the args --gradient_clip_method, the args will be None in the case of default.

the code was like below

def set_gradient_clip(clip_method, clip_norm=1.): if not clip_method: return None

typhoonzero · 2018-06-08T02:41:51Z

I'm currently thinking, we can test all these cases using unit test and not CE. Run e2e tests with CE may spend alot of time

velconia · 2018-06-08T06:10:41Z

Actually, test cases have cover the most part of these features, however, what we need is:

Running an program in distributed environment, e.g., run learning decay with parallel executor in 2 parameter servers and 2 trainers which unit tests could not cover.
These tests does not run to end actually, we just run few iterations each time, which will not cost much time.

velconia · 2018-06-08T06:12:12Z

So I guess, put these features in fluid benchmark and add about 6-7 cases in ce is a good choice, which will cost around 30s each case

2. remove lr_decay, regularization, clipping out of fluid_benchmark.py

… benchmark

… out

typhoonzero · 2018-06-10T02:01:09Z

benchmark/fluid/args.py

+        choices=[],
+        help='Error clipping method, not allowed yet')
+    parser.add_argument(
+        '--error_clip_min',


Can we remove clipping and other optimization configures in argument, It might be clean if we leave these settings to model configs, Thanks!

2. fix bug in test_listen_and_serv_op

… benchmark

typhoonzero · 2018-06-11T07:11:57Z

python/paddle/fluid/tests/unittests/test_listen_and_serv_op.py

                return
            except os.error:
-                retry_times -= 1
+                retry_times -= sleep_time


This change seems is not for current PR? and retry_times seems is the total count for trying not the time.

I changed the name retry_times to start_left_time to indicate that this is the left time for pserver starting, and this change is for passing the CI

typhoonzero · 2018-06-11T08:10:42Z

Ref #11213 for adding unit tests

typhoonzero

LGTM!

velconia added 2 commits June 5, 2018 21:51

Add some document to README.md under benchmark/fluid/ repo

3bd8f9e

typhoonzero reviewed Jun 5, 2018

View reviewed changes

velconia added 3 commits June 6, 2018 11:39

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2da0ef7

… benchmark

Add model_base.py

3bf93b3

Fix bugs in test_listen_and_serv_op

8041e8d

typhoonzero reviewed Jun 6, 2018

View reviewed changes

velconia added 6 commits June 8, 2018 15:18

1. remove args out of fluid_benchmark.py

4dd0ded

2. remove lr_decay, regularization, clipping out of fluid_benchmark.py

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7e0afd5

… benchmark

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e67392e

… benchmark

add async_mode description to doc and remove the clipping description…

9c2e68d

… out

for restart build

d11e2bf

to restart build

2da70cc

typhoonzero reviewed Jun 10, 2018

View reviewed changes

velconia added 3 commits June 11, 2018 11:21

remove optimization args from args.py

95cbb43

1. remove optimization from models

e140844

2. fix bug in test_listen_and_serv_op

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4779338

… benchmark

typhoonzero reviewed Jun 11, 2018

View reviewed changes

velconia added 2 commits June 11, 2018 15:16

change the name retry_times to left_time

0a90eee

change retry_times to the pserver start left time

c950d22

typhoonzero approved these changes Jun 11, 2018

View reviewed changes

typhoonzero merged commit 1cfd3cb into PaddlePaddle:develop Jun 11, 2018

Comments

Conversation

velconia commented Jun 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

typhoonzero commented Jun 8, 2018

Uh oh!

velconia commented Jun 8, 2018

Uh oh!

velconia commented Jun 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

typhoonzero commented Jun 11, 2018

Uh oh!

typhoonzero left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants