Skip to content

Conversation

@sneaxiy
Copy link
Collaborator

@sneaxiy sneaxiy commented Aug 17, 2021

PR types

New features

PR changes

Others

Describe

Enable program passes on Fleet APIs. Related doc PR: PaddlePaddle/docs#3854

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@sneaxiy sneaxiy changed the title [WIP] Program pass fleet Enable program passes on Fleet APIs Aug 24, 2021

def apply_ir_passes(main_program, startup_program, config):
build_strategy = config._user_defined_strategy.build_strategy._copy()
if not paddle.fluid.core.globals()['FLAGS_apply_pass_to_program']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use _global_flags() to replace paddle.fluid.core.globals()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#include "paddle/fluid/platform/mkldnn_helper.h"
#endif

DEFINE_bool(apply_pass_to_program, false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better put it in flags.cc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

'bias': 0.0,
'bias_after_scale': False
})
new_grad.op._set_attr(op_maker.kOpRoleAttrName(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use main_program._optimized_guard()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some operators are marked as kBackward. Not applicable to use main_program._optimized_guard().

sandyhouse
sandyhouse previously approved these changes Aug 26, 2021
Copy link

@sandyhouse sandyhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

JZ-LIANG
JZ-LIANG previously approved these changes Aug 31, 2021
Copy link
Contributor

@JZ-LIANG JZ-LIANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

zhiqiu
zhiqiu previously approved these changes Sep 7, 2021
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if(WITH_DISTRIBUTE)
set_tests_properties(test_new_group_api PROPERTIES TIMEOUT 120)
set_tests_properties(test_pipeline PROPERTIES TIMEOUT 120)
set_tests_properties(test_pipeline PROPERTIES TIMEOUT 240)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尽量不要改单测时间,优先看一下是否有优化方法,否则CI的负担会很大

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已对单测进行拆分,但拆分后仍有有一个超时时间为120s的单测test_ir_pass_pipeline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单测超时问题请 @kolinwei 确认一下,谢谢

* Fleet APIs.
* Note: Apply IR pass to program. Be only useful when using Fleet APIs.
*/
DEFINE_bool(apply_pass_to_program, false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

官网上关于flag的文档是否对应更新了?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lanxianghit
lanxianghit previously approved these changes Sep 7, 2021
Copy link
Contributor

@lanxianghit lanxianghit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

XieYunshen
XieYunshen previously approved these changes Sep 7, 2021
Copy link
Contributor

@XieYunshen XieYunshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sneaxiy sneaxiy dismissed stale reviews from XieYunshen and lanxianghit via f0a2a0a September 7, 2021 10:48
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Sep 7, 2021
@PaddlePaddle PaddlePaddle unlocked this conversation Sep 7, 2021
@sneaxiy sneaxiy requested a review from xiegegege September 8, 2021 00:03
Copy link
Contributor

@lanxianghit lanxianghit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sneaxiy sneaxiy merged commit 5f36988 into PaddlePaddle:develop Sep 8, 2021
@sneaxiy sneaxiy deleted the program_pass_fleet branch September 8, 2021 03:58
2742195759 pushed a commit to 2742195759/Paddle that referenced this pull request Sep 10, 2021
* add fleet api for program pass

* turn on apply pass for CI test

* fix disable fuse_all_optimizer bug

* try to test ci

* fix CI

* fill unspecified op role

* fix fuse_allreduce

* add ut to improve coverage

* remove useless change

* improve c++ coverage

* follow some comments

* test ir pass pipeline

* update doc

* reduce ut time again
block = self.main_program.global_block()

last_backward_op_idx = None
for i, op in enumerate(reversed(gm_block.ops)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
应该是reversed(list(enumerate(gm_block.ops)))吧

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #35704 .

return

gm_block._insert_op(
last_backward_op_idx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last_backward_op_idx + 1吧,插入到最后一个last_backward_op_idx后面,optimize前面。此时last_backward_op_idx的默认值应该是-1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #35704 .

outputs={'Out': g},
attrs={
'ring_id': ring_id,
OP_ROLE_KEY: OpRole.Backward,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个用Optimize可能更准确些,PE里面放到backward主要可以overlap。不过这个没啥影响就是了,pipeline有自己的gradient merge,其它的也不会用。

Copy link
Collaborator Author

@sneaxiy sneaxiy Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,这个OpRole目前设置太随意了。或者说没有统一规范,依需求来设置...

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021
* add fleet api for program pass

* turn on apply pass for CI test

* fix disable fuse_all_optimizer bug

* try to test ci

* fix CI

* fill unspecified op role

* fix fuse_allreduce

* add ut to improve coverage

* remove useless change

* improve c++ coverage

* follow some comments

* test ir pass pipeline

* update doc

* reduce ut time again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants