Support control flow in DataParallel#31625
Conversation
|
Thanks for your contribution! |
| // TODO(liuyuhui) support XPU set constant | ||
| VLOG(3) << "XPU doesn't support set_constant"; | ||
| } | ||
| if (platform::is_xpu_place(group_tensor.place())) { |
paddle/fluid/imperative/reducer.cc
Outdated
| VLOG(3) << "Local used vars : " | ||
| << string::join_strings(local_used_vars_, ','); | ||
| // TODO(liuyuhui): support bckl in using TensorToVector/TensorFromVector | ||
| #if defined(PADDLE_WITH_NCCL) |
| }); | ||
| #elif defined(PADDLE_WITH_RCCL) || defined(PADDLE_WITH_NCCL) | ||
| FusedAllReduceSchedule(run_order, group); | ||
| FusedAllReduceSchedule(run_order, group, next_group_); |
There was a problem hiding this comment.
BKCL的没有加next_group_ =.=
paddle/fluid/imperative/reducer.cc
Outdated
|
|
||
| if (find_unused_vars_) { | ||
| // TODO(liuyuhui) support xpu about Tensorcopy | ||
| #if defined(PADDLE_WITH_NCCL) |
| strategy.find_unused_parameters = True | ||
| """ | ||
|
|
||
| return self.strategy.sync_batch_norm |
| last_comm_group_size_MB) | ||
| last_comm_group_size_MB, | ||
| find_unused_parameters=self._user_defined_strategy. | ||
| find_unused_parameters) |
| } | ||
|
|
||
| void BKCLParallelContext::SynchronizeCompute() { | ||
| // TODO(wangxi16): [Performance optimize] Maybe need to put Wait and |
There was a problem hiding this comment.
这个TODO和上面那个TODO帮忙删一下0.0
| virtual void WaitComm(int ring_id) = 0; | ||
|
|
||
| // synchorize compute stream | ||
| virtual void SynchronizeCompute() = 0; |
There was a problem hiding this comment.
亮哥,顺路把SynchronizeCompute在bkcl_context里面也加下哈~
paddle/fluid/imperative/reducer.cc
Outdated
| } | ||
|
|
||
| if (find_unused_vars_) { | ||
| // TODO(liuyuhui) support xpu about Tensorcopy |
There was a problem hiding this comment.
TensorCopy不是已经支持XPU的place了吗~
There was a problem hiding this comment.
好的,不过由于前面那两个还不支持。所以还是下次xpu添加的时候顺便删一下吧。
| class TestFleetDygraphControlFlowSame(TestDygraphControlFlowSame): | ||
| def _setup_config(self): | ||
| self._sync_mode = False | ||
| self._nccl2_mode = True |
There was a problem hiding this comment.
Add new UT case about bkcl_mode for XPU?
| self._dygraph = True | ||
|
|
||
| def test_mnist(self): | ||
| def test_net(self): |
There was a problem hiding this comment.
Add new UT case about bkcl_mode for XPU?
paddle/fluid/imperative/reducer.cc
Outdated
| auto dtype = var->DataType(); | ||
| auto place = var->Place(); | ||
| const auto dtype = var->DataType(); | ||
| const auto place = var->Place(); |
paddle/fluid/imperative/reducer.cc
Outdated
| void Reducer::PrepareForBackward( | ||
| const std::vector<std::shared_ptr<imperative::VarBase>> &outputs) { | ||
| VLOG(3) << "start reseting count.."; | ||
| VLOG(3) << "start forward and reset count."; |
There was a problem hiding this comment.
Why start forward? This function is called after forward, right?
| auto tensor = | ||
| var_warpper->MutableVar()->GetMutable<framework::LoDTensor>(); | ||
| auto var_base = vars_[var_index]->GradVarBase(); | ||
| auto tensor = var_base->MutableVar()->GetMutable<framework::LoDTensor>(); |
|
|
||
| void Reducer::FusedAllReduceSchedule(int run_order, Group &group) { | ||
| void Reducer::FusedAllReduceSchedule(const int run_order, Group &group, | ||
| const int curr_group_index) { |
There was a problem hiding this comment.
why add curr_group_index?
| std::vector<bool> vars_marked_ready_; | ||
|
|
||
| // Following variables are to help control flow, | ||
| std::vector<int> local_used_vars_; |
There was a problem hiding this comment.
Could we find better name, or add more comment to explain this vector.
Because after all-reduce, what it denotes is global_used_vars instead of local_used_vars
jzhang533
left a comment
There was a problem hiding this comment.
lgtm for find_unused_parameters api exposed in python
PR types
New features
PR changes
Others
Describe
Support control flow in DataParallel
DataParallel exposes the
find_unused_parametersinterface, which is used to detect whether there are unused parameters in the network. This is a compatible upgrade.This PR also did the following work: