Add async ssa graph executor#15409
Conversation
| paddle::framework::TensorCopy(main_tensor, cpu, t); | ||
| }; | ||
|
|
||
| auto copy_memory = [&] { t->ShareDataWith(main_tensor); }; |
There was a problem hiding this comment.
seems copy_memory and share_memory are reversed?
| namespace details { | ||
|
|
||
| AsyncSSAGraphExecutor::AsyncSSAGraphExecutor( | ||
| const ExecutionStrategy &strategy, const std::vector<Scope *> &local_scopes, |
There was a problem hiding this comment.
| lodtensor_ptrs.push_back(&fetch_data.at(scope_idx).at(fetch_idx)); | ||
| } | ||
| ret.emplace_back(); | ||
| ret.back().MergeLoDTensor(lodtensor_ptrs, platform::CPUPlace()); |
There was a problem hiding this comment.
num_iteration_per_run_ > 1的情况下,各线程执行速度不一致,merge各个local_scope的结果是否有意义?
There was a problem hiding this comment.
这个感觉可以去掉其实,反正已经是纯异步了,相当于减少一点做evel的数据量
There was a problem hiding this comment.
大家步调不一致,参数版本也不一致,确实应该去掉,观察其中一个线程就够了
… add-async-ssa-graph-executor test=develop
… add-async-ssa-graph-executor test=develop
… add-async-ssa-graph-executor test=develop
… add-async-ssa-graph-executor test=develop
… add-async-ssa-graph-executor test=develop
… add-async-ssa-graph-executor test=develop
c9bf8e2 to
10393dd
Compare
| member_->use_cuda_, member_->nccl_ctxs_.get()); | ||
| if (build_strategy.async_mode_ && !build_strategy.is_distribution_) { | ||
| VLOG(3) << "use local async mode"; | ||
| for (size_t i = 0; i < member_->places_.size(); ++i) { |
There was a problem hiding this comment.
@panyx0718 has a PR that passed graph insetead of program: #15425 . And ParallelGraphExecutor does not dependence program_desc: #15716 ?
… add-async-ssa-graph-executor test=develop
test=develop
| if (pool_) { | ||
| for (auto &f : run_futures) { | ||
| if (exception_holder_.IsCaught()) { | ||
| f.wait(); |
| // num_trainers is 1, so the current fields of build_strategy doesn't tell if | ||
| // it's distributed model. | ||
| bool is_distribution_{false}; | ||
| bool async_mode_{false}; |
There was a problem hiding this comment.
what is the relationship between async_mode and is_distribution
| // it's distributed model. | ||
| bool is_distribution_{false}; | ||
| bool async_mode_{false}; | ||
| int num_trainers_{1}; |
There was a problem hiding this comment.
can num_trainers > 1 and not is_distribution?
| const std::vector<Scope *> &local_scopes, | ||
| const ExecutionStrategy &exec_strategy, const BuildStrategy &build_strategy, | ||
| ir::Graph *graph) | ||
| std::vector<ir::Graph *> graphs) |
There was a problem hiding this comment.
avoid multiple graphs. A single graph can contain multiple sub-graphs
| if (build_strategy.async_mode_ && !build_strategy.is_distribution_) { | ||
| VLOG(3) << "use local async mode"; | ||
| temp_owned_graph = | ||
| build_strategy.Apply(std::move(temp_owned_graph), {member_->places_[0]}, |
There was a problem hiding this comment.
why each graph needs to go through multi-device pass?
| # step7: init ParallelExecutor | ||
| # ParallelExecutor API will be deprecated, don't support parallel graph. | ||
| self._graph = core.Graph(main.desc) | ||
| self._graphs = [] |
There was a problem hiding this comment.
parallel_executor.py is deprecated.
| const ExecutionStrategy &exec_strategy, | ||
| const BuildStrategy &build_strategy, | ||
| ir::Graph *graph); | ||
| std::vector<ir::Graph *> graphs); |
There was a problem hiding this comment.
don't do multiple graphs.
… add-async-ssa-graph-executor
| namespace operators { | ||
| namespace reader { | ||
| BufferedReader::~BufferedReader() { | ||
| VLOG(1) << "~BufferedReader"; |
| # step7: init ParallelExecutor | ||
| # ParallelExecutor API will be deprecated, don't support parallel graph. | ||
| self._graph = core.Graph(main.desc) | ||
| self._graphs = [] |
… add-async-ssa-graph-executor
Uh oh!
There was an error while loading. Please reload this page.