Skip to content

Add async ssa graph executor#15409

Merged
jacquesqiao merged 46 commits intoPaddlePaddle:developfrom
jacquesqiao:add-async-ssa-graph-executor
Apr 2, 2019
Merged

Add async ssa graph executor#15409
jacquesqiao merged 46 commits intoPaddlePaddle:developfrom
jacquesqiao:add-async-ssa-graph-executor

Conversation

@jacquesqiao
Copy link
Member

@jacquesqiao jacquesqiao commented Jan 18, 2019

image

paddle::framework::TensorCopy(main_tensor, cpu, t);
};

auto copy_memory = [&] { t->ShareDataWith(main_tensor); };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems copy_memory and share_memory are reversed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done

@jacquesqiao jacquesqiao requested a review from panyx0718 January 18, 2019 15:49
@guru4elephant guru4elephant self-requested a review January 22, 2019 03:34
namespace details {

AsyncSSAGraphExecutor::AsyncSSAGraphExecutor(
const ExecutionStrategy &strategy, const std::vector<Scope *> &local_scopes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_scopes是从哪里创建带入的?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lodtensor_ptrs.push_back(&fetch_data.at(scope_idx).at(fetch_idx));
}
ret.emplace_back();
ret.back().MergeLoDTensor(lodtensor_ptrs, platform::CPUPlace());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_iteration_per_run_ > 1的情况下,各线程执行速度不一致,merge各个local_scope的结果是否有意义?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个感觉可以去掉其实,反正已经是纯异步了,相当于减少一点做evel的数据量

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

大家步调不一致,参数版本也不一致,确实应该去掉,观察其中一个线程就够了

@jacquesqiao jacquesqiao requested a review from velconia January 28, 2019 09:15
@jacquesqiao jacquesqiao force-pushed the add-async-ssa-graph-executor branch from c9bf8e2 to 10393dd Compare February 25, 2019 02:14
member_->use_cuda_, member_->nccl_ctxs_.get());
if (build_strategy.async_mode_ && !build_strategy.is_distribution_) {
VLOG(3) << "use local async mode";
for (size_t i = 0; i < member_->places_.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panyx0718 has a PR that passed graph insetead of program: #15425 . And ParallelGraphExecutor does not dependence program_desc: #15716 ?

if (pool_) {
for (auto &f : run_futures) {
if (exception_holder_.IsCaught()) {
f.wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why sync here?

// num_trainers is 1, so the current fields of build_strategy doesn't tell if
// it's distributed model.
bool is_distribution_{false};
bool async_mode_{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the relationship between async_mode and is_distribution

// it's distributed model.
bool is_distribution_{false};
bool async_mode_{false};
int num_trainers_{1};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can num_trainers > 1 and not is_distribution?

const std::vector<Scope *> &local_scopes,
const ExecutionStrategy &exec_strategy, const BuildStrategy &build_strategy,
ir::Graph *graph)
std::vector<ir::Graph *> graphs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid multiple graphs. A single graph can contain multiple sub-graphs

if (build_strategy.async_mode_ && !build_strategy.is_distribution_) {
VLOG(3) << "use local async mode";
temp_owned_graph =
build_strategy.Apply(std::move(temp_owned_graph), {member_->places_[0]},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why each graph needs to go through multi-device pass?

# step7: init ParallelExecutor
# ParallelExecutor API will be deprecated, don't support parallel graph.
self._graph = core.Graph(main.desc)
self._graphs = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parallel_executor.py is deprecated.

const ExecutionStrategy &exec_strategy,
const BuildStrategy &build_strategy,
ir::Graph *graph);
std::vector<ir::Graph *> graphs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do multiple graphs.

namespace operators {
namespace reader {
BufferedReader::~BufferedReader() {
VLOG(1) << "~BufferedReader";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this VLOG

# step7: init ParallelExecutor
# ParallelExecutor API will be deprecated, don't support parallel graph.
self._graph = core.Graph(main.desc)
self._graphs = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is wrong.

@jacquesqiao jacquesqiao merged commit 5e8de51 into PaddlePaddle:develop Apr 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants