Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
b985745
add auto_parallel dir
Jun 28, 2021
b79e749
mv to paddle.distributed
Jun 28, 2021
1671850
add shard_xx api
Jul 1, 2021
ec55a43
add distributed attrs for var
Jul 8, 2021
25abc00
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
Jul 9, 2021
bf24fb7
add ut, test=develop
Jul 9, 2021
8ea9363
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
Jul 18, 2021
9e4b3d8
add dist
Jul 21, 2021
e65f77e
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
Jul 22, 2021
8b95c1e
update
Jul 26, 2021
ccae6ae
update
Jul 26, 2021
d107751
update
Jul 27, 2021
f7e70ea
update
Jul 27, 2021
3111159
update
Jul 27, 2021
70cdb69
update, test=develop
Jul 27, 2021
9e5b0f0
update, test=develop
Jul 27, 2021
59936ef
update, test=develop
Jul 27, 2021
27ee413
update, test=develop
Jul 27, 2021
3a8ceef
update, test=develop
Jul 27, 2021
d11f317
update, test=develop
Jul 28, 2021
f5ef245
update, test=develop
Jul 28, 2021
7293b4f
update
Jul 28, 2021
1240edc
update
Jul 28, 2021
05455fb
update
Jul 28, 2021
3e1b3a0
update
Jul 28, 2021
8950c35
update
Jul 28, 2021
b94a9f2
update, test=develop
Jul 28, 2021
e121349
update, test=develop
Jul 28, 2021
fe51aa3
update
Jul 28, 2021
4563d42
update
Jul 28, 2021
192580d
Merge branch 'develop' into auto_parallel_basic
Jul 28, 2021
2e69980
delete unused proto
Jul 28, 2021
608dd3f
resotre op_desc
Jul 28, 2021
cb9b6bf
restore type_defs
Jul 28, 2021
8e6559e
update var_desc
Jul 28, 2021
00f5f4d
remove dimss_mapping for proto_pybind
Jul 28, 2021
1aa94da
update interface.py
Jul 28, 2021
97a446c
update framework.py
Jul 28, 2021
c586fc6
update
Jul 28, 2021
fc6cde9
update
Jul 29, 2021
9d1a664
add auto_parallel dir
Jun 28, 2021
5d1b472
mv to paddle.distributed
Jun 28, 2021
d1aabad
add shard_xx api
Jul 1, 2021
e6ba855
add distributed attrs for var
Jul 8, 2021
3bf613c
add ut, test=develop
Jul 9, 2021
8942a99
[WIP] Add the auto completion feature and related codes
aoyulong Jul 16, 2021
6916cf2
[WIP] Improve the auto completion and related codes
aoyulong Jul 18, 2021
cafdd18
[WIP] Make the auto completion to support data-parallel
aoyulong Jul 19, 2021
4d6dd52
[WIP] Make the completion support mp and dp+mp
aoyulong Jul 19, 2021
3f05d09
[WIP] Refactor auto completion unit test for MLP
aoyulong Jul 20, 2021
2c56e12
[WIP] Refactor the implementation of DistributedOperatorImpl
aoyulong Jul 21, 2021
a83e9cd
[WIP] Improve dims_mapping update rule and fix a bug
aoyulong Jul 21, 2021
203ea14
[WIP] Support auto completion for one transformer decoder layer
aoyulong Jul 21, 2021
bbc2c39
[WIP] Add a minor change
aoyulong Jul 21, 2021
2b6f992
[WIP] Fix a bug within the uint test
aoyulong Jul 22, 2021
921c53d
Shard XShape tensor, add embedding completion and refactor code
aoyulong Jul 27, 2021
a03d503
Add the distributed_operators dir to setup.py.in
aoyulong Jul 28, 2021
3770f13
Improve the completion process and add the unittest for gpt
aoyulong Jul 29, 2021
967d0e7
fix process_mesh ut
Jul 29, 2021
cd1e390
fix process_mesh ut
Jul 29, 2021
f48ec91
update
Jul 29, 2021
b07affa
update, test=develop
Jul 30, 2021
f304b47
Add support for automatically completing distributed attrs of special…
aoyulong Jul 30, 2021
a00fe9e
update
Jul 30, 2021
da9fe30
update
Aug 2, 2021
3daecf2
update
Aug 2, 2021
5640879
fix doc sample codes, test=develop
Aug 2, 2021
05b0f82
improve coverage, test=develop
Aug 2, 2021
fe93d0e
add static_mode check, test=develop
Aug 2, 2021
033c541
Model the cluster for cost model and physical mapping
aoyulong Aug 4, 2021
9856d47
update, test=develop
Aug 4, 2021
890c70c
add set_placement, test=develop
Aug 5, 2021
6291697
Add the check to make sure the candidate tensors' size is great than …
aoyulong Aug 5, 2021
4b90b03
update doc, test=develop
Aug 5, 2021
c395b84
update doc, test=develop
Aug 5, 2021
8390e01
update doc, test=develop
Aug 5, 2021
f7d5631
update doc, test=develop
Aug 6, 2021
3a2666e
update, test=develop
Aug 6, 2021
fa98e39
Auto mark dist attrs annotated by user
aoyulong Aug 9, 2021
b5b8b9b
Merge branch 'PaddlePaddle:develop' into develop
aoyulong Aug 9, 2021
70bc589
Merge branch 'PaddlePaddle:develop' into develop
aoyulong Aug 9, 2021
b9bd421
Merge PR#33804
aoyulong Aug 9, 2021
b59bc33
Merge branch 'PaddlePaddle:develop' into develop
aoyulong Aug 9, 2021
773516b
update ndarray to nested list, test=develop
Aug 10, 2021
685504f
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
Aug 10, 2021
632eeac
Merge branch 'PaddlePaddle:develop' into develop
aoyulong Aug 10, 2021
87abb4b
Merge branch 'pr_33804' into auto_parallel
aoyulong Aug 10, 2021
7ac6299
update, test=develop
Aug 10, 2021
c724593
Add auto-completion module for auto-parallel (based on PR#33804)
aoyulong Aug 11, 2021
63e66bc
Merge branch 'pr_33804' into auto_parallel
aoyulong Aug 11, 2021
7087b1e
Merge branch 'PaddlePaddle:develop' into develop
aoyulong Aug 11, 2021
1908acf
Merge branch 'develop' of https://github.com/aoyulong/Paddle into aut…
aoyulong Aug 11, 2021
86ccd47
Remove unnecessary files
aoyulong Aug 11, 2021
3f7dca2
Remove unrelated files for the auto completion pr
aoyulong Aug 11, 2021
ed02152
Update the unit test to improve the coverage
aoyulong Aug 12, 2021
88e9e23
Modify codes based on reviews
aoyulong Aug 16, 2021
63a6ec6
Minor changes for CI
aoyulong Aug 17, 2021
6b77bc8
Improve some codes based on new comments
aoyulong Aug 17, 2021
411507d
Merge branch 'auto_parallel_completion' of https://github.com/aoyulon…
aoyulong Aug 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions paddle/fluid/framework/op_desc.cc
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,14 @@ void OpDesc::CopyFrom(const OpDesc &op_desc) {
outputs_ = op_desc.outputs_;
attrs_ = op_desc.attrs_;
need_update_ = true;
// When creating graph from program, the creation of op node will create a new
// OpDesc instead of
// referring to the original one. To find the original OpDesc of the op node,
// the id have to be
// copied to the new OpDesc. The var node has the same situation, but the
// default copy constructor
// can copy the id automatically.
id_ = op_desc.id_;
}

OpDesc::OpDesc(const proto::OpDesc &desc, BlockDesc *block)
Expand Down
15 changes: 15 additions & 0 deletions paddle/fluid/framework/op_desc.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ limitations under the License. */

#pragma once

#include <atomic>
#include <string>
#include <unordered_map>
#include <utility>
Expand Down Expand Up @@ -151,6 +152,18 @@ class OpDesc {

const BlockDesc *Block() const { return this->block_; }

// This thread-safe implementation seems to be redudent since the neural
// networks
// are usually constructed in a single thread
static uint64_t GenerateId() {
static std::atomic<std::uint64_t> id{0};
return ++id;
}

// Note: the identity only used as a key for referring to its
// distributed attribute now.
uint64_t Id() { return id_; }

private:
template <typename MapType>
static std::vector<typename MapType::key_type> MapKeys(const MapType &map) {
Expand All @@ -173,6 +186,8 @@ class OpDesc {
// need_update_ indicate there some local changes not be synchronized. If
// local changes should be synchronized, need_update_ should be set to true.
bool need_update_{false};

uint64_t id_ = GenerateId();
};
} // namespace framework
} // namespace paddle
13 changes: 13 additions & 0 deletions paddle/fluid/framework/var_desc.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ limitations under the License. */
#pragma once

#include <algorithm>
#include <atomic>
#include <string>
#include <vector>

Expand Down Expand Up @@ -150,6 +151,17 @@ class VarDesc {

Attribute GetAttr(const std::string &name) const;

// This thread-safe implementation seems to be redudent since the neural
// networks are usually constructed in a single thread.
static uint64_t GenerateId() {
static std::atomic<std::uint64_t> uid{0};
return ++uid;
}

// Note: the identity only used as a key for referring to its
// distributed attribute now.
uint64_t Id() { return id_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might name as dist_attr_id ? since by now it is used for determining the dist_attr identity only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dist_attr_id is obsoleted in the new code because it cannot work well in different distributed contexts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have this question, maybe we can write in the comment that the id_ only be used for determining the dist_attr identity in auto_parallel now? avoid to make developers who read the code feel confused.


private:
const proto::VarType::TensorDesc &tensor_desc() const;
std::vector<proto::VarType::TensorDesc> tensor_descs() const;
Expand All @@ -158,6 +170,7 @@ class VarDesc {

proto::VarDesc desc_;
AttributeMap attrs_;
uint64_t id_ = GenerateId();
};

bool operator==(const VarDesc &left, const VarDesc &right);
Expand Down
3 changes: 2 additions & 1 deletion paddle/fluid/pybind/protobuf.cc
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ limitations under the License. */
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/framework/var_desc.h"
#include "paddle/fluid/framework/version.h"

#include "paddle/fluid/pybind/pybind_boost_headers.h"

namespace paddle {
Expand Down Expand Up @@ -201,6 +200,7 @@ void BindVarDsec(pybind11::module *m) {
.def("attr_names", &pd::VarDesc::AttrNames)
.def("_set_attr", &pd::VarDesc::SetAttr)
.def("remove_attr", &pd::VarDesc::RemoveAttr)
.def("id", &pd::VarDesc::Id)
.def("attr", &pd::VarDesc::GetAttr);

pybind11::enum_<pd::proto::VarType::Type> vartype(var_desc, "VarType", "");
Expand Down Expand Up @@ -293,6 +293,7 @@ void BindOpDesc(pybind11::module *m) {
.def("serialize_to_string", SerializeMessage<pd::OpDesc>)
.def("block", [](pd::OpDesc &self) { return self.Block(); },
pybind11::return_value_policy::reference)
.def("id", &pd::OpDesc::Id)
.def("inputs", &pd::OpDesc::Inputs)
.def("outputs", &pd::OpDesc::Outputs);
}
Expand Down
3 changes: 2 additions & 1 deletion python/paddle/distributed/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@
from . import cloud_utils # noqa: F401
from . import utils # noqa: F401

__all__ = [ #noqa

__all__ = [ # noqa
"spawn",
"scatter",
"broadcast",
Expand Down
1 change: 1 addition & 0 deletions python/paddle/distributed/auto_parallel/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,6 @@
from .interface import set_offload_device # noqa: F401
from .interface import set_pipeline_stage # noqa: F401
from .interface import ProcessMesh # noqa: F401
from .completion import complete_annotation # noqa: F401

__all__ = []
Loading