supports distributed classification by gavin1332 · Pull Request #18690 · PaddlePaddle/Paddle

gavin1332 · 2019-07-19T02:11:10Z

minimum functional code support for distributed classification training
design doc: http://agroup.baidu.com/paddlepaddle/md/article/1924796

test=develop

test=develop test=document_preview

guru4elephant

I think range is a concept with at least two values, e.g. [0, 10], how about to change index_range to max_index so that we can understand it easily.

guru4elephant · 2019-07-19T05:58:31Z

AUTHORS.md

 | lcy-seso | Ying Cao |
 | cjld | Dun Liang |
 | lipeng-unisound | Peng Li |
+| liuyi05 | Yi Liu |


are you gavin?

囧，忙晕了

guru4elephant · 2019-07-19T06:09:04Z

paddle/fluid/operators/shard_index_op.h

+  void Compute(const framework::ExecutionContext& context) const override {
+    auto* in = context.Input<LoDTensor>("X");
+    auto* out = context.Output<LoDTensor>("Out");
+    int index_range = context.Attr<int>("index_range");


could you use max_index or some other specific name for this variable? index_range seems to be a value of width, better to use a detailed variable name.

the attribute name "index_range" is ambiguous indeed, and we need another proper name. In most of cases, users have the variable preserving number of indices, so "max_index" attributed requires user manually subtract 1 from the variable and we also have to recover it for "shard_size" calculation later. So I change the attribute "index_range" to "index_num" as a detailed name, which denotes the number of indices precisely.

guru4elephant · 2019-07-19T06:10:41Z

paddle/fluid/operators/shard_index_op.h

+    PADDLE_ENFORCE(shard_id >= 0 && shard_id < nshards,
+                   "shard_id(%d) is not in range [0, %d)", shard_id, nshards);
+
+    int shard_range = index_range / nshards;


same as above. shard_width can be better? shard_range can be width of a shared, it also can be how many shards we have.

have been renamed to "shard_size".

test=document_preview test=develop

guru4elephant

LGTM, please write some technique documents so that it can be easy to use.

sneaxiy

LGTM.

sandyhouse

LGTM

chengduoZH · 2019-07-22T09:09:56Z

paddle/fluid/operators/shard_index_op.cu

+  int shard_size = index_num / nshards;
+  int idx = blockIdx.x * blockDim.x + threadIdx.x;
+  if (idx < numel) {
+    assert(in_data[idx] >= 0 && in_data[idx] < index_num);


Why do you use assert here? do you check whether it works?

I want just make sure the input is in the valid range

gavin1332 · 2019-07-22T09:22:24Z

TODO：replace raw assert by PADDLE_ASSERT_MSG

guoshengCS

LGTM

gavin1332 added 2 commits July 19, 2019 09:25

supports distributed classification training

1659787

test=develop

update API.spec

57e6dc9

test=develop test=document_preview

gavin1332 force-pushed the distfc branch from 04b39ec to 57e6dc9 Compare July 19, 2019 02:40

gavin1332 requested review from guru4elephant, sandyhouse and sneaxiy July 19, 2019 03:02

fix evenly division in python3

b8cbf50

test=develop test=document_preview

gavin1332 requested a review from xsrobin July 19, 2019 03:35

guru4elephant reviewed Jul 19, 2019

View reviewed changes

change "index_range" to "index_num" in shard_index operator

b924274

test=document_preview test=develop

gavin1332 requested a review from chengduoZH July 22, 2019 01:52

guru4elephant approved these changes Jul 22, 2019

View reviewed changes

xsrobin approved these changes Jul 22, 2019

View reviewed changes

sneaxiy approved these changes Jul 22, 2019

View reviewed changes

sandyhouse approved these changes Jul 22, 2019

View reviewed changes

chengduoZH reviewed Jul 22, 2019

View reviewed changes

chengduoZH approved these changes Jul 22, 2019

View reviewed changes

gavin1332 requested a review from hutuxian July 22, 2019 09:35

hutuxian approved these changes Jul 22, 2019

View reviewed changes

gavin1332 requested a review from kuke July 22, 2019 09:56

guoshengCS approved these changes Jul 22, 2019

View reviewed changes

gavin1332 merged commit 157211c into PaddlePaddle:develop Jul 23, 2019

gavin1332 deleted the distfc branch July 31, 2019 08:51

gavin1332 restored the distfc branch July 31, 2019 08:51

Conversation

gavin1332 commented Jul 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guru4elephant left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gavin1332 Jul 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guru4elephant left a comment

Choose a reason for hiding this comment

Uh oh!

sneaxiy left a comment

Choose a reason for hiding this comment

Uh oh!

sandyhouse left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gavin1332 commented Jul 22, 2019

Uh oh!

guoshengCS left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

gavin1332 commented Jul 19, 2019 •

edited

Loading

gavin1332 Jul 21, 2019 •

edited

Loading