@@ -9,16 +9,16 @@ different purposes.
99
1010## Background
1111
12- The previous implementations of the parameter server does not run a
12+ The previous implementations of the parameter server do not run a
1313fluid sub-program. Parameter initialization, optimizer computation, network
1414communication and checkpointing are implemented twice on both the
15- trainer and the parameter server.
15+ trainer as well as the parameter server.
1616
17- It would be great if we can write code once and use them on both the
18- trainer and the parameter server: reduces code duplication and
19- improves extensibility. Given that after the current refactor , we are
20- representing everything as a computing graph on the
21- trainer. Representing everything as a computing graph on the parameter
17+ It would be great if we can write code once and use them on both: the
18+ trainer and the parameter server, since this reduces code duplication and
19+ improves extensibility. Given that after the current refactoring , we are
20+ representing everything as a computation graph on the
21+ trainer. Representing everything as a computation graph on the parameter
2222server becomes a natural extension.
2323
2424## Design
@@ -30,9 +30,9 @@ into sub-programs to be scheduled on different nodes with the following
3030steps:
3131
32321 . OP placement: the OPs will be placed on different nodes according
33- to heuristic that minimizes estimated total computation
33+ to a heuristic that minimizes the estimated total computation
3434 time. Currently we will use a simple heuristic that puts parameter
35- varable on parameter server workers and everything else on trainer
35+ variable on parameter server workers and everything else on trainer
3636 workers.
37371 . Add communication OPs to enable the communication between nodes.
3838
@@ -47,22 +47,22 @@ After converting:
4747
4848<img src =" src/dist-graph.png " width =" 700 " />
4949
50- 1 . The parameter variable W and it's optimizer program are placed on the parameter server.
50+ 1 . The parameter variable W and its optimizer program are placed on the parameter server.
51511 . Operators are added to the program.
5252 - * Send* sends data to the connected * Recv* operator. The
5353 scheduler on the receive node will only schedule *Recv* operator
5454 to run when the *Send* operator has ran (the *Send* OP will mark
5555 the *Recv* OP runnable automatically).
56- - * Enueue * enqueues the input variable, it can block until space
56+ - * Enqueue * enqueues the input variable, it can block until space
5757 become available in the queue.
5858 - * Dequeue* outputs configurable numbers of tensors from the
59- queue. It will block until the queue have the required number of
59+ queue. It will block until the queue has the required number of
6060 tensors.
6161
6262
6363### Benefits
6464
65- - Model parallelism become easier to implement: it's an extension to
65+ - Model parallelism becomes easier to implement: it is an extension to
6666 the trainer - parameter server approach. We can have several "Transpilers"
6767 to achieve different goals.
6868- User-defined optimizer is easier to add - user can now express it as
@@ -72,22 +72,22 @@ After converting:
7272
7373### Challenges
7474
75- - It's important to balance the parameter shards of on multiple
76- parameter server . If a single parameter is very big (some
75+ - It is important to balance the parameter shards on multiple
76+ parameter servers . If a single parameter is very big (for example: some
7777 word-embedding, fully connected, softmax layer), we need to
7878 automatically partition the single parameter onto different
7979 parameter servers when possible (only element-wise optimizer depends
8080 on the parameter variable).
81- - In the "Aync SGD" figure, the "W" variable on the parameter server
82- could be read and wrote concurrently. See
81+ - In the "Async SGD" figure, the "W" variable on the parameter server
82+ could be read and written concurrently. See
8383 [ here] ( https://github.com/PaddlePaddle/Paddle/pull/6394 ) for more
84- details about concurrent program in fluid .
84+ details about concurrent program in Fluid .
8585
8686### Discussion
8787
8888- Can the Enqueue OP be implemented under our current tensor design
89- (puts the input tensor into the queue tensor)?
90- - * Dequeue* OP will have variable numbers of output (depends on the
89+ (put the input tensor into the queue tensor)?
90+ - * Dequeue* OP will have variable numbers of output (depending on the
9191 ` min_count ` attribute), does our current design support it? (similar
9292 question for the * Add* OP)
9393
0 commit comments