-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Correcting some grammatical mistakes in the design docs #4378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
abhinavarora
merged 4 commits into
PaddlePaddle:develop
from
abhinavarora:fix_doc_typos
Sep 26, 2017
Merged
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |
|
|
||
| ## Motivation | ||
|
|
||
| In Neural Network, many model is solved by the the backpropagation algorithm(known as BP) at present. Technically it caculates the gradient of the loss function, then distributed back through the networks. Follows the chain rule, so we need a module chains the gradient operators/expressions together with to construct the backward pass. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass. | ||
| In Neural Network, most models are solved by the the backpropagation algorithm(known as **BP**) at present. Technically, BP calculates the gradient of the loss function, then propagates it back through the networks following the chain rule. Hence we need a module that chains the gradient operators/expressions together to construct the backward pass. Every forward network needs a backward network to construct the full computation graph. The operator/expression's backward pass will be generated with respect to the forward pass. | ||
|
|
||
| ## Implementation | ||
|
|
||
|
|
@@ -24,9 +24,9 @@ A backward network is built up with several backward operators. Backward operato | |
| | **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | | ||
| | **Operator::outputs_** | Outputs | InputGradients | | ||
|
|
||
| In most cases, there is a one-to-one correspondence between the forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced. | ||
| In most cases, there is a one-to-one relation between the forward and backward operators. These relations are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and to make operators pluggable, the registry mechanism is introduced. | ||
|
|
||
| For example, we have got a `mul_op`, and we can register its information and corresponding backward operator by the following macro: | ||
| For example, we have `mul_op`, and we can register its information and corresponding backward operator by the following macro: | ||
|
|
||
| ```cpp | ||
| REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); | ||
|
|
@@ -48,49 +48,49 @@ The function `BuildGradOp` will sequentially execute following processes: | |
|
|
||
| 1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`. | ||
|
|
||
| 2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing. | ||
| 2. Build two maps named `inputs` and `outputs` to temporarily store backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing. | ||
|
|
||
| 3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`. | ||
|
|
||
| 4. Building backward operator with `inputs`, `outputs` and forward operator's attributes. | ||
|
|
||
| ### Backward Network Building | ||
|
|
||
| A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and append them together one by one. There is some corner case need to process specially. | ||
| A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and appending them together one by one. There are some corner cases that need special processing. | ||
|
|
||
| 1. Op | ||
|
|
||
| When the input forward network is an Op, return its gradient Operator Immediately. If all of its outputs are in no gradient set, then return a special `NOP`. | ||
| When the input forward network is an Op, return its gradient Operator immediately. If all of its outputs are in no gradient set, then return a special `NOP`. | ||
|
|
||
| 2. NetOp | ||
|
|
||
| In our design, the network itself is also a kind of operator(**NetOp**). So the operators contained by a big network may be some small network. When the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp. | ||
|
|
||
| 3. RnnOp | ||
|
|
||
| RnnOp is a nested stepnet operator. Backward module need to recusively call `Backward` for every stepnet. | ||
| RnnOp is a nested stepnet operator. Backward module needs to recusively call `Backward` for every stepnet. | ||
|
|
||
| 4. Sharing Variables | ||
|
|
||
| **sharing variables**. As illustrated in the pictures, two operator's share the same variable name of W@GRAD, which will overwrite their sharing input variable. | ||
| As illustrated in the figure 1 and figure 2, two operators share the same variable name **W@GRAD**, which will overwrite their shared input variable. | ||
|
|
||
| <p align="center"> | ||
| <img src="./images/duplicate_op.png" width="50%" ><br/> | ||
|
|
||
| pic 1. Sharing variables in operators. | ||
| Figure 1. Sharing variables in operators. | ||
|
|
||
| </p> | ||
|
|
||
| Sharing variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator to replace the overwrite links. | ||
| Sharing variable between operators or same input variable used in multiple operators can lead to duplicate gradient variables. As illustrated in figure 2, we need to rename the gradient names recursively and add a generic add operator to prevent overwriting. | ||
|
|
||
| <p align="center"> | ||
| <img src="images/duplicate_op2.png" width="40%" ><br/> | ||
|
|
||
| pic 2. Replace sharing variable's gradient with `Add` operator. | ||
| Figure 2. Replace sharing variable's gradient with `Add` operator. | ||
|
|
||
| </p> | ||
|
|
||
| Because our framework finds variables accord to their names, we need to rename the output links. We add a suffix of number to represent its position in clockwise. | ||
| Because the framework finds variables according to their names, we need to rename the output links. We add an integer suffix to represent its position in the clockwise direction. | ||
|
|
||
| 5. Part of Gradient is Zero. | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe In our implement => In our implementation? I am not sure about the grammar error. :) |
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the the backpropagation => by the backpropagation