PaddingRNN model memory optimize#16144
Conversation
test=develop
test=develop
| framework::slice_ddim(label_dims, 0, rank - 1), | ||
| "Input(X) and Input(Label) shall have the same shape " | ||
| "except the last dimension."); | ||
| } |
There was a problem hiding this comment.
Why make a different check here?
There was a problem hiding this comment.
Just copy from CrossEntropyOp.
| public: | ||
| using framework::OperatorWithKernel::OperatorWithKernel; | ||
|
|
||
| void InferShape(framework::InferShapeContext* ctx) const override { |
There was a problem hiding this comment.
Most of the logic of InferShape is the same as CrossEntropyOp::InferShape, you can make an inheritance class.
|
|
||
| template <typename T> | ||
| struct CrossEntropyForwardFunctor { | ||
| CrossEntropyForwardFunctor(const T *x, T *y, const int64_t *label, |
There was a problem hiding this comment.
Why don't you reuse CrossEntropyKernel?
If platform::ForRange is faster, we should replace CrossEntropyKernel withplatform::ForRange.
There was a problem hiding this comment.
It is not about speed. I just make a different op here to avoid confusing with original cross entropy op.
| const int64_t *label_; | ||
| int64_t ignore_index_; | ||
| int64_t feature_size_; | ||
| }; |
There was a problem hiding this comment.
The above code should be placed to cross_entropy.h.
There was a problem hiding this comment.
Done. Move to cross_entropy_op_base.h
| namespace ops = paddle::operators; | ||
| REGISTER_OPERATOR(expand, ops::ExpandOp, ops::ExpandOpMaker, | ||
| paddle::framework::DefaultGradOpDescMaker<true>); | ||
| ops::ExpandGradOpDescMaker); |
There was a problem hiding this comment.
We should take care of compatibility. The saved training model may be unavailable if we replace DefaultGradOpDescMaker<true> with ops::ExpandGradOpDescMaker directly.
There was a problem hiding this comment.
Compatibility is OK. The saved training model may contain extra variable names but it does not matter.
test=develop
| @@ -0,0 +1,137 @@ | |||
| /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved. | |||
test=develop
test=develop
Refine cross_entropy and expand ops to save memory. About 12G GPU memory would be saved.