-
Notifications
You must be signed in to change notification settings - Fork 6k
complete data layout transform #7440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
03db455
f50d9b1
90f33ba
acc4d9a
13059a0
70b7636
2f05d64
040c236
1d567ae
fd56a0b
bebe1ce
20c1daa
e685b90
9a7e143
01eb487
8715edb
2c23b82
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| /* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. */ | ||
|
|
||
| #include "paddle/framework/data_layout_transform.h" | ||
|
|
||
| #include "gtest/gtest.h" | ||
| #include "paddle/platform/device_context.h" | ||
|
|
||
| TEST(DataTransform, DataLayoutFunction) { | ||
| using namespace paddle::framework; | ||
| using namespace paddle::platform; | ||
|
|
||
| auto place = CPUPlace(); | ||
| Tensor in = Tensor(); | ||
| Tensor out = Tensor(); | ||
| in.mutable_data<double>(make_ddim({2, 3, 1, 2}), place); | ||
| in.set_layout(DataLayout::kNHWC); | ||
|
|
||
| auto kernel_nhwc = OpKernelType(proto::DataType::FP32, place, | ||
| DataLayout::kNHWC, LibraryType::kPlain); | ||
| auto kernel_ncwh = OpKernelType(proto::DataType::FP32, place, | ||
| DataLayout::kNCHW, LibraryType::kPlain); | ||
|
|
||
| TransDataLayout(kernel_nhwc, kernel_ncwh, in, &out); | ||
|
|
||
| EXPECT_TRUE(out.layout() == DataLayout::kNCHW); | ||
| EXPECT_TRUE(out.dims() == make_ddim({2, 2, 3, 1})); | ||
|
|
||
| TransDataLayout(kernel_ncwh, kernel_nhwc, in, &out); | ||
|
|
||
| EXPECT_TRUE(in.layout() == DataLayout::kNHWC); | ||
| EXPECT_TRUE(in.dims() == make_ddim({2, 3, 1, 2})); | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,18 +15,58 @@ limitations under the License. */ | |
| #include "paddle/framework/data_transform.h" | ||
|
|
||
| #include "paddle/framework/data_device_transform.h" | ||
| #include "paddle/framework/data_layout_transform.h" | ||
|
|
||
| namespace paddle { | ||
| namespace framework { | ||
|
|
||
| static void free_tmp_tensor(const Tensor* in_ptr, const Tensor* tmp_ptr) { | ||
| if (in_ptr != tmp_ptr) { | ||
| delete tmp_ptr; | ||
| } | ||
| } | ||
|
|
||
| void DataTransform(const OpKernelType& expected_kernel_type, | ||
| const OpKernelType& kernel_type_for_var, | ||
| const Tensor& input_tensor, Tensor* out) { | ||
| // in_ptr is used to store the input of data transform function. | ||
| // it need to be deleted by free_tmp_tensor because ip_ptr may be | ||
| // equal to &input_tensor and input_tensor should not be deleted here. | ||
| const Tensor* in_ptr = &input_tensor; | ||
|
|
||
| // out_ptr is used to store the result of data transform function. | ||
| // when one transform is done, the in_ptr should be deleted and the out_ptr | ||
| // should be assigned to in_ptr for next data transformation. | ||
| Tensor* out_ptr = new Tensor(); | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is the datatype transform ?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in next pr |
||
| // do layout transform | ||
| if (NeedTransformLayout(expected_kernel_type.data_layout_, | ||
| kernel_type_for_var.data_layout_)) { | ||
| TransDataLayout(kernel_type_for_var, expected_kernel_type, *in_ptr, | ||
| out_ptr); | ||
| free_tmp_tensor(&input_tensor, in_ptr); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. free_tmp_tensor looks awful too...
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
||
| in_ptr = out_ptr; | ||
| out_ptr = new Tensor(); | ||
| } | ||
|
|
||
| // do device transform | ||
| if (!platform::is_same_place(kernel_type_for_var.place_, | ||
| expected_kernel_type.place_)) { | ||
| DeviceTransform(input_tensor, expected_kernel_type.place_, out); | ||
| DeviceTransform(*in_ptr, expected_kernel_type.place_, out_ptr); | ||
|
|
||
| free_tmp_tensor(&input_tensor, in_ptr); | ||
| in_ptr = out_ptr; | ||
| out_ptr = new Tensor(); | ||
| } | ||
| PADDLE_ENFORCE_NOT_NULL(out, "out should not be null"); | ||
|
|
||
| PADDLE_ENFORCE_NE(in_ptr, &input_tensor, | ||
| "no transform is done, please check!"); | ||
| // get output data | ||
| out->ShareDataWith(*in_ptr); | ||
|
|
||
| // clean up | ||
| free_tmp_tensor(&input_tensor, in_ptr); | ||
| delete out_ptr; | ||
| } | ||
|
|
||
| void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -85,9 +85,14 @@ inline std::string KernelTypeToString(const OpKernelType& kernel_key) { | |
| return stream.str(); | ||
| } | ||
|
|
||
| inline bool NeedTransformLayout(const DataLayout& l, const DataLayout& r) { | ||
| return l != DataLayout::kAnyLayout && r != DataLayout::kAnyLayout && l != r; | ||
| } | ||
|
|
||
| inline bool TransFromNeeded(const OpKernelType& l, const OpKernelType& r) { | ||
| return (!platform::places_are_same_class(l.place_, r.place_)) || | ||
| (l.data_type_ != r.data_type_) || (l.data_layout_ != r.data_layout_); | ||
| (l.data_type_ != r.data_type_) || | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe we are in the wrong way. let's start from a simple convolution case: data -> conv2d -> batch_norm -> fc -> softmaxwe may have a quite different layout pipelines choices. NHWC->NCHW -> NCHW -> NCHW
NHWC->NHWC -> NCHW -> NCHW
NHWC->NHWC -> NHWC -> NCHW
....
NCHW->NCHW -> NCHW -> NCHW
NCHW->NHWC -> NCHW -> NCHW
....All of them are legal, but only one of them is the best. in cudnn, NHWC is 20% faster than other formats. In the case above, I want to claim that we do not run on mixed devices, we can run on different device type..
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear, we support multiple device, but not mixed device.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, I Agree with you, we have a data process chain, and there should be some best practice. On the other hand, I also think that the current data transform just does transform according to expected_kernel_type and actual_kernel_type_for_var, it does not consider what kind of expected_kernel_type it should use, I think to find the best expected_kernel_type should be implemented in GetExpectedKernelType by the user or by the framework. So it does not conflict with the current implementation. |
||
| NeedTransformLayout(l.data_layout_, r.data_layout_); | ||
| } | ||
|
|
||
| } // namespace framework | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new and delete inside DataTransform looks like awful. Furthermore, the memory optimizer cannot track these actions.
How about create 3 Tensor(for each channel transform) in scope?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this comment~~
This data transform is a runtime process, and will not be in ProgramDesc, so memory optimizer can never reach here, it's like some inner logic inside an Operator.
I think here we should have two principles,
I will find a better way to do this