Conversation
|
Thanks for your contribution! |
|
Describe中可以提供下单测结果 |
paddle/fluid/platform/cudnn_helper.h
Outdated
| static const cudnnDataType_t type = CUDNN_DATA_BFLOAT16; | ||
| #else | ||
| static const cudnnDataType_t type = CUDNN_DATA_HALF; | ||
| #endif |
There was a problem hiding this comment.
#else分支不需要吧。当cudnn版本 < 8.1时,整个class应该不被编译。所以是不是在class整体头尾分别加上#if和#endif。
There was a problem hiding this comment.
因为conv2d对于bfloat16需要编译成功,代码逻辑中CudnnDataType<bfloat16>部分会被实例化,如果将整个模板特化使用预处理会出现编译失败问题。
There was a problem hiding this comment.
cudnn8.1版本以下也不该用half类型,应该直接挂掉。另外,你加的都是#ifdef做编译时判断,运行时判断也要加一下?可以参考(但也不完整,需要加下cudnn version的判断):
Paddle/paddle/fluid/operators/conv_op.cc
Lines 187 to 191 in 79f7ba6
done |
Xreki
left a comment
There was a problem hiding this comment.
cudnn bf16使用TensorCore计算,cudnnSetConvolutionMathType需要设置特定的值吗?
| class CUDNNConvOpKernel : public framework::OpKernel<T> { | ||
| public: | ||
| void Compute(const framework::ExecutionContext& ctx) const override { | ||
| #if CUDNN_VERSION_MIN(8, 1, 0) |
There was a problem hiding this comment.
这个检查能不能放到一个公共的地方,比如CudnnDataType<bfloat16>里面?
There was a problem hiding this comment.
CudnnDataType<bfloat16>里只能做编译期检查,这里直接改为cudnn8.1以下不添加bfloat16数据类型的Kernel。
paddle/fluid/platform/cudnn_helper.h
Outdated
| static const cudnnDataType_t type = CUDNN_DATA_BFLOAT16; | ||
| #else | ||
| static const cudnnDataType_t type = CUDNN_DATA_HALF; | ||
| #endif |
There was a problem hiding this comment.
cudnn8.1版本以下也不该用half类型,应该直接挂掉。另外,你加的都是#ifdef做编译时判断,运行时判断也要加一下?可以参考(但也不完整,需要加下cudnn version的判断):
Paddle/paddle/fluid/operators/conv_op.cc
Lines 187 to 191 in 79f7ba6
| create_test_cudnn_bf16_class(TestWithStride, grad_check=False) | ||
| create_test_cudnn_bf16_class(TestWithGroup, grad_check=False) | ||
| create_test_cudnn_bf16_class(TestWith1x1, grad_check=False) | ||
| create_test_cudnn_bf16_class(TestWithInput1x1Filter1x1, grad_check=False) |
There was a problem hiding this comment.
之前试参考cpu上bf16测试,重新commit代码已默认添加反向测试。
| globals()[cls_name] = TestConv2DCUDNNFp16 | ||
|
|
||
|
|
||
| def create_test_cudnn_bf16_class(parent, grad_check=True): |
There was a problem hiding this comment.
需要的,目前已merge最新代码,同步OpTest单测框架改动。
|
Sorry to inform you that cd612c5's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
| self.attrs['use_seq'] = use_seq | ||
| self.check_output(check_dygraph=False, no_check_set=["Cell"]) | ||
| self.check_output( | ||
| check_dygraph=False, no_check_set=["Cell"], atol=2e-2) |
There was a problem hiding this comment.
这里指明了atol,因为你把op_test.py中的atol值改了。这样还是会影响到其他op的单测吧,我觉得最好不改op_test.py,重写OpTest函数就不会影响到其他op单测了。
There was a problem hiding this comment.
这里只影响bfloat16前向精度测试,之前单测框架中写死用0.03,PR的修改只是取消这种固定值,在有需要的各个单测中指定即可。
There was a problem hiding this comment.
修改了op单测中的精度检查方式,影响了mkldnn的单测,请@luotao1 review一下。
paddle/fluid/operators/conv_op.cc
Outdated
| platform::errors::InvalidArgument( | ||
| "bfloat16 can only be used when CUDNN is used")); | ||
| #else | ||
| PADDLE_ENFORCE_NE( |
There was a problem hiding this comment.
- 这个else的逻辑似乎没有必要,永远都不会走到的样子。
- 需要检查运行时的cudnn version是不是>=8.1.0
| globals()[cls_name] = TestConv2DCUDNNFp16 | ||
|
|
||
|
|
||
| def create_test_cudnn_bf16_class(parent, check_grad=False): |
There was a problem hiding this comment.
是不是应该设置check_grad=True,以及梯度检查相关的变量no_need_check_grad?
| } | ||
| self.inputs_fp32 = { | ||
| 'Input': OpTest.np_dtype_to_fluid_dtype(input), | ||
| 'Filter': OpTest.np_dtype_to_fluid_dtype(filter) |
There was a problem hiding this comment.
这是还构造了fp32的conv2d?在PR描述里面说明一下单测检查的逻辑吧。
| self.attrs['use_seq'] = use_seq | ||
| self.check_output(check_dygraph=False, no_check_set=["Cell"]) | ||
| self.check_output( | ||
| check_dygraph=False, no_check_set=["Cell"], atol=2e-2) |
There was a problem hiding this comment.
修改了op单测中的精度检查方式,影响了mkldnn的单测,请@luotao1 review一下。
| self.attrs['use_seq'] = use_seq | ||
| self.check_output(check_dygraph=False, no_check_set=["Cell"]) | ||
| self.check_output( | ||
| check_dygraph=False, no_check_set=["Cell"], atol=2e-2) |
There was a problem hiding this comment.
In file python/paddle/fluid/tests/unittests/op_test.py, atol = 0.03 is not a good way to check forward accuracy. This PR modified the relative error of checking the accuracy of bfload16 data type and deleted the limitation of 0.03. And add atol = 2e-2 here to keep the same accuracy limit as before to ensure the test pass.
| paddle::operators::CUDNNConvDoubleGradOpKernel<float>, | ||
| paddle::operators::CUDNNConvDoubleGradOpKernel<plat::float16>); | ||
| #else | ||
| #if CUDNN_VERSION_MIN(8, 1, 0) |
There was a problem hiding this comment.
注册的代码超过100+行了,可以简化下。这些注册无非3种类型:
- CUDA,CUDNN < 8.1,支持float、double、float16
- CUDA,CUDNN >= 8.1,支持float、double、float16、bfloat16
- ROCM,支持float、float16
可以定义一些注册的宏,比如:REGISTER_CONV_CUDNN_KERNEL_WITH_FP64_BF16、REGISTER_CONV_CUDNN_KERNEL_WITH_FP64、REGISTER_CONV_CUDNN_KERNEL_WITH_BF16?
|
|
||
| def init_kernel_type(self): | ||
| self.use_cudnn = True | ||
| self.no_need_check_grad = True |
There was a problem hiding this comment.
self.no_need_check_grad = True还保留在,有什么影响吗?
PR types
Others
PR changes
Others
Describe
PR功能
conv2d、conv2d_grad、conv2d_grad_grad、depthwise_conv2d_grad_grad共四个OP添加bfloat16数据类型支持;test_conv2d_op.py单测中,添加create_test_cudnn_bf16_class用于测试conv2d在bfloat16数据类型下的单测;OpTest单测框架中,check_output_with_place方法检查前向计算结果时,针对bfloat16类型,使用相对误差替代绝对误差;PR改动
conv2d在CUDNN下添加bfloat16数据类型支持SearchAlgorithm<cudnnConvolutionFwdAlgoPerf_t>逻辑中,添加对于bfloat16数据类型处理,即当使用bfloat16时,cudnnSetConvolutionMathType调用中使用CUDNN_DEFAULT_MATH;GetExpectedKernelType添加运行时检查逻辑,当使用bfloat16类型时,限制library为framework::LibraryType::kCUDNN以及platform::CudnnVersion()大于8100;CUDNN_VERSION判断是否编译进行bfloat16类型的Kernel注册,只有cudnn 8.1及以上版本编译时才注册;test_conv2d_op.py中添加bfloat16数据类型单测OpTest单测框架在bfloat16类型的支持,添加create_test_cudnn_bf16_class方法拓展conv2d在bfloat16类型上的单测,test_check_output用来测试前向计算,test_check_grad_no_filter、test_check_filter_no_grad用来测试反向计算;TestConv2DOP中针对bfloat16类型测试:output使用float类型计算及表示,input、filter使用convert_float_to_uint16将float类型转换为uint16类型标识(在paddle内则按bfloat16处理);self.inputs_fp32记录input、filter原始float类型,该记录用于后续计算反向参考值使用;test_check_grad_no_filter、test_check_filter_no_grad检查反向时,使用OpTest单测框架中提供的get_numeric_gradient方法完成,与其他类型检查不同地方在于inputs参数使用self.inputs_fp32,从而减小数据多次转换带来的误差;OpTest单测框架检查前向时使用相对误差check_output_with_place中,检查精度使用numpy.allclose方法,在该方法调用中添加rtol参数,当数据类型为bfloat16时,设置rtol=1e-2,其他情况设置rtol=1e-5(默认值);自测结果
bfloat16数据类型对应前向、反向计算结果,本地及CI均测试通过;bfloat16类型检查前向、反向结果使用的网络如下图所示,可以看到只有conv2d使用bfloat16类型(目前uint16表示bfloat16):