Skip to content

多线程调用C++推理库进行RNN算子崩溃问题!!!! #49737

@marsbzp

Description

@marsbzp

bug描述 Describe the Bug

系统环境/System Environment:Centos
版本号/Version:Paddle_inference:2.3 PaddleOCR: 2.4 cuda:10.1
运行指令/Command Code:
完整报错/Complete Error Message:

多线程调用C++推理库进行OCR推理时出现报错,使用的是官网下载的识别模型ch_ppocr_server_v2.0_rec_infer,
非必现,线程数越多,越容易出现。不报错时可以正常分析,基本都是崩在了RNN算子那里,是否是Paddle实现的RNN算子有问题(每个线程有单独的检测识别模型,各线程间模型不共享)
和下面这个issue基本一致可以参照里面描述进行复现
PaddlePaddle/PaddleOCR#6514
PaddlePaddle/FastDeploy#1143
使用fastdeploy库paddle后端进行推理也会有相同的问题

报错信息如下

The detection visualized image saved in ./ocr_vis.png
Detected boxes num: 8
Detected boxes num: 8
Detected boxes num: 8
The detection visualized image saved in ./ocr_vis.png
The detection visualized image saved in ./ocr_vis.png
The detection visualized image saved in ./ocr_vis.png

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffd397fe700 (LWP 5157)]
0x00007fff5308b04e in ?? () from /lib64/libcuda.so.1
Missing separate debuginfos, use: debuginfo-install glibc-2.17-324.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libgomp-4.8.5-44.el7.x86_64
(gdb) bt
#0 0x00007fff5308b04e in ?? () from /lib64/libcuda.so.1
#1 0x00007fff53204b0f in ?? () from /lib64/libcuda.so.1
#2 0x00007fff530751e0 in ?? () from /lib64/libcuda.so.1
#3 0x00007fff531dded6 in ?? () from /lib64/libcuda.so.1
#4 0x00007fff52f85a1b in ?? () from /lib64/libcuda.so.1
#5 0x00007fff52f85c98 in ?? () from /lib64/libcuda.so.1
#6 0x00007fff52f85cde in ?? () from /lib64/libcuda.so.1
#7 0x00007fff5310c806 in cuLaunchKernel () from /lib64/libcuda.so.1
#8 0x00007fff64b5aa19 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#9 0x00007fff64b5aaa7 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#10 0x00007fff64b90e9b in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#11 0x00007fff647f83de in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#12 0x00007fff647f29ea in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#13 0x00007fff646d7a76 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#14 0x00007fff647a4bf3 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#15 0x00007fff647e908f in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#16 0x00007fff647ebfd8 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#17 0x00007fff647ec7bf in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#18 0x00007fff647f23b0 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#19 0x00007fff645a4342 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#20 0x00007fff645cf335 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#21 0x00007fff645d3e81 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#22 0x00007fff645c1d20 in ?? () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#23 0x00007fff645c277f in cudnnRNNForwardInference () from /usr/local/cuda-10.1/lib64/libcudnn.so.7
#24 0x00007fff995599ef in ?? () from /home/share/disk2/zhangjinlong21/inference_project/ocr_inference/cpp_infer_release/3rdparty/paddle_inference_2.3/paddle/lib/libpaddle_inference.so
#25 0x00007fff9955e2f4 in void phi::RnnKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::optional<phi::DenseTensor const&>, float, bool, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocatorphi::DenseTensor* >, phi::DenseTensor*) () from /home/share/disk2/zhangjinlong21/inference_project/ocr_inference/cpp_infer_release/3rdparty/paddle_inference_2.3/paddle/lib/libpaddle_inference.so
#26 0x00007fff9955ee5d in void phi::KernelImpl<void ()(phi::GPUContext const&, phi::DenseTensor const&, std::vector<phi::DenseTensor const, std::allocator<phi::DenseTensor const*> > const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::optional<phi::DenseTensor const&>, float, bool, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocatorphi::DenseTensor* >, phi::DenseTensor*), &(void phi::RnnKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::optional<phi::DenseTensor const&>, float, bool, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocatorphi::DenseTensor* >, phi::DenseTensor*))>::KernelCallHelper<std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::optional<phi::DenseTensor const&>, float, bool, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocatorphi::DenseTensor* >, phi::DenseTensor*, phi::TypeTag >::Compute<1, 1, 0, 0, phi::GPUContext const, phi::DenseTensor const>(phi::KernelContext*, phi::GPUContext const&, phi::DenseTensor const&) ()
from /home/share/disk2/zhangjinlong21/inference_project/ocr_inference/cpp_infer_release/3rdparty/paddle_inference_2.3/paddle/lib/libpaddle_inference.so
#27 0x00007fff9cefaa3a in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&, paddle::framework::RuntimeContext*) const ()
from /home/share/disk2/zhangjinlong21/inference_project/ocr_inference/cpp_infer_release/3rdparty/paddle_inference_2.3/paddle/lib/libpaddle_inference.so
#28 0x00007fff9cefb629 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&) const ()
from /home/share/disk2/zhangjinlong21/inference_project/ocr_inference/cpp_infer_release/3rdparty/paddle_inference_2.3/paddle/lib/libpaddle_inference.so
#29 0x00007fff9ceec10b in paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, phi::Place const&) () from /home/share/disk2/zhangjinlong21/inference_project/ocr_inference/cpp_infer_release/3rdparty/paddle_inference_2.3/paddle/lib/libpaddle_inference.so
#30 0x00007fff96fa23d0 in paddle::framework::NaiveExecutor::Run() () from /home/share/disk2/zhangjinlong21/inference_project/ocr_inference/cpp_infer_release/3rdparty/paddle_inference_2.3/paddle/lib/libpaddle_inference.so
#31 0x00007fff96bfd8fb in paddle::AnalysisPredictor::ZeroCopyRun() () from /home/share/disk2/zhangjinlong21/inference_project/ocr_inference/cpp_infer_release/3rdparty/paddle_inference_2.3/paddle/lib/libpaddle_inference.so
#32 0x000000000044198c in PaddleOCR::CRNNRecognizer::Run (this=0x7ffd397ecfd0, img_list=..., times=0x7ffd397ecf70) at /home/share/disk1/bzp/ocr_inference_multithread/cpp_infer_qm/src/ocr_rec.cpp:63
#33 0x000000000042ce4a in predictor_thread (cv_all_img_names=..., thread_id=14) at /home/share/disk1/bzp/ocr_inference_multithread/cpp_infer_qm/src/main.cpp:139
#34 0x00000000004314e7 in std::__invoke_impl<void, void ()(std::vector<cv::String, std::allocatorcv::String >, int), std::vector<cv::String, std::allocatorcv::String >, int>(std::__invoke_other, void (&&)(std::vector<cv::String, std::allocatorcv::String >, int), std::vector<cv::String, std::allocatorcv::String >&&, int&&) (__f=<unknown type in /home/share/disk1/bzp/ocr_inference_multithread/cpp_infer_qm/build/ppocr, CU 0x0, DIE 0x40211>,
__args#0=<unknown type in /home/share/disk1/bzp/ocr_inference_multithread/cpp_infer_qm/build/ppocr, CU 0x0, DIE 0x40234>, __args#1=<unknown type in /home/share/disk1/bzp/ocr_inference_multithread/cpp_infer_qm/build/ppocr, CU 0x0, DIE 0x40244>)
at /usr/local/gcc-8.2/include/c++/8.2.0/bits/invoke.h:60
#35 0x000000000042fca9 in std::__invoke<void ()(std::vector<cv::String, std::allocatorcv::String >, int), std::vector<cv::String, std::allocatorcv::String >, int>(void (&&)(std::vector<cv::String, std::allocatorcv::String >, int), std::vector<cv::String, std::allocatorcv::String >&&, int&&) (__fn=<unknown type in /home/share/disk1/bzp/ocr_inference_multithread/cpp_infer_qm/build/ppocr, CU 0x0, DIE 0x42458>,
__args#0=<unknown type in /home/share/disk1/bzp/ocr_inference_multithread/cpp_infer_qm/build/ppocr, CU 0x0, DIE 0x4247a>, __args#1=<unknown type in /home/share/disk1/bzp/ocr_inference_multithread/cpp_infer_qm/build/ppocr, CU 0x0, DIE 0x42489>)
---Type to continue, or q to quit---
at /usr/local/gcc-8.2/include/c++/8.2.0/bits/invoke.h:95
#36 0x00000000004350b9 in std::thread::_Invoker<std::tuple<void ()(std::vector<cv::String, std::allocatorcv::String >, int), std::vector<cv::String, std::allocatorcv::String >, int> >::_M_invoke<0ul, 1ul, 2ul> (this=0x2ace388)
at /usr/local/gcc-8.2/include/c++/8.2.0/thread:234
#37 0x0000000000435058 in std::thread::_Invoker<std::tuple<void (
)(std::vector<cv::String, std::allocatorcv::String >, int), std::vector<cv::String, std::allocatorcv::String >, int> >::operator() (this=0x2ace388) at /usr/local/gcc-8.2/include/c++/8.2.0/thread:243
#38 0x000000000043503c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(std::vector<cv::String, std::allocatorcv::String >, int), std::vector<cv::String, std::allocatorcv::String >, int> > >::_M_run (this=0x2ace380)
at /usr/local/gcc-8.2/include/c++/8.2.0/thread:186
#39 0x00007ffff7f3f19d in std::execute_native_thread_routine (__p=0x2ace380) at /home/nwani/m3/conda-bld/compilers_linux-64_1560109574129/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
#40 0x00007fff62ccaea5 in start_thread () from /lib64/libpthread.so.0
#41 0x00007fff624db9fd in clone () from /lib64/libc.so.6
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions