Skip to content

paddle多cpu训练与预测问题 #19354

@xuzhenglei1991

Description

@xuzhenglei1991

之前用的paddle1.2版本进行的训练,但是速度太慢,故改成多cpu的训练方式,由于paddle1.2调用多cpu的时候报错,具体错误为:

File "python/train_multi_cpu.py", line 397, in train
    feed=feeder.feed(full_batch))
  File "/home/disk7/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 247, in run
    feed_tensor_dict)
paddle.fluid.core.EnforceNotMet: Enforce failed. Expected member_->places_.size() == lod_tensors.size(), but received member_->places_.size():48 != lod_tensors.size():40.
The number of samples of current batch is less than the count of devices, currently, it is not allowed. (48 vs 40) at [/paddle/paddle/fluid/framework/parallel_executor.cc:314]
PaddlePaddle Call Stacks:
0       0x7f96bfe40986p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
1       0x7f96bff2ec4ep paddle::framework::ParallelExecutor::FeedAndSplitTensorIntoLocalScopes(std::unordered_map<std::string, paddle::framework::LoDTensor, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, paddle::framework::LoDTensor> > > const&) + 1118
2       0x7f96bfe8e7f1p
3       0x7f96bfe7a7c0p
4       0x7f974c3d9bb8p PyEval_EvalFrameEx + 25016
5       0x7f974c3dd0bdp PyEval_EvalCodeEx + 2061
6       0x7f974c3da345p PyEval_EvalFrameEx + 26949
7       0x7f974c3da460p PyEval_EvalFrameEx + 27232
8       0x7f974c3dd0bdp PyEval_EvalCodeEx + 2061
9       0x7f974c3dd1f2p PyEval_EvalCode + 50
10      0x7f974c405f42p PyRun_FileExFlags + 146
11      0x7f974c4072d9p PyRun_SimpleFileExFlags + 217
12      0x7f974c41d00dp Py_Main + 3149
13      0x7f974b61abd5p __libc_start_main + 245
14            0x4007a1p

故改成paddle1.5进行训练

这样又遇到一个问题,预测的代码是C++ paddle1.2.0_pb32版本的 load模型的时候会报core,堆栈信息如下:

(gdb) bt
#0  0x00007ffc829783f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#1  0x00007ffc829797d8 in abort () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#2  0x00007ffc83268c65 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffc83266e06 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:38
#4  0x00007ffc83266e33 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#5  0x00007ffc83267052 in __cxxabiv1::__cxa_throw (obj=0x7ffc740319b0, tinfo=0x1785710 <typeinfo for paddle::platform::EnforceNotMet>, dest=
    0x64ddb4 <paddle::platform::EnforceNotMet::~EnforceNotMet()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:87
#6  0x000000000077c5f4 in paddle::framework::ExtractAttribute<std::vector<int, std::allocator<int> > >::operator()(boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > >&) const ()
#7  0x00000000007801cf in paddle::framework::TypedAttrChecker<std::vector<int, std::allocator<int> > >::operator()(std::unordered_map<std::string, boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > > > > >*) const ()
#8  0x000000000067a959 in paddle::framework::OpRegistry::CreateOp(std::string const&, std::map<std::string, std::vector<std::string, std::allocator<std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<std::string, std::allocator<std::string> > > > > const&, std::map<std::string, std::vector<std::string, std::allocator<std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<std::string, std::allocator<std::string> > > > > const&, std::unordered_map<std::string, boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, boost::variant<boost::blank, int, float, std::string, std::vector<int, std::allocator<int> >, std::vector<float, std::allocator<float> >, std::vector<std::string, std::allocator<std::string> >, bool, std::vector<bool, std::allocator<bool> >, paddle::framework::BlockDesc*, long, std::vector<paddle::framework::BlockDesc*, std::allocator<paddle::framework::BlockDesc*> >, std::vector<long, std::allocator<long> > > > > >) ()
#9  0x000000000067aae3 in paddle::framework::OpRegistry::CreateOp(paddle::framework::OpDesc const&) ()
#10 0x0000000000653194 in paddle::framework::Executor::Prepare(paddle::framework::ProgramDesc const&, int, std::vector<std::string, std::allocator<std::string> > const&) ()
#11 0x000000000063b67e in visionary::lac::MainTagger::create_buff (this=0x19f0b00, buff=0x7ffc740008c0) at baidu/visionary/lac/src/main_tagger.cpp:100
#12 0x00000000006330ab in visionary::lac::Lac::create_buff (this=0x19f09d0) at baidu/visionary/lac/src/lac.cpp:107
#13 0x000000000062ef13 in visionary::lac::lac_buff_create (lac_handle=0x19f09d0) at baidu/visionary/lac/src/ilac.cpp:43
#14 0x0000000000629f3f in tagging (max_result_num=1000) at baidu/visionary/lac/tools/lac_class_demo.cpp:135
#15 0x000000000062a5b9 in thread_worker (arg=0x7fffa242eaf0) at baidu/visionary/lac/tools/lac_class_demo.cpp:205

辛苦看下如何解决这个问题。更换训练维paddle1.2的多cpu训练接口?还是更换预测库的维paddle1.5_pb32的呢?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions