Skip to content

fluid.incubate.fleet 接口分布式训练会发生 Segmentation fault #17900

@lzha106

Description

@lzha106

I0531 16:14:44.043287 461 communicator.cc:208] communicator stopped, recv thread exit
I0531 16:14:44.223714 460 communicator.cc:169] communicator stopped, send thread exit
I0531 16:14:44.223843 443 communicator.cc:307] Communicator stop done
*** Aborted at 1559290484 (unix time) try "date -d @1559290484" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 443 (TID 0x7fd54155c700) from PID 0; stack trace: ***
@ 0x7fd541139390 (unknown)
@ 0x7fd523e403eb paddle::memory::allocation::Allocator::FreeImpl()
@ 0x7fd522b4a9b9 std::_Sp_counted_base<>::_M_release()
@ 0x7fd522b4b588 paddle::framework::Variable::PlaceholderImpl<>::~PlaceholderImpl()
@ 0x7fd523defc4d paddle::framework::Scope::~Scope()
@ 0x7fd522d4bdb4 paddle::operators::distributed::Communicator::~Communicator()
@ 0x7fd522c7bd8a std::_Sp_counted_ptr<>::_M_dispose()
@ 0x7fd522b4a9b9 std::_Sp_counted_base<>::_M_release()
@ 0x7fd540d97ff8 (unknown)
@ 0x7fd540d98045 exit
@ 0x7fd540d7e837 __libc_start_main
@ 0x493299 _start
@ 0x0 (unknown)
Segmentation fault

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions