If there are too many parameter servers or too many parameter server ports(or sparse ports), some parameter servers will wait forever.
When parameter start up, ti says:
W0522 12:00:09.495564 35864 ParameterServer2.cpp:269] --ports_num or --ports_num_for_sparse might be too large, or total dense parameter size or sparse parameters size might be too small, this psever doesn't store any parameter.
In ParameterServer2.cpp:
void ParameterServer2::setParameter(const SendParameterRequest& request,
std::vector<Buffer>& inputBuffers,
SendParameterResponse* response,
std::vector<Buffer>* outputBuffers) {
...
if (!request.blocks().size()) {
LOG(WARNING)
<< "--ports_num or --ports_num_for_sparse might be too large, "
<< "or total dense parameter size or sparse parameters size "
<< "might be too small, this psever doesn't store any parameter.";
return;
}
...
void ParameterServer2::addGradient(const SendParameterRequest& request,
std::vector<Buffer>& inputBuffers,
SendParameterResponse* response,
std::vector<Buffer>* outputBuffers) {
if (!numPassFinishClients_) {
REGISTER_BARRIER_DELTA_SERVER_SET(
*statSet_,
"forwardbackwardDelta",
FLAGS_num_gradient_servers,
request.trainer_id(),
request.forwardbackward_time(),
isSparseServer_ ? "_sparseUpdater" : "_denseUpdater");
}
It seems that the hanging problem is due to some other reason. But I still need to figure out the details when parameter block is more than pserver instances
If there are too many parameter servers or too many parameter server ports(or sparse ports), some parameter servers will wait forever.
When parameter start up, ti says:
In
ParameterServer2.cpp:It seems that the hanging problem is due to some other reason. But I still need to figure out the details when parameter block is more than pserver instances