Conversation
| // TODO(gongwb): add more retries. | ||
| ClientBase* c = static_cast<ClientBase*>(tag); | ||
| if (!c->status_.ok()) { | ||
| LOG(ERROR) << "proc param error:" << c->var_h_.String(); |
There was a problem hiding this comment.
Only log one time for the error.
paddle/operators/send_op.cc
Outdated
| client_.wait(); | ||
| if (!client_.wait()) { | ||
| LOG(ERROR) << "send op exit"; | ||
| exit(1); |
There was a problem hiding this comment.
Do not use exit in operators, use PADDLE_ENFORCE
| @@ -0,0 +1,169 @@ | |||
| # Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved | |||
There was a problem hiding this comment.
Can you please split the fix and the book dist sample in two PRs?
| break; | ||
| } | ||
|
|
||
| assert(tag); |
paddle/operators/recv_op.cc
Outdated
| // TODO(gognwb): simply this loop. | ||
| // Get from multiple trainers, we don't care about order in which | ||
| // the gradient arrives, just add suffix 0~n then average the gradient. | ||
| VLOG(4) << "param_count:" << param_count |
There was a problem hiding this comment.
Reduce VLOG appearances.
| grpc::ServerCompletionQueue* cq) | ||
| : service_(service), cq_(cq), status_(PROCESS) {} | ||
| : service_(service), cq_(cq), status_(PROCESS) { | ||
| assert(cq_); |
| << base->GetReqName(); | ||
| // FIXME(gongwb): delete the old one? | ||
| TryToRegisterNewOne(); | ||
| delete base; |
There was a problem hiding this comment.
No other places to release this memory then.
There was a problem hiding this comment.
I'm not sure it's a grpc bug or it's our application bug.
When delete base often, I met an error
| VLOG(4) << cq_name << " recv no regular event"; | ||
| LOG(WARNING) << cq_name << " recv no regular event:argument name" | ||
| << base->GetReqName(); | ||
| // FIXME(gongwb): delete the old one? |
There was a problem hiding this comment.
This comment does not make things clear.
There was a problem hiding this comment.
Can't get more context when ok != true.
paddle/operators/send_op.cc
Outdated
|
|
||
| client_.wait(); | ||
| if (!client_.wait()) { | ||
| LOG(ERROR) << "send op exit"; |
There was a problem hiding this comment.
This log is too simple.
There was a problem hiding this comment.
Detail logs had been logged in functions it calls.
Fix #7520 (comment)
Fix grpc/grpc#13983