Take into account network latency when syncing #55
Conversation
…etting stuck in an always lib catchup state. Co-authored-by: Farhad Shahabi <farhad.shahabi@block.one>
plugins/net_plugin/net_plugin.cpp
Outdated
| sync_reset_lib_num(c); | ||
|
|
||
| auto current_time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch()).count(); | ||
| auto network_latency_ns = current_time_ns - msg.time; // net latency in nanoseconds |
There was a problem hiding this comment.
What if this is negative?
There was a problem hiding this comment.
Negative would mean time skew between the nodes, should just make it 0 if < 0 I guess.
There was a problem hiding this comment.
That makes sense if the clock skew is known to be small... but you removed the check for skew. If the clock skew is close to the latency, then one side will see double latency and the other will see 0 latency.
There was a problem hiding this comment.
The check for skew that was removed never worked (see comment I just added to PR for that section of code). I'm open for suggestions on alternatives, but I don't think there is any way to improve that, right?
There was a problem hiding this comment.
It would have to be based on RTT (which can be measured independent of clock skew) rather than one-way latency.
There was a problem hiding this comment.
Yes, that would work, but require a new protocol version and RT message.
There was a problem hiding this comment.
I think it's probably fine to assume low clock skew for now, since we've survived so long without a working check for clock skew. This PR doesn't make things worse in that regard.
| ("peer", msg.p2p_address)("time", "1 second")); // TODO Add to_variant for std::chrono::system_clock::duration | ||
| return false; | ||
| } | ||
|
|
There was a problem hiding this comment.
Adding note to this PR here for future documentation of why this was removed. Removed this code because it could never have worked. time is in microseconds where msg.time is in nanoseconds so time - msg_time is always negative.
Also there is no way to do what this was trying to do. You don't know how much network latency is involved so you have no idea what clock skew is involved.
plugins/net_plugin/net_plugin.cpp
Outdated
| } | ||
| // number of blocks syncing node is behind from a peer node | ||
| uint32_t nblk_behind_by_net_latency = static_cast<uint32_t>(network_latency_ns / block_interval_ns); | ||
| // Multiplied by 2 to compensate the time it takes for message to reach peer node, and plus 1 to compensate for integer division truncation |
There was a problem hiding this comment.
I think if we change it to "to reach back to that peer node" I think the 2 times will be clearer.
Take into account network latency when syncing from a node to avoid getting stuck in an always lib catchup state.
p2p_high_latency_test.py) as it requires eitheriproute-tcoriproute2installed depending on platform.Co-authored-by: Farhad Shahabi farhad.shahabi@block.one