-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Collators sometimes miss blocks #5349
Description
Is there an existing issue?
- I have searched the existing issues
Experiencing problems? Have you tried our Stack Exchange first?
- This is not a support question.
Description of bug
There is a bug introduced in #3308 that causes collators to sometimes miss blocks, causing longer block times and triggering a reorg. You can reproduce the issue by building the parachain template and polkadot, and then run them with zombienet. If you look at the latency screen, you will see something like this:
Undoing everything in #3308 fixes the problem and the collators will no longer periodically miss blocks. It seems to happen whether or not async backing is enabled, but I mostly tested on older versions without it.
In my testing, the bug only occurs if both the relaychain and parachain are using binaries that include that commit. If either one was built from a version before that commit, the collator performs normally.
We experienced this bug on our testnet when upgrading Enjin Blockchain to polkadot sdk v1.9.0. We worked backwards to find the first version that worked correctly. We solved the issue by forking the sdk and undoing the changes in the mentioned PR.
Steps to reproduce
- Build the parachain template
- Build polkadot
- Run them with zombienet
- Check the latency and notice long block times periodically
Here is the zombienet config I used:
[settings]
timeout = 1000
[relaychain]
default_command = "path/to/polkadot"
chain = "rococo-local"
[[relaychain.nodes]]
name = "Alice"
validator = true
[[relaychain.nodes]]
name = "Bob"
validator = true
[[parachains]]
id = 1000
name = "parachain-template-node"
cumulus_based = true
add_to_genesis = true
register_para = true
[[parachains.collators]]
name = "Alice"
command = "path/to/template-node"
args = ["-ldebug"]
[[parachains.collators]]
name = "Bob"
command = "path/to/template-node"
args = ["-ldebug"]