Fix duties override bug in VC#5305
Conversation
| // in the initial query. There was previously a bug here where we assumed the duties from the | ||
| // initial query were "new" and needed to be inserted into the map, which overrode information | ||
| // in the `subscription_slots` and caused expired subscriptions to be sent. | ||
| // Seeing as the initial batch is so small, the worst-case bandwidth wastage here is minimal, |
There was a problem hiding this comment.
Wouldn't uninitialized_validators.len() == local_pubkeys.len() right after a restart?
So we would end up making 2 requests with indices_to_request.len() == uninitialized_validators.len() in that case.
I think this is okay, but just wanted to confirm that this would be the case.
There was a problem hiding this comment.
Oh yeah you are right! I got a bit carried away with the 1 validator case
Will try to fix it up so we do reuse the info from the first query, but don't attempt any overrides
There was a problem hiding this comment.
Pushed that update. I still want to roll the changes from the other PR into this one to handle the startup case. I noticed a bunch of warnings on restart when I deployed this PR.
|
This is ready for review. No more |
pawanjay176
left a comment
There was a problem hiding this comment.
Looks great!
Really want to get this in soon so that it's less noisy to debug the InsufficientPeers issues
|
@Mergifyio queue |
✅ The pull request has been merged automaticallyDetailsThe pull request has been merged automatically at cff6258 |
* Fix duties override bug in VC * Use initial request efficiently * Prevent expired subscriptions by construction * Clean up selection proof logic * Add test
Issue Addressed
Alternative to:
Proposed Changes
I wasn't fully satisfied with the fix from #5296, as while it works, it's a bit of a bandaid.
This PR attempts to address a bug that I found in how we calculate duties which I think is the root cause. We were previously requesting duties for a single validator to work out whether any updates were required, and then making a request for all validators that we determined needed updating. The problem was we would assume that the duties for the single validator were "new" and relevant, and then use them to overwrite existing duties for that validator, thus removing our knowledge of the subscriptions we'd already sent.
This explains why some users noticed that the expired subscription warnings occurred more often on nodes with validators changing their status from pending to active. These validators would be deemed in need of updating, and would allow the update to go through for the single validator (which is not usually the same validator).
I've also updated the duty update logic to be a bit more defensive about overriding data, logging a warning in cases where we would previously override (which I hope is now unreachable).
Additional Info
This PR could be merged in addition to #5296 if we want to really guarantee (defense in depth) that we don't send bad subscriptions.