Skip to content

Fix duties override bug in VC#5305

Merged
mergify[bot] merged 5 commits intosigp:unstablefrom
michaelsproul:vc-past-subscriptions-alt
Mar 4, 2024
Merged

Fix duties override bug in VC#5305
mergify[bot] merged 5 commits intosigp:unstablefrom
michaelsproul:vc-past-subscriptions-alt

Conversation

@michaelsproul
Copy link
Copy Markdown
Member

Issue Addressed

Alternative to:

Proposed Changes

I wasn't fully satisfied with the fix from #5296, as while it works, it's a bit of a bandaid.

This PR attempts to address a bug that I found in how we calculate duties which I think is the root cause. We were previously requesting duties for a single validator to work out whether any updates were required, and then making a request for all validators that we determined needed updating. The problem was we would assume that the duties for the single validator were "new" and relevant, and then use them to overwrite existing duties for that validator, thus removing our knowledge of the subscriptions we'd already sent.

This explains why some users noticed that the expired subscription warnings occurred more often on nodes with validators changing their status from pending to active. These validators would be deemed in need of updating, and would allow the update to go through for the single validator (which is not usually the same validator).

I've also updated the duty update logic to be a bit more defensive about overriding data, logging a warning in cases where we would previously override (which I hope is now unreachable).

Additional Info

This PR could be merged in addition to #5296 if we want to really guarantee (defense in depth) that we don't send bad subscriptions.

@michaelsproul michaelsproul added bug Something isn't working val-client Relates to the validator client binary ready-for-review The code is ready for review labels Feb 26, 2024
// in the initial query. There was previously a bug here where we assumed the duties from the
// initial query were "new" and needed to be inserted into the map, which overrode information
// in the `subscription_slots` and caused expired subscriptions to be sent.
// Seeing as the initial batch is so small, the worst-case bandwidth wastage here is minimal,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't uninitialized_validators.len() == local_pubkeys.len() right after a restart?
So we would end up making 2 requests with indices_to_request.len() == uninitialized_validators.len() in that case.
I think this is okay, but just wanted to confirm that this would be the case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you are right! I got a bit carried away with the 1 validator case

Will try to fix it up so we do reuse the info from the first query, but don't attempt any overrides

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed that update. I still want to roll the changes from the other PR into this one to handle the startup case. I noticed a bunch of warnings on restart when I deployed this PR.

@michaelsproul michaelsproul added the v5.1.0 Q2 2024 label Feb 27, 2024
@michaelsproul
Copy link
Copy Markdown
Member Author

This is ready for review. No more WARN Not enough time for a discovery search logs after running this PR on our Goerli validators. The new warning log that I added also hasn't been hit.

Copy link
Copy Markdown
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Really want to get this in soon so that it's less noisy to debug the InsufficientPeers issues

@michaelsproul
Copy link
Copy Markdown
Member Author

@Mergifyio queue

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 4, 2024

queue

✅ The pull request has been merged automatically

Details

The pull request has been merged automatically at cff6258

mergify bot added a commit that referenced this pull request Mar 4, 2024
@mergify mergify bot merged commit cff6258 into sigp:unstable Mar 4, 2024
@michaelsproul michaelsproul deleted the vc-past-subscriptions-alt branch March 5, 2024 01:20
pawanjay176 pushed a commit to pawanjay176/lighthouse that referenced this pull request Mar 5, 2024
* Fix duties override bug in VC

* Use initial request efficiently

* Prevent expired subscriptions by construction

* Clean up selection proof logic

* Add test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready-for-review The code is ready for review v5.1.0 Q2 2024 val-client Relates to the validator client binary

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants