Skip to content

*: fix attestation committee index#3820

Merged
obol-bulldozer[bot] merged 3 commits into
mainfrom
kalo/track-attestation-failures
Jun 26, 2025
Merged

*: fix attestation committee index#3820
obol-bulldozer[bot] merged 3 commits into
mainfrom
kalo/track-attestation-failures

Conversation

@KaloyanTanev
Copy link
Copy Markdown
Collaborator

@KaloyanTanev KaloyanTanev commented Jun 25, 2025

During our testing of v1.5.0-rc1 and rc2, we found an issue with clusters that had Nimbus VCs. Upon closer look an incompatibility was found.

Pre electra, VCs upon asking for attestation data from the BN, they were required to specify the committee_index of the validator. Given that the committee index was removed from the signed data post electra, the endpoint staid the same, but it is now required to simply put committee_index=0. However, most VCs do not comply with that and continue asking for specific committee index. BNs on their hand, simply ignore this field and return the data.

On our end as Charon we store the unsigned attestation data in a map, where part of its key structure is the committee index itself. We continued asking for all committee indices in our fetcher, so we saved multiple entries as pre electra. This worked fine, as VCs were asking as well for specific committee indices. BNs on their end are caching the result, so the responses were almost instant.

Nimbus VC though was the only VC working correctly and asking only for attestation data for committee_index=0. This resulted in failures to fetch.

Example:

  1. Fetcher of a leader non-Nimbus fetched for indices 1, 5, 10.
  2. Nimbus VC asked for attestation data for index 0.
  3. Charon of Nimbus VC had only for indices 1, 5, 10 and its request to fetch attestation data for index 0 timed out.

Given that some VCs ask for the specific committee indices and some ask for only 0, this fix keeps the existing fetcher logic to ask for all, but also adds committee index 0 in Charon's DB. This way we have duplicated data, but we ensure compatibility. Note that Charon's memory usage will increase slightly. Once we can confirm all VCs will ask only for committee index 0, we can remove the fetcher logic to ask for all committee indices and reduce Charon's memory usage by a lot.

Note that Lodestar had this enabled only for DVs. It should be fixed in their next release. It is likely the case is similar with other clients as well.

category: bug
ticket: none

@KaloyanTanev KaloyanTanev self-assigned this Jun 25, 2025
@KaloyanTanev KaloyanTanev marked this pull request as draft June 25, 2025 16:27
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 25, 2025

Codecov Report

Attention: Patch coverage is 60.41667% with 38 lines in your changes missing coverage. Please review.

Project coverage is 54.13%. Comparing base (f48d03a) to head (2e506f9).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
app/eth2wrap/utils.go 47.50% 16 Missing and 5 partials ⚠️
app/app.go 0.00% 11 Missing ⚠️
core/dutydb/memory.go 80.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3820      +/-   ##
==========================================
- Coverage   54.15%   54.13%   -0.03%     
==========================================
  Files         227      227              
  Lines       35735    35806      +71     
==========================================
+ Hits        19354    19383      +29     
- Misses      14254    14289      +35     
- Partials     2127     2134       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread core/fetcher/fetcher.go
awaitAttDataFunc func(ctx context.Context, slot, commIdx uint64) (*eth2p0.AttestationData, error)
builderEnabled bool
graffitiBuilder *GraffitiBuilder
electraSlot eth2p0.Slot
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is currently not utilised. It will be once all VCs start asking. Then, if we are post electra, we can safely ask the BN only for 1 attestation data - with a committee index 0.

If there are strong objections not to keep it in the meantime, I'm OK with removing it.

@KaloyanTanev KaloyanTanev marked this pull request as ready for review June 26, 2025 08:51
@KaloyanTanev KaloyanTanev requested review from DiogoSantoss and pinebit and removed request for pinebit June 26, 2025 08:51
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
27.5% Duplication on New Code (required ≤ 15%)

See analysis details on SonarQube Cloud

@KaloyanTanev KaloyanTanev requested a review from pinebit June 26, 2025 08:51
@KaloyanTanev
Copy link
Copy Markdown
Collaborator Author

Sonarcloud is complaining about the repeated Endpoint: "/eth/v1/config/spec", in beaconmock.go, however, I don't see this as an issue at all... If it has to be refactored not to be duplicated, it will make it more unreadable.

@KaloyanTanev KaloyanTanev added the merge when ready Indicates bulldozer bot may merge when all checks pass label Jun 26, 2025
@obol-bulldozer obol-bulldozer Bot merged commit 79e96f8 into main Jun 26, 2025
11 of 12 checks passed
@obol-bulldozer obol-bulldozer Bot deleted the kalo/track-attestation-failures branch June 26, 2025 10:46
KaloyanTanev added a commit that referenced this pull request Jun 26, 2025
During our testing of v1.5.0-rc1 and rc2, we found an issue with clusters that had Nimbus VCs. Upon closer look an incompatibility was found.

Pre electra, VCs upon asking for attestation data from the BN, they were required to specify the `committee_index` of the validator. Given that the committee index was removed from the signed data post electra, the endpoint staid the same, but it is now required to simply put `committee_index=0`. However, most VCs do not comply with that and continue asking for specific committee index. BNs on their hand, simply ignore this field and return the data.

On our end as Charon we store the unsigned attestation data in a map, where part of its key structure is the committee index itself. We continued asking for all committee indices in our fetcher, so we saved multiple entries as pre electra. This worked fine, as VCs were asking as well for specific committee indices. BNs on their end are caching the result, so the responses were almost instant.

Nimbus VC though was the only VC working correctly and asking only for attestation data for `committee_index=0`. This resulted in failures to fetch.

Example:
1. Fetcher of a leader non-Nimbus fetched for indices 1, 5, 10.
2. Nimbus VC asked for attestation data for index 0.
3. Charon of Nimbus VC had only for indices 1, 5, 10 and its request to fetch attestation data for index 0 timed out.

Given that some VCs ask for the specific committee indices and some ask for only 0, this fix keeps the existing fetcher logic to ask for all, but also adds committee index 0 in Charon's DB. This way we have duplicated data, but we ensure compatibility. Note that Charon's memory usage will increase slightly. Once we can confirm all VCs will ask only for committee index 0, we can remove the fetcher logic to ask for all committee indices and reduce Charon's memory usage by a lot.

Note that Lodestar had this enabled only for DVs. It [should be fixed](ChainSafe/lodestar#7958) in their next release. It is likely the case is similar with other clients as well.

category: bug
ticket: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge when ready Indicates bulldozer bot may merge when all checks pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants