Expand attestation related queues#4058
Conversation
|
By my math, a queue of 32,768 full of attestations will be 9MiB. So, I don't think we're creating any significant DoS risk by expanding this queue. Workings: >>> attestation_data = 8 + 8 + 32 + 32 + 8 + 32 + 8
>>> sig = 96
>>> bitlist = bitlist = 32768 / 64 / 8 # NUM_VALIDATORS_PER_SLOT / MAX_COMMITTEES_PER_SLOT / BITS_PER_BYTE
>>> attestation = bitlist + attestation_data + sig
>>> full_queue_of_attestations = attestation * 32768
>>> full_queue_of_attestations / 1024 / 1024
9.0 |
michaelsproul
left a comment
There was a problem hiding this comment.
Looks good on the whole, but I'm not sure how to decide on the minutiae of the different queue sizes
| const MAX_UNAGGREGATED_ATTESTATION_REPROCESS_QUEUE_LEN: usize = 8_192; | ||
| /// The maximum number of queued `Attestation` objects that reference an unknown | ||
| /// block that will be stored before we start dropping them. | ||
| const MAX_UNAGGREGATED_ATTESTATION_REPROCESS_QUEUE_LEN: usize = 32_768; |
There was a problem hiding this comment.
I wonder whether we should keep the reprocess queue limit at half the regular limit, i.e. only bump it to 16K.
I'd like to have some good basis for this decision (queueing theory?) but I don't, other than "if it ain't broke, don't fix it".
|
|
||
| /// The maximum size of the channel for re-processing work events. | ||
| const MAX_SCHEDULED_WORK_QUEUE_LEN: usize = 3 * MAX_WORK_EVENT_QUEUE_LEN / 4; | ||
| const MAX_SCHEDULED_WORK_QUEUE_LEN: usize = MAX_WORK_EVENT_QUEUE_LEN; |
There was a problem hiding this comment.
Similarly, I wonder if we should keep this at 3/4?
|
Thanks for the comments @michaelsproul. After reading those, looking back at #3239 and doing some more thinking, I'm going to close this PR in favor of #4101. Expanding the queues comes with some risk and I'm not sure that v4.0.0 is a release that needs any more risk. With #4101 we should be able to gather more information and make a better informed decision in the future, with very little v4.0.0 risk |
## Issue Addressed NA ## Proposed Changes Replaces #4058 to attempt to reduce `ERRO Failed to send scheduled attestation` spam and provide more information for diagnosis. With this PR we achieve: - When dequeuing attestations after a block is received, send only one log which reports `n` failures (rather than `n` logs reporting `n` failures). - Make a distinction in logs between two separate attestation dequeuing events. - Add more information to both log events to help assist with troubleshooting. ## Additional Info NA
## Issue Addressed NA ## Proposed Changes Replaces sigp#4058 to attempt to reduce `ERRO Failed to send scheduled attestation` spam and provide more information for diagnosis. With this PR we achieve: - When dequeuing attestations after a block is received, send only one log which reports `n` failures (rather than `n` logs reporting `n` failures). - Make a distinction in logs between two separate attestation dequeuing events. - Add more information to both log events to help assist with troubleshooting. ## Additional Info NA
Issue Addressed
NA
Proposed Changes
We've used 16,384 as the maximum length for a bunch of queues since we didn't really expect to see more than that many attestations per slot.
We would see 16,384 attestations per slot once we reach
16384 * 32 = 524288validators. We're now at ~542k 😬Having small queues is not a big deal, since the queues only fill up if we're not processing attestations. However, it does result in a bunch of
ERRO Failed to send scheduled attestationwhen we see a block later than the rest of the network."Why 32,768?", you may ask. It's the next power-of-two and it seems like a decent long-shot for the validator count in the future. I'm open to other suggestions.
Additional Info
I also ended up increasing the max size of the naive aggregation pool. I don't think this will have any effect since we don't actually expect to hit these limits. I just though it would be nice to be consistent.