Skip to content

v3.1: turbine: Fix and cleanup redundant metric (backport of #9287)#9303

Merged
steviez merged 1 commit intov3.1from
mergify/bp/v3.1/pr-9287
Nov 26, 2025
Merged

v3.1: turbine: Fix and cleanup redundant metric (backport of #9287)#9303
steviez merged 1 commit intov3.1from
mergify/bp/v3.1/pr-9287

Conversation

@mergify
Copy link

@mergify mergify bot commented Nov 26, 2025

Problem

The broadcast-process-shreds-interrupted-stats and broadcast-process-shreds-stats metrics have duplicate fields for the number of shreds per slot (probably an artifact of when we rolled merkle shreds out):

  • num_data_shreds & num_merkle_data_shreds
  • num_coding_shreds & num_merkle_coding_shreds

Additionally, there is currently a bug that will report the wrong values when a slot is interrupted. When the slot is interrupted, we call the below function to create a LAST_IN_SLOT shred which will signal to the rest of the cluster that we're abandoning the block:

if self.slot != bank.slot() {
// Finish previous slot if it was interrupted.
if !self.completed {
let shreds =
self.finish_prev_slot(keypair, bank.ticks_per_slot() as u8, process_stats);

In the function, we accumulate into the passed in stats:
fn finish_prev_slot(
&mut self,
keypair: &Keypair,
max_ticks_in_slot: u8,
stats: &mut ProcessShredsStats,
) -> Vec<Shred> {
if self.completed {
return vec![];
}
// Set the reference_tick as if the PoH completed for this slot
let reference_tick = max_ticks_in_slot;
let shreds: Vec<_> =
Shredder::new(self.slot, self.parent, reference_tick, self.shred_version)
.unwrap()
.make_merkle_shreds_from_entries(
keypair,
&[], // entries
true, // is_last_in_slot,
self.chained_merkle_root,
self.next_shred_index,
self.next_code_index,
&self.reed_solomon_cache,
stats,
)
.inspect(|shred| stats.record_shred(shred))
.collect();

but then report metrics on self.process_shred_stats without accumulating the just modified stats:
self.report_and_reset_stats(/*was_interrupted:*/ true);

Instead, the stats object now has some counters updated for the interrupted slot that will get aggregated into self.process_shred_stats (the next "active" slot) later on:
self.process_shreds_stats += *process_stats;

Assuming we generated S shreds in this function, this results in us under reporting the number of shreds for the interrupted slot by S and over reporting the number of shreds for the next slot by S

Summary of Changes

  • First commit: Operate on self.process_shred_stats within finish_prev_slot()
  • Second commit: Remove the duplicate metric and keep the num_data_shreds; including merkle seemed redundant to me since all shreds are now of the Merkle variant

There is probably some more cleanup that could be done; ie, this feels unnecessary:

.inspect(|shred| {
process_stats.record_shred(shred);
let next_index = match shred.shred_type() {
ShredType::Code => &mut self.next_code_index,
ShredType::Data => &mut self.next_shred_index,
};
*next_index = (*next_index).max(shred.index() + 1);

However, I decided to keep the PR slimmer to keep our options for BP open


This is an automatic backport of pull request #9287 done by Mergify.

The broadcast-process-shred-stats (and interrupted variant) have
duplicate fields for the number of data and coding shred. This is
probably a relic of when we rolled merkle shreds out. So, remove the
duplicate fields.

This also addresses a small issue where some shreds would be counted
in metrics for the wrong slot when we have an interrupted slot

(cherry picked from commit 89ead14)
@mergify mergify bot requested a review from a team as a code owner November 26, 2025 17:11
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.2%. Comparing base (8887656) to head (1d00f6b).
⚠️ Report is 1 commits behind head on v3.1.

Additional details and impacted files
@@            Coverage Diff            @@
##             v3.1    #9303     +/-   ##
=========================================
- Coverage    83.2%    83.2%   -0.1%     
=========================================
  Files         865      865             
  Lines      375625   375605     -20     
=========================================
- Hits       312710   312634     -76     
- Misses      62915    62971     +56     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@steviez steviez merged commit 2efc616 into v3.1 Nov 26, 2025
44 checks passed
@steviez steviez deleted the mergify/bp/v3.1/pr-9287 branch November 26, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants