Skip to content

Added janitor task for automatic deposit recovery with Prometheus metrics#1095

Merged
sigurpol merged 10 commits intomainfrom
clear_old_round_data
Jun 20, 2025
Merged

Added janitor task for automatic deposit recovery with Prometheus metrics#1095
sigurpol merged 10 commits intomainfrom
clear_old_round_data

Conversation

@sigurpol
Copy link
Copy Markdown
Contributor

@sigurpol sigurpol commented Jun 19, 2025

Implements automatic cleanup of old discarded submissions to reclaim deposits.
Close #980.

Features:

  • Triggers when the election round increments (runs once per new round)
  • Scans the last 5 rounds for old submissions at most (the usual process is to simply clear the previous round).
  • Calls clear_old_round_data() to recover deposits and clean storage
  • Non-blocking integration with existing mining operations
  • Comprehensive error handling (critical vs recoverable errors)

Architecture

  • Introduce JanitorMessage enum and janitor_task for deposit recovery
  • Use dedicated bounded channel for janitor communication
  • Update listener to send janitor ticks to janitor task
  • Ensure mining and janitor operations are independent and non-blocking
┌──────────────────────────────────────────────────────────────────────────┐
│   ┌─────────────┐                      ┌─────────────┐            ┌─────────────┐
└──▶│ Listener    │                      │   Miner     │            │ Blockchain  │
    │             │  Snapshot/Signed     │             │            │             │
    │ ┌─────────┐ │ ────────────────────▶│ ┌─────────┐ │ (solutions)│             │
    │ │ Stream  │ │  (mining work)       │ │ Mining  │ │───────────▶│             │
    │ └─────────┘ │                      │ └─────────┘ │            │             │
    │      │      │  Round++             │ ┌─────────┐ │            │             │
    │      ▼      │ ────────────────────▶│ │ Clear   │ │            │             │
    │ ┌─────────┐ │                      │ │ Snapshot│ │            │             │
    │ │ Phase   │ │                      │ └─────────┘ │            │             │
    │ │ Check   │ │  Round++             └─────────────┘            │             │
    │ └─────────┘ │ ────────────────────▶┌─────────────┐            │             │
    │             │  (deposit cleanup)   │  Janitor    │ (cleanup)  │             │
    │             │                      │ ┌─────────┐ │───────────▶│             │
    │             │                      │ │ Cleanup │ │            │             │
    │             │                      │ └─────────┘ │            │             │
    └─────────────┘                      └─────────────┘            └─────────────┘

Prometheus Metrics Added:

Counters:

  • staking_miner_janitor_cleanup_success_total: Successful cleanup operations
  • staking_miner_janitor_cleanup_failures_total: Failed cleanup operations

Gauges:

  • staking_miner_janitor_cleanup_duration_ms: Time taken for last cleanup
  • staking_miner_janitor_old_submissions_found: Old submissions discovered
  • staking_miner_janitor_old_submissions_cleared: Submissions successfully cleared

Key Metrics:

  • Success rate: success_total / (success_total + failures_total)
  • Performance: avg(cleanup_duration_ms)
  • Activity: old_submissions_cleared shows actual deposit recovery

Example Prometheus queries:

staking_miner_janitor_cleanup_success_total / (staking_miner_janitor_cleanup_success_total + staking_miner_janitor_cleanup_failures_total)

increase(staking_miner_janitor_cleanup_success_total[24h])

staking_miner_janitor_old_submissions_cleared / staking_miner_janitor_old_submissions_found

Logs

An example of logs below when a previous submission is cleared

2025-06-20T20:09:03.221683Z DEBUG polkadot-staking-miner: Detected round increment 9 -> 10    
2025-06-20T20:09:03.221707Z TRACE polkadot-staking-miner: Sent janitor tick for round 10    
2025-06-20T20:09:03.221711Z DEBUG polkadot-staking-miner: Round increment in Off phase, signaling snapshot cleanup    
2025-06-20T20:09:03.221718Z TRACE polkadot-staking-miner: Block #792, Phase Off - nothing to do    
2025-06-20T20:09:03.221731Z TRACE polkadot-staking-miner: Running janitor cleanup for round 10    
2025-06-20T20:09:03.221745Z TRACE polkadot-staking-miner: Clearing snapshots    
2025-06-20T20:09:03.221749Z TRACE polkadot-staking-miner: Scanning round 9 for old submissions (current round: 10, scanning rounds from 9)    
2025-06-20T20:09:03.222313Z DEBUG polkadot-staking-miner: Found old submission in round 9 with 4 pages, attempting cleanup    
2025-06-20T20:09:03.222324Z DEBUG polkadot-staking-miner: Clearing old round data for round 9 with 4 witness pages    
2025-06-20T20:09:11.232603Z TRACE polkadot-staking-miner: Block #793, Phase Off - nothing to do    
2025-06-20T20:09:15.243377Z TRACE polkadot-staking-miner: Block #794, Phase Off - nothing to do    
2025-06-20T20:09:23.253658Z TRACE polkadot-staking-miner: Block #795, Phase Off - nothing to do    
2025-06-20T20:09:27.257928Z TRACE polkadot-staking-miner: Block #796, Phase Off - nothing to do    
2025-06-20T20:09:35.275623Z TRACE polkadot-staking-miner: Block #797, Phase Off - nothing to do    
2025-06-20T20:09:39.278849Z DEBUG polkadot-staking-miner: Successfully submitted clear_old_round_data for round 9    
2025-06-20T20:09:39.278892Z  INFO polkadot-staking-miner: Successfully cleaned up old submission from round 9 (4 witness pages)    
2025-06-20T20:09:39.278901Z  INFO polkadot-staking-miner: Janitor cleaned up 1 old submissions in 36057ms    

Integration test

An integration test now verifies the scenario where two miners submit identical solutions. Both solutions are successfully submitted; one is rewarded while the other is discarded after clear_old_round() has been explicitly called by the miner with the non-winning solution.

2025-06-20T13:13:08.930566Z  INFO monitor: Bob solution discarded!    
2025-06-20T13:13:08.998134Z  INFO monitor: 🤑 Successfully completed two-miner test: both submitted solutions, one rewarded, one discarded! Duration: 1179.356788208s 🤑    
test submit_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1179.38s

sigurpol added 3 commits June 19, 2025 16:17
Implements automatic cleanup of old discarded submissions to reclaim deposits:

Features:
- Triggers on Done → Off phase transitions (first Off block only)
- Scans last 5 rounds for old submissions
- Calls clear_old_round_data() to recover deposits and clean storage
- Non-blocking integration with existing mining operations
- Comprehensive error handling (critical vs recoverable errors)

Prometheus Metrics Added:

Counters:
- staking_miner_janitor_cleanup_success_total: Successful cleanup operations
- staking_miner_janitor_cleanup_failures_total: Failed cleanup operations

Gauges:
- staking_miner_janitor_cleanup_duration_ms: Time taken for last cleanup
- staking_miner_janitor_old_submissions_found: Old submissions discovered
- staking_miner_janitor_old_submissions_cleared: Submissions successfully cleared

Key Metrics:
- Success rate: success_total / (success_total + failures_total)
- Performance: avg(cleanup_duration_ms)
- Activity: old_submissions_cleared shows actual deposit recovery

Example Prometheus queries:
staking_miner_janitor_cleanup_success_total / (staking_miner_janitor_cleanup_success_total + staking_miner_janitor_cleanup_failures_total)

increase(staking_miner_janitor_cleanup_success_total[24h])

staking_miner_janitor_old_submissions_cleared / staking_miner_janitor_old_submissions_found
Update janitor logic to run when transitioning from Done or Export phase
to Off, not just Done. Improve log message to include previous phase.
- Introduce JanitorMessage enum and janitor_task for deposit recovery -
Use dedicated bounded channel for janitor communication - Update
listener to send janitor ticks to janitor task - Improve documentation
and diagrams to reflect new architecture - Ensure mining and janitor
operations are independent and non-blocking
@sigurpol sigurpol force-pushed the clear_old_round_data branch from 033a2ff to 912685c Compare June 19, 2025 17:15
@sigurpol
Copy link
Copy Markdown
Contributor Author

My usual suspects as reviewers are all OOO these days. This PR is needed for Kusama AH migration.
@jsdw , @Overkillus or @tdimitrov : if you have time to spare / waste, I would appreciate a review 🙏 🙇

@sigurpol sigurpol force-pushed the clear_old_round_data branch from e595dfb to 64678c1 Compare June 20, 2025 11:04
We expect to see one solution rewarded and the other discarded.
Comment on lines +136 to +137
This ensures that deposits from unsuccessful submissions are automatically recovered, maintaining
the economic viability of long-term mining operations.
Copy link
Copy Markdown
Contributor

@jsdw jsdw Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For somebody who doesn't know this stuff so well: if this didn't exist, would the old deposits be locked up forever without such a "cleanup" phase to claim them back?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, there is no automatic reclaim, the election pallet does not automatically return deposits for valid not best solutions.

Copy link
Copy Markdown
Contributor

@jsdw jsdw Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job on the README rewrite/tidyup!

Copy link
Copy Markdown
Contributor

@jsdw jsdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert in the specifics of staking, but the code looks clean and the approach makes sense to me, modulo a couple of tiny comments; nice one!

sigurpol added 3 commits June 20, 2025 20:39
The change switches from phase-based to round-based triggers for the
janitor cleanup task and for clearing the snapshot.
@sigurpol sigurpol force-pushed the clear_old_round_data branch from 3e19af9 to 138a3d9 Compare June 20, 2025 20:10
@sigurpol sigurpol merged commit 0e7b1c2 into main Jun 20, 2025
10 checks passed
@sigurpol sigurpol deleted the clear_old_round_data branch June 20, 2025 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

multi block miner: reclaim deposit

2 participants