Added janitor task for automatic deposit recovery with Prometheus metrics#1095
Added janitor task for automatic deposit recovery with Prometheus metrics#1095
Conversation
Implements automatic cleanup of old discarded submissions to reclaim deposits: Features: - Triggers on Done → Off phase transitions (first Off block only) - Scans last 5 rounds for old submissions - Calls clear_old_round_data() to recover deposits and clean storage - Non-blocking integration with existing mining operations - Comprehensive error handling (critical vs recoverable errors) Prometheus Metrics Added: Counters: - staking_miner_janitor_cleanup_success_total: Successful cleanup operations - staking_miner_janitor_cleanup_failures_total: Failed cleanup operations Gauges: - staking_miner_janitor_cleanup_duration_ms: Time taken for last cleanup - staking_miner_janitor_old_submissions_found: Old submissions discovered - staking_miner_janitor_old_submissions_cleared: Submissions successfully cleared Key Metrics: - Success rate: success_total / (success_total + failures_total) - Performance: avg(cleanup_duration_ms) - Activity: old_submissions_cleared shows actual deposit recovery Example Prometheus queries: staking_miner_janitor_cleanup_success_total / (staking_miner_janitor_cleanup_success_total + staking_miner_janitor_cleanup_failures_total) increase(staking_miner_janitor_cleanup_success_total[24h]) staking_miner_janitor_old_submissions_cleared / staking_miner_janitor_old_submissions_found
Update janitor logic to run when transitioning from Done or Export phase to Off, not just Done. Improve log message to include previous phase.
- Introduce JanitorMessage enum and janitor_task for deposit recovery - Use dedicated bounded channel for janitor communication - Update listener to send janitor ticks to janitor task - Improve documentation and diagrams to reflect new architecture - Ensure mining and janitor operations are independent and non-blocking
033a2ff to
912685c
Compare
912685c to
f1218d0
Compare
|
My usual suspects as reviewers are all OOO these days. This PR is needed for Kusama AH migration. |
e595dfb to
64678c1
Compare
We expect to see one solution rewarded and the other discarded.
dbdb333 to
68cda80
Compare
| This ensures that deposits from unsuccessful submissions are automatically recovered, maintaining | ||
| the economic viability of long-term mining operations. |
There was a problem hiding this comment.
For somebody who doesn't know this stuff so well: if this didn't exist, would the old deposits be locked up forever without such a "cleanup" phase to claim them back?
There was a problem hiding this comment.
correct, there is no automatic reclaim, the election pallet does not automatically return deposits for valid not best solutions.
There was a problem hiding this comment.
Nice job on the README rewrite/tidyup!
jsdw
left a comment
There was a problem hiding this comment.
I am not an expert in the specifics of staking, but the code looks clean and the approach makes sense to me, modulo a couple of tiny comments; nice one!
The change switches from phase-based to round-based triggers for the janitor cleanup task and for clearing the snapshot.
3e19af9 to
138a3d9
Compare
Implements automatic cleanup of old discarded submissions to reclaim deposits.
Close #980.
Features:
clear_old_round_data()to recover deposits and clean storageArchitecture
JanitorMessageenum andjanitor_taskfor deposit recoveryPrometheus Metrics Added:
Counters:
Gauges:
Key Metrics:
Example Prometheus queries:
Logs
An example of logs below when a previous submission is cleared
Integration test
An integration test now verifies the scenario where two miners submit identical solutions. Both solutions are successfully submitted; one is rewarded while the other is discarded after
clear_old_round()has been explicitly called by the miner with the non-winning solution.