-
Notifications
You must be signed in to change notification settings - Fork 20
Added janitor task for automatic deposit recovery with Prometheus metrics #1095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
80b5b66
6581550
db91666
f1218d0
64678c1
68cda80
1dbb867
d2eba32
138a3d9
29cff01
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,37 +2,79 @@ | |
|
|
||
| [](https://github.com/paritytech/polkadot-staking-miner/actions/workflows/nightly.yml) | ||
|
|
||
| This is a re-write of the [staking miner](https://github.com/paritytech/polkadot/tree/master/utils/staking-miner) using [subxt](https://github.com/paritytech/subxt) to avoid hard dependency to each runtime version. | ||
| ## Why do we need a staking miner? | ||
|
|
||
| A key difference is that the miner only interacts with AssetHub nodes that support multi-page async staking and in particular the new multi-phase, multi-block election provider pallet (`EPMB`), while the old legacy miner is designed to work with the previous multi-phase, single-page election provider pallet. | ||
| Polkadot uses a **Nominated Proof-of-Stake (NPoS)** consensus mechanism where validators are | ||
| selected through a complex election process. This election determines the active validators in the | ||
| next era and how stake is distributed among them to maximize security and decentralization. | ||
|
|
||
| The binary itself embeds [multi-block static metadata](./artifacts/multi_block.scale) to | ||
| generate a rust codegen at compile-time that [subxt provides](https://github.com/paritytech/subxt). | ||
| The nomination process in a nutshell: | ||
|
|
||
| Runtime upgrades are handled by the polkadot-staking-miner by upgrading storage constants | ||
| and that will work unless there is a breaking change in the pallets `pallet-election-provider-multi-block` or `frame_system`. | ||
| ```text | ||
| Nominators → Stake DOT → Nominate Validators → Validator Set Selection | ||
| ``` | ||
|
|
||
| Because detecting breaking changes when connecting to a RPC node when using | ||
| `polkadot-staking-miner` is hard, this repo performs daily integration tests | ||
| against `polkadot master` and in most cases [updating the metadata](#update-metadata) | ||
| and fixing the compile errors are sufficient. | ||
| where | ||
|
|
||
| It's also possible to use the `info` command to check whether the metadata | ||
| embedded in the binary is compatible with a remote node, [see the info the command for further information](#info-command) | ||
| - Nominators choose validators they trust with their stake | ||
| - Phragmén algorithm selects the optimal validator with the following goals: | ||
| - **Maximize total stake**: Select validators with the most backing stake | ||
| - **Minimize variance**: Distribute stake as evenly as possible among selected validators | ||
| - **Support diversity**: Enable smaller validators to participate | ||
sigurpol marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Each release will specify which runtime version it was tested against but | ||
| it's not possible to know in advance which runtimes it will work with. | ||
| This optimization is computationally intensive and cannot _yet_ be solved on-chain due to block time | ||
| and resource constraints. Instead, the blockchain runs a **multi-phase election**: | ||
|
|
||
| Thus, it's important to subscribe to releases to this repo or | ||
| add some logic that triggers an alert once the polkadot-staking-miner crashes. | ||
| 1. **Snapshot Phase**: Collect all staking data (validators, nominators, stakes) | ||
| 2. **Signed Phase**: Off-chain actors (miners) compute optimal solutions and submit them | ||
| 3. **Unsigned Phase**: Validators can submit solutions if no good ones were found | ||
| 4. **Export Phase**: The best solution is chosen and applied | ||
|
|
||
| ## Important Warning | ||
| **Staking miners** are essential participants who: | ||
|
|
||
| **Do not run multiple instances of the miner with the same account.** | ||
| - Compute high-quality election solutions off-chain | ||
| - Submit solutions during the signed phase to earn rewards | ||
| - Compete to provide the most optimal validator selection | ||
| - Ensure the election completes successfully even under high load | ||
|
|
||
| The miner uses the on-chain nonce for a given user to submit solutions, which can lead to nonce | ||
| collisions if multiple miners are running with the same account. This can cause transaction | ||
| failures and potentially result in lost rewards or other issues. | ||
| Without miners, elections could fail or produce suboptimal results, compromising network security | ||
| and decentralization. | ||
|
|
||
| More details about staking on Polkadot can be found [here](https://wiki.polkadot.network/learn/learn-staking/). | ||
|
|
||
| ## Overview | ||
|
|
||
| This is a re-write of the | ||
| [staking miner](https://github.com/paritytech/polkadot/tree/master/utils/staking-miner) using | ||
| [subxt](https://github.com/paritytech/subxt) to avoid hard dependency to each runtime version. | ||
|
|
||
| A key difference is that the miner only interacts with AssetHub nodes that support multi-page async | ||
sigurpol marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| staking and in particular the new multi-phase, multi-block election provider pallet (`EPMB`), while | ||
| the old legacy miner is designed to work with the previous multi-phase, single-page election | ||
| provider pallet. | ||
|
|
||
| The miner includes an automatic deposit recovery system that cleans up old discarded submissions to | ||
| reclaim locked deposits, ensuring optimal economic efficiency for operators. | ||
|
|
||
| The binary itself embeds [multi-block static metadata](./artifacts/multi_block.scale) to generate a | ||
| rust codegen at compile-time that [subxt provides](https://github.com/paritytech/subxt). | ||
|
|
||
| Runtime upgrades are handled by the polkadot-staking-miner by upgrading storage constants and that | ||
| will work unless there is a breaking change in the pallets `pallet-election-provider-multi-block` or | ||
| `frame_system`. | ||
|
|
||
| Because detecting breaking changes when connecting to a RPC node when using `polkadot-staking-miner` | ||
| is hard, this repo performs daily integration tests against `polkadot master` and in most cases | ||
| [updating the metadata](#update-metadata) and fixing the compile errors are sufficient. | ||
|
|
||
| It's also possible to use the `info` command to check whether the metadata embedded in the binary is | ||
| compatible with a remote node, [see the info the command for further information](#info-command) | ||
|
|
||
| Each release will specify which runtime version it was tested against but it's not possible to know | ||
| in advance which runtimes it will work with. | ||
|
|
||
| Thus, it's important to subscribe to releases to this repo or add some logic that triggers an alert | ||
| once the polkadot-staking-miner crashes. | ||
|
|
||
| ## Usage | ||
|
|
||
|
|
@@ -78,11 +120,26 @@ Here are some notable options you can use with the command: | |
|
|
||
| Refer to `--help` for the full list of options. | ||
|
|
||
| NOTE: the miner only listens to finalized blocks. Listening to the best blocks is not offered as an option. | ||
| NOTE: the miner only listens to finalized blocks. Listening to the best blocks is not offered as an | ||
| option. | ||
|
|
||
| ### Automatic Deposit Recovery | ||
|
|
||
| The miner automatically handles deposit recovery for discarded submissions: | ||
|
|
||
| - **Automatic Cleanup**: Scans for old submissions from previous rounds that were not selected as | ||
| winners | ||
| - **Deposit Recovery**: Calls `clear_old_round_data()` to reclaim locked deposits from discarded | ||
| solutions | ||
| - **Triggered on Phase Transitions**: Activates when transitioning from Done/Export to Off phase | ||
|
|
||
| This ensures that deposits from unsuccessful submissions are automatically recovered, maintaining | ||
| the economic viability of long-term mining operations. | ||
|
Comment on lines
+136
to
+137
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For somebody who doesn't know this stuff so well: if this didn't exist, would the old deposits be locked up forever without such a "cleanup" phase to claim them back?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. correct, there is no automatic reclaim, the election pallet does not automatically return deposits for valid not best solutions. |
||
|
|
||
| ### Prepare your SEED | ||
|
|
||
| While you could pass your seed directly to the cli or Docker, this is highly **NOT** recommended. Instead, you should use an ENV variable. | ||
| While you could pass your seed directly to the cli or Docker, this is highly **NOT** recommended. | ||
| Instead, you should use an ENV variable. | ||
|
|
||
| You can set it manually using: | ||
|
|
||
|
|
@@ -91,8 +148,8 @@ You can set it manually using: | |
| SEED=0x1234... | ||
| ``` | ||
|
|
||
| Alternatively, for development, you may store your seed in a `.env` file that | ||
| can be loaded to make your seed available: | ||
| Alternatively, for development, you may store your seed in a `.env` file that can be loaded to make | ||
| your seed available: | ||
|
|
||
| `.env`: | ||
|
|
||
|
|
@@ -119,6 +176,14 @@ docker run --rm -it \ | |
| polkadot-staking-miner monitor | ||
| ``` | ||
|
|
||
| ### Important Warning | ||
|
|
||
| **Do not run multiple instances of the miner with the same account.** | ||
|
|
||
| The miner uses the on-chain nonce for a given user to submit solutions, which can lead to nonce | ||
| collisions if multiple miners are running with the same account. This can cause transaction failures | ||
| and potentially result in lost rewards or other issues. | ||
|
|
||
| ## Update metadata | ||
|
|
||
| The static metadata files are stored at [artifacts/multi_block.scale](artifacts/multi_block.scale). | ||
|
|
@@ -138,9 +203,9 @@ $ subxt codegen --file artifacts/metadata.scale | rustfmt > code.rs | |
|
|
||
| ## Test locally | ||
|
|
||
| Because elections occurs quite rarely and if you want to test it locally | ||
| it's possible build the polkadot binary with `feature --fast-runtime` | ||
| to ensure that elections occurs more often (in order of minutes rather hours/days). | ||
| Because elections occurs quite rarely and if you want to test it locally it's possible build the | ||
| polkadot binary with `feature --fast-runtime` to ensure that elections occurs more often (in order | ||
| of minutes rather hours/days). | ||
|
|
||
| ```bash | ||
| # from the root of polkadot-sdk branch | ||
|
|
@@ -154,20 +219,33 @@ $ ./build-and-run-zn.sh | |
|
|
||
| ## Prometheus metrics | ||
|
|
||
| The staking-miner starts a prometheus server on port 9999 and that metrics can | ||
| be fetched by: | ||
| The staking-miner starts a prometheus server on port 9999 and that metrics can be fetched by: | ||
|
|
||
| ```bash | ||
| $ curl localhost:9999/metrics | ||
| ``` | ||
|
|
||
| ```bash | ||
| # Core Mining Metrics | ||
| # HELP staking_miner_balance The balance of the staking miner account | ||
| # TYPE staking_miner_balance gauge | ||
| staking_miner_balance 88756574897390270 | ||
| # HELP staking_miner_mining_duration_ms The mined solution time in milliseconds. | ||
| # TYPE staking_miner_mining_duration_ms gauge | ||
| staking_miner_mining_duration_ms 50 | ||
| # HELP staking_miner_submit_and_watch_duration_ms The time in milliseconds it took to submit the solution to chain and to be included in block | ||
| # TYPE staking_miner_submit_and_watch_duration_ms gauge | ||
| staking_miner_submit_and_watch_duration_ms 17283 | ||
|
|
||
| # Submission Tracking | ||
| # HELP staking_miner_submissions_started Number of submissions started | ||
| # TYPE staking_miner_submissions_started counter | ||
| staking_miner_submissions_started 12 | ||
| # HELP staking_miner_submissions_success Number of submissions finished successfully | ||
| # TYPE staking_miner_submissions_success counter | ||
| staking_miner_submissions_success 10 | ||
|
|
||
| # Solution Quality Metrics | ||
| # HELP staking_miner_score_minimal_stake The minimal winner, in terms of total backing stake | ||
| # TYPE staking_miner_score_minimal_stake gauge | ||
| staking_miner_score_minimal_stake 24426059484170936 | ||
|
|
@@ -177,17 +255,78 @@ staking_miner_score_sum_stake 2891461667266507300 | |
| # HELP staking_miner_score_sum_stake_squared The sum squared of the total backing of all winners, aka. the variance. | ||
| # TYPE staking_miner_score_sum_stake_squared gauge | ||
| staking_miner_score_sum_stake_squared 83801161022319280000000000000000000 | ||
| # HELP staking_miner_solution_length_bytes Number of bytes in the solution submitted | ||
| # TYPE staking_miner_solution_length_bytes gauge | ||
| staking_miner_solution_length_bytes 2947 | ||
| # HELP staking_miner_solution_weight Weight of the solution submitted | ||
| # TYPE staking_miner_solution_weight gauge | ||
| staking_miner_solution_weight 8285574626 | ||
| # HELP staking_miner_submit_and_watch_duration_ms The time in milliseconds it took to submit the solution to chain and to be included in block | ||
| # TYPE staking_miner_submit_and_watch_duration_ms gauge | ||
| staking_miner_submit_and_watch_duration_ms 17283 | ||
|
|
||
| # Runtime Upgrades | ||
| # HELP staking_miner_runtime_upgrades Number of runtime upgrades performed | ||
| # TYPE staking_miner_runtime_upgrades counter | ||
| staking_miner_runtime_upgrades 2 | ||
|
|
||
| # Janitor (Deposit Recovery) Metrics | ||
| # HELP staking_miner_janitor_cleanup_success_total Total number of successful janitor cleanup operations | ||
| # TYPE staking_miner_janitor_cleanup_success_total counter | ||
| staking_miner_janitor_cleanup_success_total 3 | ||
| # HELP staking_miner_janitor_cleanup_failures_total Total number of failed janitor cleanup operations | ||
| # TYPE staking_miner_janitor_cleanup_failures_total counter | ||
| staking_miner_janitor_cleanup_failures_total 0 | ||
| # HELP staking_miner_janitor_cleanup_duration_ms The time in milliseconds it takes to complete janitor cleanup | ||
| # TYPE staking_miner_janitor_cleanup_duration_ms gauge | ||
| staking_miner_janitor_cleanup_duration_ms 1250 | ||
| # HELP staking_miner_janitor_old_submissions_found Number of old submissions found during last janitor run | ||
| # TYPE staking_miner_janitor_old_submissions_found gauge | ||
| staking_miner_janitor_old_submissions_found 2 | ||
| # HELP staking_miner_janitor_old_submissions_cleared Number of old submissions successfully cleared during last janitor run | ||
| # TYPE staking_miner_janitor_old_submissions_cleared gauge | ||
| staking_miner_janitor_old_submissions_cleared 2 | ||
| ``` | ||
|
|
||
| ## Related projects | ||
| ## Architecture | ||
|
|
||
| The polkadot-staking-miner uses a multi-task architecture to efficiently handle different | ||
| responsibilities: | ||
|
|
||
| ### Task Overview | ||
|
|
||
| The miner consists of **three independent tasks** that communicate via bounded channels: | ||
|
|
||
| 1. **Listener Task**: Monitors blockchain for phase changes and new (`finalized` only) blocks. | ||
| 2. **Miner Task**: Handles solution mining and submission operations | ||
| 3. **Janitor Task**: Manages automatic deposit recovery from old submissions | ||
|
|
||
| ### Task Communication | ||
|
|
||
| ``` | ||
| (finalized blocks) | ||
| ┌────────────────────────────────────────────────────────────────────────────────────────────────┐ | ||
| │ | | ||
| │ ┌─────────────────┐ bounded ch(1) ┌─────────────────┐ ┌─────────────────┐ | ||
| └──▶│ Listener Task │ ─────────────────────▶│ Miner Task │ │ Blockchain │ | ||
| │ │ Snapshot / Signed │ │ │ │ | ||
| │ ┌─────────────┐ │ │ ┌─────────────┐ │ (solutions) │ │ | ||
| │ │Block Stream │ │ │ │ Process │ │ ──────────────▶│ │ | ||
| │ │Subscription │ │ │ │ Block │ │ │ │ | ||
| │ └─────────────┘ │ │ └─────────────┘ │ │ │ | ||
| │ │ │ │ │ │ │ | ||
| │ ▼ │ └─────────────────┘ │ │ | ||
| │ ┌─────────────┐ │ bounded ch(1) ┌─────────────────┐ │ │ | ||
| │ │Phase Check │ │ ─────────────────────▶│ Janitor Task │ │ │ | ||
| │ │(fast) │ │ Done/Extract(0)->Off │ │ │ │ | ||
| │ └─────────────┘ │ │ ┌─────────────┐ │ (cleanup) │ │ | ||
| │ │ │ │ Cleanup │ │ ──────────────▶│ │ | ||
| │ │ │ │ Old Rounds │ │ │ │ | ||
| │ │ │ └─────────────┘ │ │ │ | ||
| └─────────────────┘ └─────────────────┘ └─────────────────┘ | ||
| ``` | ||
|
|
||
| ### Key Design Principles | ||
|
|
||
| - **Separation of Concerns**: Each task has a single, well-defined responsibility | ||
| - **Non-Blocking Operations**: Tasks operate independently without blocking each other | ||
| - **Backpressure Handling**: Bounded channels prevent memory bloat and ensure fresh work | ||
| - **Fault Isolation**: Task failures are isolated and handled appropriately | ||
| - **Performance Optimization**: Mining operations are never delayed by cleanup tasks | ||
|
|
||
| ### Benefits | ||
|
|
||
| - [substrate-etl](https://github.com/gpestana/substrate-etl) - a tool fetch state from substrate-based chains. | ||
| - **Reliable Mining**: Core mining functionality is never blocked by ancillary operations | ||
| - **Automatic Maintenance**: Deposit recovery runs seamlessly in the background | ||
| - **Scalable Architecture**: Tasks can be optimized independently for their workloads | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job on the README rewrite/tidyup!