feat: reclaim deposit for discarded solutions by sigurpol · Pull Request #1029 · paritytech/polkadot-staking-miner

sigurpol · 2025-04-22T07:52:37Z

This pull request introduces logic to track submitted rounds and automatically call the clear_submission function (which dispatches the clear_old_round_data¹ extrinsic) for any solution that is determined to be discarded (i.e., not the winning solution) in past rounds.

The monitor now checks past rounds on each block, and if a better solution has been validated or the round has ended and our submission still exists, it triggers the clearing process to reclaim the deposit.

We have adopted a queue-based approach for clearing rounds, along with a dedicated task to call clear_old_round_data. This strategy helps us avoid blocking the main event loop and potentially delaying the processing of blocks in monitor_cmd.

Dedicated Prometheus metrics have been added as well to keep track of number of failures, queue size and duration of a clearing round attempt.

This ensures that locked deposits for discarded solutions are properly returned to the miner without manual intervention.

Fix #980.

Driven-by: the miner now only listens to finalized blocks and not also to best blocks, avoiding potential race conditions and simplifying round submission handling.

Example:
Let's run a miner both for //Alice and //Bob in the usual Zombienet setup as provided by the SDK. We are processing round2, in round1 Alice got rewarded and Bob's solution was discarded.

Relevant logs of the miner for Alice in round 2:

2025-04-22T07:12:17.825910Z DEBUG polkadot-staking-miner: Checking status of PAST rounds (< 2): [1]
2025-04-22T07:12:17.826488Z DEBUG polkadot-staking-miner: Submission metadata for past round 1 gone. Removing from tracking.
2025-04-22T07:12:17.826511Z  INFO polkadot-staking-miner: Removed past round 1 from local tracking.

Relevant logs for Bob in round 2:

2025-04-22T07:12:17.825890Z DEBUG polkadot-staking-miner: Checking status of PAST rounds (< 2): [1]
2025-04-22T07:12:17.826790Z  INFO polkadot-staking-miner: Submission for past round 1 detected as DISCARDED (Reason: Phase is past Signed/SignedValidation (or round gone) and metadata still exists). Attempting to clear.
2025-04-22T07:12:17.826806Z  INFO polkadot-staking-miner: Attempting to clear submission data for round 1 (16 pages)
2025-04-22T07:12:53.907228Z  INFO polkadot-staking-miner: Successfully cleared submission data for round 1 in block 0x7334857ad8a2a322198cbdbbdd788953ce52048352f6a66732447fa852b00ffa
2025-04-22T07:12:53.907293Z  INFO polkadot-staking-miner: Successfully cleared submission for past round 1
2025-04-22T07:12:53.907336Z  INFO polkadot-staking-miner: Removed past round 1 from local tracking.

On Polkadot JS I can see an event multiBlockSigned.Discarded associated to Bob after Bob has cleared his submission.

Important Distinction: bail is for self-withdrawing a solution during the active mining phase whereas clear_old_round_data is for reclaiming deposits from solutions that were discarded after the validation phase ↩

sigurpol · 2025-04-22T09:45:24Z

@niklasad1 , @kianenigma PTAL 🙏

This commit introduces logic to track submitted rounds and automatically call the clear_submission function (which dispatches the clear_old_round_data extrinsic) for any solution that is determined to be discarded (i.e., not the winning solution) in past rounds. The monitor now checks all previously submitted rounds on each block, and if a better solution has been validated or the round has ended and our submission still exists, it triggers the clearing process to reclaim the deposit. This ensures that locked deposits for discarded solutions are properly returned to the miner without manual intervention.

niklasad1 · 2025-04-23T10:34:46Z

src/commands/multi_block/monitor.rs

-    if has_submitted(&storage, round, signer.account_id(), n_pages).await? {
+    // 2. If the solution has already been submitted:
+    // 2.1 Check local tracking first
+    if submitted_rounds.lock().await.contains_key(&round) {


Because we support both best blocks and finalized blocks at moment it's possible that a block can be reverted and this locally submitted_round may not be accurate with the state on chain then.

Finalized blocks can't be reverted except hard forks IIRC, so to support such as functionality we would need to remove support for listening to best blocks which may simplify our life a bit.

That's the reason why we are checking the state at latest head like:

if has_submitted( &utils::storage_at_head(&client, listen).await? )

Good point. What are you suggesting as best option?

Only list to finalized blocks
vs

always double-check with chain state i.e. use local tracking as now but then validate against chain state e.g.

// Check local tracking first, but validate against chain state if submitted_rounds.lock().await.contains_key(&round) { // Verify with chain state before trusting local tracking let storage_head = utils::storage_at_head(&client, listen).await?; if has_submitted(&storage_head, round, signer.account_id(), n_pages).await? { // Confirmed both locally and on-chain return Ok(()); } else { // Local tracking is wrong, remove it and continue untrack_round_submission( &submitted_rounds, round, Some("Removing inconsistent local tracking for round {}") ).await; // Continue with submission attempt } }

The entire point with best blocks is that it's faster and you may win against finalized blocks but it may bite you in the ass and if something got included in a block but never got finalized or reverted.

Personally I don't think it's worth the effort of maintaining such annoyances when we are just using finalized when we are running the miner. My cents is to remove best block listening completely.

This thingy would be annoying to implement or simply for best blocks these weird stuff could happen users that uses best blocks simply has to accept the fact that it is unreliable.

/cc @kianenigma what's your take on best blocks?

I wouldn't be against to just go with finalized blocks and check the impact on the overall performance of the miner. I'll make a commit on top, if you agree

Changes to support ONLY finalized blocks in this commit

src/commands/multi_block/monitor.rs

sigurpol · 2025-04-23T18:12:56Z

@niklasad1 , @kianenigma , CI/run test failures will be addressed once paritytech/polkadot-sdk#8310 is merged into polkadot-sdk, so please don't pay too much attention to it.

- Created a new helper function `storage_at_finalized`` that gets storage only from the finalized chain state: - Updated the `monitor_cmd` function to always use finalized blocks - Replaced all instances of utils::storage_at_head(&client, listen) with storage_at_finalized(&client) in three locations: - After acquiring the submission lock - After mining a solution - Before submitting a solution to check if it's better than what's on chain - Removed the unused functions from utils.rs: - Removed storage_at_head which was replaced by our new storage_at_finalized function - Removed rpc_get_latest_head which is no longer needed since we're only using finalized blocks This change ensures: 1. The miner only listens to finalized blocks, avoiding potential race conditions 2. All storage queries are made against finalized blocks, ensuring consistent state 3. We only submit solutions when we're certain they're better than the current best solution on the finalized chain

src/commands/multi_block/monitor.rs

This reverts commit 1a5718e and leverage `is_solution_already_submitted`.

src/commands/multi_block/monitor.rs

niklasad1 · 2025-04-24T16:33:26Z

src/commands/multi_block/monitor.rs

+
+        if maybe_submission_metadata.is_none() {
+            log::debug!(target: LOG_TARGET, "Submission metadata for past round {} gone. Removing from tracking.", round_to_check);
+            rounds_to_remove.push(round_to_check);


remove directly instead of pushing to a vec?

Wouldn't we invalidate the iterator though? And I thought retain or similar would be problematic in this async code....

src/commands/multi_block/monitor.rs

- Introduce a queue-based approach where we enqueue rounds to clear and a task processes the queue. We do not consider only the most recent past rounds, but we still cap the size of queue. - Add prometheus metrics to monitor clearing round mechanism

src/commands/multi_block/types.rs

kianenigma · 2025-04-25T10:02:44Z

I would do this slightly differently, first elaborating on that before I do a more in detail review.

Note that I am not against this; if it works and the code is clear to you, you can keep the current approach.

now checks past rounds on each block

and the fact that you keep your internal state of what you have submitted, is my main point of disagreement. I would have done this simpler:

Keep the event loop of monitor-and-submit as separate as possible. This thread should always look at the current round, if Phase::Signed mine something and submit it. Its responsibilities end here. It won't keep track of what has submitted or not
A totally separate thread is always looking at the SignedSubmissions storage, and checking if there are any data starting with key (x, our_account) where x < current_round in the storage, and clean that up. This thread can run every block, every 10 blocks, or every day, it doesn't need to be super on-time.

In some sense, I am suggesting we don't re-store what is already in chain. The question of "okay, what have I submitted in any old round that needs to be cleared now?" is 100% answerable through looking at the chain state, we don't need to store it here again.

I think this would be a simpler approach, perhaps worth considering before you commit to this.

The benefit of your current approach, keeping an internal record of what you have submitted and not, is that iff a lot of other submitters submit garbage, and don't clean it up, then the query that we have to do ("checking if there are any data starting with key (x, our_account) where x < current_round") becomes a bit slow.

A downside of the current approach is that if the miner crashes and restarts, we lose track of what we have submitted in the past, and won't clear it up.

Another downside of the current approach is that it is not really compatible with the idea of allowing anyone to clear anyone's data. If we want to do that.

niklasad1 · 2025-04-25T10:14:13Z

src/commands/multi_block/monitor.rs

+        // Process any rounds in the queue
+        let mut processed = false;
+
+        while let Ok(round_to_clear) = clear_receiver.try_recv() {


this will essentially poll the receiver, let's use the async API to avoid waking up the task that often when there is nothing to consume

Suggested change

while let Ok(round_to_clear) = clear_receiver.try_recv() {

while let Ok(round_to_clear) = clear_receiver.recv().await {

Done a variant of this in 4a7899a

sigurpol · 2025-04-25T10:14:32Z

Thanks for the comment, @kianenigma. I see your valid points. Let me explore your approach and see how it would look. I'm also interested in @niklasad1's opinion on this!

niklasad1 · 2025-04-25T10:33:26Z

and the fact that you keep your internal state of what you have submitted, is my main point of disagreement. I would have done this simpler:

Keep the event loop of monitor-and-submit as separate as possible. This thread should always look at the current round, if Phase::Signed mine something and submit it. Its responsibilities end here. It won't keep track of what has submitted or not

A totally separate thread is always looking at the SignedSubmissions storage, and checking if there are any data starting with key (x, our_account) where x < current_round in the storage, and clean that up. This thread can run every block, every 10 blocks, or every day, it doesn't need to be super on-time.

It seems like a good idea to me because "the current implementation in this PR" became a bit complicated and I don't think reconnecting/restarting shouldn't take that long (tens of seconds is my guess) so the probability missing the round transition should quite low but possible.

There is also a possibility to use the reconnecting-rpc-client from subxt to protect against this stuff as well, but still possible that it stops for some other reason.

I would think this Kian's suggested would be a good first step and then improve it if really losing submissions is a big issue then we could write them to a file or something to re-use on start-up.

but really it's to you @sigurpol :)

sigurpol · 2025-04-25T13:19:34Z

and the fact that you keep your internal state of what you have submitted, is my main point of disagreement. I would have done this simpler:

Keep the event loop of monitor-and-submit as separate as possible. This thread should always look at the current round, if Phase::Signed mine something and submit it. Its responsibilities end here. It won't keep track of what has submitted or not

A totally separate thread is always looking at the SignedSubmissions storage, and checking if there are any data starting with key (x, our_account) where x < current_round in the storage, and clean that up. This thread can run every block, every 10 blocks, or every day, it doesn't need to be super on-time.

It seems like a good idea to me because "the current implementation in this PR" became a bit complicated and I don't think reconnecting/restarting shouldn't take that long (tens of seconds is my guess) so the probability missing the round transition should quite low but possible.

There is also a possibility to use the reconnecting-rpc-client from subxt to protect against this stuff as well, but still possible that it stops for some other reason.

I would think this Kian's suggested would be a good first step and then improve it if really losing submissions is a big issue then we could write them to a file or something to re-use on start-up.

but really it's to you @sigurpol :)

Thanks @kianenigma and @niklasad1 . I do agree with @kianenigma 's suggestion actually. I think the pros are pretty strong:

Chain as source of truth: no longer duplicate state that's already tracked on-chain
Crash resistance: If the miner crashes and restarts, it will still find and clear old submissions
Cleaner separation of concerns: The main monitor process focuses solely on current round submissions
Simplicity: No need for complex coordination between processes with channels and notifications

so seems a clear winner 🚀

sigurpol · 2025-06-19T13:22:00Z

Closing this pull request, I will implement the feature on the new miner's architecture (now that we have a listener and a miner task, I will also add a janitor task responsible to call clear_old_round_data(), should be simpler now to keep concerns well isolated - last famous words 😅 ). Stay tuned 📻

sigurpol · 2025-06-23T06:13:57Z

Feature has been implemented in #1095

sigurpol marked this pull request as draft April 22, 2025 07:52

sigurpol force-pushed the reclaim-deposit branch from 305b2f1 to 0b8d23d Compare April 22, 2025 09:37

sigurpol marked this pull request as ready for review April 22, 2025 09:44

sigurpol requested a review from niklasad1 April 22, 2025 09:45

sigurpol mentioned this pull request Apr 22, 2025

chore(deps): bump polkadot-sdk from 1932342 to registry+https://github.com/rust-lang/crates.io-index #1028

Closed

sigurpol added 2 commits April 22, 2025 16:00

refactor: streamline submission clearing process

17ce754

niklasad1 reviewed Apr 23, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 23, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 23, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 23, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 23, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 23, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

Cleanup and address review comments

ceae639

sigurpol force-pushed the reclaim-deposit branch 2 times, most recently from 319a705 to b71e2b3 Compare April 23, 2025 16:46

Add RoundSubmission abstraction

941c8d9

sigurpol force-pushed the reclaim-deposit branch from b71e2b3 to 941c8d9 Compare April 23, 2025 16:50

sigurpol force-pushed the reclaim-deposit branch from 98b9b09 to c2b06a6 Compare April 23, 2025 20:17

sigurpol force-pushed the reclaim-deposit branch from c2b06a6 to 42f330b Compare April 23, 2025 20:25

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

sigurpol mentioned this pull request Apr 24, 2025

[AHM] Prepare election-provider-multi-block for full lazy data deletion paritytech/polkadot-sdk#8304

Merged

sigurpol added 2 commits April 24, 2025 13:00

Further cleanup after code review

1b06762

Introduce check_existing_submission() fn

1a5718e

sigurpol force-pushed the reclaim-deposit branch from 66734f2 to 1a5718e Compare April 24, 2025 15:19

Use is_solution_already_submitted in process_block

c0c0f6f

This reverts commit 1a5718e and leverage `is_solution_already_submitted`.

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

niklasad1 reviewed Apr 24, 2025

View reviewed changes

src/commands/multi_block/monitor.rs Outdated Show resolved Hide resolved

Cleanup

70ce314

sigurpol force-pushed the reclaim-deposit branch from 5594f6e to 70ce314 Compare April 24, 2025 20:36

sigurpol added 7 commits April 24, 2025 22:52

Clear submission asynchronously

1a941b1

Clear submission only for the most recent past round

ab70b5d

Do not spawn multiple tasks for clearing a submissions

8828fe7

Logs less verbose

4d171b8

Improve logs and metrics around clearing submissions

fffb6db

Improve documentation

6c0e631

kianenigma reviewed Apr 25, 2025

View reviewed changes

src/commands/multi_block/types.rs Outdated Show resolved Hide resolved

Improve doc

b7d3c2b

niklasad1 reviewed Apr 25, 2025

View reviewed changes

Use async recv instead of polling

4a7899a

sigurpol closed this Jun 19, 2025

	while let Ok(round_to_clear) = clear_receiver.try_recv() {
	while let Ok(round_to_clear) = clear_receiver.recv().await {

Conversation

sigurpol commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

sigurpol commented Apr 22, 2025

Uh oh!

niklasad1 Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

sigurpol Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

niklasad1 Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sigurpol Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

sigurpol Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sigurpol commented Apr 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niklasad1 Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

sigurpol Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kianenigma commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

niklasad1 Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sigurpol Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

sigurpol commented Apr 25, 2025

Uh oh!

niklasad1 commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sigurpol commented Apr 25, 2025

Uh oh!

sigurpol commented Jun 19, 2025

Uh oh!

sigurpol commented Jun 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sigurpol commented Apr 22, 2025 •

edited

Loading

niklasad1 Apr 23, 2025 •

edited

Loading

sigurpol Apr 23, 2025 •

edited

Loading

sigurpol Apr 24, 2025 •

edited

Loading

kianenigma commented Apr 25, 2025 •

edited

Loading

niklasad1 Apr 25, 2025 •

edited

Loading

niklasad1 commented Apr 25, 2025 •

edited

Loading