feat: reclaim deposit for discarded solutions#1029
Conversation
305b2f1 to
0b8d23d
Compare
|
@niklasad1 , @kianenigma PTAL 🙏 |
This commit introduces logic to track submitted rounds and automatically call the clear_submission function (which dispatches the clear_old_round_data extrinsic) for any solution that is determined to be discarded (i.e., not the winning solution) in past rounds. The monitor now checks all previously submitted rounds on each block, and if a better solution has been validated or the round has ended and our submission still exists, it triggers the clearing process to reclaim the deposit. This ensures that locked deposits for discarded solutions are properly returned to the miner without manual intervention.
src/commands/multi_block/monitor.rs
Outdated
| if has_submitted(&storage, round, signer.account_id(), n_pages).await? { | ||
| // 2. If the solution has already been submitted: | ||
| // 2.1 Check local tracking first | ||
| if submitted_rounds.lock().await.contains_key(&round) { |
There was a problem hiding this comment.
Because we support both best blocks and finalized blocks at moment it's possible that a block can be reverted and this locally submitted_round may not be accurate with the state on chain then.
Finalized blocks can't be reverted except hard forks IIRC, so to support such as functionality we would need to remove support for listening to best blocks which may simplify our life a bit.
That's the reason why we are checking the state at latest head like:
if has_submitted(
&utils::storage_at_head(&client, listen).await?
)There was a problem hiding this comment.
Good point. What are you suggesting as best option?
- Only list to finalized blocks
vs - always double-check with chain state i.e. use local tracking as now but then validate against chain state e.g.
// Check local tracking first, but validate against chain state
if submitted_rounds.lock().await.contains_key(&round) {
// Verify with chain state before trusting local tracking
let storage_head = utils::storage_at_head(&client, listen).await?;
if has_submitted(&storage_head, round, signer.account_id(), n_pages).await? {
// Confirmed both locally and on-chain
return Ok(());
} else {
// Local tracking is wrong, remove it and continue
untrack_round_submission(
&submitted_rounds,
round,
Some("Removing inconsistent local tracking for round {}")
).await;
// Continue with submission attempt
}
}There was a problem hiding this comment.
The entire point with best blocks is that it's faster and you may win against finalized blocks but it may bite you in the ass and if something got included in a block but never got finalized or reverted.
Personally I don't think it's worth the effort of maintaining such annoyances when we are just using finalized when we are running the miner. My cents is to remove best block listening completely.
This thingy would be annoying to implement or simply for best blocks these weird stuff could happen users that uses best blocks simply has to accept the fact that it is unreliable.
/cc @kianenigma what's your take on best blocks?
There was a problem hiding this comment.
I wouldn't be against to just go with finalized blocks and check the impact on the overall performance of the miner. I'll make a commit on top, if you agree
There was a problem hiding this comment.
Changes to support ONLY finalized blocks in this commit
319a705 to
b71e2b3
Compare
b71e2b3 to
941c8d9
Compare
|
@niklasad1 , @kianenigma , CI/run test failures will be addressed once paritytech/polkadot-sdk#8310 is merged into polkadot-sdk, so please don't pay too much attention to it. |
98b9b09 to
c2b06a6
Compare
- Created a new helper function `storage_at_finalized`` that gets storage only from the finalized chain state: - Updated the `monitor_cmd` function to always use finalized blocks - Replaced all instances of utils::storage_at_head(&client, listen) with storage_at_finalized(&client) in three locations: - After acquiring the submission lock - After mining a solution - Before submitting a solution to check if it's better than what's on chain - Removed the unused functions from utils.rs: - Removed storage_at_head which was replaced by our new storage_at_finalized function - Removed rpc_get_latest_head which is no longer needed since we're only using finalized blocks This change ensures: 1. The miner only listens to finalized blocks, avoiding potential race conditions 2. All storage queries are made against finalized blocks, ensuring consistent state 3. We only submit solutions when we're certain they're better than the current best solution on the finalized chain
c2b06a6 to
42f330b
Compare
66734f2 to
1a5718e
Compare
This reverts commit 1a5718e and leverage `is_solution_already_submitted`.
src/commands/multi_block/monitor.rs
Outdated
|
|
||
| if maybe_submission_metadata.is_none() { | ||
| log::debug!(target: LOG_TARGET, "Submission metadata for past round {} gone. Removing from tracking.", round_to_check); | ||
| rounds_to_remove.push(round_to_check); |
There was a problem hiding this comment.
remove directly instead of pushing to a vec?
There was a problem hiding this comment.
Wouldn't we invalidate the iterator though? And I thought retain or similar would be problematic in this async code....
5594f6e to
70ce314
Compare
- Introduce a queue-based approach where we enqueue rounds to clear and a task processes the queue. We do not consider only the most recent past rounds, but we still cap the size of queue. - Add prometheus metrics to monitor clearing round mechanism
|
I would do this slightly differently, first elaborating on that before I do a more in detail review. Note that I am not against this; if it works and the code is clear to you, you can keep the current approach.
and the fact that you keep your internal state of what you have submitted, is my main point of disagreement. I would have done this simpler:
In some sense, I am suggesting we don't re-store what is already in chain. The question of "okay, what have I submitted in any old round that needs to be cleared now?" is 100% answerable through looking at the chain state, we don't need to store it here again. I think this would be a simpler approach, perhaps worth considering before you commit to this. The benefit of your current approach, keeping an internal record of what you have submitted and not, is that iff a lot of other submitters submit garbage, and don't clean it up, then the query that we have to do ("checking if there are any data starting with key A downside of the current approach is that if the miner crashes and restarts, we lose track of what we have submitted in the past, and won't clear it up. Another downside of the current approach is that it is not really compatible with the idea of allowing anyone to clear anyone's data. If we want to do that. |
src/commands/multi_block/monitor.rs
Outdated
| // Process any rounds in the queue | ||
| let mut processed = false; | ||
|
|
||
| while let Ok(round_to_clear) = clear_receiver.try_recv() { |
There was a problem hiding this comment.
this will essentially poll the receiver, let's use the async API to avoid waking up the task that often when there is nothing to consume
| while let Ok(round_to_clear) = clear_receiver.try_recv() { | |
| while let Ok(round_to_clear) = clear_receiver.recv().await { |
|
Thanks for the comment, @kianenigma. I see your valid points. Let me explore your approach and see how it would look. I'm also interested in @niklasad1's opinion on this! |
It seems like a good idea to me because "the current implementation in this PR" became a bit complicated and I don't think reconnecting/restarting shouldn't take that long (tens of seconds is my guess) so the probability missing the round transition should quite low but possible. There is also a possibility to use the reconnecting-rpc-client from subxt to protect against this stuff as well, but still possible that it stops for some other reason. I would think this Kian's suggested would be a good first step and then improve it if really losing submissions is a big issue then we could write them to a file or something to re-use on start-up. but really it's to you @sigurpol :) |
Thanks @kianenigma and @niklasad1 . I do agree with @kianenigma 's suggestion actually. I think the pros are pretty strong:
so seems a clear winner 🚀 |
|
Closing this pull request, I will implement the feature on the new miner's architecture (now that we have a listener and a miner task, I will also add a |
|
Feature has been implemented in #1095 |
This pull request introduces logic to track submitted rounds and automatically call the
clear_submissionfunction (which dispatches theclear_old_round_data1 extrinsic) for any solution that is determined to be discarded (i.e., not the winning solution) in past rounds.The monitor now checks past rounds on each block, and if a better solution has been validated or the round has ended and our submission still exists, it triggers the clearing process to reclaim the deposit.
We have adopted a queue-based approach for clearing rounds, along with a dedicated task to call
clear_old_round_data. This strategy helps us avoid blocking the main event loop and potentially delaying the processing of blocks inmonitor_cmd.Dedicated
Prometheusmetrics have been added as well to keep track of number of failures, queue size and duration of a clearing round attempt.This ensures that locked deposits for discarded solutions are properly returned to the miner without manual intervention.
Fix #980.
Driven-by: the miner now only listens to finalized blocks and not also to best blocks, avoiding potential race conditions and simplifying round submission handling.
Example:
Let's run a miner both for //Alice and //Bob in the usual Zombienet setup as provided by the SDK. We are processing round2, in round1 Alice got rewarded and Bob's solution was discarded.
Relevant logs of the miner for Alice in round 2:
Relevant logs for Bob in round 2:
On Polkadot JS I can see an event
multiBlockSigned.Discardedassociated toBobafter Bob has cleared his submission.Footnotes
Important Distinction:
bailis for self-withdrawing a solution during the active mining phase whereasclear_old_round_datais for reclaiming deposits from solutions that were discarded after the validation phase ↩