Skip to content

Add retry mechanics to pallet-scheduler#3060

Merged
georgepisaltu merged 44 commits intoparitytech:masterfrom
georgepisaltu:retry-schedule
Feb 16, 2024
Merged

Add retry mechanics to pallet-scheduler#3060
georgepisaltu merged 44 commits intoparitytech:masterfrom
georgepisaltu:retry-schedule

Conversation

@georgepisaltu
Copy link
Copy Markdown
Contributor

Fixes #3014

This PR adds retry mechanics to pallet-scheduler, as described in the issue above.

Users can now set a retry configuration for a task so that, in case its scheduled run fails, it will be retried after a number of blocks, for a specified number of times or until it succeeds.

If a retried task runs successfully before running out of retries, its remaining retry counter will be reset to the initial value. If a retried task runs out of retries, it will be removed from the schedule.

Tasks which need to be scheduled for a retry are still subject to weight metering and agenda space, same as a regular task. Periodic tasks will have their periodic schedule put on hold while the task is retrying.

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
@georgepisaltu georgepisaltu requested a review from muharem January 25, 2024 11:19
@georgepisaltu georgepisaltu self-assigned this Jan 25, 2024
@georgepisaltu georgepisaltu requested review from a team January 25, 2024 11:19
Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
@mordamax
Copy link
Copy Markdown
Contributor

bot bench-all pallet -v PIPELINE_SCRIPTS_REF=mak-bench-all-pallet --pallet=pallet_scheduler

@command-bot
Copy link
Copy Markdown

command-bot bot commented Jan 25, 2024

@mordamax https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5029575 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench-all/bench-all.sh" --pallet=pallet_scheduler. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 1-44309cf4-2563-41eb-9bf6-8c037436c2c7 to cancel this command or bot cancel to cancel all commands in this pull request.

@paritytech-review-bot paritytech-review-bot bot requested a review from a team January 25, 2024 15:47
@command-bot
Copy link
Copy Markdown

command-bot bot commented Jan 25, 2024

@mordamax Command "$PIPELINE_SCRIPTS_DIR/commands/bench-all/bench-all.sh" --pallet=pallet_scheduler has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5029575 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5029575/artifacts/download.

@mordamax
Copy link
Copy Markdown
Contributor

bot help

@command-bot
Copy link
Copy Markdown

command-bot bot commented Jan 26, 2024

Here's a link to docs

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
@@ -1124,11 +1258,17 @@ impl<T: Config> Pallet<T> {
},
Err(()) => Err((Overweight, Some(task))),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the task is removed if Unavailable, but now I see that we keep it in the storage. This is why I proposed to remove the retry as well. And also I see now that Overweight does not change it's address.

Comment on lines +1388 to +1397
if weight
.try_consume(T::WeightInfo::schedule_retry(T::MaxScheduledPerBlock::get()))
.is_err()
{
Self::deposit_event(Event::RetryFailed {
task: (when, agenda_index),
id: task.maybe_id,
});
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, that it would be better to have this in service_task context. Now the schedule_retry is failable/no-op, which is not clear from service_task context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside is that you can't know in service_task ahead of time if the task will fail and that it has a retry config. You could check that you can consume the weight anyway, but you might be preventing users from running their tasks when it wasn't necessary.

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
Copy link
Copy Markdown
Contributor

@NachoPal NachoPal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just wondering if you considered extending Scheduled struct with RetryConfig fields instead of creating a new Retries storage item. In that way it would natively support retries when schedule() without the need of doing it in two steps (schedule() + set_retry())

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
@georgepisaltu
Copy link
Copy Markdown
Contributor Author

LGTM. Just wondering if you considered extending Scheduled struct with RetryConfig fields instead of creating a new Retries storage item. In that way it would natively support retries when schedule() without the need of doing it in two steps (schedule() + set_retry())

I considered it but I decided against it. This is a niche mechanic which we don't expect to be widely used right now. Doing it in Scheduled would bloat existing Scheduled entries. It's not backwards compatible in terms of storage (would require a migration) or API. It makes more sense for the few users of this mechanic to pay extra than for everyone to pay more for something that is not a core feature. If, in the future, we find that a lot of people are doing retries, we can reconsider in order to optimize costs.

@georgepisaltu georgepisaltu added this pull request to the merge queue Feb 16, 2024
Merged via the queue into paritytech:master with commit 9346019 Feb 16, 2024
@georgepisaltu georgepisaltu deleted the retry-schedule branch February 16, 2024 12:07
bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this pull request Mar 25, 2024
Fixes paritytech#3014 

This PR adds retry mechanics to `pallet-scheduler`, as described in the
issue above.

Users can now set a retry configuration for a task so that, in case its
scheduled run fails, it will be retried after a number of blocks, for a
specified number of times or until it succeeds.

If a retried task runs successfully before running out of retries, its
remaining retry counter will be reset to the initial value. If a retried
task runs out of retries, it will be removed from the schedule.

Tasks which need to be scheduled for a retry are still subject to weight
metering and agenda space, same as a regular task. Periodic tasks will
have their periodic schedule put on hold while the task is retrying.

---------

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>
Co-authored-by: command-bot <>
@redzsina redzsina moved this from Backlog to Scheduled in Security Audit (PRs) - SRLabs Jul 14, 2025
@rachsrl rachsrl moved this from Scheduled to In progress in Security Audit (PRs) - SRLabs Jul 22, 2025
@redzsina redzsina moved this from In progress to Audited in Security Audit (PRs) - SRLabs Jul 28, 2025
@redzsina redzsina moved this from Audited to In progress in Security Audit (PRs) - SRLabs Jul 28, 2025
@redzsina redzsina moved this from In progress to Audited in Security Audit (PRs) - SRLabs Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A0-needs_burnin Pull request needs to be tested on a live validator node before merge. DevOps is notified via matrix I5-enhancement An additional feature request. T2-pallets This PR/Issue is related to a particular pallet.

Projects

Status: Audited

Development

Successfully merging this pull request may close these issues.

Scheduler: Retry a Task on a Call Failure

7 participants