Skip to content

Anti-patterns for Retry in V7 #1527

@peter-csala

Description

@peter-csala

As it was discussed in this PR I've tried to collect those anti-patterns what I have seen in past 3+ years on StackOverflow.

In this issue I try to focus only on Retry. Later I will file new issues for Fallback, Circuit Breaker, etc..

1 - Using retry as a periodical executor

The retry policy can be defined in a way that it runs forever in a given frequency for example once in a day

❌ DON'T

var retryForeverDaily = Policy
    .HandleResult<bool>(_ => true)
    .WaitAndRetryForeverAsync(_ => TimeSpan.FromHours(24));

Reasoning:

  • The sleep period can be blocking or non blocking depending on how your policy is defined: synchronous or asynchronous respectively
  • Even if it is used in non blocking manner, it consumes (unnecessarily) memory which can't be garbage collected

✅ DO
Use appropriate tool to schedule recurring jobs like Quartz.Net or Hangfire

Reasoning:

  • Polly was never design to support this use case rather than its main aim is to help you overcome short transient failures
  • Dedicated job scheduler tools are more efficient (in terms of memory) and can be configured to withstand machine failure by utilizing persistence storage

2 - Combining multiple sleep duration strategies

The retry has several overloads where you can dynamically define the sleep durations

❌ DON'T
Mixing the ever an increasing values with constant ones

.WaitAndRetry(10, retryAttempt => retryAttempt switch
{
    <= 5 => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)), 
    _ => TimeSpan.FromMinutes(3)
});

Reasoning:

  • By changing the behaviour based on state we basically created here a state machine
  • Even though it is a really compact/concise way to express sleep durations there are three main drawbacks
    • This approach does not embrace re-usability (you can't re-use only the quick retries)
    • The sleep duration logic is tightly coupled to the retryCount parameter of the policy
    • It is harder to unit test

✅ DO
Define two separate policies and then chain them

var quickRetries = ...
  .WaitAndRetry(Backoff.DecorrelatedJitterBackoffV2(TimeSpan.FromSeconds(1), 5));
var slowRetries = ...
  .WaitAndRetry(5, TimeSpan.FromMinutes(3));
  
var retries = Policy.Wrap(slowRetries, quickRetries); 

Reasoning:

  • Even though this approach is a bit more verbose (compared to the previous one) it is more flexible
  • You can compose the retry policies in any order (slower is the outer and quicker is the inner or vice versa)
  • You can define different triggers for the retry policies
    • Which allows you to switch back and forth between the policies based on the thrown exception or on the result
    • There is no strict order (quick retries comes first no matter what...) so, quick and slow retries can interleave

3 - Pre-generating massive amount of sleep durations

WaitAndRetry can can accept an IEnumerable<TimeSpan> for sleep durations

❌ DON'T
Creating a List with huge amount of timespans

private IEnumerable<TimeSpan> GetSleeps()
{
    var sleeps = new List<TimeSpan>();

    for (int i = 0; i < 1_000; i++)
    {
        sleeps.Add(TimeSpan.FromSeconds(300 + (i % 200)));
    }

    return sleeps;
}

var retry = ...
   .WaitAndRetry(sleepDurations: GetSleeps());

Reasoning:

  • It uses too much memory unnecessarily for fairly long period
  • For each and every sleep duration only a single element is consumed the rest 999 remains intact

✅ DO
Use yield return and sleepDurationProvider

private IEnumerable<TimeSpan> GetSleeps(int _)
{
    for (int i = 0; i < 1_000; i++)
    {
        yield return TimeSpan.FromSeconds(300 + (i % 200));
    }
}

var retry = ...
   .WaitAndRetry(sleepDurationProvider: GetSleeps);

Reasoning:

  • This approach will calculate the very next sleep duration whenever it needed
  • In other words this is the lazy approach the other one is the eager

4 - Branching retry logic based on request url

Lets suppose you have an HttpClient and you want to decorate it with a retry only if a request is against a certain endpoint

❌ DON'T
Using NoOp and ?: operator

.AddPolicyHandler((sp, req) =>
    shouldRetry(req.RequestUri)
        ? Policy<HttpResponseMessage>.Handle<HttpRequestException>()...
        : Policy.NoOpAsync<HttpResponseMessage>()

Reasoning:

  • In this case the triggering conditions/logic are scattered in multiple places
  • From future extensibility standpoint is also not desirable since it can easily become less legible as you add more and more conditions

✅ DO
Use the Handle / Or clauses to prevent triggering retry

.AddPolicyHandler((sp, req) =>
     Policy<HttpResponseMessage>
       .Handle<HttpRequestException>(_ => shouldRetry(req.RequestUri))
       ...

Reasoning:

  • The triggering conditions are located in a single, well-known place
  • There is no need to cover "what to do when policy shouldn't trigger"

5 - Calling a given method before/after each retry attempt

The onRetry/onRetryAsync user-defined delegates are called before the sleep

❌ DON'T
Calling explicitly the given method before Execute/ExecuteAsync

var retryPolicy = ...
    .WaitAndRetry(.., (exception, timeSpan) => BeforeEachRetry);
...
BeforeEachRetry();
var result = retryPolicy.Excecute(DoSomething);

Reasoning:

  • Since the onRetry is called before each retry attempt it won't be called before the very first initial attempt (which is not a retry)
  • If the policy is used in multiple places it is quite likely that you will forgot to call BeforeEachRetry before every Execute calls
  • Here the naming is very explicit but in real world scenario it might not prefixed with Before so, one might call it after the Execute call which is not the intended behaviour

✅ DO
Decorate the method call pair together

var retryPolicy = ...
    .WaitAndRetry(..., (exception, timeSpan) => {});
...
var result = retryPolicy.Excecute(() => 
{ 
   BeforeEachRetry();
   DoSomething();
});

Reasoning:

  • If the DoSomething and BeforeEachRetry coupled together then decorate them together
    • Or create a simple wrapper to call them in the desired order

6 - Having a single policy to cover multiple failures

Lets suppose we have an HttpClient which issues a request and then we try to parse a large Json

❌ DON'T
Having a single policy to cover everything

var retry = Policy
    .Handle<HttpRequestException>()
    .WaitAndRetryAsync(...);

var timeout = Policy
    .TimeoutAsync(timeout: TimeSpan.FromMinutes(1));

var strategy = Policy.WrapAsync(retry, timeout);

await strategy.ExecuteAsync(async (ct) =>
{
     var stream = await httpClient.GetStreamAsync(endpoint, ct);
     var foo = await JsonSerializer.DeserializeAsync<Foo>(stream, cancellationToken: ct);
     ...
}, CancellationToken.None);

Reasoning:

  • In the previous point it was suggested that if the X and Y coupled together then decorate them together
    • if they are all part of the same failure domain
    • in other words the decorator policy covers one group of transient failures
      • this is true for the combined policy as well

✅ DO
Define a policy per failure "mode"

var retry = Policy<Stream>
    .Handle<HttpRequestException>()
    .WaitAndRetryAsync(...);

var stream = await retry.ExecuteAsync((ct) => httpClient.GetStreamAsync(endpoint, ct), CancellationToken.None);

var timeout = Policy<Foo>
    .TimeoutAsync(timeout: TimeSpan.FromMinutes(1));

var foo = await timeout.ExecuteAsync((ct) => JsonSerializer.DeserializeAsync<Foo>(stream, cancellationToken: ct), CancellationToken.None);

Reasoning:

  • Network call's failure domain is different than deserialization's failures
    • Having dedicated policies makes the overall resilient strategy more robust against different transient failures

7 - Cancelling retry in case of given exception

After you receive a TimeoutException you don't want to perform any more retries

❌ DON'T
Adding cancellation logic inside onRetryAsync

...
.WaitAndRetryAsync(
        ...
        onRetryAsync:(ex, _, __, ___) =>
    {
        if (ex is TimeoutException)
        {
            var cts = context["shouldNotRetryCTS"] as CancellationTokenSource;
            cts.Cancel();
        }
        ...
    });

Reasoning:

  • The triggering logic/conditions should be placed inside Handle/Or/HandleResult/OrResult builder methods
  • "Jumping out from the policy" from a user-defined delegate either via an Exception or by a CancellationToken just complicates the policy (and its control flow) unnecessarily

✅ DO
Define triggering logic inside Handle

...
.Handle<Exception>(ex => ex is not TimeoutException))
.WaitAndRetryAsync(...);

Reasoning:

  • As it was stated above please use the dedicated builder methods to define triggering condition
    • Try to rephrase your exit condition in a way to express when should a given policy trigger

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions