Update Retry backoff to follow changes made to gRFC A6

### What version of gRPC are you using?

google.golang.org/grpc v1.65.0

### What version of Go are you using (`go version`)?

1.23.0

### What operating system (Linux, Windows, …) and version?

Linux

### What did you do?
I have been trying to debug the effective behaviour of the client retryPolicy values in response to an "UNAVAILABLE" server. After using many different combinations of values for "InitialBackoff", "MaxBackoff", and "BackoffMultiplier", I have found the retry backoff times to be wildly unpredictable.

Example configs:

```json
"retryPolicy": {
    "MaxAttempts": 5,
    "InitialBackoff": "0.5s",
    "MaxBackoff": "5s",
    "BackoffMultiplier": 4.0,
    "RetryableStatusCodes": [ "UNAVAILABLE" ]
}

"retryPolicy": {
    "MaxAttempts": 5,
    "InitialBackoff": "10s",
    "MaxBackoff": "20s",
    "BackoffMultiplier": 2.0,
    "RetryableStatusCodes": [ "UNAVAILABLE" ]
}
```

### What did you expect to see?

As I increase the InitialBackoff and BackoffMultiplier I would expect to see a progressively longer backoff time between each retry. The goal is to choose an overall timeout, such as "20s", and try to get the retry process to attempt for something close to that window. 

### What did you see instead?

I observe wildly variable outcomes, where sometimes the total elapsed time for all retries could be 5s, or 10s, or 20s. 

Here is an example of a pretty strange set of 5 attempts:

Policy:
```json
"retryPolicy": {
    "MaxAttempts": 5,
    "InitialBackoff": "1.0s",
    "MaxBackoff": "8s",
    "BackoffMultiplier": 8.0,
    "RetryableStatusCodes": [ "UNAVAILABLE" ]
}
```

Connection logs:
```
2024/08/16 13:54:46 INFO: [core] [Channel #1 SubChannel #2]Subchannel picks a new address "host:port" to connect
2024/08/16 13:54:46 INFO: [core] [Channel #1 SubChannel #2]Subchannel picks a new address "host:port" to connect
2024/08/16 13:54:51 INFO: [core] [Channel #1 SubChannel #2]Subchannel picks a new address "host:port" to connect
2024/08/16 13:54:51 INFO: [core] [Channel #1 SubChannel #2]Subchannel picks a new address "host:port" to connect
2024/08/16 13:54:59 INFO: [core] [Channel #1 SubChannel #2]Subchannel picks a new address "host:port" to connect
```

It retries with less than a second between attempts, two different times, and then a big 8 second one for the last one. But it is all entirely random.

### Further investigations

MaxAttempts is capped to 5, so it is not possible to try and squeeze more retry time out of the entire process, when the retry backoff times are extremely variable in comparison to the configuration.

The Go implementation uses a randomization of the calculated backoff duration, which makes it land anywhere between 0-backoff:
https://github.com/grpc/grpc-go/blob/v1.65.0/stream.go#L702

The C++ implementation uses the actual calculated backoff duration, which makes it actually increase the backoff with each attempt. The randomization is only applied to the jitter:
https://github.com/grpc/grpc/blob/v1.65.4/src/core/lib/backoff/backoff.cc#L34

It seems entirely non-sensical to randomize the entire backoff value between 0-backoff. Am I reading the logic wrong? Does it not defeat the purpose of an exponential backoff if the next value could be smaller? Why is the Go client not doing a similar approach where it applies random jitter?

The end result is that the numbers I plug into the Go retryPolicy feel really arbitrary and magical. And I have to keep twiddling them and running a test to see what kind of range of retries I might get out of the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Retry backoff to follow changes made to gRFC A6 #7514

What version of gRPC are you using?

What version of Go are you using (`go version`)?

What operating system (Linux, Windows, …) and version?

What did you do?

What did you expect to see?

What did you see instead?

Further investigations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update Retry backoff to follow changes made to gRFC A6 #7514

Description

What version of gRPC are you using?

What version of Go are you using (go version)?

What operating system (Linux, Windows, …) and version?

What did you do?

What did you expect to see?

What did you see instead?

Further investigations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?