balancer: connectivity state aggregation algorithm needs fixing

The `round_robin` LB policy's implementation is broken into two pieces:
1. The base balancer implementation found [here](https://github.com/grpc/grpc-go/blob/master/balancer/base/balancer.go), and
2. Picker implementation specific to round_robin found [here](https://github.com/grpc/grpc-go/tree/master/balancer/roundrobin)

The base balancer implementation uses the connectivity state aggregation logic provided by `ConnectivityStateEvaluator`: https://github.com/grpc/grpc-go/blob/28de4866ce7440b675662abbdd5c43b476bd4dae/balancer/balancer.go#L379

The algorithm is as follows:
```
//  - If at least one SubConn in Ready, the aggregated state is Ready;
//  - Else if at least one SubConn in Connecting, the aggregated state is Connecting;
//  - Else if at least one SubConn is TransientFailure, the aggregated state is Transient Failure;
//  - Else if at least one SubConn is Idle, the aggregated state is Idle;
//  - Else there are no subconns and the aggregated state is Transient Failure
```

The algorithm as defined in the load balancing [spec](https://github.com/grpc/grpc/blob/master/doc/load-balancing.md#round_robin) is as follows though:
```
The policy sets the channel's connectivity state by aggregating the states of the subchannels:

- If any one subchannel is in READY state, the channel's state is READY.
- Otherwise, if there is any subchannel in state CONNECTING, the channel's state is CONNECTING.
- Otherwise, if there is any subchannel in state IDLE, the channel's state is IDLE.
- Otherwise, if all subchannels are in state TRANSIENT_FAILURE, the channel's state is TRANSIENT_FAILURE.

Note that when a given subchannel reports TRANSIENT_FAILURE, it is considered to still be in
TRANSIENT_FAILURE until it successfully reconnects and reports READY. In particular, we ignore 
the transition from TRANSIENT_FAILURE to CONNECTING.
```

Note that the implemented algorithm gives precedence to `IDLE` over `TRANSIENT_FAILURE`. This works fine for `round_robin` because in `round_robin`, we push the subConn into `CONNECTING` as soon as it enters `IDLE`. But if we want to use this connectivity state aggregation algorithm in other LB policies, `IDLE` should take precedence over `TRANSIENT_FAILURE`. For example, this is exactly what we do in `weightedtarget`: https://github.com/grpc/grpc-go/blob/28de4866ce7440b675662abbdd5c43b476bd4dae/balancer/weightedtarget/weightedaggregator/aggregator.go#L218

We even have a TODO in `weightedtarget` to use `balancer.ConnectivityStateEvaluator`: https://github.com/grpc/grpc-go/blob/28de4866ce7440b675662abbdd5c43b476bd4dae/balancer/weightedtarget/weightedaggregator/aggregator.go#L203 We cannot use the latter unless we fix the algorithm implementation.

Also, the c-core implementation of `round_robin` sets the connectivity state of the subConn to `CONNECTING` when it enters `IDLE` because the LB policy starts connecting as soon the subConn enters `IDLE`. We also do the latter, but we don't do the  former. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

balancer: connectivity state aggregation algorithm needs fixing #5458

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

balancer: connectivity state aggregation algorithm needs fixing #5458

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions