Consumer facing higher latency after a preferred leader election

Hi,
as part of the Strimzi project, we have a canary application sending and receiving messages to test that the Kafka cluster is working fine. This canary tool is using Sarama and we are facing the following "problem".
Imagine a cluster with 3 brokers, and the canary application creates a topic with 3 partition, one on each broker.
A normal canary flow shows the following recurring log:

```shell
Metadata for __strimzi_canary topic
	{ID:0 Leader:0 Replicas:[0 1 2] Isr:[0 1 2] OfflineReplicas:[]}
	{ID:2 Leader:2 Replicas:[2 0 1] Isr:[2 0 1] OfflineReplicas:[]}
	{ID:1 Leader:1 Replicas:[1 2 0] Isr:[1 2 0] OfflineReplicas:[]}
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6421,""timestamp"":1619525979038} on partition=0"
Message sent: partition=0, offset=2140, duration=157 ms"
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6422,""timestamp"":1619525979195} on partition=1"
Message received: value={ProducerID:strimzi-canary-client, MessageID:6421, Timestamp:1619525979038}, partition=0, offset=2140, duration=157 ms"
Message sent: partition=1, offset=2140, duration=220 ms"
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6423,""timestamp"":1619525979415} on partition=2"
Message received: value={ProducerID:strimzi-canary-client, MessageID:6422, Timestamp:1619525979195}, partition=1, offset=2140, duration=220 ms"
Message sent: partition=2, offset=2140, duration=126 ms"
... reconcile done
Message received: value={ProducerID:strimzi-canary-client, MessageID:6423, Timestamp:1619525979415}, partition=2, offset=2140, duration=127 ms"
```

It shows a latency that quite often is around 200 ms.
Restarting the broker 0 produces the following effect:

```shell
Metadata for __strimzi_canary topic
 	{ID:0 Leader:1 Replicas:[0 1 2] Isr:[1 2] OfflineReplicas:[0]}
 	{ID:2 Leader:2 Replicas:[2 0 1] Isr:[2 1] OfflineReplicas:[0]}
	{ID:1 Leader:1 Replicas:[1 2 0] Isr:[1 2] OfflineReplicas:[0]}
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6448,""timestamp"":1619526024136} on partition=0"
Message sent: partition=0, offset=2149, duration=88 ms"
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6449,""timestamp"":1619526024224} on partition=1"
Message received: value={ProducerID:strimzi-canary-client, MessageID:6448, Timestamp:1619526024136}, partition=0, offset=2149, duration=89 ms"
Message sent: partition=1, offset=2149, duration=3 ms"
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6450,""timestamp"":1619526024227} on partition=2"
Message received: value={ProducerID:strimzi-canary-client, MessageID:6449, Timestamp:1619526024224}, partition=1, offset=2149, duration=4 ms"
Message sent: partition=2, offset=2149, duration=162 ms"
... reconcile done
Message received: value={ProducerID:strimzi-canary-client, MessageID:6450, Timestamp:1619526024227}, partition=2, offset=2149, duration=163 ms"
```

The replica 0 is offline (because broker 0 is down) but canary keeps going to send messages to partition 0 that now has the leader on broker 1. The latency is still ok.

When broker 0 restarts, it doesn’t become the leader for partition 0 immediately and meanwhile the canary still sends to partition 0 (leader on broker 1) and latency is still ok.

```shell
Metadata for __strimzi_canary topic
 	{ID:0 Leader:1 Replicas:[0 1 2] Isr:[1 2 0] OfflineReplicas:[]}
 	{ID:2 Leader:2 Replicas:[2 0 1] Isr:[2 1 0] OfflineReplicas:[]}
 	{ID:1 Leader:1 Replicas:[1 2 0] Isr:[1 2 0] OfflineReplicas:[]}
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6463,""timestamp"":1619526049199} on partition=0"
Message sent: partition=0, offset=2154, duration=53 ms"
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6464,""timestamp"":1619526049252} on partition=1"
Message sent: partition=1, offset=2154, duration=47 ms"
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6465,""timestamp"":1619526049299} on partition=2"
Message received: value={ProducerID:strimzi-canary-client, MessageID:6463, Timestamp:1619526049199}, partition=0, offset=2154, duration=100 ms"
Message received: value={ProducerID:strimzi-canary-client, MessageID:6464, Timestamp:1619526049252}, partition=1, offset=2154, duration=48 ms"
Message sent: partition=2, offset=2154, duration=154 ms"
... reconcile done
Message received: value={ProducerID:strimzi-canary-client, MessageID:6465, Timestamp:1619526049299}, partition=2, offset=2154, duration=154 ms"
```
When finally the preferred leader election happens and broker 0 is again the leader for partition 0, the canary is affected by higher latency on partition 0. You can see from around 200 ms to 940 ms and this latency has an average of 700/800 ms with canary keep going to send/receive messages.

```shell
Metadata for __strimzi_canary topic
 	{ID:0 Leader:0 Replicas:[0 1 2] Isr:[1 2 0] OfflineReplicas:[]}
 	{ID:2 Leader:2 Replicas:[2 0 1] Isr:[2 1 0] OfflineReplicas:[]}
 	{ID:1 Leader:1 Replicas:[1 2 0] Isr:[1 2 0] OfflineReplicas:[]}
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6469,""timestamp"":1619526059062} on partition=0"
Message sent: partition=0, offset=2156, duration=352 ms"
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6470,""timestamp"":1619526059414} on partition=1"
Message sent: partition=1, offset=2156, duration=84 ms"
Sending message: value={""producerId"":""strimzi-canary-client"",""messageId"":6471,""timestamp"":1619526059498} on partition=2"
Message received: value={ProducerID:strimzi-canary-client, MessageID:6470, Timestamp:1619526059414}, partition=1, offset=2156, duration=85 ms"
Message sent: partition=2, offset=2156, duration=69 ms"
... reconcile done
Message received: value={ProducerID:strimzi-canary-client, MessageID:6471, Timestamp:1619526059498}, partition=2, offset=2156, duration=79 ms"
Message received: value={ProducerID:strimzi-canary-client, MessageID:6469, Timestamp:1619526059062}, partition=0, offset=2156, duration=940 ms"
```

Restarting the canary helps.

Tracing down some Sarama stuff, we noticed that when broker 0 goes off, something like this happens:

```shell
[Sarama] 2021/04/28 11:48:55 consumer/broker/0 disconnecting due to error processing FetchRequest: EOF
[Sarama] 2021/04/28 11:48:55 Closed connection to broker my-cluster-kafka-brokers.default.svc:9092
[Sarama] 2021/04/28 11:48:55 kafka: error while consuming __strimzi_canary/0: EOF
[Sarama] 2021/04/28 11:48:57 consumer/__strimzi_canary/0 finding new broker
[Sarama] 2021/04/28 11:48:57 client/metadata fetching metadata for [__strimzi_canary] from broker localhost:9092
[Sarama] 2021/04/28 11:48:57 client/metadata got error from broker -1 while fetching metadata: EOF
[Sarama] 2021/04/28 11:48:57 Closed connection to broker localhost:9092
[Sarama] 2021/04/28 11:48:57 client/metadata fetching metadata for [__strimzi_canary] from broker my-cluster-kafka-brokers.default.svc:9092
[Sarama] 2021/04/28 11:48:57 Failed to connect to broker my-cluster-kafka-brokers.default.svc:9092: dial tcp 127.0.0.1:9092: connect: connection refused
[Sarama] 2021/04/28 11:48:57 client/metadata got error from broker 0 while fetching metadata: dial tcp 127.0.0.1:9092: connect: connection refused
[Sarama] 2021/04/28 11:48:57 client/brokers deregistered broker #0 at my-cluster-kafka-brokers.default.svc:9092
[Sarama] 2021/04/28 11:48:57 client/metadata fetching metadata for [__strimzi_canary] from broker my-cluster-kafka-brokers.default.svc:9094
[Sarama] 2021/04/28 11:48:57 consumer/broker/1 added subscription to __strimzi_canary/0
```

so it's clear to me that now the consumer is consuming partition 0 from broker 1 (that is actually the leader).
But when the broker 0 comes back again and the new leader is elected, we see:

```shell
[Sarama] 2021/04/28 11:48:58 producer/broker/0 state change to [closing] because dial tcp 127.0.0.1:9092: connect: connection refused
[Sarama] 2021/04/28 11:48:58 producer/leader/__strimzi_canary/0 state change to [retrying-1]
[Sarama] 2021/04/28 11:48:58 producer/leader/__strimzi_canary/0 abandoning broker 0
[Sarama] 2021/04/28 11:48:58 producer/broker/0 input chan closed
[Sarama] 2021/04/28 11:48:58 producer/broker/0 shut down
[Sarama] 2021/04/28 11:48:58 client/metadata fetching metadata for [__strimzi_canary] from broker my-cluster-kafka-brokers.default.svc:9094
[Sarama] 2021/04/28 11:48:59 producer/leader/__strimzi_canary/0 selected broker 1
[Sarama] 2021/04/28 11:48:59 producer/broker/1 state change to [open] on __strimzi_canary/0
[Sarama] 2021/04/28 11:48:59 producer/leader/__strimzi_canary/0 state change to [flushing-1]
[Sarama] 2021/04/28 11:48:59 producer/leader/__strimzi_canary/0 state change to [normal]
```

So only the producer got that leader is changed and moves to send to broker 0 but there is no log from the consumer perspective to get that now the leader is 0 and should read from there.
Other than missing this log, I don't understand how the consumer is then consuming and why the higher latency from now on.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer facing higher latency after a preferred leader election #1927

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consumer facing higher latency after a preferred leader election #1927

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions