cgrp: Fix NULL dereference in LeaveGroup when coordinator unavailable by MAlostaz · Pull Request #5327 · confluentinc/librdkafka

MAlostaz (MAlostaz) · 2026-02-24T22:12:31Z

Summary

Fix SIGSEGV crash in rd_kafka_cgrp_handle_LeaveGroup() when the coordinator broker becomes unavailable during consumer close.

Problem

When a consumer is destroyed (rd_kafka_destroy()) and the group coordinator is unavailable, rd_kafka_cgrp_leave() calls rd_kafka_cgrp_handle_LeaveGroup() with a NULL broker pointer (rkb = rkcg->rkcg_coord). The error path then dereferences this NULL pointer:

`rd_kafka_dbg(rkb->rkb_rk, CGRP, "LEAVEGROUP", ...);  // CRASH: rkb is NULL`

This crash requires the coordinator to become unavailable at the exact moment the consumer is shutting down. Typical triggers:

Rolling broker upgrades where the coordinator broker restarts
Coordinator failover during consumer shutdown
Network partition isolating the coordinator

Solution

Replace rkb->rkb_rk with rk in the rd_kafka_dbg() calls. The rk parameter is always valid (passed directly to the function), and is semantically equivalent to rkb->rkb_rk when rkb is non-NULL.

Without the fix (crashes):

Consumer process dies with SIGSEGV
Local resources may not be cleaned up (memory, file handles, etc.)
Broker doesn't get LeaveGroup
Broker waits for session timeout → rebalance

With the fix (graceful exit):

Consumer process exits cleanly
Local resources are properly cleaned up
Broker doesn't get LeaveGroup
Broker waits for session timeout → rebalance

The broker-side outcome is identical. In both cases, the broker doesn't receive LeaveGroup and must wait for session timeout. We can't avoid this because the coordinator is unavailable. But this way there is no core dump and crash.

Backtrace

    #0  rd_kafka_cgrp_handle_LeaveGroup (rk=0x..., rkb=0x0, err=RD_KAFKA_RESP_ERR__WAIT_COORD, ...)
        at rdkafka_cgrp.c:984
    #1  rd_kafka_cgrp_leave (rkcg=0x...) at rdkafka_cgrp.c:1158
    #2  rd_kafka_cgrp_terminate (rkcg=0x...) at rdkafka_cgrp.c:...
    #3  rd_kafka_destroy_internal (rk=0x...) at rdkafka.c:...

We observed intermittent SIGSEGV crashes in production during consumer shutdown
We captured the core dump and analyzed with gdb
The above backtrace shows rkb=0x0 (NULL) in rd_kafka_cgrp_handle_LeaveGroup()

We traced the call site in rd_kafka_cgrp_leave() (line 1158):

} else
    rd_kafka_cgrp_handle_LeaveGroup(rkcg->rkcg_rk, rkcg->rkcg_coord,  // <-- rkcg_coord is NULL here
                                     RD_KAFKA_RESP_ERR__WAIT_COORD,
                                     NULL, NULL, rkcg);

This else branch is taken when no coordinator is available (rkcg->rkcg_coord == NULL)
The function then attempts to log using rkb->rkb_rk at line 984, causing NULL dereference
Confirmed that rk parameter is always valid and equivalent to rkb->rkb_rk when rkb is non-NULL

confluent-cla-assistant · 2026-02-24T22:12:44Z

🎉 All Contributor License Agreements have been signed. Ready to merge.
✅ MAlostaz
_{Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.}

rd_kafka_cgrp_handle_LeaveGroup() would crash with SIGSEGV when logging errors because it dereferenced rkb->rkb_rk when rkb was NULL. This can occur when the coordinator becomes unavailable during consumer shutdown. Use the always-valid `rk` parameter instead of `rkb->rkb_rk` in the rd_kafka_dbg() calls in the error path.

MAlostaz (MAlostaz) force-pushed the fix-leavegroup-null-broker branch from 74a8adf to 669e9f8 Compare February 24, 2026 22:26

MAlostaz (MAlostaz) marked this pull request as ready for review February 24, 2026 22:31

MAlostaz (MAlostaz) requested a review from a team as a code owner February 24, 2026 22:31

MAlostaz (MAlostaz) mentioned this pull request Mar 5, 2026

NULL Dereference in LeaveGroup When Coordinator Is Unavailable #5347

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgrp: Fix NULL dereference in LeaveGroup when coordinator unavailable#5327

cgrp: Fix NULL dereference in LeaveGroup when coordinator unavailable#5327
MAlostaz (MAlostaz) wants to merge 1 commit intoconfluentinc:masterfrom
MAlostaz:fix-leavegroup-null-broker

MAlostaz (MAlostaz) commented Feb 24, 2026 •

edited

Loading

Uh oh!

confluent-cla-assistant bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MAlostaz (MAlostaz) commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Backtrace

Uh oh!

confluent-cla-assistant bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MAlostaz (MAlostaz) commented Feb 24, 2026 •

edited

Loading

confluent-cla-assistant bot commented Feb 24, 2026 •

edited

Loading