cgrp: Fix NULL dereference in LeaveGroup when coordinator unavailable#5327
Open
MAlostaz (MAlostaz) wants to merge 1 commit intoconfluentinc:masterfrom
Open
cgrp: Fix NULL dereference in LeaveGroup when coordinator unavailable#5327MAlostaz (MAlostaz) wants to merge 1 commit intoconfluentinc:masterfrom
MAlostaz (MAlostaz) wants to merge 1 commit intoconfluentinc:masterfrom
Conversation
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
rd_kafka_cgrp_handle_LeaveGroup() would crash with SIGSEGV when logging errors because it dereferenced rkb->rkb_rk when rkb was NULL. This can occur when the coordinator becomes unavailable during consumer shutdown. Use the always-valid `rk` parameter instead of `rkb->rkb_rk` in the rd_kafka_dbg() calls in the error path.
74a8adf to
669e9f8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Issue: #5347
Fix SIGSEGV crash in
rd_kafka_cgrp_handle_LeaveGroup()when the coordinator broker becomes unavailable during consumer close.Problem
When a consumer is destroyed (
rd_kafka_destroy()) and the group coordinator is unavailable,rd_kafka_cgrp_leave()callsrd_kafka_cgrp_handle_LeaveGroup()with a NULL broker pointer (rkb = rkcg->rkcg_coord). The error path then dereferences this NULL pointer:This crash requires the coordinator to become unavailable at the exact moment the consumer is shutting down. Typical triggers:
Solution
Replace
rkb->rkb_rkwithrkin therd_kafka_dbg()calls. Therkparameter is always valid (passed directly to the function), and is semantically equivalent torkb->rkb_rkwhenrkbis non-NULL.Without the fix (crashes):
With the fix (graceful exit):
The broker-side outcome is identical. In both cases, the broker doesn't receive LeaveGroup and must wait for session timeout. We can't avoid this because the coordinator is unavailable. But this way there is no core dump and crash.
Backtrace
rkb=0x0(NULL) inrd_kafka_cgrp_handle_LeaveGroup()rd_kafka_cgrp_leave()(line 1158):elsebranch is taken when no coordinator is available (rkcg->rkcg_coord == NULL)rkb->rkb_rkat line 984, causing NULL dereferencerkparameter is always valid and equivalent torkb->rkb_rkwhenrkbis non-NULL