fix: ensure ResponseComplete hook always executes by LukeAVanDrie · Pull Request #2064 · kubernetes-sigs/gateway-api-inference-extension

LukeAVanDrie · 2026-01-05T23:34:39Z

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR ensures that the ResponseComplete plugin hook is always executed if a request was scheduled to a pod, regardless of how the request terminates (success, error, or client disconnect).

The Problem:
Stateful plugins (like the upcoming Concurrency Saturation Detector) rely on strict symmetry between PreRequest (increment) and ResponseComplete (decrement). Previously, several edge cases could break this symmetry, causing capacity leaks where the system believed it was saturated when it was actually empty:

Errors & Disconnects: If json.Marshal failed or the client context was canceled after scheduling but before response generation, the function would return early, skipping the completion hook.
Split Streaming Chunks: The streaming logic relied on string parsing (strings.Contains(..., "[DONE]")) to detect the end of a stream. If the TCP/Envoy chunk boundary split this message across two reads, the completion signal was missed.

The Fix:

Safety Defer: Added a defer block in server.Process that checks if a TargetPod was assigned but ResponseComplete was never called. If so, it forces the completion hook to run.
Authoritative Streaming Signal: Moved the streaming completion trigger out of the content-parsing utility and into the main loop, relying on the gRPC EndOfStream boolean. This makes the logic robust against split chunks.

Which issue(s) this PR fixes:

Prerequisite for #1793

Does this PR introduce a user-facing change?:

Fixed a bug where client disconnects, internal errors, or split streaming chunks could cause request capacity to leak, potentially leading to false saturation signals in the Flow Control layer.

This guarantees request/response symmetry to prevent capacity leaks in stateful plugins (e.g., Concurrency Detector). Previously, errors during JSON marshaling, client disconnects, or split streaming chunks could cause the `ResponseComplete` hook to be skipped. Changes: - Add `defer` safety block to trigger completion on errors/disconnects. - Move streaming completion trigger to the authoritative `EndOfStream` signal rather than relying on body content parsing.

netlify · 2026-01-05T23:34:45Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`3800fe3`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/695c4a9331b48600082891f3
😎 Deploy Preview	https://deploy-preview-2064--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2026-01-05T23:34:50Z

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

LukeAVanDrie · 2026-01-05T23:34:59Z

/assign @kfswain

kfswain · 2026-01-05T23:57:12Z

+//
+// Plugins should assume this is the final cleanup hook for a request.
+//
+// TODO: Consider passing an error or success bool; however, this is a breaking change and is deffered for now.


Good call, lets make an issue for this.

kfswain · 2026-01-06T00:11:41Z

/lgtm
/approve

Looks good.

I think we can make this change safely. I checked to make sure this plugin was run on streaming and nonstreaming, which it is. Other plugins may need to handle the failure case. But I think thats okay. Thanks!

k8s-ci-robot · 2026-01-06T00:11:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [kfswain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kfswain · 2026-01-06T00:15:26Z

/ok-to-test

nirrozenbaum · 2026-01-06T04:45:36Z

getting back to my comments on the original PR that added the response hooks - naming selection is really not clear IMO.
I think we should make the names self explained to avoid confusion, or more concretely, I think all response hooks should be renamed.

LukeAVanDrie · 2026-01-06T04:57:18Z

getting back to my comments on the original PR that added the response hooks - naming selection is really not clear IMO. I think we should make the names self explained to avoid confusion, or more concretely, I think all response hooks should be renamed.

I can create an issue for this tomorrow morning and link it to this PR as a discussion starter. The semantics of ResponseComplete are a bit unclear. I do think making this symmetric with PreRequest is the right choice for a lot of different extension point implementations though - specifically to handle closing resources or cleaning up memory references. For the concurrency-limit saturation detector the use case is even more evident.

nirrozenbaum · 2026-01-06T07:33:26Z

getting back to my comments on the original PR that added the response hooks - naming selection is really not clear IMO. I think we should make the names self explained to avoid confusion, or more concretely, I think all response hooks should be renamed.

I can create an issue for this tomorrow morning and link it to this PR as a discussion starter. The semantics of ResponseComplete are a bit unclear. I do think making this symmetric with PreRequest is the right choice for a lot of different extension point implementations though - specifically to handle closing resources or cleaning up memory references. For the concurrency-limit saturation detector the use case is even more evident.

@LukeAVanDrie agreed, definitely should be symmetric. I have no comments on the behavior, only on the names selection :).
would be great if you can create an issue and we can come up with names that best describe what is done by the extension points.

This guarantees request/response symmetry to prevent capacity leaks in stateful plugins (e.g., Concurrency Detector). Previously, errors during JSON marshaling, client disconnects, or split streaming chunks could cause the `ResponseComplete` hook to be skipped. Changes: - Add `defer` safety block to trigger completion on errors/disconnects. - Move streaming completion trigger to the authoritative `EndOfStream` signal rather than relying on body content parsing.

…teway-api-inference-extension#2064) This guarantees request/response symmetry to prevent capacity leaks in stateful plugins (e.g., Concurrency Detector). Previously, errors during JSON marshaling, client disconnects, or split streaming chunks could cause the `ResponseComplete` hook to be skipped. Changes: - Add `defer` safety block to trigger completion on errors/disconnects. - Move streaming completion trigger to the authoritative `EndOfStream` signal rather than relying on body content parsing.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 5, 2026

k8s-ci-robot requested review from liu-cong and shmuelk January 5, 2026 23:34

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 5, 2026

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 5, 2026

k8s-ci-robot assigned kfswain Jan 5, 2026

LukeAVanDrie mentioned this pull request Jan 5, 2026

feat: Add concurrency saturation detector #2062

Merged

kfswain reviewed Jan 5, 2026

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 6, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 6, 2026

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 6, 2026

k8s-ci-robot merged commit 66c6f1d into kubernetes-sigs:main Jan 6, 2026
12 checks passed

This was referenced Jan 6, 2026

Proposal: Rename Response Extension Points to reflect lifecycle semantics #2078

Open

Proposal: Pass error context to ResponseComplete extension point #2079

Open

LukeAVanDrie mentioned this pull request Mar 11, 2026

(refactor)remove ResponseComplete interface, replace it with ResponseStream using endOfstream #2507

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ensure ResponseComplete hook always executes#2064

fix: ensure ResponseComplete hook always executes#2064
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
LukeAVanDrie:fix/handler-lifecycle-symmetry

LukeAVanDrie commented Jan 5, 2026

Uh oh!

netlify Bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jan 5, 2026

Uh oh!

LukeAVanDrie commented Jan 5, 2026

Uh oh!

kfswain Jan 5, 2026

Uh oh!

kfswain commented Jan 6, 2026

Uh oh!

k8s-ci-robot commented Jan 6, 2026

Uh oh!

kfswain commented Jan 6, 2026

Uh oh!

Uh oh!

nirrozenbaum commented Jan 6, 2026

Uh oh!

LukeAVanDrie commented Jan 6, 2026

Uh oh!

nirrozenbaum commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

LukeAVanDrie commented Jan 5, 2026

Uh oh!

netlify Bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Jan 5, 2026

Uh oh!

LukeAVanDrie commented Jan 5, 2026

Uh oh!

kfswain Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

kfswain commented Jan 6, 2026

Uh oh!

k8s-ci-robot commented Jan 6, 2026

Uh oh!

kfswain commented Jan 6, 2026

Uh oh!

Uh oh!

nirrozenbaum commented Jan 6, 2026

Uh oh!

LukeAVanDrie commented Jan 6, 2026

Uh oh!

nirrozenbaum commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify Bot commented Jan 5, 2026 •

edited

Loading