fix: ensure ResponseComplete hook always executes#2064
fix: ensure ResponseComplete hook always executes#2064k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
Conversation
This guarantees request/response symmetry to prevent capacity leaks in stateful plugins (e.g., Concurrency Detector). Previously, errors during JSON marshaling, client disconnects, or split streaming chunks could cause the `ResponseComplete` hook to be skipped. Changes: - Add `defer` safety block to trigger completion on errors/disconnects. - Move streaming completion trigger to the authoritative `EndOfStream` signal rather than relying on body content parsing.
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @LukeAVanDrie. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/assign @kfswain |
| // | ||
| // Plugins should assume this is the final cleanup hook for a request. | ||
| // | ||
| // TODO: Consider passing an error or success bool; however, this is a breaking change and is deffered for now. |
There was a problem hiding this comment.
Good call, lets make an issue for this.
|
/lgtm Looks good. I think we can make this change safely. I checked to make sure this plugin was run on streaming and nonstreaming, which it is. Other plugins may need to handle the failure case. But I think thats okay. Thanks! |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain, LukeAVanDrie The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ok-to-test |
|
getting back to my comments on the original PR that added the response hooks - naming selection is really not clear IMO. |
I can create an issue for this tomorrow morning and link it to this PR as a discussion starter. The semantics of |
@LukeAVanDrie agreed, definitely should be symmetric. I have no comments on the behavior, only on the names selection :). |
This guarantees request/response symmetry to prevent capacity leaks in stateful plugins (e.g., Concurrency Detector). Previously, errors during JSON marshaling, client disconnects, or split streaming chunks could cause the `ResponseComplete` hook to be skipped. Changes: - Add `defer` safety block to trigger completion on errors/disconnects. - Move streaming completion trigger to the authoritative `EndOfStream` signal rather than relying on body content parsing.
…teway-api-inference-extension#2064) This guarantees request/response symmetry to prevent capacity leaks in stateful plugins (e.g., Concurrency Detector). Previously, errors during JSON marshaling, client disconnects, or split streaming chunks could cause the `ResponseComplete` hook to be skipped. Changes: - Add `defer` safety block to trigger completion on errors/disconnects. - Move streaming completion trigger to the authoritative `EndOfStream` signal rather than relying on body content parsing.
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR ensures that the
ResponseCompleteplugin hook is always executed if a request was scheduled to a pod, regardless of how the request terminates (success, error, or client disconnect).The Problem:
Stateful plugins (like the upcoming Concurrency Saturation Detector) rely on strict symmetry between
PreRequest(increment) andResponseComplete(decrement). Previously, several edge cases could break this symmetry, causing capacity leaks where the system believed it was saturated when it was actually empty:json.Marshalfailed or the client context was canceled after scheduling but before response generation, the function would return early, skipping the completion hook.strings.Contains(..., "[DONE]")) to detect the end of a stream. If the TCP/Envoy chunk boundary split this message across two reads, the completion signal was missed.The Fix:
deferblock inserver.Processthat checks if aTargetPodwas assigned butResponseCompletewas never called. If so, it forces the completion hook to run.EndOfStreamboolean. This makes the logic robust against split chunks.Which issue(s) this PR fixes:
Prerequisite for #1793
Does this PR introduce a user-facing change?: