fix(predictedlatency): keep prefillTokensInFlight from drifting negative by kaushikmitr · Pull Request #2848 · kubernetes-sigs/gateway-api-inference-extension

kaushikmitr · 2026-04-15T03:15:57Z

The predicted-latency producer can be left holding an SLO context whose PreRequest counterpart never ran, when PrepareData times out and the director moves on. The plugin's PrepareRequestData goroutine eventually finishes and publishes the context; ResponseBody later finds it and unconditionally decrements prefillTokensInFlight. Each orphan path subtracts one request's input length from the pod's counter with no matching increment, so under sustained load and at startup when the EPP is ready but the prediction sidecars are not yet ready the counter drifts deeply negative. Once that happens, every subsequent bulk prediction request carries a negative prefill_tokens_in_flight feature and the prediction sidecar rejects it with HTTP 422 "greater_than_equal: 0", killing the latency-prediction scheduling pipeline until the EPP pod is restarted.

#2847 fixed the underlying cancellation hole in the request-control executor so plugins now see ctx.Done() when the deadline fires, but plugins that already commit shared state at the end of PrepareRequestData need to cooperate. Add three layered guards in the predicted-latency producer:

PrepareRequestData: check ctx.Err() before publishing the SLO context. If the director has already cancelled us, the context PreRequest needed never showed up, and storing it now would create the orphan-decrement race even with cooperative cancellation.
ResponseBody: only decrement counters that PreRequest actually incremented. Mirror the prefillTokensAtDispatch{,OnPrefill} > 0 guard the OnEviction handler already uses, instead of decrementing based on prefillTargetMetadata != nil alone. Also tighten the OnEviction guard so each pod's decrement is gated on its own dispatch snapshot rather than a combined OR.
decrementPodCounter helper: a CAS-based clamp at zero that all decrement sites now go through. Decrementing a missing key is a no-op and does not create a zero entry. This is the last line of defense — if any future bug somehow lands a stray decrement, the counter still cannot drift below zero and the prediction sidecar stays inside its validation contract.

Adds regression tests for each layer:

TestPrepareRequestData_CancelledContextDoesNotPublish (and a live positive control) verifies the publication guard on a cancelled ctx.
TestPredictedLatency_ResponseBody_NoOrphanDecrement_WhenPreRequestSkipped reproduces the production race in a unit test: SLO context is published as if PrepareData raced past the director, PreRequest is skipped, ResponseBody fires at EOS, and the pod's counter must not go below zero.
TestDecrementPodCounter_FloorAtZero exercises the helper directly: decrementing a missing key, decrementing past zero, decrementing after delete, and the normal partial-decrement path.

netlify · 2026-04-15T03:16:04Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`15bdac6`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69dfb873914e6c00080edbb3
😎 Deploy Preview	https://deploy-preview-2848--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2026-04-15T03:16:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaushikmitr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/epp/framework/plugins/requestcontrol/requestdataproducer/predictedlatency/OWNERS~~ [kaushikmitr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kaushikmitr · 2026-04-15T03:19:27Z

@BenjaminBraunDev

ahg-g · 2026-04-15T03:23:58Z

 	return v.(*atomic.Int64)
 }

+// decrementPodCounter subtracts delta from the counter at key with a hard


We work with endpoints not pods, a pod may have one or more endpoint in DP mode. so I suggest to rename to endpoint

If there are instances in the predicted latency set of plugins that use the term pod, please update that to endpoint in a followup PR

fixed this for now, will clean it up in a follow up PR

The predicted-latency producer's prefillTokensInFlight counter can get stuck deeply negative, after which every bulk prediction call fails the sidecar's `greater_than_equal: 0` validation with HTTP 422 and the latency-prediction scheduling pipeline is effectively dead. An EPP restart doesn't help because the same race fires again on the next pod startup. Trigger (reproduced on a test cluster): At EPP startup, the EPP's gRPC readiness can pass a few ms before the prediction sidecar is actually answering requests. Envoy has a persistent ext-proc connection to the service, so real traffic floods in the instant EPP becomes ready. For the first ~500 ms of the pod's life, every request's PrepareRequestData calls generatePredictions, which sends an HTTP POST to the sidecar. The sidecar's port is bound but the handler isn't answering yet, so the Go HTTP client waits up to HTTPTimeout (10s by default). The director's runPrepareDataPlugins has its own 400 ms deadline, so after 400 ms it returns "timed out" to the request handler and continues — but until kubernetes-sigs#2847 landed, the plugin's goroutine kept running with a live context. The downstream hooks then raced with the plugin: 1. PreRequest runs on the now-scheduled request, fails to read the SLO context (the plugin goroutine has not published it yet), and skips incrementing prefillTokensInFlight. 2. The plugin goroutine eventually returns from its HTTP call and stores the SLO context in the ttlcache. 3. ResponseBody fires at EOS, reads the (late-published) context, and unconditionally decrements prefillTokensInFlight. Each orphan pair subtracts one request's input length from the endpoint's counter with no matching increment. Reproduced on a test cluster: 19 PrepareData timeouts across a 490 ms startup window drove the counter negative, producing 192 "prefill_tokens_in_flight: negative" log lines and 281 subsequent bulk prediction 422s even though the sidecar was fully healthy from second two onwards. Under real production load the startup burst is much larger (hundreds of requests) and drifts the counter millions of tokens negative in one shot. Steady-state CPU-induced timeouts contribute an additional slow drip on top, but the one-shot startup burst is the dominant source. kubernetes-sigs#2847 fixed the underlying cancellation hole in the request-control executor so plugins now see ctx.Done() when the deadline fires, but plugins that already commit shared state at the end of PrepareRequestData need to cooperate. Add three layered guards in the predicted-latency producer: 1. PrepareRequestData: check ctx.Err() before publishing the SLO context. If the director has already cancelled us, the context PreRequest needed never showed up, and storing it now recreates the orphan-decrement race even under cooperative cancellation. 2. ResponseBody: only decrement counters that PreRequest actually incremented. Mirror the prefillTokensAtDispatch{,OnPrefill} > 0 guard the OnEviction handler already uses, instead of decrementing based on prefillTargetMetadata != nil alone. Also tighten the OnEviction guard so each endpoint's decrement is gated on its own dispatch snapshot rather than a combined OR. 3. decrementEndpointCounter helper: a CAS-based clamp at zero that all decrement sites now go through. Decrementing a missing key is a no-op and does not create a zero entry. This is the last line of defense — if any future bug somehow lands a stray decrement, the counter still cannot drift below zero and the prediction sidecar stays inside its validation contract. Adds regression tests for each layer: - TestPrepareRequestData_CancelledContextDoesNotPublish (and a live positive control) verifies the publication guard on a cancelled ctx. - TestPredictedLatency_ResponseBody_NoOrphanDecrement_WhenPreRequestSkipped reproduces the startup race in a unit test: SLO context is published as if PrepareData raced past the director, PreRequest is skipped, ResponseBody fires at EOS, and the endpoint's counter must not go below zero. - TestDecrementEndpointCounter_FloorAtZero exercises the helper directly: decrementing a missing key, decrementing past zero, decrementing after delete, and the normal partial-decrement path.

ahg-g · 2026-04-15T17:52:11Z

/lgtm

…ive (kubernetes-sigs#2848) The predicted-latency producer's prefillTokensInFlight counter can get stuck deeply negative, after which every bulk prediction call fails the sidecar's `greater_than_equal: 0` validation with HTTP 422 and the latency-prediction scheduling pipeline is effectively dead. An EPP restart doesn't help because the same race fires again on the next pod startup. Trigger (reproduced on a test cluster): At EPP startup, the EPP's gRPC readiness can pass a few ms before the prediction sidecar is actually answering requests. Envoy has a persistent ext-proc connection to the service, so real traffic floods in the instant EPP becomes ready. For the first ~500 ms of the pod's life, every request's PrepareRequestData calls generatePredictions, which sends an HTTP POST to the sidecar. The sidecar's port is bound but the handler isn't answering yet, so the Go HTTP client waits up to HTTPTimeout (10s by default). The director's runPrepareDataPlugins has its own 400 ms deadline, so after 400 ms it returns "timed out" to the request handler and continues — but until kubernetes-sigs#2847 landed, the plugin's goroutine kept running with a live context. The downstream hooks then raced with the plugin: 1. PreRequest runs on the now-scheduled request, fails to read the SLO context (the plugin goroutine has not published it yet), and skips incrementing prefillTokensInFlight. 2. The plugin goroutine eventually returns from its HTTP call and stores the SLO context in the ttlcache. 3. ResponseBody fires at EOS, reads the (late-published) context, and unconditionally decrements prefillTokensInFlight. Each orphan pair subtracts one request's input length from the endpoint's counter with no matching increment. Reproduced on a test cluster: 19 PrepareData timeouts across a 490 ms startup window drove the counter negative, producing 192 "prefill_tokens_in_flight: negative" log lines and 281 subsequent bulk prediction 422s even though the sidecar was fully healthy from second two onwards. Under real production load the startup burst is much larger (hundreds of requests) and drifts the counter millions of tokens negative in one shot. Steady-state CPU-induced timeouts contribute an additional slow drip on top, but the one-shot startup burst is the dominant source. kubernetes-sigs#2847 fixed the underlying cancellation hole in the request-control executor so plugins now see ctx.Done() when the deadline fires, but plugins that already commit shared state at the end of PrepareRequestData need to cooperate. Add three layered guards in the predicted-latency producer: 1. PrepareRequestData: check ctx.Err() before publishing the SLO context. If the director has already cancelled us, the context PreRequest needed never showed up, and storing it now recreates the orphan-decrement race even under cooperative cancellation. 2. ResponseBody: only decrement counters that PreRequest actually incremented. Mirror the prefillTokensAtDispatch{,OnPrefill} > 0 guard the OnEviction handler already uses, instead of decrementing based on prefillTargetMetadata != nil alone. Also tighten the OnEviction guard so each endpoint's decrement is gated on its own dispatch snapshot rather than a combined OR. 3. decrementEndpointCounter helper: a CAS-based clamp at zero that all decrement sites now go through. Decrementing a missing key is a no-op and does not create a zero entry. This is the last line of defense — if any future bug somehow lands a stray decrement, the counter still cannot drift below zero and the prediction sidecar stays inside its validation contract. Adds regression tests for each layer: - TestPrepareRequestData_CancelledContextDoesNotPublish (and a live positive control) verifies the publication guard on a cancelled ctx. - TestPredictedLatency_ResponseBody_NoOrphanDecrement_WhenPreRequestSkipped reproduces the startup race in a unit test: SLO context is published as if PrepareData raced past the director, PreRequest is skipped, ResponseBody fires at EOS, and the endpoint's counter must not go below zero. - TestDecrementEndpointCounter_FloorAtZero exercises the helper directly: decrementing a missing key, decrementing past zero, decrementing after delete, and the normal partial-decrement path. (cherry picked from commit c307ce7)

…ive (kubernetes-sigs#2848) The predicted-latency producer's prefillTokensInFlight counter can get stuck deeply negative, after which every bulk prediction call fails the sidecar's `greater_than_equal: 0` validation with HTTP 422 and the latency-prediction scheduling pipeline is effectively dead. An EPP restart doesn't help because the same race fires again on the next pod startup. Trigger (reproduced on a test cluster): At EPP startup, the EPP's gRPC readiness can pass a few ms before the prediction sidecar is actually answering requests. Envoy has a persistent ext-proc connection to the service, so real traffic floods in the instant EPP becomes ready. For the first ~500 ms of the pod's life, every request's PrepareRequestData calls generatePredictions, which sends an HTTP POST to the sidecar. The sidecar's port is bound but the handler isn't answering yet, so the Go HTTP client waits up to HTTPTimeout (10s by default). The director's runPrepareDataPlugins has its own 400 ms deadline, so after 400 ms it returns "timed out" to the request handler and continues — but until kubernetes-sigs#2847 landed, the plugin's goroutine kept running with a live context. The downstream hooks then raced with the plugin: 1. PreRequest runs on the now-scheduled request, fails to read the SLO context (the plugin goroutine has not published it yet), and skips incrementing prefillTokensInFlight. 2. The plugin goroutine eventually returns from its HTTP call and stores the SLO context in the ttlcache. 3. ResponseBody fires at EOS, reads the (late-published) context, and unconditionally decrements prefillTokensInFlight. Each orphan pair subtracts one request's input length from the endpoint's counter with no matching increment. Reproduced on a test cluster: 19 PrepareData timeouts across a 490 ms startup window drove the counter negative, producing 192 "prefill_tokens_in_flight: negative" log lines and 281 subsequent bulk prediction 422s even though the sidecar was fully healthy from second two onwards. Under real production load the startup burst is much larger (hundreds of requests) and drifts the counter millions of tokens negative in one shot. Steady-state CPU-induced timeouts contribute an additional slow drip on top, but the one-shot startup burst is the dominant source. kubernetes-sigs#2847 fixed the underlying cancellation hole in the request-control executor so plugins now see ctx.Done() when the deadline fires, but plugins that already commit shared state at the end of PrepareRequestData need to cooperate. Add three layered guards in the predicted-latency producer: 1. PrepareRequestData: check ctx.Err() before publishing the SLO context. If the director has already cancelled us, the context PreRequest needed never showed up, and storing it now recreates the orphan-decrement race even under cooperative cancellation. 2. ResponseBody: only decrement counters that PreRequest actually incremented. Mirror the prefillTokensAtDispatch{,OnPrefill} > 0 guard the OnEviction handler already uses, instead of decrementing based on prefillTargetMetadata != nil alone. Also tighten the OnEviction guard so each endpoint's decrement is gated on its own dispatch snapshot rather than a combined OR. 3. decrementEndpointCounter helper: a CAS-based clamp at zero that all decrement sites now go through. Decrementing a missing key is a no-op and does not create a zero entry. This is the last line of defense — if any future bug somehow lands a stray decrement, the counter still cannot drift below zero and the prediction sidecar stays inside its validation contract. Adds regression tests for each layer: - TestPrepareRequestData_CancelledContextDoesNotPublish (and a live positive control) verifies the publication guard on a cancelled ctx. - TestPredictedLatency_ResponseBody_NoOrphanDecrement_WhenPreRequestSkipped reproduces the startup race in a unit test: SLO context is published as if PrepareData raced past the director, PreRequest is skipped, ResponseBody fires at EOS, and the endpoint's counter must not go below zero. - TestDecrementEndpointCounter_FloorAtZero exercises the helper directly: decrementing a missing key, decrementing past zero, decrementing after delete, and the normal partial-decrement path. (cherry picked from commit c307ce7) Minor rebase conflict: upstream/main renamed schedulingtypes.LLMRequest → schedulingtypes.InferenceRequest and the helper createTestLLMRequest → createTestInferenceRequest. release-1.5 still uses the pre-rename names, so the three call sites added by the new regression tests were adjusted back to createTestLLMRequest to compile against release-1.5. No behavior changes.

…ive (#2848) (#2853) The predicted-latency producer's prefillTokensInFlight counter can get stuck deeply negative, after which every bulk prediction call fails the sidecar's `greater_than_equal: 0` validation with HTTP 422 and the latency-prediction scheduling pipeline is effectively dead. An EPP restart doesn't help because the same race fires again on the next pod startup. Trigger (reproduced on a test cluster): At EPP startup, the EPP's gRPC readiness can pass a few ms before the prediction sidecar is actually answering requests. Envoy has a persistent ext-proc connection to the service, so real traffic floods in the instant EPP becomes ready. For the first ~500 ms of the pod's life, every request's PrepareRequestData calls generatePredictions, which sends an HTTP POST to the sidecar. The sidecar's port is bound but the handler isn't answering yet, so the Go HTTP client waits up to HTTPTimeout (10s by default). The director's runPrepareDataPlugins has its own 400 ms deadline, so after 400 ms it returns "timed out" to the request handler and continues — but until #2847 landed, the plugin's goroutine kept running with a live context. The downstream hooks then raced with the plugin: 1. PreRequest runs on the now-scheduled request, fails to read the SLO context (the plugin goroutine has not published it yet), and skips incrementing prefillTokensInFlight. 2. The plugin goroutine eventually returns from its HTTP call and stores the SLO context in the ttlcache. 3. ResponseBody fires at EOS, reads the (late-published) context, and unconditionally decrements prefillTokensInFlight. Each orphan pair subtracts one request's input length from the endpoint's counter with no matching increment. Reproduced on a test cluster: 19 PrepareData timeouts across a 490 ms startup window drove the counter negative, producing 192 "prefill_tokens_in_flight: negative" log lines and 281 subsequent bulk prediction 422s even though the sidecar was fully healthy from second two onwards. Under real production load the startup burst is much larger (hundreds of requests) and drifts the counter millions of tokens negative in one shot. Steady-state CPU-induced timeouts contribute an additional slow drip on top, but the one-shot startup burst is the dominant source. #2847 fixed the underlying cancellation hole in the request-control executor so plugins now see ctx.Done() when the deadline fires, but plugins that already commit shared state at the end of PrepareRequestData need to cooperate. Add three layered guards in the predicted-latency producer: 1. PrepareRequestData: check ctx.Err() before publishing the SLO context. If the director has already cancelled us, the context PreRequest needed never showed up, and storing it now recreates the orphan-decrement race even under cooperative cancellation. 2. ResponseBody: only decrement counters that PreRequest actually incremented. Mirror the prefillTokensAtDispatch{,OnPrefill} > 0 guard the OnEviction handler already uses, instead of decrementing based on prefillTargetMetadata != nil alone. Also tighten the OnEviction guard so each endpoint's decrement is gated on its own dispatch snapshot rather than a combined OR. 3. decrementEndpointCounter helper: a CAS-based clamp at zero that all decrement sites now go through. Decrementing a missing key is a no-op and does not create a zero entry. This is the last line of defense — if any future bug somehow lands a stray decrement, the counter still cannot drift below zero and the prediction sidecar stays inside its validation contract. Adds regression tests for each layer: - TestPrepareRequestData_CancelledContextDoesNotPublish (and a live positive control) verifies the publication guard on a cancelled ctx. - TestPredictedLatency_ResponseBody_NoOrphanDecrement_WhenPreRequestSkipped reproduces the startup race in a unit test: SLO context is published as if PrepareData raced past the director, PreRequest is skipped, ResponseBody fires at EOS, and the endpoint's counter must not go below zero. - TestDecrementEndpointCounter_FloorAtZero exercises the helper directly: decrementing a missing key, decrementing past zero, decrementing after delete, and the normal partial-decrement path. (cherry picked from commit c307ce7) Minor rebase conflict: upstream/main renamed schedulingtypes.LLMRequest → schedulingtypes.InferenceRequest and the helper createTestLLMRequest → createTestInferenceRequest. release-1.5 still uses the pre-rename names, so the three call sites added by the new regression tests were adjusted back to createTestLLMRequest to compile against release-1.5. No behavior changes.

…ive (kubernetes-sigs/gateway-api-inference-extension#2848) The predicted-latency producer's prefillTokensInFlight counter can get stuck deeply negative, after which every bulk prediction call fails the sidecar's `greater_than_equal: 0` validation with HTTP 422 and the latency-prediction scheduling pipeline is effectively dead. An EPP restart doesn't help because the same race fires again on the next pod startup. Trigger (reproduced on a test cluster): At EPP startup, the EPP's gRPC readiness can pass a few ms before the prediction sidecar is actually answering requests. Envoy has a persistent ext-proc connection to the service, so real traffic floods in the instant EPP becomes ready. For the first ~500 ms of the pod's life, every request's PrepareRequestData calls generatePredictions, which sends an HTTP POST to the sidecar. The sidecar's port is bound but the handler isn't answering yet, so the Go HTTP client waits up to HTTPTimeout (10s by default). The director's runPrepareDataPlugins has its own 400 ms deadline, so after 400 ms it returns "timed out" to the request handler and continues — but until kubernetes-sigs/gateway-api-inference-extension#2847 landed, the plugin's goroutine kept running with a live context. The downstream hooks then raced with the plugin: 1. PreRequest runs on the now-scheduled request, fails to read the SLO context (the plugin goroutine has not published it yet), and skips incrementing prefillTokensInFlight. 2. The plugin goroutine eventually returns from its HTTP call and stores the SLO context in the ttlcache. 3. ResponseBody fires at EOS, reads the (late-published) context, and unconditionally decrements prefillTokensInFlight. Each orphan pair subtracts one request's input length from the endpoint's counter with no matching increment. Reproduced on a test cluster: 19 PrepareData timeouts across a 490 ms startup window drove the counter negative, producing 192 "prefill_tokens_in_flight: negative" log lines and 281 subsequent bulk prediction 422s even though the sidecar was fully healthy from second two onwards. Under real production load the startup burst is much larger (hundreds of requests) and drifts the counter millions of tokens negative in one shot. Steady-state CPU-induced timeouts contribute an additional slow drip on top, but the one-shot startup burst is the dominant source. kubernetes-sigs/gateway-api-inference-extension#2847 fixed the underlying cancellation hole in the request-control executor so plugins now see ctx.Done() when the deadline fires, but plugins that already commit shared state at the end of PrepareRequestData need to cooperate. Add three layered guards in the predicted-latency producer: 1. PrepareRequestData: check ctx.Err() before publishing the SLO context. If the director has already cancelled us, the context PreRequest needed never showed up, and storing it now recreates the orphan-decrement race even under cooperative cancellation. 2. ResponseBody: only decrement counters that PreRequest actually incremented. Mirror the prefillTokensAtDispatch{,OnPrefill} > 0 guard the OnEviction handler already uses, instead of decrementing based on prefillTargetMetadata != nil alone. Also tighten the OnEviction guard so each endpoint's decrement is gated on its own dispatch snapshot rather than a combined OR. 3. decrementEndpointCounter helper: a CAS-based clamp at zero that all decrement sites now go through. Decrementing a missing key is a no-op and does not create a zero entry. This is the last line of defense — if any future bug somehow lands a stray decrement, the counter still cannot drift below zero and the prediction sidecar stays inside its validation contract. Adds regression tests for each layer: - TestPrepareRequestData_CancelledContextDoesNotPublish (and a live positive control) verifies the publication guard on a cancelled ctx. - TestPredictedLatency_ResponseBody_NoOrphanDecrement_WhenPreRequestSkipped reproduces the startup race in a unit test: SLO context is published as if PrepareData raced past the director, PreRequest is skipped, ResponseBody fires at EOS, and the endpoint's counter must not go below zero. - TestDecrementEndpointCounter_FloorAtZero exercises the helper directly: decrementing a missing key, decrementing past zero, decrementing after delete, and the normal partial-decrement path.

k8s-ci-robot requested review from nirrozenbaum and robscott April 15, 2026 03:16

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 15, 2026

ahg-g reviewed Apr 15, 2026

View reviewed changes

kaushikmitr force-pushed the fix/predictedlatency-counter-hardening branch from f0ea31b to 5ea53e9 Compare April 15, 2026 16:06

kaushikmitr force-pushed the fix/predictedlatency-counter-hardening branch from 5ea53e9 to 15bdac6 Compare April 15, 2026 16:10

k8s-ci-robot assigned ahg-g Apr 15, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2026

k8s-ci-robot merged commit c307ce7 into kubernetes-sigs:main Apr 15, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(predictedlatency): keep prefillTokensInFlight from drifting negative#2848

fix(predictedlatency): keep prefillTokensInFlight from drifting negative#2848
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
tomatillo-and-multiverse:fix/predictedlatency-counter-hardening

kaushikmitr commented Apr 15, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented Apr 15, 2026

Uh oh!

kaushikmitr commented Apr 15, 2026

Uh oh!

ahg-g Apr 15, 2026

Uh oh!

ahg-g Apr 15, 2026

Uh oh!

kaushikmitr Apr 15, 2026

Uh oh!

ahg-g commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kaushikmitr commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Apr 15, 2026

Uh oh!

kaushikmitr commented Apr 15, 2026

Uh oh!

ahg-g Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

ahg-g Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

kaushikmitr Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

ahg-g commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kaushikmitr commented Apr 15, 2026 •

edited

Loading

netlify Bot commented Apr 15, 2026 •

edited

Loading