Skip to content

Conversation

@nirrozenbaum
Copy link
Contributor

@nirrozenbaum nirrozenbaum commented Aug 4, 2025

This PR adds a new mechanism, PluginState, that may be used in a plugin to share a state between different extension points.
a clarification - the intention is that a plugin shares state with itself in different extension points, not with other plugins.

so essentially we should have two mechanisms for sharing state across plugins:

  1. for sharing state between different scheduling plugins, one can use scheduling CycleState.
  2. for sharing state between different extension points of the same plugin, one can add PluginState as a field in the plugin.

This mechanism adds the option to write/read/delete keys in the context of a request, while keeping in mind that for a given request the plugins do not run in parallel (and therefore using internally a map works).

this looks and feels similar to CycleState in some things, so I exported some common things like StateKey and StateData to a common plugins/shared_state.go file (structs that used to share state in the context of plugins).

this PR only adds the generic mechanism, still not using it.
the next step would be to use it in Prefix scorer and completely remove PostCycle extension point.

LMKWYT.

cc: @kfswain @liu-cong @ahg-g @elevran

Fix #1084

…xtension point of a plugin

Signed-off-by: Nir Rozenbaum <[email protected]>
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 4, 2025
@netlify
Copy link

netlify bot commented Aug 4, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit dcf5530
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6890fa77743e3800086e4b9e
😎 Deploy Preview https://deploy-preview-1299--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot requested a review from danehans August 4, 2025 17:59
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from liu-cong August 4, 2025 17:59
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 4, 2025
Signed-off-by: Nir Rozenbaum <[email protected]>
@liu-cong
Copy link
Contributor

liu-cong commented Aug 4, 2025

This is my understanding of the pros/cons of this approach, vs. having the state as part of the "request context" (I am using request context to generally describe an object whose lifecycle is scoped to the request, and does not necessarily need to be the concrete requestContext type).

Pros:

  • Each plugin maintains its own PluginState and no risk of messing up with another plugin.

Cons:

  • Need to maintain the garbage collection via the stalessThreshold. That can be tricky given inference requests can take very long. If we set a sufficiently long threshold, we keep a lot of stale data in memory.

There is no right/wrong solution, but my preference is to model this in a request context storage similar to the CycleState for the following reasons:

  • Having some "request/thread local storage" is not uncommon (e.g., Java ThreadLocal, Go Context).
  • Let go runtime handle the garbage collection.
  • I think we generally consider the plugins as trusted (so preventing malicious plugin is out of the scope), and the risk of one plugin accidentally manipulate the state of another is low. We can even enforce this by only allowing a plugin to access this data key'ed by itself.

Would like to hear others' thoughts :)

@nirrozenbaum
Copy link
Contributor Author

nirrozenbaum commented Aug 5, 2025

This is my understanding of the pros/cons of this approach, vs. having the state as part of the "request context" (I am using request context to generally describe an object whose lifecycle is scoped to the request, and does not necessarily need to be the concrete requestContext type).

Pros:

  • Each plugin maintains its own PluginState and no risk of messing up with another plugin.

Cons:

  • Need to maintain the garbage collection via the stalessThreshold. That can be tricky given inference requests can take very long. If we set a sufficiently long threshold, we keep a lot of stale data in memory.

There is no right/wrong solution, but my preference is to model this in a request context storage similar to the CycleState for the following reasons:

  • Having some "request/thread local storage" is not uncommon (e.g., Java ThreadLocal, Go Context).
  • Let go runtime handle the garbage collection.
  • I think we generally consider the plugins as trusted (so preventing malicious plugin is out of the scope), and the risk of one plugin accidentally manipulate the state of another is low. We can even enforce this by only allowing a plugin to access this data key'ed by itself.

Would like to hear others' thoughts :)

there is another major cons for a requestContext that lives through the request lifecycle.
it basically opens the door for bypassing the well-defined interfaces between the different layers of request handling.

I'm not saying the current design or interfaces are perfect, I just think we should continuously adjust the interfaces between the layers based on new requirements (AI use cases are changing in real fast pace).

I'm concerned about adding this general (key,value) map that can be passed between all extension points with no enforcement of the interfaces between the layers.

@liu-cong
Copy link
Contributor

liu-cong commented Aug 5, 2025

I'm concerned about adding this general (key,value) map that can be passed between all extension points with no enforcement of the interfaces between the layers.

+1 on this concern, though personally I can live with it.

Ideally we can iterate the interface to enforce something, for example, a plugin can only access a specific key that it owns. Would that be something you will consider?

@kfswain
Copy link
Collaborator

kfswain commented Aug 5, 2025

plugin can only access a specific key that it owns

That's not something we should do, multiple plugins may want the same data. Ultimately, I'm thinking we should have the data layer plugins emit or 'publish' the data keys they have. Then other plugins can 'subscribe' to these keys. It's not a pub/sub messaging queue, but a similar idea.

@kfswain
Copy link
Collaborator

kfswain commented Aug 5, 2025

This is similar to the issue we were discussing in updating the prefix cache.

We allowed the plugin to manage its state, but its becoming clear that we should not be allowing that, and instead have a clean data communication system. I think thoughtful admission of data is a better system than to just pass all data throughout the system.

@liu-cong
Copy link
Contributor

liu-cong commented Aug 5, 2025

Some sort of data key management mechanism for plugins (e.g., the pub/sub idea @kfswain mentioned above) sounds plausible to me. Before we have that, I am in favor of having a general key/value store.

@nirrozenbaum
Copy link
Contributor Author

I'd be careful with targeting this pub-sub alike system. it sounds to me like an overkill for what we need.
the use cases we've identified so far that need data sharing are:

  • CycleState to share state between different scheduling plugins in the context of a given request. to be honest, I've not seen a concrete use case for this other than the prefix scorer which is exchanging information with itself. I'm not sure this is truly needed but let's assume it is.
  • I think we should have PluginState - a mechanism for a plugin to share state between different extension points that are not necessarily part of the same layer (also in the context of a given request). as a side note, once we have this in place, Prefix plugin will use PluginState and then CycleState has no plugin that uses it. a point to think about of whether we really need it or want it.
  • Considering the fact that plugins can potentially have cache, they are holding state based on pods they get as candidates for requests. there is a gap here - these stateful plugins don't know when a pod is deleted. but as far as I can tell this is the only information they need from the datastore. we might add a function callback as part of the plugins handle to notify a plugin when a pod is deleted. I don't think we need a full blown notification system here. I also think we SHOULD ALLOW a plugin to maintain its state. e.g., kv cache scorer should manage kv cache state. session scorer should manage active sessions state. etc.. all of these have much in common - the common is that this cached state should be evicted if it wasn't accessed during "XXX" time. we should supply the general mechanism and let a plugin owner decide how much time is considered reasonable.
  • My initial intention was to call pluginState.Delete(requestID) upon successful request handling. so for example Prefix plugin would call Delete(requestID) when it completed updating the prefix cache in PostResponse extension point.
    the go routine that performs cleanup in the background based on lastTimeAccess is just a backup plan for requests that went wrong and did not complete the processing successfully.
  • Data Layer will need separate solution from everything that is mentioned here. we will get there when the new pluggable data layer gets more mature.
  • another side note - even if we do decide to go with this pub-sub alike mechanism in the future, we should not allow access to everything, but rather the access will be as described above. that means that going with a general key/value store throughout the whole lifecycle of a request is a step in the wrong direction, cause it might introduce wrong/unexpected behavior in plugins (e.g., reading properties that are not allowed) and that will be very painful to fix.
    On the other hand, I think that going with the PluginState as suggested in this PR plus adding a podDeleted event notification through the plugins handle is covering all that we need at this point in time.

@kfswain
Copy link
Collaborator

kfswain commented Aug 7, 2025

Oh man. A lot to unpack there.

@liu-cong
Copy link
Contributor

liu-cong commented Aug 7, 2025

My two cents:

  1. I think exposing the "active pods", and perhaps more "system state" info to the plugins so they can manage their internal state is a separate issue. For this PR let's focus on the "per request cache state".

  2. I am cautious about over-designing on this and my preference has always been just using that generic map cause I think the risk is really low. But I hear the concerns. I see this PR is a tradeoff between that simple but potentially risky map vs. a complex pub/sub system. Given we just have one use case so far, I am supportive of giving this a try and iterate later if we really need to.

Will wait for others to comment but you get my lgtm.

@kfswain
Copy link
Collaborator

kfswain commented Aug 7, 2025

Because we are intended to act as a library, a predictable, safe exposure of data & an understandable way that the data interacts is key.

To me, PluginState generates a new source of truth beyond the data-layer/datastore. And simple map is easy to accidentally abuse. Given we snapshot data for scheduling runs, it would be prudent to centralize this data & limit its exposure.

To do this in a general way my preferred approach is to have the data sources emit the key that they populate, and plugins specify keys they 'consume'. Its not a full pub/sub queueing system. But just pulling inspiration from the publisher & subscriber idea. We have multiple use cases that need to reference other plugin data; i.e. SLO prediction takes in prefix token hit as input.

@kfswain
Copy link
Collaborator

kfswain commented Aug 7, 2025

If we want to do either implementation as an interim, that's totally fine, but I'm not entirely sold on the urgency.

@liu-cong
Copy link
Contributor

liu-cong commented Aug 7, 2025

To me, PluginState generates a new source of truth beyond the data-layer/datastore. And simple map is easy to accidentally abuse.

I think how to expose the data-layer to the plugins are a separate discussion. This PR is about how a plugin can maintain its own state (that may or may not be relevant to the data-layer). Just wanna keep the discussion focused so we don't explode the scope.

To be fair, a plugin can always maintain its own state in whatever form they want, and I am not sure if we can/should prevent that. This PR simply provides a useful library for the plugin, and this does not involve a change to the plugin framework. From that perspective, I am not too concerned whether this is the long-term solution or not.

@kfswain
Copy link
Collaborator

kfswain commented Aug 7, 2025

I wont block this PR. This is fine for now, however this discussion keeps coming up, and I think will continue to until we come to a real solve.

@kfswain
Copy link
Collaborator

kfswain commented Aug 7, 2025

/lgtm
/hold

Holding if others want to review.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 7, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2025
@nirrozenbaum
Copy link
Contributor Author

@kfswain @liu-cong thanks for taking the time to think and comment on this topic, It does worth a follow up discussion.

comparing to where we are today, I think this is a step towards the right direction and will allow us to finally deprecare the PostCycle.

will give it a day to see if anyone has more comments/concerns.

@liu-cong
Copy link
Contributor

liu-cong commented Aug 8, 2025

Thanks for the discussion everyone!

comparing to where we are today, I think this is a step towards the right direction and will allow us to finally deprecare the PostCycle.

+1 on this.

/lgtm

@nirrozenbaum Mind adding a unit test?

@nirrozenbaum
Copy link
Contributor Author

@nirrozenbaum Mind adding a unit test?

@liu-cong sure, absolutely.
I'll take that in a follow up PR and merge this one, cause I don't like holding PRs that are basically approved opened for a long time (it usually results in having to rebase multiple times).

atm no plugin uses PluginState, so my next PR would be to add unit-test and only then switch to using it in Prefix plugin.

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 8, 2025
@k8s-ci-robot k8s-ci-robot merged commit d27a716 into kubernetes-sigs:main Aug 8, 2025
9 checks passed
@nirrozenbaum
Copy link
Contributor Author

opened issue #1343

@nirrozenbaum nirrozenbaum deleted the plugin-state branch August 10, 2025 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mechanism for plugins to share state between extension points

4 participants