Skip to content

Support for vLLM Data parallel#1663

Merged
k8s-ci-robot merged 21 commits intokubernetes-sigs:mainfrom
shmuelk:data-parallel
Oct 17, 2025
Merged

Support for vLLM Data parallel#1663
k8s-ci-robot merged 21 commits intokubernetes-sigs:mainfrom
shmuelk:data-parallel

Conversation

@shmuelk
Copy link
Copy Markdown
Contributor

@shmuelk shmuelk commented Sep 28, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds support for the vLLM Data Parallel feature. The vLLM Data Parallel feature causes the vLLM "launcher" to launch many vLLM instances in the same Pod, each listening on a different port.

The InferencePool CRD has already been changed to support this by allowing up to eight TargetPorts to be specified. It is assumed that all pods in the InferencePool have been configured the same way WRT Data Parallelism.

In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods" from the real pods that are found by the pod reconciler. These virtual pods are all given names that are the real pod's name concatenated with the string "-rank-N", where N is a number from zero to seven. The term rank was used as that's what each of the separate vLLM "servers" in a Data Parallel configuration are called.

The former code has a notion of a globally known ports for inference and metrics scraping. This has been eliminated and instead inference port and metrics port fields have been added to the PodInfo struct. In addition a field was added, PodName, that contains the name of the real pod used to create the "virtual pods".

Lastly the API of the PreRequest extension point has been changed, removing the inference port parameter. Any PreRequest extensions must get the inference port of the pod(s) in question from the PodInfo's GetPort() API.

Which issue(s) this PR fixes:
Fixes #1519

Does this PR introduce a user-facing change?:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Sep 28, 2025
@netlify
Copy link
Copy Markdown

netlify Bot commented Sep 28, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit cbcb07e
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68f0c20a3f4f7900086e4740
😎 Deploy Preview https://deploy-preview-1663--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 28, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 28, 2025
@danehans
Copy link
Copy Markdown
Contributor

danehans commented Oct 2, 2025

In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods" from the real pods that are found by the pod reconciler.

Pods are comprised of one or more containers. Will each vLLM engine instance run as a separate container in the vLLM pod?

Comment on lines -78 to -80
if p.ModelServerMetricsPort == 0 {
p.ModelServerMetricsPort = targetPortNumber
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that GetMetricsPort() does not implement this default behavior, which makes sense now that targetPortNumber is a list. We need to note this as a breaking change in the PR description.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no breaking change. See the code in pkg/epp/datastore/datastore.go lines 242-244. If there is only one targetPort in the InferencePool and the ModelServerMetricsPort from the command line is not zero it will be used to fill the metricsPort in the PodInfo struct. The function GetMetricsPort() simply returns what was placed in the struct earlier.

@danehans
Copy link
Copy Markdown
Contributor

danehans commented Oct 4, 2025

@shmuelk please create a tracker issue for adding a conformance test that includes multiple InferencePool targetPorts.

@shmuelk
Copy link
Copy Markdown
Contributor Author

shmuelk commented Oct 5, 2025

In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods"
from the real pods that are found by the pod reconciler.

Pods are comprised of one or more containers. Will each vLLM engine instance run as a separate container in the vLLM pod?

They are run as separate processes in the same container as far as I know. What launches Data parallel in a real vLLM is a parameter to vLLM. I don't think that can add containers to the pod.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 14, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2025
@shmuelk shmuelk changed the title WIP: Support for vLLM Data parallel Support for vLLM Data parallel Oct 15, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2025
Comment thread pkg/epp/datastore/datastore.go Outdated
}
}

func (ds *datastore) PodRemove(podName string) {
Copy link
Copy Markdown
Collaborator

@kfswain kfswain Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this to the PodDelete function? It should be the same behavior it was for one engine per pod

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread pkg/epp/backend/metrics/metrics.go Outdated
}
return fmt.Sprintf("%s://%s:%d%s", p.ModelServerMetricsScheme, pod.Address, p.ModelServerMetricsPort, p.ModelServerMetricsPath)
func (p *PodMetricsClientImpl) getMetricEndpoint(pod *backend.Pod) string {
return fmt.Sprintf("%s://%s:%d%s", p.ModelServerMetricsScheme, pod.GetIPAddress(), pod.GetMetricsPort(), p.ModelServerMetricsPath)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this, but since this is called in FetchMetrics, this is a highly trafficked function, can we just cache this endpoint string instead of calling Sprintf? I don't really see a case in which this is mutated within the pod/endpoint lifecycle.

Understood that it was this way before this PR, but would be a nice cleanup

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodInfo now has the metrics pre-built. This code is now a simple concatenation of strings.

return fmt.Errorf("expected 1 target port, got %d", len(pool.Spec.TargetPorts))
}
updated, err := pm.pmc.FetchMetrics(ctx, pm.GetPod(), pm.GetMetrics(), int32(pool.Spec.TargetPorts[0].Number))
updated, err := pm.pmc.FetchMetrics(ctx, pm.GetPod(), pm.GetMetrics())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much cleaner, I like.

// PodInfo represents the relevant Kubernetes Pod state of an inference server.
type PodInfo struct {
NamespacedName types.NamespacedName
PodName string
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an 'Endpoint` struct also defined in this package, would it make more sense to define that as the object that the rest of the system uses?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endpoint is actually an interface. It is more related to the data gathering done in the Datalayer.

I think a refactoring all over the place from Pod to Endpoint, is not something to be done in this PR. I also think this needs to be done in phases due to the size of the effort. Some potential steps:

  1. Rename the current Endpoint interface to something like EndpointMetrics
  2. Replace all over the place backendmetrics.PodMetrics with datalayer.EndpointMetrics. In reality backendmetrics.PodMetrics is simply a reference to datalayer.EndPoint, which is somewhat confusing.
  3. Appropriately rename the pod related structs, functions, and variables to Endpoint related names.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Can we open an issue to track this work?

In reality backendmetrics.PodMetrics is simply a reference to datalayer.EndPoint, which is somewhat confusing.

Agreed, we should get rid of excess layers where we can.

pm.UpdatePod(pod)
return ok

pods := []*datalayer.PodInfo{}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i think the use of the nomenclature pods when referring to a specific rank/engine is making this PR more difficult to read, and subsequently, I think that will impact the codebase. Suggest using Endpoint

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above

Comment thread pkg/epp/datastore/datastore.go Outdated
Labels: labels,
})
}
if len(pods) == 1 && ds.modelServerMetricsPort != 0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can check the length of ds.pool.Spec.TargetPorts before iteration and assembling the endpoint list

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread pkg/epp/datalayer/podinfo.go Outdated
NamespacedName types.NamespacedName
PodName string
Address string
Port int32
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we want to make this a string (would prevent Itoa calls), do we ever want to use this like a number?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2025
@kfswain
Copy link
Copy Markdown
Collaborator

kfswain commented Oct 16, 2025

/lgtm
/approve
/hold

Looks good to me, just holding to give others a chance to review if they want. If we don't hear anything after an hr or two we can unhold and keep moving

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 16, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, shmuelk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2025
@kfswain
Copy link
Copy Markdown
Collaborator

kfswain commented Oct 17, 2025

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2025
@k8s-ci-robot k8s-ci-robot merged commit 51dff7b into kubernetes-sigs:main Oct 17, 2025
11 checks passed
BenjaminBraunDev pushed a commit to BenjaminBraunDev/gateway-api-inference-extension that referenced this pull request Oct 28, 2025
* Removed global inference port from Prerequest extension API

Signed-off-by: Shmuel Kallner <[email protected]>

* Inference port and metrics port now per pod

Signed-off-by: Shmuel Kallner <[email protected]>

* Differentiate between real pod delete and virtual pod delete

Signed-off-by: Shmuel Kallner <[email protected]>

* Pass default metrics port to datastore

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates to reflect newer APIs

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates to tests

Signed-off-by: Shmuel Kallner <[email protected]>

* Fail tests that have errors, don't just log the errors

Signed-off-by: Shmuel Kallner <[email protected]>

* Remove tests that are no longer applicable

Signed-off-by: Shmuel Kallner <[email protected]>

* Set an InferencePool into the datastore

Signed-off-by: Shmuel Kallner <[email protected]>

* Added tests with multiple TargetPorts

Signed-off-by: Shmuel Kallner <[email protected]>

* Fix lint issues

Signed-off-by: Shmuel Kallner <[email protected]>

* Updated a new test due to updated interface

Signed-off-by: Shmuel Kallner <[email protected]>

* Store inference port and metrics host as strings

Signed-off-by: Shmuel Kallner <[email protected]>

* Concatenate metrics URL parts together without fmt.Sprintf

Signed-off-by: Shmuel Kallner <[email protected]>

* Use already stored metrics host

Signed-off-by: Shmuel Kallner <[email protected]>

* No need to convert inference port to a string

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates due to PodInfo changes

Signed-off-by: Shmuel Kallner <[email protected]>

* Test updates due to PodInfo changes

Signed-off-by: Shmuel Kallner <[email protected]>

* Merged PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

* Changes due to merging of PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

* Test changes due to merging of PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

---------

Signed-off-by: Shmuel Kallner <[email protected]>
@shmuelk shmuelk deleted the data-parallel branch November 16, 2025 14:18
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request Apr 23, 2026
…-extension#1663)

* Removed global inference port from Prerequest extension API

Signed-off-by: Shmuel Kallner <[email protected]>

* Inference port and metrics port now per pod

Signed-off-by: Shmuel Kallner <[email protected]>

* Differentiate between real pod delete and virtual pod delete

Signed-off-by: Shmuel Kallner <[email protected]>

* Pass default metrics port to datastore

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates to reflect newer APIs

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates to tests

Signed-off-by: Shmuel Kallner <[email protected]>

* Fail tests that have errors, don't just log the errors

Signed-off-by: Shmuel Kallner <[email protected]>

* Remove tests that are no longer applicable

Signed-off-by: Shmuel Kallner <[email protected]>

* Set an InferencePool into the datastore

Signed-off-by: Shmuel Kallner <[email protected]>

* Added tests with multiple TargetPorts

Signed-off-by: Shmuel Kallner <[email protected]>

* Fix lint issues

Signed-off-by: Shmuel Kallner <[email protected]>

* Updated a new test due to updated interface

Signed-off-by: Shmuel Kallner <[email protected]>

* Store inference port and metrics host as strings

Signed-off-by: Shmuel Kallner <[email protected]>

* Concatenate metrics URL parts together without fmt.Sprintf

Signed-off-by: Shmuel Kallner <[email protected]>

* Use already stored metrics host

Signed-off-by: Shmuel Kallner <[email protected]>

* No need to convert inference port to a string

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates due to PodInfo changes

Signed-off-by: Shmuel Kallner <[email protected]>

* Test updates due to PodInfo changes

Signed-off-by: Shmuel Kallner <[email protected]>

* Merged PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

* Changes due to merging of PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

* Test changes due to merging of PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

---------

Signed-off-by: Shmuel Kallner <[email protected]>
nirrozenbaum pushed a commit to llm-d/llm-d-inference-payload-processor that referenced this pull request Apr 28, 2026
…-extension#1663)

* Removed global inference port from Prerequest extension API

Signed-off-by: Shmuel Kallner <[email protected]>

* Inference port and metrics port now per pod

Signed-off-by: Shmuel Kallner <[email protected]>

* Differentiate between real pod delete and virtual pod delete

Signed-off-by: Shmuel Kallner <[email protected]>

* Pass default metrics port to datastore

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates to reflect newer APIs

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates to tests

Signed-off-by: Shmuel Kallner <[email protected]>

* Fail tests that have errors, don't just log the errors

Signed-off-by: Shmuel Kallner <[email protected]>

* Remove tests that are no longer applicable

Signed-off-by: Shmuel Kallner <[email protected]>

* Set an InferencePool into the datastore

Signed-off-by: Shmuel Kallner <[email protected]>

* Added tests with multiple TargetPorts

Signed-off-by: Shmuel Kallner <[email protected]>

* Fix lint issues

Signed-off-by: Shmuel Kallner <[email protected]>

* Updated a new test due to updated interface

Signed-off-by: Shmuel Kallner <[email protected]>

* Store inference port and metrics host as strings

Signed-off-by: Shmuel Kallner <[email protected]>

* Concatenate metrics URL parts together without fmt.Sprintf

Signed-off-by: Shmuel Kallner <[email protected]>

* Use already stored metrics host

Signed-off-by: Shmuel Kallner <[email protected]>

* No need to convert inference port to a string

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates due to PodInfo changes

Signed-off-by: Shmuel Kallner <[email protected]>

* Test updates due to PodInfo changes

Signed-off-by: Shmuel Kallner <[email protected]>

* Merged PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

* Changes due to merging of PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

* Test changes due to merging of PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

---------

Signed-off-by: Shmuel Kallner <[email protected]>
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request May 3, 2026
…-extension#1663)

* Removed global inference port from Prerequest extension API

Signed-off-by: Shmuel Kallner <[email protected]>

* Inference port and metrics port now per pod

Signed-off-by: Shmuel Kallner <[email protected]>

* Differentiate between real pod delete and virtual pod delete

Signed-off-by: Shmuel Kallner <[email protected]>

* Pass default metrics port to datastore

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates to reflect newer APIs

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates to tests

Signed-off-by: Shmuel Kallner <[email protected]>

* Fail tests that have errors, don't just log the errors

Signed-off-by: Shmuel Kallner <[email protected]>

* Remove tests that are no longer applicable

Signed-off-by: Shmuel Kallner <[email protected]>

* Set an InferencePool into the datastore

Signed-off-by: Shmuel Kallner <[email protected]>

* Added tests with multiple TargetPorts

Signed-off-by: Shmuel Kallner <[email protected]>

* Fix lint issues

Signed-off-by: Shmuel Kallner <[email protected]>

* Updated a new test due to updated interface

Signed-off-by: Shmuel Kallner <[email protected]>

* Store inference port and metrics host as strings

Signed-off-by: Shmuel Kallner <[email protected]>

* Concatenate metrics URL parts together without fmt.Sprintf

Signed-off-by: Shmuel Kallner <[email protected]>

* Use already stored metrics host

Signed-off-by: Shmuel Kallner <[email protected]>

* No need to convert inference port to a string

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates due to PodInfo changes

Signed-off-by: Shmuel Kallner <[email protected]>

* Test updates due to PodInfo changes

Signed-off-by: Shmuel Kallner <[email protected]>

* Merged PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

* Changes due to merging of PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

* Test changes due to merging of PodRemove into PodDelete

Signed-off-by: Shmuel Kallner <[email protected]>

---------

Signed-off-by: Shmuel Kallner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for vLLM's Data Parallel

4 participants