InferencePool port specification (e.g., serving, metrics and health)

Currently the InferencePool specifies (the set of) Ports exposed by the selected inference Pods (e.g., #1336). These are assumed to be the model serving ports.
vLLM uses the same port for **model serving**, **metrics** and **health** - there's no native way to override or specify different ports for each function. However, IGW allows specifying a metrics port separately via a command line option.

1. is overriding metrics port only consistent with a specific use case or feature?
2. if so, should inferencepool allow separating the port(s) by role (serving, metrics, health)? Once could still default to the current Ports specification if no per-role specification is provided.

/cc @robscott @smarterclayton 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InferencePool port specification (e.g., serving, metrics and health) #1396

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

InferencePool port specification (e.g., serving, metrics and health) #1396

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions