Skip to content

InferencePool port specification (e.g., serving, metrics and health) #1396

@elevran

Description

@elevran

Currently the InferencePool specifies (the set of) Ports exposed by the selected inference Pods (e.g., #1336). These are assumed to be the model serving ports.
vLLM uses the same port for model serving, metrics and health - there's no native way to override or specify different ports for each function. However, IGW allows specifying a metrics port separately via a command line option.

  1. is overriding metrics port only consistent with a specific use case or feature?
  2. if so, should inferencepool allow separating the port(s) by role (serving, metrics, health)? Once could still default to the current Ports specification if no per-role specification is provided.

/cc @robscott @smarterclayton

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions