Currently the InferencePool specifies (the set of) Ports exposed by the selected inference Pods (e.g., #1336). These are assumed to be the model serving ports.
vLLM uses the same port for model serving, metrics and health - there's no native way to override or specify different ports for each function. However, IGW allows specifying a metrics port separately via a command line option.
- is overriding metrics port only consistent with a specific use case or feature?
- if so, should inferencepool allow separating the port(s) by role (serving, metrics, health)? Once could still default to the current Ports specification if no per-role specification is provided.
/cc @robscott @smarterclayton
Currently the InferencePool specifies (the set of) Ports exposed by the selected inference Pods (e.g., #1336). These are assumed to be the model serving ports.
vLLM uses the same port for model serving, metrics and health - there's no native way to override or specify different ports for each function. However, IGW allows specifying a metrics port separately via a command line option.
/cc @robscott @smarterclayton