Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion api/v1alpha2/inferencemodel_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ type InferenceModelSpec struct {
// ModelNames must be unique for a referencing InferencePool
// (names can be reused for a different pool in the same cluster).
// The modelName with the oldest creation timestamp is retained, and the incoming
// InferenceModel is sets the Ready status to false with a corresponding reason.
// InferenceModel's Ready status is set to false with a corresponding reason.
// In the rare case of a race condition, one Model will be selected randomly to be considered valid, and the other rejected.
// Names can be reserved without an underlying model configured in the pool.
// This can be done by specifying a target model and setting the weight to zero,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ spec:
ModelNames must be unique for a referencing InferencePool
(names can be reused for a different pool in the same cluster).
The modelName with the oldest creation timestamp is retained, and the incoming
InferenceModel is sets the Ready status to false with a corresponding reason.
InferenceModel's Ready status is set to false with a corresponding reason.
In the rare case of a race condition, one Model will be selected randomly to be considered valid, and the other rejected.
Names can be reserved without an underlying model configured in the pool.
This can be done by specifying a target model and setting the weight to zero,
Expand Down
2 changes: 1 addition & 1 deletion docs/proposals/002-api-proposal/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ type InferenceModelSpec struct {
// ModelNames are expected to be unique for a specific InferencePool
// (names can be reused for a different pool in the same cluster).
// The modelName with the oldest creation timestamp is retained, and the incoming
// InferenceModel is sets the Ready status to false with a corresponding reason.
// InferenceModel's Ready status is set to false with a corresponding reason.
// In the rare case of a race condition, one Model will be selected randomly to be considered valid, and the other rejected.
// Names can be reserved without an underlying model configured in the pool.
// This can be done by specifying a target model and setting the weight to zero,
Expand Down
2 changes: 1 addition & 1 deletion site-src/reference/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ _Appears in:_

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `modelName` _string_ | ModelName is the name of the model as it will be set in the "model" parameter for an incoming request.<br />ModelNames must be unique for a referencing InferencePool<br />(names can be reused for a different pool in the same cluster).<br />The modelName with the oldest creation timestamp is retained, and the incoming<br />InferenceModel is sets the Ready status to false with a corresponding reason.<br />In the rare case of a race condition, one Model will be selected randomly to be considered valid, and the other rejected.<br />Names can be reserved without an underlying model configured in the pool.<br />This can be done by specifying a target model and setting the weight to zero,<br />an error will be returned specifying that no valid target model is found. | | MaxLength: 256 <br />Required: \{\} <br /> |
| `modelName` _string_ | ModelName is the name of the model as it will be set in the "model" parameter for an incoming request.<br />ModelNames must be unique for a referencing InferencePool<br />(names can be reused for a different pool in the same cluster).<br />The modelName with the oldest creation timestamp is retained, and the incoming<br />InferenceModel's Ready status is set to false with a corresponding reason.<br />In the rare case of a race condition, one Model will be selected randomly to be considered valid, and the other rejected.<br />Names can be reserved without an underlying model configured in the pool.<br />This can be done by specifying a target model and setting the weight to zero,<br />an error will be returned specifying that no valid target model is found. | | MaxLength: 256 <br />Required: \{\} <br /> |
| `criticality` _[Criticality](#criticality)_ | Criticality defines how important it is to serve the model compared to other models referencing the same pool.<br />Criticality impacts how traffic is handled in resource constrained situations. It handles this by<br />queuing or rejecting requests of lower criticality. InferenceModels of an equivalent Criticality will<br />fairly share resources over throughput of tokens. In the future, the metric used to calculate fairness,<br />and the proportionality of fairness will be configurable.<br />Default values for this field will not be set, to allow for future additions of new field that may 'one of' with this field.<br />Any implementations that may consume this field may treat an unset value as the 'Standard' range. | | Enum: [Critical Standard Sheddable] <br /> |
| `targetModels` _[TargetModel](#targetmodel) array_ | TargetModels allow multiple versions of a model for traffic splitting.<br />If not specified, the target model name is defaulted to the modelName parameter.<br />modelName is often in reference to a LoRA adapter. | | MaxItems: 10 <br /> |
| `poolRef` _[PoolObjectReference](#poolobjectreference)_ | PoolRef is a reference to the inference pool, the pool must exist in the same namespace. | | Required: \{\} <br /> |
Expand Down