What needs to be done?: The Flow Control layer is designed to gracefully handle scale-from-zero scenarios by queueing incoming requests until backend pods are ready. This behavior needs to be rigorously validated to ensure it is robust.
Validation Steps:
- Create test scenarios where an
InferencePool scales from 0 to N replicas while under load (or even 1 to 0 to N)
- Verify that requests are correctly queued and not dropped, provided they do not exceed their own timeouts.
Measure the end-to-end latency for the first few requests to confirm they are dispatched promptly once backends become available.
- Ensure the system remains stable and does not enter a deadlocked state during the scale-up process.
What needs to be done?: The Flow Control layer is designed to gracefully handle scale-from-zero scenarios by queueing incoming requests until backend pods are ready. This behavior needs to be rigorously validated to ensure it is robust.
Validation Steps:
InferencePoolscales from 0 to N replicas while under load (or even 1 to 0 to N)Measure the end-to-end latency for the first few requests to confirm they are dispatched promptly once backends become available.