-
Notifications
You must be signed in to change notification settings - Fork 331
Open
Labels
Description
Description:
Currently, our LLM clients (e.g., Gemini Flash) sometimes experience excessive delays or errors (e.g., "stream closed because of a broken pipe") due to unbounded waiting times. This has led to multiple user reports and requests to configure timeouts for the client request lifecycle.
We propose adding configurable timeout settings directly in BAML so that developers can define:
- Connection Timeout: Maximum time to establish a connection.
- Response Timeout: Maximum time to wait for the first response or header.
- Total Timeout: Overall limit for the request lifecycle before the request is aborted.
Proposed Design Ideas:
-
Extend Client Options Syntax:
- Allow users to specify timeout values in the
optionsblock for an LLM client. For example:client<llm> MyClient { provider "openai" options { model "gpt-4o" connection_timeout 4 // seconds to establish the connection response_timeout 4 // seconds to wait for the first byte/response total_timeout 4 // overall request limit } }
- Allow users to specify timeout values in the
-
Integration with Fallbacks/Retry Policy:
- would only have a
total_timeoutoption for the entire fallback - if a client within the fallback failed due to timeout, then it would retry
- Expose a new
BamlTimeoutErrorwhich derives off ofBamlClientErrorto expose the client that failed.
Notes, this may cause a stream to be interrupted in the middle.
Other thoughts, do we expose the concept of a time unit into baml? ie.
Does 1s translate to an implicit Duration type? What is the interface for this in other languages?
HankelBao, tekumara, alechoey, Sayed-Ameer and deadcoder0904