Skip to content

Feature Proposal: Configurable LLM Client Timeouts #1630

@hellovai

Description

@hellovai

Description:

Currently, our LLM clients (e.g., Gemini Flash) sometimes experience excessive delays or errors (e.g., "stream closed because of a broken pipe") due to unbounded waiting times. This has led to multiple user reports and requests to configure timeouts for the client request lifecycle.

We propose adding configurable timeout settings directly in BAML so that developers can define:

  • Connection Timeout: Maximum time to establish a connection.
  • Response Timeout: Maximum time to wait for the first response or header.
  • Total Timeout: Overall limit for the request lifecycle before the request is aborted.

Proposed Design Ideas:

  1. Extend Client Options Syntax:

    • Allow users to specify timeout values in the options block for an LLM client. For example:
      client<llm> MyClient {
        provider "openai"
        options {
          model "gpt-4o"
          connection_timeout 4   // seconds to establish the connection
          response_timeout 4     // seconds to wait for the first byte/response
          total_timeout 4        // overall request limit
        }
      }
      
  2. Integration with Fallbacks/Retry Policy:

  • would only have a total_timeout option for the entire fallback
  • if a client within the fallback failed due to timeout, then it would retry
  1. Expose a new BamlTimeoutError which derives off of BamlClientError to expose the client that failed.

Notes, this may cause a stream to be interrupted in the middle.

Other thoughts, do we expose the concept of a time unit into baml? ie.

Does 1s translate to an implicit Duration type? What is the interface for this in other languages?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions