Skip to content

Conversation

@ultmaster
Copy link
Contributor

@ultmaster ultmaster commented Oct 23, 2025

This pull request refactors how global state and middleware are managed in agentlightning/llm_proxy.py, with a focus on improving initialization and dependency handling for the proxy, middleware, and telemetry. The changes are to prepare for the coming tinker custom LLM, making llm proxy restarting more robust and emit fewer warnings.

Here's what's on my mind:

  1. There should be at most one LLM proxy simulatenously running at one time. This is both limited by litellm's global variable design and opentelemetry's global tracer provider. If litellm cannot have dual instances, there is no point to have dual LLMProxy in one process. That's why we have an "active_llm_proxy".
  2. The active llm proxy must have a store, and that's exactly the middleware and callbacks are fetching from. Ideally, the callbacks and middlewares should only be initialized once and all subsequent restarts or reinitializations should reuse that.
  3. Since litellm will insert some inherent callbacks into the callback list, to avoid overwhelm that list, we clear the list every time before we start the server. This is by reverting that callbacks to an "initial state".
  4. We need to take special care of LightningOpenTelemetry, because to give way to other tracers, we might clear the tracer sometimes, which makes the LightningOpenTelemetry damaged. In that case, it needs to be re-initialized.
  5. Logging workers are reset by the way.
  6. If the exporter was used in one process, then used in another, the lock will die and get stuck. In that case, it needs to be re-initialized too.

Now here's the most tricky part (test).

  1. We know that, we don't have a reliable way to re-intiailize AgentOpsTracer. That means, if an AgentOpsTracer's tracer provider is cleared, there is no reliable way to get it back.
  2. We know that, AgentOpsTracer is only tested in tests/runner/test_agent_integration.py and tests/tracer/test_integration.py, and the latter one was reliably run in a clean subprocess.
  3. We know that, AgentOpsTracer and litellm's tracer can't live harmoneously in one process. You thus need to clear the agentops tracer to install litellm's tracer. but clear litellm's tracer won't give you agentops tracer back if you had it before.
  4. The only way this will work is we test agentops tracer and litellm's tracer in sequential order. They can't be tested interwined; at least agentops tests need to stick together.
  5. We need to find a reliable way to deal with all these global variables. Current hacks suck.

Copilot AI review requested due to automatic review settings October 23, 2025 17:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the global state management in agentlightning/llm_proxy.py to eliminate reliance on global variables and make dependencies explicit. The core change moves initialization logic from a global initialize() function into an instance method LLMProxy.initialize(), requiring explicit LightningStore injection rather than falling back to a global store.

Key changes:

  • Removes global _initialized flag and _global_store, replacing them with explicit dependency injection
  • Moves middleware and callback setup into LLMProxy.initialize() instance method
  • Adds helpers to preserve original user middleware and LiteLLM callbacks for idempotent re-initialization

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


Idempotent. Installs:
def _ensure_global_user_middlewares(app: FastAPI) -> List[Any]:
"""Preserve a copy of `app.user_middlewares` before AGL's middlewares is installed."""
Copy link

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected 'middlewares' to 'middleware' in the docstring.

Suggested change
"""Preserve a copy of `app.user_middlewares` before AGL's middlewares is installed."""
"""Preserve a copy of `app.user_middleware` before AGL's middleware is installed."""

Copilot uses AI. Check for mistakes.
@ultmaster ultmaster closed this Oct 24, 2025
@ultmaster ultmaster reopened this Oct 24, 2025
@ultmaster ultmaster marked this pull request as draft October 24, 2025 10:02
@ultmaster ultmaster marked this pull request as ready for review October 24, 2025 10:02
@ultmaster ultmaster marked this pull request as draft October 24, 2025 10:18
@ultmaster ultmaster marked this pull request as ready for review October 24, 2025 10:18
@ultmaster ultmaster marked this pull request as draft October 24, 2025 11:00
@ultmaster ultmaster marked this pull request as ready for review October 24, 2025 11:00
@ultmaster ultmaster marked this pull request as draft October 24, 2025 15:25
@ultmaster ultmaster marked this pull request as ready for review October 24, 2025 15:25
@ultmaster ultmaster marked this pull request as draft October 24, 2025 16:20
@ultmaster ultmaster marked this pull request as ready for review October 24, 2025 16:20
@ultmaster ultmaster merged commit 56fa8d6 into main Oct 24, 2025
34 checks passed
@ultmaster ultmaster deleted the llm-proxy-init-issue branch October 28, 2025 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants