Skip to content

Conversation

@ultmaster
Copy link
Contributor

Summary

  • reset LiteLLM's global logging worker when the proxy boots so it binds to the fresh event loop
  • add a regression test ensuring a restart replaces the logging worker singleton

Testing

  • pytest tests/test_llm_proxy_restart.py

https://chatgpt.com/codex/tasks/task_e_68f4dab56bc4832e9e5ef8a9f053433f

Copilot AI review requested due to automatic review settings October 20, 2025 01:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where LiteLLM's global logging worker would raise a RuntimeError when the LLM proxy is restarted, because the logging worker's asyncio queue remains bound to the old event loop after Uvicorn creates a new one.

  • Adds a new function _reset_litellm_logging_worker() to recreate LiteLLM's global logging worker on proxy restart
  • Calls the reset function in LLMProxy.start() to ensure the logging worker uses the fresh event loop
  • Includes a regression test to verify the logging worker is properly replaced between restarts

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
agentlightning/llm_proxy.py Adds logging worker reset functionality and calls it during proxy startup
tests/test_llm_proxy_restart.py New test file verifying the logging worker is refreshed on restart

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

import litellm.utils as litellm_utils

litellm_utils.GLOBAL_LOGGING_WORKER = litellm_logging_worker.GLOBAL_LOGGING_WORKER
except Exception: # pragma: no cover - best-effort hygiene
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare except Exception is too broad and could mask unexpected errors. Consider catching more specific exceptions like ImportError or AttributeError that are expected when the module doesn't exist or the attribute is missing.

Suggested change
except Exception: # pragma: no cover - best-effort hygiene
except (ImportError, AttributeError): # pragma: no cover - best-effort hygiene

Copilot uses AI. Check for mistakes.
@ultmaster ultmaster added the bug Something isn't working label Oct 20, 2025
@ultmaster
Copy link
Contributor Author

Exception:

the following exception happens sometimes when I try to restart llmproxy. Figure out what might be wrong here:

(APIServer pid=126879) INFO:     127.0.0.1:39246 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Task exception was never retrieved
future: <Task finished name='Task-7390' coro=<LoggingWorker._worker_loop() done, defined at /mnt/vss/_work/agent-lightning/agent-lightning/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/logging_worker.py:57> exception=RuntimeError('<Queue at 0x783cbf633f20 maxsize=50000> is bound to a different event loop')>
Traceback (most recent call last):
  File "/mnt/vss/_work/agent-lightning/agent-lightning/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/logging_worker.py", line 65, in _worker_loop
    task = await self._queue.get()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cloudtest/.local/share/uv/python/cpython-3.13.9-linux-x86_64-gnu/lib/python3.13/asyncio/queues.py", line 183, in get
    getter = self._get_loop().create_future()
             ~~~~~~~~~~~~~~^^
  File "/home/cloudtest/.local/share/uv/python/cpython-3.13.9-linux-x86_64-gnu/lib/python3.13/asyncio/mixins.py", line 20, in _get_loop
    raise RuntimeError(f'{self!r} is bound to a different event loop')
RuntimeError: <Queue at 0x783cbf633f20 maxsize=50000> is bound to a different event loop
INFO:     10.1.72.12:59254 - "POST /chat/completions HTTP/1.1" 200 OK
2025-10-19 12:24:09,412 [INFO] (Process-124055 agentlightning.store.client_server)   127.0.0.1:50034 - "GET /ge

@ultmaster ultmaster closed this Oct 23, 2025
@ultmaster ultmaster reopened this Oct 23, 2025
@ultmaster ultmaster merged commit e28fb8c into main Oct 23, 2025
34 checks passed
@ultmaster ultmaster deleted the codex/investigate-llmproxy-restart-exception branch October 28, 2025 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants