Skip to content

[Python SDK] graceful shutdown does not track or cancel in-flight tasks #356

@santoshkumarradha

Description

@santoshkumarradha

Summary

agentfield.agent_server.AgentServer._graceful_shutdown does not track in-flight asyncio tasks, so when shutdown fires, long-running reasoner tasks continue executing in the background past the shutdown deadline. The agent process exits while work is still in progress — partial state, potential dropped responses, and a confusing audit trail in the control plane.

Where

  • File: sdk/python/agentfield/agent_server.py
  • Function: AgentServer._graceful_shutdown
  • Discovered by: sdk/python/tests/test_agent_graceful_shutdown.py::test_graceful_shutdown_cancels_in_flight_tasks_within_deadline (skipped with pytest.skip("source bug: graceful shutdown does not track or cancel in-flight tasks"))

Reproduction

agent = make_shutdown_agent()
server = AgentServer(agent)

async def long_running():
    await asyncio.sleep(60)

tasks = [asyncio.create_task(long_running()) for _ in range(5)]
await server._graceful_shutdown(timeout_seconds=0)

# Expected: all tasks done (completed or cancelled) before _graceful_shutdown returns
# Actual:   tasks still running after _graceful_shutdown — server exits leaving them orphaned

Expected behavior

_graceful_shutdown should:

  1. Maintain a registry of in-flight reasoner tasks (created when a request arrives, removed when it completes)
  2. On shutdown, await all pending tasks up to the configured deadline
  3. After the deadline expires, cancel any remaining tasks (see [Python SDK] graceful shutdown does not enforce timeout-based task cancellation #357)
  4. Only call os._exit after the task drain is complete

Reference implementation: most ASGI servers (uvicorn, hypercorn) follow this pattern — track requests, drain on shutdown, cancel on deadline.

Acceptance criteria

  • AgentServer maintains a set of in-flight reasoner tasks
  • _graceful_shutdown awaits all in-flight tasks before exit
  • Tasks finished before the deadline complete cleanly
  • The skipped test in test_agent_graceful_shutdown.py::test_graceful_shutdown_cancels_in_flight_tasks_within_deadline is unskipped and passes

Related

Discovered via

PR #352 (test coverage improvements). One of 5 source bugs surfaced while writing failure-mode tests for the Python SDK.

Metadata

Metadata

Assignees

Labels

ai-friendlyWell-documented task suitable for AI-assisted developmentbugSomething isn't workingsdk:pythonPython SDK relatedtestsUnit test improvements and coverage

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions