[Bug]: Background tasks via agents chat --background killed during agent reload

## CoPaw Version
 
1.0.2
 
## Description
 
Background tasks dispatched via `copaw agents chat --background` are spontaneously cancelled when the target agent undergoes a workspace reload. The reload's graceful shutdown logic has a **blind spot**: it only checks CoPaw's per-workspace `TaskTracker` for active tasks, but background tasks submitted through `/api/agent/process/task` are managed by **agentscope_runtime's AgentApp** and are invisible to that tracker. The old workspace is therefore stopped immediately, killing all in-flight background tasks.
 
All affected sessions end with `_is_interrupted=True` and "The tool call has been interrupted by the user" — but **no user issued a stop command**.
 
**Related PR(s):** N/A
 
**Security considerations:** N/A
 
## Component(s) Affected
 
- [x] Core / Backend (app, agents, config, providers, utils, local_models)
- [ ] Console (frontend web UI)
- [ ] Channels (DingTalk, Feishu, QQ, Discord, iMessage, etc.)
- [ ] Skills
- [x] CLI
- [ ] Documentation (website)
- [ ] Tests
- [ ] CI/CD
- [ ] Scripts / Deploy
 
## Environment
 
- **CoPaw version:** 1.0.2
- **OS:** Linux
- **Install method:** from source
- **Python version:** 3.10+
 
## Steps to Reproduce
 
1. Start CoPaw with a configured agent
2. Dispatch multiple background tasks to the same target agent:
   ```bash
   for i in $(seq 1 5); do
     copaw agents chat --background \
       --from-agent default --to-agent <target> \
       --text "Run a long task: sleep 120 && echo done"
   done
   ```
3. Trigger a reload from another terminal or the UI:
   ```bash
   curl -X PUT http://localhost:8088/api/agent/running-config \
     -H "Content-Type: application/json" \
     -H "X-Agent-Id: <target>" \
     -d '{"temperature": 0.8}'
   ```
4. Check task status:
   ```bash
   copaw agents chat --background --task-id <any-task-id>
   ```
 
Any config change endpoint that calls `schedule_agent_reload()` triggers this — including `PUT /agent/running-config`, `PUT /agent/system-prompt-files`, `PUT /agents/{agentId}`, and `PUT /config/channels`.
 
## Actual vs Expected
 
- **Actual:** All background tasks are cancelled within seconds of a reload. Sessions show `_is_interrupted=True`. Task status API shows "pending" or "cancelled".
- **Expected:** Background tasks should survive agent reloads (or at least be given a grace period to complete). The graceful shutdown should be aware of all running tasks, not just those tracked by CoPaw's `TaskTracker`.
 
## Logs / Screenshots
 
All interrupted sessions end with the same pattern:
 
```json
{
  "metadata": {"_is_interrupted": true},
  "content": "I noticed that you have interrupted me. What can I do for you?"
}
```
 
Preceding tool result:
```
<system-info>The tool call has been interrupted by the user.</system-info>
```
 
## Root Cause Analysis
 
### Affected code path
 
```
CLI: copaw agents chat --background --to-agent kM8Z4E --text "..."
  → POST /api/agent/process/task          (agentscope_runtime AgentApp)
  → DynamicMultiAgentRunner.stream_query  (src/copaw/app/_app.py:104)
  → workspace_runner.stream_query         (agentscope_runtime Runner)
  → AgentRunner.query_handler             (src/copaw/app/runner/runner.py:349)
  → CoPawAgent.reply → tool execution     (src/copaw/agents/react_agent.py)
```
 
The task lifecycle is owned by `agentscope_runtime`, **not** by CoPaw's `TaskTracker` (`src/copaw/app/runner/task_tracker.py`).
 
### Step-by-step
 
**1. Tasks are dispatched and running.** Each goes through agentscope_runtime's `/api/agent/process/task` endpoint, which creates an `asyncio.Task` wrapping `DynamicMultiAgentRunner.stream_query()`. At task start, the runner resolves to **workspace A**'s `AgentRunner` and holds a reference to it.
 
**2. A config change triggers `schedule_agent_reload()`** (`src/copaw/app/utils.py:15`), which fires `MultiAgentManager.reload_agent()` in the background.
 
**3. `reload_agent()`** (`src/copaw/app/multi_agent_manager.py:208-319`) creates a new `Workspace` with a fresh `TaskTracker`, starts it, atomically swaps it in (`self.agents[agent_id] = new_instance`, line 312), then calls `_graceful_stop_old_instance()`.
 
**4. `_graceful_stop_old_instance()` — THE BUG** (`src/copaw/app/multi_agent_manager.py:91-186`):
 
```python
has_active = await old_instance.task_tracker.has_active_tasks()  # line 105
if has_active:
    # Wait up to 60s for tasks, then stop ...
else:
    # No active tasks — stop immediately       ← THIS PATH IS TAKEN
    await old_instance.stop(final=False)        # line 176
```
 
**`has_active` returns `False`** because `TaskTracker` only tracks tasks registered via `attach_or_start()` (console channel, messaging channels). Background tasks from `/api/agent/process/task` are managed by agentscope_runtime's AgentApp and are **never registered** in CoPaw's `TaskTracker`.
 
**5. Old workspace is stopped immediately.** `stop(final=False)` (`workspace.py:363`) calls `ServiceManager.stop_all(final=False)` which stops the runner, MCP clients, and channels. The in-flight tasks receive `asyncio.CancelledError`.
 
**6. CancelledError propagates to agent interrupt:**
 
```python
# runner.py:541-545
except asyncio.CancelledError as exc:
    if agent is not None:
        await agent.interrupt()      # cancels agent's reply task
    raise AgentException("Task has been cancelled!") from exc
```
 
`agent.interrupt()` (`react_agent.py:1031-1046`) cancels the agent's `_reply_task`, producing the `_is_interrupted=True` metadata.
 
### Why the evidence matches
 
| Observation | Explanation |
|---|---|
| All 10 tasks interrupted | All used the same old workspace runner |
| Different durations before death (12s–134s) | Tasks dispatched at different times, killed by the same reload event |
| All interrupted during `sleep` commands | `sleep` yields to the event loop where `CancelledError` is delivered |
| `_is_interrupted=True` in all sessions | Standard agentscope interrupt response to `CancelledError` |
| Task status API shows "pending/cancelled" | agentscope_runtime's tracker is separate; status doesn't update correctly after workspace stop |
| No user-initiated `/stop` | Stop was triggered by workspace reload, not user |
 
## Suggested Fix
 
**Option A (Preferred): Register AgentApp tasks with CoPaw's TaskTracker**
 
In `DynamicMultiAgentRunner.stream_query()` (`_app.py:104`), register each background task with the resolved workspace's `TaskTracker` so `_graceful_stop_old_instance` waits for them.
 
**Option B: Delay old workspace stop unconditionally**
 
Always schedule a delayed cleanup with a configurable grace period (e.g. 60–300s) before stopping the old workspace after a reload, instead of relying solely on `has_active_tasks()`.
 
### Key files to modify
 
| File | Purpose |
|---|---|
| `src/copaw/app/multi_agent_manager.py:91-186` | `_graceful_stop_old_instance` — add awareness of AgentApp tasks |
| `src/copaw/app/_app.py:104-126` | `DynamicMultiAgentRunner.stream_query` — register tasks with TaskTracker |
| `src/copaw/app/runner/task_tracker.py` | May need API additions for external task registration |
 
## Additional Notes
 
- The `anyio>=4.0.0,<4.13.0` pin (`pyproject.toml`, #2632) addresses a separate anyio cancellation busy-loop issue that may exacerbate symptoms.
- `agentscope-runtime==1.1.3` is the external package managing `/agent/process/task` endpoints and background task lifecycle.

File	Purpose
`src/copaw/app/multi_agent_manager.py:91-186`	`_graceful_stop_old_instance` — add awareness of AgentApp tasks
`src/copaw/app/_app.py:104-126`	`DynamicMultiAgentRunner.stream_query` — register tasks with TaskTracker
`src/copaw/app/runner/task_tracker.py`	May need API additions for external task registration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Background tasks via agents chat --background killed during agent reload #3275

CoPaw Version

Description

Component(s) Affected

Environment

Steps to Reproduce

Actual vs Expected

Logs / Screenshots

Root Cause Analysis

Affected code path

Step-by-step

Why the evidence matches

Suggested Fix

Key files to modify

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Observation	Explanation
All 10 tasks interrupted	All used the same old workspace runner
Different durations before death (12s–134s)	Tasks dispatched at different times, killed by the same reload event
All interrupted during `sleep` commands	`sleep` yields to the event loop where `CancelledError` is delivered
`_is_interrupted=True` in all sessions	Standard agentscope interrupt response to `CancelledError`
Task status API shows "pending/cancelled"	agentscope_runtime's tracker is separate; status doesn't update correctly after workspace stop
No user-initiated `/stop`	Stop was triggered by workspace reload, not user

[Bug]: Background tasks via agents chat --background killed during agent reload #3275

Description

CoPaw Version

Description

Component(s) Affected

Environment

Steps to Reproduce

Actual vs Expected

Logs / Screenshots

Root Cause Analysis

Affected code path

Step-by-step

Why the evidence matches

Suggested Fix

Key files to modify

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions