Name and Version
after b7793
Operating systems
Windows
GGML backends
CUDA
Hardware
irrelevant
Models
irrelevant
Problem description & steps to reproduce
When starting an llama-server, if the HTTP connection is interrupted on the client side during the generation process, the previous Chat Completions API (v1/chat/completions) can cancel the stream to stop generation, but the new Responses API (/v1/responses) cannot.
First Bad Commit
#18486
Relevant log output
Logs