Skip to content

Eval bug: Responses API (/v1/responses) can`t cancel a stream to stop generation #19173

@wqerrewetw

Description

@wqerrewetw

Name and Version

after b7793

Operating systems

Windows

GGML backends

CUDA

Hardware

irrelevant

Models

irrelevant

Problem description & steps to reproduce

When starting an llama-server, if the HTTP connection is interrupted on the client side during the generation process, the previous Chat Completions API (v1/chat/completions) can cancel the stream to stop generation, but the new Responses API (/v1/responses) cannot.

First Bad Commit

#18486

Relevant log output

Logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions