server: close out the upstream /v1/responses merge plan

Upstream reference: https://github.com/waybarrios/vllm-mlx/pull/214

Problem:
We want an OpenAI-compatible `/v1/responses` endpoint for local coding-agent workflows, including Codex-style request normalization. The core PR is broad and important, but it is not yet clearly prioritized against the engine/scheduler correctness backlog.

Status:
- prompt normalization from old PR #219 is already folded into #214
- no substantive review feedback is present yet
- this is the largest and riskiest open PR in the stack

Path to completion:
1. Rebase on current upstream `main` and confirm the diff is still coherent after folding the Codex normalization work into the main PR.
2. Re-run `tests/test_responses_api.py` and any broader server tests.
3. Audit unsupported semantics again so the endpoint fails explicitly rather than silently accepting partially-implemented behavior.
4. Consider whether the PR should be split into a minimal core endpoint plus follow-up semantics if upstream review stalls on scope.
5. Make sure the PR description clearly distinguishes what is implemented, what intentionally degrades, and what is explicitly unsupported.
6. Only push this aggressively once the higher-priority engine/scheduler correctness items are not in draft, unless local product needs force earlier action.

Acceptance criteria:
- `/v1/responses` behavior is clearly specified and tested
- unsupported semantics fail explicitly
- the PR is either merged as-is or intentionally split into smaller reviewable pieces

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: close out the upstream /v1/responses merge plan #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: close out the upstream /v1/responses merge plan #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions