forked from waybarrios/vllm-mlx
-
Notifications
You must be signed in to change notification settings - Fork 0
server: close out the upstream /v1/responses merge plan #21
Copy link
Copy link
Closed
Description
Upstream reference: waybarrios#214
Problem:
We want an OpenAI-compatible /v1/responses endpoint for local coding-agent workflows, including Codex-style request normalization. The core PR is broad and important, but it is not yet clearly prioritized against the engine/scheduler correctness backlog.
Status:
- prompt normalization from old PR responses: normalize developer and instructions for Codex waybarrios/vllm-mlx#219 is already folded into server: add OpenAI-compatible /v1/responses endpoint waybarrios/vllm-mlx#214
- no substantive review feedback is present yet
- this is the largest and riskiest open PR in the stack
Path to completion:
- Rebase on current upstream
mainand confirm the diff is still coherent after folding the Codex normalization work into the main PR. - Re-run
tests/test_responses_api.pyand any broader server tests. - Audit unsupported semantics again so the endpoint fails explicitly rather than silently accepting partially-implemented behavior.
- Consider whether the PR should be split into a minimal core endpoint plus follow-up semantics if upstream review stalls on scope.
- Make sure the PR description clearly distinguishes what is implemented, what intentionally degrades, and what is explicitly unsupported.
- Only push this aggressively once the higher-priority engine/scheduler correctness items are not in draft, unless local product needs force earlier action.
Acceptance criteria:
/v1/responsesbehavior is clearly specified and tested- unsupported semantics fail explicitly
- the PR is either merged as-is or intentionally split into smaller reviewable pieces
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels