docs: migrate Speculative Decoding docs to three-tier structure#6001
docs: migrate Speculative Decoding docs to three-tier structure#6001dagil-nvidia merged 3 commits intomainfrom
Conversation
Copy speculative decoding documentation to docs/features/speculative_decoding/ as a cross-cutting feature with backend support matrix. Changes: - Create docs/features/speculative_decoding/README.md with feature overview - Add deprecation notice to docs/backends/vllm/speculative_decoding.md - Add cross-reference from vLLM README to new feature location - Add redirect in conf.py for old path - Add toctree entry in hidden_toctree.rst Part of docs hierarchy refactoring effort. Signed-off-by: Dan Gil <[email protected]> Co-authored-by: Cursor <[email protected]>
WalkthroughAdds a new Speculative Decoding feature guide and vLLM-specific page, inserts cross-references from existing vLLM docs, adds a Sphinx redirect, and registers the new pages in the hidden toctree. All changes are documentation-only. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/ok to test 3dd4010 |
- Create docs/features/speculative_decoding/speculative_decoding_vllm.md with vLLM-specific content following <feature>_<backend>.md convention - Update README.md to link to local backend-specific file - Update original backends file to redirect to new location - Update conf.py redirect to point to vllm-specific file Signed-off-by: Dan Gil <[email protected]> Co-authored-by: Cursor <[email protected]>
|
/ok to test 0d6bbcb |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/features/speculative_decoding/speculative_decoding_vllm.md`:
- Around line 89-105: The example response JSON for the /v1/chat/completions
example is using choices[].text; update it to the OpenAI-compatible chat schema
by replacing choices[].text with choices[].message containing a role (e.g.,
"assistant") and content (string or object) fields—i.e., change the choice
object structure used in the example (keep id, model, usage) so that each choice
uses message.role and message.content instead of text to match the vLLM/Dynamo
integration and OpenAI API.
🧹 Nitpick comments (1)
docs/features/speculative_decoding/speculative_decoding_vllm.md (1)
51-55: Avoid a fixed approval-time promise.Hugging Face approval latency varies; a softer statement reduces the risk of stale guidance.
📝 Suggested tweak
-Approval usually takes around **5 minutes**. +Approval time can vary depending on Hugging Face review/traffic.
- Fix response schema: use message.role/content instead of text - Soften approval time claim (varies vs fixed 5 minutes) Signed-off-by: Dan Gil <[email protected]> Co-authored-by: Cursor <[email protected]>
|
/ok to test a3df9f9 |
…ynamo#6001) Signed-off-by: Dan Gil <[email protected]> Co-authored-by: Cursor <[email protected]>
Summary
docs/features/speculative_decoding/README.mdwith feature overview and backend support matrixdocs/backends/vllm/speculative_decoding.mdpointing to new locationconf.pyfor backward compatibilityhidden_toctree.rstPart of the docs hierarchy refactoring effort to organize cross-cutting features.
Test plan
Summary by CodeRabbit