Skip to content

docs: migrate Speculative Decoding docs to three-tier structure#6001

Merged
dagil-nvidia merged 3 commits intomainfrom
dagil/docs-speculative-migration
Feb 5, 2026
Merged

docs: migrate Speculative Decoding docs to three-tier structure#6001
dagil-nvidia merged 3 commits intomainfrom
dagil/docs-speculative-migration

Conversation

@dagil-nvidia
Copy link
Copy Markdown
Collaborator

@dagil-nvidia dagil-nvidia commented Feb 5, 2026

Summary

  • Create docs/features/speculative_decoding/README.md with feature overview and backend support matrix
  • Add deprecation notice to docs/backends/vllm/speculative_decoding.md pointing to new location
  • Add cross-reference from vLLM README to new feature documentation
  • Add redirect in conf.py for backward compatibility
  • Add toctree entry in hidden_toctree.rst

Part of the docs hierarchy refactoring effort to organize cross-cutting features.

Test plan

  • Docs build passes with standard commands:
    # One-time setup
    uv venv .venv-docs
    uv pip install --python .venv-docs --group docs
    
    # Build docs
    uv run --python .venv-docs --no-project docs/generate_docs.py
  • No broken links in new content
  • Redirect configured for old path

Summary by CodeRabbit

  • Documentation
    • Added new documentation pages covering speculative decoding and vLLM backend integration with setup instructions and examples
    • Added cross-references between related documentation sections
    • Implemented documentation redirects for improved navigation

Copy speculative decoding documentation to docs/features/speculative_decoding/
as a cross-cutting feature with backend support matrix.

Changes:
- Create docs/features/speculative_decoding/README.md with feature overview
- Add deprecation notice to docs/backends/vllm/speculative_decoding.md
- Add cross-reference from vLLM README to new feature location
- Add redirect in conf.py for old path
- Add toctree entry in hidden_toctree.rst

Part of docs hierarchy refactoring effort.

Signed-off-by: Dan Gil <[email protected]>
Co-authored-by: Cursor <[email protected]>
@dagil-nvidia dagil-nvidia requested review from a team as code owners February 5, 2026 16:19
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added docs documentation Improvements or additions to documentation labels Feb 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 5, 2026

Walkthrough

Adds a new Speculative Decoding feature guide and vLLM-specific page, inserts cross-references from existing vLLM docs, adds a Sphinx redirect, and registers the new pages in the hidden toctree. All changes are documentation-only.

Changes

Cohort / File(s) Summary
Feature Overview
docs/features/speculative_decoding/README.md, docs/features/speculative_decoding/speculative_decoding_vllm.md
New comprehensive Speculative Decoding feature guide and a vLLM-specific page outlining workflow, prerequisites, quick-start (vLLM + Eagle3), examples, and backend notes.
vLLM Backend Docs
docs/backends/vllm/README.md, docs/backends/vllm/speculative_decoding.md
Added cross-reference links to the new feature guide in Advanced Examples/Quickstart; added notes that content moved and will be removed in future.
Docs Configuration
docs/conf.py, docs/hidden_toctree.rst
Added a Sphinx redirect mapping backends/vllm/speculative_decoding../../features/speculative_decoding/speculative_decoding_vllm.html and registered new feature pages in the hidden toctree.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I hopped through docs to find a trail,
Left notes of drafts and a verification tale,
Eagles and vLLM peeking through the trees,
Tokens scurry fast on speculative breeze,
A tiny rabbit cheers: "Read on, please!" 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description provides a clear summary of changes, includes a test plan with verification steps, but is missing a 'Related Issues' section with issue references as specified in the template. Add a 'Related Issues' section citing the relevant GitHub issue(s) using action keywords like 'Closes' or 'Relates to' as shown in the template.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: migrating speculative decoding documentation to a three-tier structure, which aligns with the substantive changes across multiple documentation files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dagil-nvidia dagil-nvidia changed the title docs: add speculative decoding to docs/features/ docs: migrate Speculative Decoding docs to three-tier structure Feb 5, 2026
@dagil-nvidia dagil-nvidia requested a review from rmccorm4 February 5, 2026 16:30
@dagil-nvidia
Copy link
Copy Markdown
Collaborator Author

/ok to test 3dd4010

- Create docs/features/speculative_decoding/speculative_decoding_vllm.md
  with vLLM-specific content following <feature>_<backend>.md convention
- Update README.md to link to local backend-specific file
- Update original backends file to redirect to new location
- Update conf.py redirect to point to vllm-specific file

Signed-off-by: Dan Gil <[email protected]>
Co-authored-by: Cursor <[email protected]>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator Author

/ok to test 0d6bbcb

@dagil-nvidia
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/features/speculative_decoding/speculative_decoding_vllm.md`:
- Around line 89-105: The example response JSON for the /v1/chat/completions
example is using choices[].text; update it to the OpenAI-compatible chat schema
by replacing choices[].text with choices[].message containing a role (e.g.,
"assistant") and content (string or object) fields—i.e., change the choice
object structure used in the example (keep id, model, usage) so that each choice
uses message.role and message.content instead of text to match the vLLM/Dynamo
integration and OpenAI API.
🧹 Nitpick comments (1)
docs/features/speculative_decoding/speculative_decoding_vllm.md (1)

51-55: Avoid a fixed approval-time promise.

Hugging Face approval latency varies; a softer statement reduces the risk of stale guidance.

📝 Suggested tweak
-Approval usually takes around **5 minutes**.
+Approval time can vary depending on Hugging Face review/traffic.

- Fix response schema: use message.role/content instead of text
- Soften approval time claim (varies vs fixed 5 minutes)

Signed-off-by: Dan Gil <[email protected]>
Co-authored-by: Cursor <[email protected]>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator Author

/ok to test a3df9f9

@dagil-nvidia dagil-nvidia merged commit 8aa7335 into main Feb 5, 2026
33 checks passed
@dagil-nvidia dagil-nvidia deleted the dagil/docs-speculative-migration branch February 5, 2026 23:45
soodoshll pushed a commit to soodoshll/dynamo that referenced this pull request Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs documentation Improvements or additions to documentation size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants