docs: migrate Multimodal docs to three-tier structure#5999
docs: migrate Multimodal docs to three-tier structure#5999dagil-nvidia merged 3 commits intomainfrom
Conversation
WalkthroughThis pull request relocates and expands multimodal documentation from Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@docs/features/multimodal/README.md`:
- Around line 129-130: The product name capitalization is inconsistent: change
the instance "SGlang" in the line "Decode-first, used by SGlang" to "SGLang" so
it matches earlier usage; search for and normalize any other occurrences of
"SGlang" to "SGLang" in this doc (and related multimodal/readme content) to
ensure consistent branding across the text.
In `@docs/features/multimodal/sglang.md`:
- Around line 140-143: The Markdown link to the SGLang backend README is broken;
update the reference in the Workflow section to point to the correct location
under docs/backends (replace the relative link ../backends/sglang/README.md with
the correct path docs/backends/sglang/README.md) so that the sentence
referencing the LLM aggregated serving example (near mentions of
MultimodalEncodeWorker and MultimodalWorker) uses the valid README link.
In `@docs/features/multimodal/trtllm.md`:
- Around line 450-452: Update the typo "compatibilty" to "compatibility" in the
docs text shown: change both occurrences in the TRTLLM notes (the
llava-v1.6-mistral-7b-hf model crash line and the Embeddings file crash line) so
the phrase reads "compatibility with `TensorRT LLM version: 1.2.0rc6.post1`";
verify the surrounding references (including the mention of
attach_multimodal_embeddings()) remain unchanged and commit the corrected doc.
- Around line 386-393: The fenced log-output blocks in trtllm.md lack a language
identifier which triggers MD040; update each triple-backtick fence that
surrounds the logs (e.g., the block containing "INFO dynamo_run::input::http:
Watching for remote model at models" and the block containing "INFO
dynamo_llm::discovery::watcher: added model
model_name=\"meta-llama/Llama-4-Maverick-17B-128E-Instruct\"") to include a
language token such as text (i.e., change ``` to ```text) so markdownlint
recognizes them as plain log output.
In `@docs/templates/component_design.md`:
- Around line 27-55: The markdown has unlabeled fenced code blocks (the ASCII
diagram and the "Pseudocode or formula" block) which triggers MD040; update the
two triple-backtick fences around the diagram and the pseudocode to use a
language identifier (e.g., replace ``` with ```text) so markdownlint recognizes
them. Locate the unlabeled fences around the ASCII art and the block under
"Algorithms" (the ``` blocks in component_design.md) and change them to labeled
fences such as ```text to satisfy MD040.
In `@docs/templates/EXAMPLE_SKILL.md`:
- Around line 50-57: The Inputs table's "Target Category" allowed values list is
missing two entries from the documented 9-category hierarchy; update the table
under "## Inputs" so the "Target Category" row includes both "reference" and
"design_docs" in addition to the existing values (components, backends,
features, deploy, performance, infrastructure, integrations), and ensure the
example and description text (if any) reflect the full 9-category set so
migrations for reference and design_docs are covered.
In `@docs/templates/MIGRATION_GUIDE.md`:
- Around line 158-165: Update the table row whose first column is "Code block
languages" to use inline code spans for the language examples instead of raw
backticks, i.e., replace the cell text "Always specify (```python, ```bash,
```yaml)" with "Always specify (`python`, `bash`, `yaml`)" so the markdown
renders properly and satisfies markdownlint MD038.
- Around line 15-26: The fenced code block in MIGRATION_GUIDE.md is missing a
language specifier which triggers markdownlint MD040; update the opening
triple-backtick for the directory tree to include a language (e.g., change ```
to ```text) so the block becomes a fenced "text" code block, preserving the
directory listing content exactly; look for the directory tree block in
docs/templates/MIGRATION_GUIDE.md and modify the opening fence accordingly.
🧹 Nitpick comments (2)
docs/templates/README.md (1)
47-48: Consider clarifying the "Tier 2.5" designation.The "Tier 2.5" label for
docs/backends/README.mdbreaks the clean three-tier pattern and may cause confusion. Consider either:
- Renaming it to indicate it's a special cross-cutting index (e.g., "Backend Index" or "Tier 2 Index")
- Adding a note explaining why this deviates from the standard tier numbering
This appears intentional for the backend comparison table, but explicit clarification would help future contributors understand the exception.
docs/templates/SOURCE_TARGET_MAPPING.md (1)
154-156: Consider simplifying filenames to avoid redundancy.The target filenames (e.g.,
multimodal_vllm.md,multimodal_sglang.md,multimodal_trtllm.md) repeat "multimodal" from the parent directory pathdocs/features/multimodal/.While this follows the
<feature>_<backend>.mdpattern from the README template, it creates redundancy. Consider using simpler names likevllm.md,sglang.md,trtllm.mdsince the parent directory already provides the "multimodal" context.However, if the redundancy is intentional for search/clarity when files are referenced out of context, this is acceptable as-is.
15737f7 to
8328dd2
Compare
|
/ok to test 8328dd2 |
|
/ok to test a845476 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/features/multimodal/multimodal_vllm.md`:
- Around line 395-398: The pip extras specifier vllm["audio"] can be
glob-expanded by zsh; update the install line to quote the requirement and use
the correct extras syntax (e.g., replace vllm["audio"] with 'vllm[audio]' or
"vllm[audio]") so the command becomes pip install 'vllm[audio]' accelerate to
prevent shell globbing.
🧹 Nitpick comments (3)
docs/features/multimodal/multimodal_vllm.md (3)
68-76: Avoid “latest tag” ambiguity in git checkout.
git describe --tagscan select pre-releases or non-release tags. Consider instructing users to checkout the release tag shown on the release page to avoid accidental RC/nightly use.♻️ Proposed doc tweak
-We recommend using the latest stable release of dynamo to avoid breaking changes: +We recommend using the latest stable release of dynamo to avoid breaking changes: [](https://github.com/ai-dynamo/dynamo/releases/latest) -You can find the [latest release](https://github.com/ai-dynamo/dynamo/releases/latest) and check out the corresponding branch with: +You can find the [latest release](https://github.com/ai-dynamo/dynamo/releases/latest) and check out the corresponding tag with: ```bash -git checkout $(git describe --tags $(git rev-list --tags --max-count=1)) +git fetch --tags +git checkout <latest_release_tag></details> --- `468-474`: **Clarify ModelInput.Tokens behavior in multimodal.** “Rust SDK would tokenize (but bypassed in multimodal)” reads contradictory. Consider rewording to make the bypass explicit in the same sentence. <details> <summary>✍️ Suggested wording</summary> ```diff -| `ModelInput.Tokens` | Rust SDK would tokenize (but bypassed in multimodal) | Components expecting pre-tokenized input | +| `ModelInput.Tokens` | Tokenization is normally done by the Rust SDK, but multimodal requests bypass that path | Components expecting pre-tokenized input |
116-142: Add a note clarifying that content ordering is flexible.
The example shows text before image, which is a common convention but not required. OpenAI's multimodal API accepts content in any order—arrays are processed in the sequence provided. Adding a brief note would help users understand they can arrange text, images, audio, and video in any order that fits their use case.
|
/ok to test 6a3dd20 |
Copy multimodal docs from docs/multimodal/ to docs/features/multimodal/ as part of the documentation hierarchy refactoring: - Copy index.md as README.md in new location - Copy vllm.md, sglang.md, trtllm.md to new location - Update internal links from index.md to README.md - Add deprecation notice to original docs/multimodal/index.md - Add redirects in conf.py for old paths - Update navigation in index.rst to point to new location - Add old multimodal files to hidden_toctree.rst Signed-off-by: Dan Gil <[email protected]> Co-authored-by: Cursor <[email protected]>
- Fix SGlang → SGLang capitalization - Fix broken link to SGLang backend README - Add language specifier to fenced code blocks (MD040) - Fix "compatibilty" typo - Rename backend files to <feature>_<backend>.md convention Signed-off-by: Dan Gil <[email protected]> Co-authored-by: Cursor <[email protected]>
Signed-off-by: Dan Gil <[email protected]> Co-authored-by: Cursor <[email protected]>
6a3dd20 to
cd55320
Compare
|
/ok to test cd55320 |
Signed-off-by: Dan Gil <[email protected]> Co-authored-by: Cursor <[email protected]>
Summary
docs/multimodal/todocs/features/multimodal/as part of the documentation hierarchy refactoringindex.mdtoREADME.mdin the new locationindex.mdtoREADME.mddocs/multimodal/index.mdpointing to new locationconf.pyfor seamless URL transitionindex.rstto point to new locationThis is a non-destructive migration - old files remain with deprecation notices, and redirects ensure existing links continue to work.
Test Plan
Summary by CodeRabbit